LlamaIndex integration guide

LlamaIndex prompt compression

LlamaIndex is the leading framework for building RAG applications. SuperCompress integrates as a node postprocessor that compresses retrieved context before it reaches the LLM.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

The LlamaIndex compression pattern

In LlamaIndex, NodePostprocessors run after retrieval and before the LLM prompt construction. By creating a SuperCompressPostprocessor, you intercept the retrieved nodes and compress their text against the query before they are assembled into the LLM prompt.

This is the cleanest integration point because it keeps compression as a modular pipeline step rather than modifying the query engine internals.

Node postprocessor implementation

from llama_index.core.postprocessor import BaseNodePostprocessor
from supercompress import Compressor

class SuperCompressPostprocessor(BaseNodePostprocessor):
    '''Compress retrieved node text against the query before LLM generation.'''

    def __init__(self):
        super().__init__()
        self._compressor = Compressor()

    def _postprocess_nodes(self, nodes, query_bundle):
        query = query_bundle.query_str
        for node in nodes:
            result = self._compressor.compress(node.text, query)
            node.text = result.compressed_text
        return nodes

# Usage
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
    node_postprocessors=[SuperCompressPostprocessor()]
)
response = query_engine.query("What caused the outage?")

Savings with LlamaIndex

For a LlamaIndex query engine retrieving 10 chunks of 500 tokens each (5,000 total tokens), compression reduces the context to ~1,750 tokens. At GPT-4o pricing, that saves ~$0.008 per query. At 10,000 queries/day, that is $80/day or ~$29,200/year.

Frequently asked questions

Does this work with LlamaIndex's chat engine?

Yes. The postprocessor runs before the LLM call in both query and chat engines.

Will it work with hierarchical retrievers?

Yes. The postprocessor compresses each node independently, so it works with any retriever type.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground See benchmarks