LlamaIndex integration guide
LlamaIndex prompt compression
LlamaIndex is the leading framework for building RAG applications. SuperCompress integrates as a node postprocessor that compresses retrieved context before it reaches the LLM.
The LlamaIndex compression pattern
In LlamaIndex, NodePostprocessors run after retrieval and before the LLM prompt construction. By creating a SuperCompressPostprocessor, you intercept the retrieved nodes and compress their text against the query before they are assembled into the LLM prompt.
This is the cleanest integration point because it keeps compression as a modular pipeline step rather than modifying the query engine internals.
Node postprocessor implementation
from llama_index.core.postprocessor import BaseNodePostprocessor
from supercompress import Compressor
class SuperCompressPostprocessor(BaseNodePostprocessor):
'''Compress retrieved node text against the query before LLM generation.'''
def __init__(self):
super().__init__()
self._compressor = Compressor()
def _postprocess_nodes(self, nodes, query_bundle):
query = query_bundle.query_str
for node in nodes:
result = self._compressor.compress(node.text, query)
node.text = result.compressed_text
return nodes
# Usage
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
node_postprocessors=[SuperCompressPostprocessor()]
)
response = query_engine.query("What caused the outage?")
Savings with LlamaIndex
For a LlamaIndex query engine retrieving 10 chunks of 500 tokens each (5,000 total tokens), compression reduces the context to ~1,750 tokens. At GPT-4o pricing, that saves ~$0.008 per query. At 10,000 queries/day, that is $80/day or ~$29,200/year.
Frequently asked questions
Does this work with LlamaIndex's chat engine?
Yes. The postprocessor runs before the LLM call in both query and chat engines.
Will it work with hierarchical retrievers?
Yes. The postprocessor compresses each node independently, so it works with any retriever type.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.