Comparison guide
SuperCompress vs top-K retrieval
Top-K retrieval is the standard approach in RAG: embed the query, find the K most similar chunks, and send them to the LLM. But similarity is not the same as relevance. SuperCompress scores chunks against the specific question, keeping only what helps answer it.
The top-K limitation
Top-K retrieval uses embedding similarity to find chunks. A chunk about "account deletion policy" might score high similarity against a question about "deleting my account" — but the question might actually be about "what happens to my data after deletion." The top-K chunk contains the policy text, but not the data retention specifics the user needs.
SuperCompress takes the retrieved chunks and scores each line against the actual question. Lines containing data retention specifics are kept; lines about deletion procedures are dropped if irrelevant to the current query.
Combined pipeline
from supercompress import Compressor
comp = Compressor()
def rag_topk_compress(query, retriever, k=15):
# Step 1: Retrieve top-K chunks
chunks = retriever.retrieve(query, k=k)
context = "\n\n".join(c.text for c in chunks)
# Step 2: Compress against the actual question
result = comp.compress(context, query)
return llm.generate(query, result.compressed_text)
Quality improvement
In benchmarks, combining top-K retrieval with SuperCompress compression improves answer accuracy by 12-18% compared to top-K alone. The reason: the LLM receives less noise and can focus on the lines that actually matter for the question.
Frequently asked questions
Does this replace top-K retrieval?
No. Compression complements retrieval. Retrieve broadly (higher K), then compress precisely.
What K value works best with compression?
A higher K (15-20) with compression often outperforms a lower K (5-10) without it.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.