Prompt compression guide
Prompt compression that keeps the answer
Prompt compression is the practical way to reduce oversized LLM requests before they hit an expensive model. SuperCompress scores context against the current question and keeps the lines most likely to matter.
Why prompt compression matters
Every LLM call processes your full prompt including context that may be irrelevant to the question. In agent loops, RAG pipelines, and coding assistants, context accumulates rapidly.
Without prompt compression, you pay for every token regardless of relevance. A typical RAG query sending 4,000 tokens may only need 400 of them to answer the question.
How prompt compression works
SuperCompress uses a learned ~5K-parameter policy that scores every block of context against your question, then keeps only the blocks most likely to contain answer-relevant information.
from supercompress import Compressor
compressor = Compressor()
result = compressor.compress(
context=long_context,
query="What caused the failed deployment?"
)
print(f"Removed {result.tokens_removed} tokens")
Prompt compression vs summarization
| Factor | Prompt Compression | Summarization |
|---|---|---|
| Method | Selects original lines | Rewrites with LLM |
| Extra cost | ~60ms CPU | Full LLM call |
| Oracle recall | 100% | ~61% |
Cost savings from prompt compression
| Scenario | Daily Calls | GPT-4o Monthly Savings |
|---|---|---|
| Small agent | 100 | ~$20 |
| Medium RAG app | 1,000 | ~$295 |
| Enterprise | 500,000 | ~$98,500 |
Frequently asked questions
Is prompt compression the same as summarization?
No. Summarization rewrites the prompt using another LLM call, changing wording and potentially losing facts. Query-aware compression selects original lines that matter for the current question.
How much can prompt compression save?
SuperCompress averages 82.5% savings on bundled long-context presets.
Does prompt compression add latency?
SuperCompress adds ~60ms on CPU, often compensated by reduced GPU prefill time.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.