Best practices
When to use prompt compression
Prompt compression is not always the right answer. Here is a decision framework to help you determine when to compress and when to skip.
Compress when:
- Context exceeds 1,000 tokens — Below this, the savings may not justify the overhead
- Context contains irrelevant information — Standards table with update data for many questions, only some questions touch most standards
- Cost is a primary concern — If you are tracking LLM spending, compression is your highest-ROI optimization
- Latency matters — Compressing large contexts reduces LLM prefill time, often netting faster responses
Skip when:
- Context is under 500 tokens — Minimal savings, and the compression overhead may not be worth it
- Every token matters — Some critical applications need every line of context available, regardless of relevance
- You are debugging prompt quality — Compression adds a variable that complicates debugging
Frequently asked questions
Should I always compress RAG contexts?
Yes. RAG contexts are the highest-ROI target for compression due to their size and noise level.
Should I compress few-shot examples?
Not usually. Few-shot examples are carefully chosen and typically small. Compression is more valuable on large context blocks.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.