Performance guide
Latency benchmarks for compression
Compression adds ~60ms of processing time but reduces LLM prefill time proportionally to the compression ratio. The net effect on end-to-end latency is often neutral or positive.
End-to-end latency breakdown
| Scenario | Without Compression | With SuperCompress | Net Change |
|---|---|---|---|
| 4K tokens → 1.4K | ~800ms | ~340ms + 60ms = ~400ms | -400ms |
| 8K tokens → 2.8K | ~1,600ms | ~560ms + 60ms = ~620ms | -980ms |
| 16K tokens → 5.6K | ~3,200ms | ~1,120ms + 60ms = ~1,180ms | -2,020ms |
Frequently asked questions
Does compression make responses faster?
For prompts over 2,000 tokens, yes. The reduced LLM prefill time more than compensates for the compression overhead.
Is compression worth it for short prompts?
For prompts under 500 tokens, compression may add latency without meaningful savings.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.