How does SuperCompress compare to truncation?

Truncation keeps only the head and tail of the context, dropping the middle. If the answer-critical line sits in the middle, truncation loses it. SuperCompress compiler mode scores semantic blocks against the query, removes low-value context, and reports important context kept.

How many tokens can SuperCompress save?

Compiler mode removes the most tokens it can while preserving answer-critical evidence. The current bundled long-context run averages 82.5% token savings.

Is SuperCompress free?

Yes. SuperCompress is open source under MIT license. The Python library and browser demo are free. The hosted API offers a free tier with 100K tokens per month.

LLM Context Compression Benchmarks

Compiler-mode savings for real API-call behavior, plus fixed-ratio baselines for research comparison.

Policy Comparison at 35% Token Budget

Policy	Oracle Recall	Entity Recall	Latency	KV Savings	Model Size
FIFO / Truncation	25%	73%	~57 ms	~65%	0 (rule-based)
Summarization	61%	65%	~63 ms	~65%	LLM call
H2O (Heavy Hitter Oracle)	98%	73%	~56 ms	~65%	attention-based
SuperCompress Best	100%	73%	~60 ms	~65%	~5K params

Legacy baseline on 8 project seeds. Oracle recall = answer-critical lines preserved. Entity recall = named entities retained.

Compiler Mode — Real API-Call Behavior

Compiler mode does not ask users for a budget. It removes the most tokens it can while preserving query-critical evidence and returning verifier metadata: important context kept, risk, kept blocks, and dropped blocks.

Context Type	Original Tokens	After Compression	Tokens Removed	Savings
To Kill a Mockingbird study context	1,454	611	843	58.0%
Long coding session log	1,020	76	944	92.5%
Markdown documentation	1,195	85	1,110	92.9%
Agent incident log	1,074	56	1,018	94.8%
Average	1,186	207	979	82.5%

Visual Benchmarks

Legacy fixed-ratio oracle recall: SuperCompress 100%, H2O about 98%, FIFO and truncation about 25% — Legacy fixed-ratio oracle recall baseline

Compiler mode token savings on long-context presets — Compiler mode token savings on real-world contexts

Environmental Impact at Scale

Based on documented SuperCompress assumptions (2,500 tok/GPU-s, 150W GPU, 55% KV share, 0.417 kg CO₂/kWh).

Scale	Tokens Avoided	kWh Saved	CO₂ Avoided	Water Saved (est.)
1 model call	~800	~0.00003	~0.00001 kg	~0.0001 L
1,000 calls	~800K	~0.03	~0.01 kg	~0.1 L
1M calls	~800M	~29	~12 kg	~100 L
10M calls	~8B	~290	~120 kg	~1,000 L

Full methodology: Environment guide.

Frequently Asked Questions

What is oracle recall?

Oracle recall measures how many of the lines that contain the answer to a specific question are preserved after compression. 100% oracle recall means every answer-critical line is kept. This is the most important quality metric for context compression.

How is SuperCompress different from truncation?

Truncation keeps only the head and tail of the context, dropping the middle. If the answer-critical line sits in the middle (which it often does), truncation loses it. SuperCompress scores every line against the question and keeps only the most relevant ones, regardless of position.

Does SuperCompress require GPU or extra LLM calls?

No. SuperCompress runs entirely on CPU with ~5K parameters and ~60ms latency on benchmark seeds. It requires zero GPU time and zero extra LLM calls — it's a small learned policy that runs before the language model.

Why is there no compression budget?

Compiler mode is designed for individual API calls: it removes as much as it safely can, then reports tokens saved, important context kept, and verifier risk. Fixed-ratio mode remains only for legacy comparisons.

Can I run SuperCompress with any LLM?

Yes. SuperCompress is model-agnostic. It compresses the context before sending it to the language model, so it works with OpenAI, Anthropic, open-weight models, or any LLM that accepts text input.

Is there a hosted API?

Yes. The SuperCompress hosted API is available at supercompress.dev/api/v1/compress. Get a free API key from the dashboard to get started. The Python client library wraps both local and API modes.

Try SuperCompress on your own context

Paste your long prompts and see exactly how much can be removed while keeping what matters.

Open Playground Get API Key