Comparison guide

Compression vs routing

Model routing sends simple queries to cheap models and complex queries to expensive models. Compression reduces input size for all models. They are complementary strategies.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

How routing works

Model routing classifies each query by complexity. Simple queries go to GPT-4o-mini ($0.15/1M tokens), complex queries go to GPT-4o ($2.50/1M tokens). This saves money on the ~70% of queries that are simple.

Combining with compression

Add compression to both paths: compress the context before sending to GPT-4o-mini for simple queries, and before GPT-4o for complex queries. This double-optimizes the pipeline — you pay less per token (routing) and use fewer tokens (compression).

Total savings potential

StrategyCost/QuerySavings
No optimization$0.01000%
Routing only$0.003961%
Compression only$0.003565%
Routing + Compression$0.001486%

Frequently asked questions

Do I need a separate routing model?

Yes. Use a classifier (simple heuristic or small ML model) to determine query complexity.

Does compression affect routing accuracy?

No. Compression happens after routing. The router sees the original query.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge