Comparison guide
Compression vs routing
Model routing sends simple queries to cheap models and complex queries to expensive models. Compression reduces input size for all models. They are complementary strategies.
How routing works
Model routing classifies each query by complexity. Simple queries go to GPT-4o-mini ($0.15/1M tokens), complex queries go to GPT-4o ($2.50/1M tokens). This saves money on the ~70% of queries that are simple.
Combining with compression
Add compression to both paths: compress the context before sending to GPT-4o-mini for simple queries, and before GPT-4o for complex queries. This double-optimizes the pipeline — you pay less per token (routing) and use fewer tokens (compression).
Total savings potential
| Strategy | Cost/Query | Savings |
|---|---|---|
| No optimization | $0.0100 | 0% |
| Routing only | $0.0039 | 61% |
| Compression only | $0.0035 | 65% |
| Routing + Compression | $0.0014 | 86% |
Frequently asked questions
Do I need a separate routing model?
Yes. Use a classifier (simple heuristic or small ML model) to determine query complexity.
Does compression affect routing accuracy?
No. Compression happens after routing. The router sees the original query.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.