Content summarization guide
Token compression for content summarization
LLM summarization works best when the input is focused. SuperCompress removes irrelevant sections before the summarization call, reducing costs and producing better summaries.
Why compress before summarizing
Sending a 10,000-token document to an LLM and asking for a summary costs $0.025 in GPT-4o input tokens. Compressing first reduces the input to ~3,500 tokens ($0.009). The summary is better because the LLM focuses on relevant content rather than filtering noise.
Implementation
from supercompress import Compressor
comp = Compressor()
def summarize(document, focus_question):
# Step 1: Compress to keep relevant content
result = comp.compress(document, focus_question)
# Step 2: Summarize the compressed version
summary = llm.generate(
f"Summarize this:\n\n{result.compressed_text}"
)
return summary
Frequently asked questions
Does pre-compression improve summary quality?
Yes. Benchmarks show 15-20% improvement in summary relevance when compressing before summarization.
Should I always compress before summarizing?
For documents over 2,000 tokens, yes. For very short documents, the compression overhead may not be worth it.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.