Batch processing guide
Batch prompt compression
Many LLM workloads are batch: offline dataset processing, bulk content generation, nightly RAG indexing. Batch compression lets you pre-process all prompts together for maximum efficiency.
Batch processing script
from supercompress import Compressor
import json
comp = Compressor()
def batch_compress(prompts: list[dict]) -> list[dict]:
"""Compress a list of prompts in batch."""
results = []
for item in prompts:
result = comp.compress(item["context"], item["query"])
results.append({
"id": item["id"],
"compressed": result.compressed_text,
"original_tokens": result.original_tokens,
"kept_tokens": result.kept_tokens,
"savings_pct": round(
(1 - result.kept_tokens / max(result.original_tokens, 1)) * 100, 1
)
})
return results
Batch cost comparison
| Batch Size | Without Compression | With Compression | Savings |
|---|---|---|---|
| 1,000 prompts | ~$10.00 | ~$3.50 | ~$6.50 |
| 10,000 prompts | ~$100.00 | ~$35.00 | ~$65.00 |
| 100,000 prompts | ~$1,000.00 | ~$350.00 | ~$650.00 |
Frequently asked questions
Is batch compression faster per-prompt?
The compression time is the same per prompt (~60ms). But batch processing eliminates per-request overhead for HTTP calls.
Can I parallelize batch compression?
Yes. The compressor is thread-safe. Use multiprocessing or asyncio for parallel compression.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.