Batch processing guide

Batch prompt compression

Many LLM workloads are batch: offline dataset processing, bulk content generation, nightly RAG indexing. Batch compression lets you pre-process all prompts together for maximum efficiency.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

Batch processing script

from supercompress import Compressor
import json

comp = Compressor()

def batch_compress(prompts: list[dict]) -> list[dict]:
    """Compress a list of prompts in batch."""
    results = []
    for item in prompts:
        result = comp.compress(item["context"], item["query"])
        results.append({
            "id": item["id"],
            "compressed": result.compressed_text,
            "original_tokens": result.original_tokens,
            "kept_tokens": result.kept_tokens,
            "savings_pct": round(
                (1 - result.kept_tokens / max(result.original_tokens, 1)) * 100, 1
            )
        })
    return results

Batch cost comparison

Batch Size	Without Compression	With Compression	Savings
1,000 prompts	~$10.00	~$3.50	~$6.50
10,000 prompts	~$100.00	~$35.00	~$65.00
100,000 prompts	~$1,000.00	~$350.00	~$650.00

Frequently asked questions

Is batch compression faster per-prompt?

The compression time is the same per prompt (~60ms). But batch processing eliminates per-request overhead for HTTP calls.

Can I parallelize batch compression?

Yes. The compressor is thread-safe. Use multiprocessing or asyncio for parallel compression.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge