Batch processing guide

Batch prompt compression

Many LLM workloads are batch: offline dataset processing, bulk content generation, nightly RAG indexing. Batch compression lets you pre-process all prompts together for maximum efficiency.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

Batch processing script

from supercompress import Compressor
import json

comp = Compressor()

def batch_compress(prompts: list[dict]) -> list[dict]:
    """Compress a list of prompts in batch."""
    results = []
    for item in prompts:
        result = comp.compress(item["context"], item["query"])
        results.append({
            "id": item["id"],
            "compressed": result.compressed_text,
            "original_tokens": result.original_tokens,
            "kept_tokens": result.kept_tokens,
            "savings_pct": round(
                (1 - result.kept_tokens / max(result.original_tokens, 1)) * 100, 1
            )
        })
    return results

Batch cost comparison

Batch SizeWithout CompressionWith CompressionSavings
1,000 prompts~$10.00~$3.50~$6.50
10,000 prompts~$100.00~$35.00~$65.00
100,000 prompts~$1,000.00~$350.00~$650.00

Frequently asked questions

Is batch compression faster per-prompt?

The compression time is the same per prompt (~60ms). But batch processing eliminates per-request overhead for HTTP calls.

Can I parallelize batch compression?

Yes. The compressor is thread-safe. Use multiprocessing or asyncio for parallel compression.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge