Integration guide

FastAPI compression middleware

FastAPI is the most popular Python web framework for AI backends. Add SuperCompress as middleware to automatically compress all LLM-bound requests.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

Middleware implementation

from fastapi import FastAPI, Request
from supercompress import Compressor

app = FastAPI()
comp = Compressor()

@app.middleware("http")
async def compress_prompts(request: Request, call_next):
    if request.url.path == "/api/llm/chat":
        body = await request.json()
        context = body.get("context", "")
        query = body.get("query", "")
        if context and query:
            result = comp.compress(context, query)
            body["context"] = result.compressed_text
            request._body = json.dumps(body).encode()
    return await call_next(request)

Frequently asked questions

Does this work with streaming endpoints?

Yes. Compress the request body before the streaming response starts.

Can I exclude certain endpoints from compression?

Yes. Check the request path and skip compression for non-LLM endpoints.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge