Serverless guide

Serverless prompt compression

Serverless functions have tight resource limits. SuperCompress adds prompt compression in ~60ms with no GPU, no model downloads, and minimal memory — perfect for serverless deployments.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

AWS Lambda deployment

# Lambda function that compresses before calling an LLM
import json
from supercompress import Compressor

comp = Compressor()

def lambda_handler(event, context):
    body = json.loads(event["body"])
    result = comp.compress(body["context"], body["query"])

    # Forward compressed context to your LLM
    return {
        "statusCode": 200,
        "body": json.dumps({
            "compressed": result.compressed_text,
            "savings": result.tokens_removed
        })
    }

Serverless compatibility

PlatformCold StartMemoryCompression Time
AWS Lambda~300ms~80MB~60ms
Google Cloud Functions~200ms~80MB~60ms
Cloudflare Workers~5ms~50MB~70ms
Vercel Edge Functions~50ms~60MB~65ms

Frequently asked questions

Does SuperCompress fit in Lambda's /tmp space?

Yes. The package is ~200KB. No model files needed.

Can I use it in edge runtimes?

Yes. The compressor is pure Python and works in edge environments that support Python.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge