Python integration guide

Python prompt compression guide

Python is the primary language for LLM application development. SuperCompress is a native Python library that compresses prompts before they reach your model, cutting token costs by ~65% on every API call.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

Installation

pip install supercompress

That is it. The package includes the compression policy, Python bindings, and CLI tool. No GPU, no model download, no external dependencies.

Basic usage

from supercompress import Compressor

comp = Compressor()
result = comp.compress(
    context="""Alice has been a Premium subscriber since 2021.
Her account type is Enterprise.
She last logged in on June 14, 2026.
Her current plan costs $199/month.""",
    query="What plan is Alice on?"
)

print(result.compressed_text)
# Output keeps: "Her current plan costs $199/month."
# Drops: login date, subscriber since info (irrelevant to the question)
print(f"Saved {result.tokens_removed} tokens")

Integration with OpenAI

import openai
from supercompress import Compressor

comp = Compressor()
client = openai.OpenAI()

def ask_with_compression(context, query):
    result = comp.compress(context, query)
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer based on the context."},
            {"role": "user", "content": result.compressed_text},
        ]
    )
    return response.choices[0].message.content

Frequently asked questions

Does SuperCompress work with Python 3.9+?

Yes. SuperCompress supports Python 3.9 through 3.13.

Can I use it with async Python?

Yes. The compressor is synchronous and fast (~60ms), so it works in async contexts without blocking.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge