Anthropic integration guide

Anthropic Claude prompt compression

Claude models charge per token just like OpenAI. SuperCompress integrates with the Anthropic Python SDK to compress context before every API call, reducing Claude 3.5 Opus costs significantly.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

Claude cost context

Claude 3.5 Sonnet costs $3.00 per million input tokens, and Claude 3.5 Opus costs $15.00 per million input tokens. A RAG application processing 10,000 tokens per query with 10,000 daily queries costs $300/day on Sonnet or $1,500/day on Opus.

SuperCompress reduces input tokens by ~65%, dropping those costs to ~$105/day and ~$525/day respectively. For teams running Claude at scale, this is a direct bottom-line improvement.

Anthropic wrapper integration

from anthropic import Anthropic
from supercompress import Compressor

class SuperCompressAnthropic(Anthropic):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._compressor = Compressor()

    def messages_create(self, *, model, messages, **kwargs):
        if len(messages) > 1:
            # Compress the conversation history
            history_lines = []
            for msg in messages[:-1]:
                if isinstance(msg.get("content"), str):
                    history_lines.append(msg["content"])
            history = "
".join(history_lines)
            query = messages[-1].get("content", "") if isinstance(messages[-1].get("content"), str) else ""
            if history and query:
                result = self._compressor.compress(history, query)
                # Replace history with compressed single message
                compressed_messages = [{"role": "user", "content": result.compressed_text}]
                compressed_messages.append(messages[-1])
                kwargs["messages"] = compressed_messages
        return super().messages_create(model=model, messages=kwargs.get("messages", messages), **{k:v for k,v in kwargs.items() if k != "messages"})

Frequently asked questions

Does compression work with Claude's extended thinking?

Yes. Compression happens before the API call. The thinking budget is unaffected.

Will it work with Claude 3 Haiku for cheaper calls?

Yes. Compression benefits all Claude models equally.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground See benchmarks