Chatbot optimization

Token compression for AI chatbots

Every chatbot conversation accumulates history that gets sent on every turn. Token compression removes the low-value messages while keeping the evidence needed for the next response.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

Why chatbots need compression

A typical support chatbot conversation has 10-20 messages. By turn 10, the full history may be 3,000-5,000 tokens. At turn 20, it could be 8,000+ tokens. Most of this history is irrelevant to the latest customer question.

Compressing the conversation history before each LLM call saves 60-85% on input tokens while keeping all answer-relevant context.

Integration example

from supercompress import Compressor
comp = Compressor()

async def chat_response(messages, user_query):
    history = format_history(messages)
    compressed = comp.compress(history, user_query)
    return await llm.chat(compressed.compressed_text, user_query)

Frequently asked questions

Does compression change the chatbot personality?

No. System prompts and instructions are kept intact.

Can I use this with Dialogflow or Rasa?

Yes. Compression happens before the LLM call in your middleware.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground See benchmarks