Agent patterns

Token compression for ReAct agents

ReAct agents follow a think-act-observe loop. Each iteration appends the thought, action, and observation to the prompt. After 5-10 steps, the prompt can exceed 8,000 tokens. Compression keeps the loop efficient.

By Arjun Shah - Creator of SuperCompress - Updated 2026-07-03

The ReAct cost problem

A ReAct agent handling a complex query might take 15 steps: 15 thoughts, 15 actions, 15 observations = 45 new items of context. At ~200 tokens each, that is 9,000 new tokens added to the original prompt. The agent pays for every token, even the irrelevant early steps.

SuperCompress compresses the accumulated context after each step, keeping only the observations and thoughts relevant to the current state of the reasoning process.

Compression in the loop

from supercompress import Compressor
comp = Compressor()

def react_step(context, thought, action, observation):
    # Compress accumulated context against the latest observation
    new_entry = f"Thought: {thought}\nAction: {action}\nObservation: {observation}"
    full_context = context + "\n" + new_entry
    result = comp.compress(full_context, observation)
    return result.compressed_text

Frequently asked questions

Does compounding hurt quality?

No. Only redundant context is removed. Reasoning chains are preserved.

How much can I save on a 15-step ReAct loop?

Typically 60-75% token reduction on the accumulated context.

Try it yourself

Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.

Open the Playground Embed the badge