Agent patterns
Token compression for ReAct agents
ReAct agents follow a think-act-observe loop. Each iteration appends the thought, action, and observation to the prompt. After 5-10 steps, the prompt can exceed 8,000 tokens. Compression keeps the loop efficient.
The ReAct cost problem
A ReAct agent handling a complex query might take 15 steps: 15 thoughts, 15 actions, 15 observations = 45 new items of context. At ~200 tokens each, that is 9,000 new tokens added to the original prompt. The agent pays for every token, even the irrelevant early steps.
SuperCompress compresses the accumulated context after each step, keeping only the observations and thoughts relevant to the current state of the reasoning process.
Compression in the loop
from supercompress import Compressor
comp = Compressor()
def react_step(context, thought, action, observation):
# Compress accumulated context against the latest observation
new_entry = f"Thought: {thought}\nAction: {action}\nObservation: {observation}"
full_context = context + "\n" + new_entry
result = comp.compress(full_context, observation)
return result.compressed_text
Frequently asked questions
Does compounding hurt quality?
No. Only redundant context is removed. Reasoning chains are preserved.
How much can I save on a 15-step ReAct loop?
Typically 60-75% token reduction on the accumulated context.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.