Model optimization
Mistral compression
Mistral models are popular for self-hosted deployments due to their efficiency. Compression further optimizes inference by reducing prompt prefill time.
Mistral with compression
from supercompress import Compressor
from mistralai import Mistral
comp = Compressor()
client = Mistral(api_key="...")
def chat_with_compression(messages, query):
history = "\n".join(m["content"] for m in messages[:-1])
compressed = comp.compress(history, query)
messages[-1]["content"] = compressed.compressed_text + "\n" + query
return client.chat.complete(model="mistral-large", messages=messages)
Frequently asked questions
Does compression work with Mistral's function calling?
Yes. Function definitions are preserved. Only conversation history is compressed.
Can I use it with self-hosted Mistral?
Yes. The compressor runs locally and compresses before sending to your self-hosted instance.
Try it yourself
Paste your long prompt into the playground, ask a question, and see what SuperCompress keeps and removes. Free, no signup needed.