Mistral Medium 3.5 API Guide — Authentication, Endpoints, and Code Examples (2026)
Mistral Medium 3.5 is a 128B dense model with 77.6% SWE-bench Verified, 256K context, configurable reasoning, and native vision. The API is available through La Plateforme at $1.50 input / $7.50 output per million tokens — half the cost of Claude Sonnet 4.6 with open weights included.
This guide covers everything you need to build with the Mistral Medium 3.5 API: authentication, chat completions, streaming, reasoning effort, function calling, vision, structured output, and integration with coding tools. For the full model overview, see our Mistral Medium 3.5 complete guide.
Getting started
1. Get an API key
Sign up at La Plateforme, add a payment method, and generate an API key from the dashboard. No waitlist — the key works immediately.
2. Install the client library
pip install mistralai
3. Make your first request
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{"role": "user", "content": "Explain Python generators in 3 sentences."}
]
)
print(response.choices[0].message.content)
That’s it. The model ID is mistral-medium-3.5. All requests go through https://api.mistral.ai.
Pricing
All prices are per 1M tokens. For a broader comparison, see AI API pricing compared and how to reduce LLM API costs.
| Input | Output | |
|---|---|---|
| Mistral Medium 3.5 | $1.50 | $7.50 |
Medium 3.5 replaces three separate models (Medium 3.1, Magistral, and Devstral 2), so you no longer need to route between models or manage multiple pricing tiers. One model, one price.
Authentication
Every request requires your API key in the Authorization header:
Authorization: Bearer your-api-key
The Python and JavaScript client libraries handle this automatically when you pass the key at initialization. If you’re calling the REST API directly:
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-medium-3.5",
"messages": [{"role": "user", "content": "Hello"}]
}'
Store your key in an environment variable (MISTRAL_API_KEY) rather than hardcoding it. For key management best practices, see secure AI API keys.
Basic chat completion
The core endpoint is /v1/chat/completions. Here’s a complete example with a system prompt:
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{"role": "system", "content": "You are a senior Python developer. Be concise."},
{"role": "user", "content": "Write a decorator that retries a function 3 times on exception."}
],
temperature=0.3
)
print(response.choices[0].message.content)
Medium 3.5 has strong system prompt adherence — it follows formatting instructions, persona constraints, and output rules more reliably than most open-weight models. This makes it well-suited for production applications where consistent behavior matters.
Streaming responses
For user-facing applications, stream responses to reduce perceived latency:
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
stream = client.chat.stream(
model="mistral-medium-3.5",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Explain how async/await works in Python."}
]
)
for chunk in stream:
content = chunk.data.choices[0].delta.content
if content:
print(content, end="", flush=True)
Streaming is especially important when using reasoning_effort="high", since extended reasoning produces longer outputs.
Configurable reasoning effort
Medium 3.5 supports two reasoning modes per request. This replaces the need for separate reasoning models like Magistral. For more context on how this compares to other models, see the Mistral Medium 3.5 complete guide.
| Mode | Behavior | When to use |
|---|---|---|
none | Fast, direct answers. No chain-of-thought. | Classification, autocomplete, formatting, high-throughput tasks |
high | Extended internal reasoning before responding. | Complex coding, debugging, math, multi-step planning |
reasoning_effort=“none”
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "Classify this text as positive or negative: 'Great product, fast shipping'"}],
reasoning_effort="none"
)
Use none when the task is straightforward. It’s faster and uses fewer output tokens, which directly reduces cost.
reasoning_effort=“high”
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "Debug this race condition in my Go service that causes duplicate writes under load."}],
reasoning_effort="high"
)
Use high for anything that benefits from step-by-step thinking: complex code generation, architectural decisions, debugging, and math. The model produces more output tokens (and costs more per request), but the quality improvement is significant for hard problems.
Default behavior: If you omit reasoning_effort, the model uses its default mode. Explicitly set it when you know the task complexity.
Temperature recommendations
Mistral recommends different temperature settings depending on the reasoning mode:
- With reasoning (
reasoning_effort="high"): Usetemperature=0.7. The reasoning process benefits from some exploration. - Without reasoning (
reasoning_effort="none"): Usetemperature=0.0to0.7depending on the task. Lower for deterministic outputs (classification, extraction), higher for creative tasks.
# Deterministic classification — low temperature, no reasoning
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "Is this a bug report or feature request? 'The button doesn't work on mobile'"}],
reasoning_effort="none",
temperature=0.0
)
# Creative code generation — higher temperature, with reasoning
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "Design a novel caching strategy for a multi-tenant SaaS app."}],
reasoning_effort="high",
temperature=0.7
)
Function calling / tool use
Medium 3.5 supports function calling for building agents and automated workflows. Define tools with JSON schemas and the model generates structured calls.
from mistralai import Mistral
import json
client = Mistral(api_key="your-api-key")
# Define a calculator tool
tools = [
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform a mathematical calculation",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The math expression to evaluate, e.g. '2 + 3 * 4'"
}
},
"required": ["expression"]
}
}
}
]
# Send a message that requires the tool
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "What is 1547 * 382 + 91?"}],
tools=tools,
tool_choice="auto"
)
# Check if the model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
# Execute the tool (your implementation)
result = eval(args["expression"]) # In production, use a safe math parser
# Send the result back
follow_up = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{"role": "user", "content": "What is 1547 * 382 + 91?"},
message,
{"role": "tool", "content": str(result), "tool_call_id": tool_call.id}
],
tools=tools
)
print(follow_up.choices[0].message.content)
For more on tool use patterns, see what is tool calling.
Vision / multimodal input
Medium 3.5 includes a vision encoder trained from scratch — not a bolted-on adapter. It handles variable image sizes, documents, diagrams, UI screenshots, and charts.
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what's in this image and identify any UI issues."},
{"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
]
}
]
)
print(response.choices[0].message.content)
You can also pass multiple images in a single request:
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Compare these two designs and list the differences."},
{"type": "image_url", "image_url": {"url": "https://example.com/design-v1.png"}},
{"type": "image_url", "image_url": {"url": "https://example.com/design-v2.png"}}
]
}
]
)
Vision works with both reasoning modes. Use reasoning_effort="high" for complex diagram analysis or document understanding.
JSON mode / structured output
Force the model to return valid JSON by setting response_format:
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{"role": "system", "content": "Extract the following fields from the text: name, email, company. Return JSON."},
{"role": "user", "content": "Hi, I'm Sarah Chen from Acme Corp. Reach me at sarah@acme.io."}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
print(data)
# {"name": "Sarah Chen", "email": "sarah@acme.io", "company": "Acme Corp"}
JSON mode guarantees the output is parseable JSON. Combine it with a clear schema description in the system prompt for reliable structured extraction. For more on this pattern, see structured outputs explained.
System prompts
Medium 3.5 has notably strong system prompt adherence. It follows formatting rules, persona constraints, language requirements, and output schemas more consistently than most models in its class.
Tips for effective system prompts:
- Be specific about format: “Return a JSON object with keys: summary, severity, recommendation” works better than “return structured data.”
- Set constraints early: Put output format and behavioral rules in the system prompt, not the user message.
- Use examples: One-shot or few-shot examples in the system prompt significantly improve consistency.
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "system",
"content": """You are a code reviewer. For each review:
1. List issues found (bullet points)
2. Rate severity: low/medium/high
3. Suggest a fix for each issue
Never explain what the code does — only review it."""
},
{"role": "user", "content": "Review this:\n\ndef get_user(id):\n return db.query(f'SELECT * FROM users WHERE id = {id}')"}
]
)
OpenAI-compatible endpoint
Mistral’s API is compatible with the OpenAI SDK. If you’re already using the OpenAI Python library, you can switch to Mistral Medium 3.5 by changing the base URL and API key:
from openai import OpenAI
client = OpenAI(
api_key="your-mistral-api-key",
base_url="https://api.mistral.ai/v1"
)
response = client.chat.completions.create(
model="mistral-medium-3.5",
messages=[
{"role": "user", "content": "Write a Dockerfile for a Node.js app with multi-stage build."}
]
)
print(response.choices[0].message.content)
This means any tool or framework that supports the OpenAI API format can use Mistral Medium 3.5 with minimal configuration changes.
Using with coding tools
Aider
Aider works with Mistral Medium 3.5 directly:
export MISTRAL_API_KEY=your-api-key
aider --model mistral/mistral-medium-3.5
Or use it via OpenRouter:
export OPENROUTER_API_KEY=your-key
aider --model openrouter/mistralai/mistral-medium-3.5
OpenCode
OpenCode supports any OpenAI-compatible endpoint. Add this to your opencode.json:
{
"provider": {
"mistral": {
"apiKey": "your-mistral-api-key",
"baseURL": "https://api.mistral.ai/v1",
"models": {
"mistral-medium-3.5": {
"maxTokens": 16384,
"contextWindow": 262144
}
}
}
},
"model": "mistral/mistral-medium-3.5"
}
Continue.dev
Continue.dev supports Mistral natively. Add this to your config.json:
{
"models": [
{
"title": "Mistral Medium 3.5",
"provider": "mistral",
"model": "mistral-medium-3.5",
"apiKey": "your-mistral-api-key"
}
]
}
Continue.dev also supports tab autocomplete with Codestral if you want to pair Medium 3.5 for chat with Codestral for completions.
Rate limits and error handling
Mistral applies rate limits based on your account tier. If you hit a limit, the API returns HTTP 429.
| Error code | Meaning | Action |
|---|---|---|
| 401 | Invalid API key | Check your key and ensure it’s active |
| 429 | Rate limit exceeded | Back off and retry with exponential delay |
| 400 | Bad request (invalid model, malformed input) | Check your request body and model ID |
| 500 | Server error | Retry after a short delay |
Retry with exponential backoff
import time
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
def chat_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.complete(
model="mistral-medium-3.5",
messages=messages
)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
else:
raise
For production applications, consider using an AI gateway pattern with automatic retries and fallback to other models via OpenRouter.
Pricing comparison vs other APIs
How Medium 3.5 compares to other frontier model APIs as of April 2026:
| Model | Input ($/M tokens) | Output ($/M tokens) | Context | Open weights |
|---|---|---|---|---|
| Mistral Medium 3.5 | $1.50 | $7.50 | 256K | Yes |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M (beta) | No |
| DeepSeek V4 Pro | $1.74 | $3.48 | 1M | Yes |
| DeepSeek V4 Flash | $0.14 | $0.28 | 1M | Yes |
| GPT-5.4 | ~$5.00 | ~$15.00 | 256K | No |
| Gemini 3.1 Pro | ~$3.50 | ~$10.50 | 2M | No |
| Kimi K2.6 | ~$1.00 | ~$4.00 | 256K | No |
| Qwen 3.5 | Free (Alibaba) | Free (Alibaba) | 128K | Yes |
Key takeaways:
- Cheapest frontier option? No — DeepSeek V4 Flash is far cheaper. But Medium 3.5 beats it on SWE-bench (77.6% vs ~76%).
- vs Claude Sonnet 4.6: Medium 3.5 is half the price. Sonnet leads by 2 points on SWE-bench but you’re paying 2× for that margin.
- vs DeepSeek V4 Pro: V4 Pro has cheaper output ($3.48 vs $7.50) and slightly higher SWE-bench (80.6%). But Medium 3.5 is a simpler dense architecture that’s easier to self-host.
- Open weights advantage: Medium 3.5 and DeepSeek V4 are the only frontier-class models you can self-host. If you need to run on your own infrastructure, these are your options.
For strategies to minimize API spend, see how to reduce LLM API costs.
FAQ
Which model ID do I use for Mistral Medium 3.5?
Use mistral-medium-3.5. This is the model ID for both the Mistral client library and the OpenAI-compatible endpoint.
Does Medium 3.5 replace Devstral 2 in the API?
Medium 3.5 subsumes Devstral 2’s capabilities — coding, agentic tasks, and tool use — while adding vision, configurable reasoning, and general-purpose intelligence. Devstral 2 remains available at a lower price point ($0.40/$2.00) for coding-only workloads where you don’t need vision or reasoning modes.
Can I use Medium 3.5 with the OpenAI SDK in JavaScript/TypeScript?
Yes. Any OpenAI-compatible SDK works. Set the base URL to https://api.mistral.ai/v1 and use your Mistral API key. Alternatively, use the official Mistral JavaScript SDK (@mistralai/mistralai).
What’s the context window limit?
256K tokens. If you exceed it, the API returns a 400 error. For workloads that need more context, consider DeepSeek V4 (1M tokens) or Gemini 3.1 Pro (2M tokens).
Is there a free tier?
No. Mistral’s API is pay-per-token with no free tier. However, at $1.50/M input tokens, light usage costs pennies. For free alternatives, see best free AI APIs 2026.
Does the API support batch processing?
Yes. You can send multiple independent requests concurrently. For high-throughput workloads, combine reasoning_effort="none" with streaming disabled to maximize throughput.
How does Medium 3.5 handle GDPR compliance?
Mistral’s API runs on EU infrastructure. Data stays in Europe without additional configuration — no Standard Contractual Clauses needed. This is a significant advantage for European companies. See our AI GDPR developers guide for details.
Can I self-host Medium 3.5 instead of using the API?
Yes. The model has open weights on Hugging Face and runs on as few as 4 GPUs with FP8 quantization. See how to run Mistral Medium 3.5 locally for setup instructions with vLLM, SGLang, and Ollama.
Next steps
- Read the Mistral Medium 3.5 complete guide for benchmarks, architecture, and comparisons.
- Explore the Mistral API guide for other Mistral models (Codestral, Mistral Small, Mistral Large).
- Set up Aider or OpenCode with Medium 3.5 for AI-assisted coding in the terminal.
- Learn how to run Mistral Medium 3.5 locally if you prefer self-hosting over the API.
- Compare costs across all providers in how to reduce LLM API costs.