Mistral Medium 3.5 Complete Guide — Specs, Benchmarks, and How to Use It (2026)
Mistral Medium 3.5 is a 128B dense transformer with a 256K context window, 77.6% on SWE-bench Verified, and open weights under a modified MIT license. It replaces three separate models — Medium 3.1, Magistral, and Devstral 2 — into a single unified model that handles coding, reasoning, vision, and general tasks.
At $1.50 input / $7.50 output per million tokens, it undercuts every closed-source frontier model while running on as few as 4 GPUs for self-hosting. This guide covers everything: architecture, benchmarks, pricing, API setup, self-hosting, and how it compares to Claude Sonnet 4.6, DeepSeek V4, and the rest of the field.
What is Mistral Medium 3.5?
Mistral Medium 3.5 is Mistral AI’s new flagship model, released in April 2026. It is a dense (non-MoE) transformer with 128 billion parameters — every parameter is active on every forward pass. This matters for inference predictability and self-hosting simplicity compared to sparse MoE architectures like DeepSeek V4.
The key move: Mistral merged three previously separate model lines into one. Medium 3.1 handled general tasks, Magistral handled reasoning, and Devstral 2 handled agentic coding. Medium 3.5 replaces all three. You no longer need to route between models — one model covers everything with configurable reasoning effort per request.
Medium 3.5 also introduces a vision encoder trained from scratch (not bolted on from a separate model). It handles variable image sizes and aspect ratios natively, making it suitable for document analysis, diagram understanding, and UI screenshot interpretation.
The model ships with open weights on Hugging Face under a modified MIT license, meaning you can download, self-host, fine-tune, and deploy commercially.
Key specs
| Spec | Value |
|---|---|
| Parameters | 128B (dense) |
| Architecture | Dense transformer |
| Context window | 256K tokens |
| Vision | Yes (trained from scratch) |
| Reasoning | Configurable (none / high) |
| License | Modified MIT, open weights |
| Weights | Hugging Face |
| Self-hosting | As few as 4 GPUs |
| API input price | $1.50 / 1M tokens |
| API output price | $7.50 / 1M tokens |
| Release date | April 2026 |
| Replaces | Medium 3.1, Magistral, Devstral 2 |
Benchmarks
Mistral published two headline benchmarks for Medium 3.5:
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 77.6% |
| τ³-Telecom | 91.4 |
The 77.6% SWE-bench Verified score beats Devstral 2 (72.2%) by over 5 points and surpasses Qwen 3.5 397B despite being a much smaller dense model. It also beats DeepSeek V4 Flash (~76%) while being a fundamentally different architecture.
The τ³-Telecom score of 91.4 demonstrates strong domain-specific performance in telecommunications — relevant for enterprise customers evaluating models for vertical applications.
Mistral has not published GPQA Diamond, MMLU-Pro, or LiveCodeBench scores for Medium 3.5 at launch. Expect community benchmarks to fill these gaps in the coming weeks.
Comparison vs frontier models
Here’s how Medium 3.5 stacks up against the current frontier as of April 2026:
| Metric | Mistral Medium 3.5 | Claude Sonnet 4.6 | DeepSeek V4 Pro | DeepSeek V4 Flash | GPT-5.4 | Kimi K2.6 | Gemini 3.1 Pro |
|---|---|---|---|---|---|---|---|
| SWE-bench Verified | 77.6% | 79.6% | 80.6% | ~76% | ~78%* | ~75%* | ~77%* |
| Architecture | Dense 128B | Dense (closed) | MoE 1.6T/49B | MoE 284B/13B | Dense (closed) | MoE (closed) | Dense (closed) |
| Context window | 256K | 1M (beta) | 1M | 1M | 256K | 256K | 2M |
| Input $/M tokens | $1.50 | $3.00 | $1.74 | $0.14 | $5.00* | $1.00* | $3.50* |
| Output $/M tokens | $7.50 | $15.00 | $3.48 | $0.28 | $15.00* | $4.00* | $10.50* |
| Open weights | Yes (modified MIT) | No | Yes (MIT) | Yes (MIT) | No | No | No |
| Self-hosting | 4 GPUs | N/A | 8× H100 | 1× H200 | N/A | N/A | N/A |
*Approximate scores and pricing based on available data. GPT-5.4, Kimi K2.6, and Gemini 3.1 Pro scores are estimates from community benchmarks.
Key takeaways:
-
vs Claude Sonnet 4.6: Sonnet leads by 2 points on SWE-bench but costs 2× more on both input and output. Sonnet has 1M context (beta) vs 256K. Medium 3.5 is open-weight; Sonnet is closed. If you need open weights or lower cost, Medium 3.5 wins. If you need maximum coding accuracy or huge context, Sonnet 4.6 edges ahead.
-
vs DeepSeek V4 Pro: V4 Pro leads by 3 points on SWE-bench and has 1M context, but it’s a 1.6T MoE model requiring 8× H100 GPUs to self-host. Medium 3.5 runs on 4 GPUs. V4 Pro output is cheaper ($3.48 vs $7.50), but Medium 3.5 is simpler to deploy.
-
vs DeepSeek V4 Flash: Medium 3.5 beats V4 Flash on SWE-bench (77.6% vs ~76%) but V4 Flash is dramatically cheaper ($0.28/M output). V4 Flash is the budget pick; Medium 3.5 is the quality pick among open-weight models.
How to use Mistral Medium 3.5
Via the Mistral API
The fastest way to start. Get an API key from La Plateforme and call the model directly. For a full walkthrough, see our Mistral API guide and Mistral Medium 3.5 API guide.
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{"role": "user", "content": "Refactor this function to use async/await"}
]
)
print(response.choices[0].message.content)
Via Vibe CLI
Mistral Vibe CLI is Mistral’s terminal-based coding agent. With Vibe 2.0, you get custom subagents, slash-command skills, and — as of April 2026 — remote agents that run async in the cloud.
Medium 3.5 powers the premium tier of Vibe CLI. You can start a local session and teleport it to the cloud mid-task:
# Install Vibe CLI
npm install -g @mistralai/vibe
# Start a coding session with Medium 3.5
vibe --model mistral-medium-3.5
# Or use remote agents (async cloud sessions)
vibe remote start --task "Add pagination to the /users endpoint"
Vibe CLI is included with Le Chat Pro ($14.99/month) and Le Chat Team ($24.99/seat/month).
Via Le Chat
Le Chat is Mistral’s web interface. The new Work Mode (preview) uses Medium 3.5 for multi-step agentic workflows — cross-tool tasks like email triage, Jira issue creation, Slack summaries, and research synthesis. It’s the non-developer entry point to Medium 3.5.
Self-hosted (vLLM, SGLang, Ollama)
Medium 3.5’s 128B dense architecture runs on as few as 4 GPUs with FP8 quantization. For detailed self-hosting instructions, see our guide to running Mistral Medium 3.5 locally and running Mistral models locally.
vLLM:
vllm serve mistralai/Mistral-Medium-3.5 \
--tensor-parallel-size 4 \
--max-model-len 262144 \
--dtype float16
SGLang:
python -m sglang.launch_server \
--model-path mistralai/Mistral-Medium-3.5 \
--tp 4
Ollama:
ollama pull mistral-medium-3.5
ollama run mistral-medium-3.5
For more on Ollama model options, see our best Ollama models for coding in 2026.
Configurable reasoning effort
Medium 3.5 supports configurable reasoning effort per request — you control how much “thinking” the model does before responding. This replaces the need for separate reasoning models like Magistral.
Two modes:
- None: Fast responses, minimal chain-of-thought. Best for simple tasks, autocomplete, classification, and high-throughput workloads.
- High: Extended reasoning with internal chain-of-thought. Best for complex coding tasks, multi-step debugging, math, and planning.
# Fast mode — no extended reasoning
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "What's 2+2?"}],
reasoning_effort="none"
)
# Deep reasoning mode
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[{"role": "user", "content": "Debug this race condition in my Go service"}],
reasoning_effort="high"
)
This is similar to how DeepSeek V4 offers Non-think / Think High / Think Max modes, and how Claude Sonnet 4.6 supports adaptive thinking. The difference: Medium 3.5 is one model with a simple toggle, not separate model variants.
Vision capabilities
Medium 3.5 includes a vision encoder trained from scratch — not a CLIP adapter or a bolted-on module from another model. It handles:
- Variable image sizes and aspect ratios (no forced resizing)
- Document and diagram understanding
- UI screenshot analysis
- Chart and table extraction
- Multi-image inputs in a single request
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's wrong with this UI layout?"},
{"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
]
}
]
)
The trained-from-scratch approach means the vision encoder and language model share training signal from the start, which typically produces better visual grounding than post-hoc adapter approaches.
What’s new vs Medium 3.1
| Feature | Medium 3.1 | Medium 3.5 |
|---|---|---|
| SWE-bench Verified | ~65%* | 77.6% |
| Merged model | No (separate Magistral + Devstral 2) | Yes (one model for everything) |
| Vision | Limited | Trained from scratch |
| Reasoning | Fixed | Configurable (none / high) |
| Replaces Devstral 2 | No | Yes |
| Replaces Magistral | No | Yes |
| Remote agents (Vibe) | No | Yes |
| Context window | 128K | 256K |
*Medium 3.1 SWE-bench score is approximate.
The biggest change is the merger. Instead of routing between three models depending on the task, you use one model and adjust reasoning effort. This simplifies deployment, reduces API complexity, and means Mistral can focus optimization on a single architecture.
The coding improvement is substantial — jumping from Devstral 2’s 72.2% to 77.6% on SWE-bench while also handling general tasks and reasoning that previously required separate models.
Pricing comparison
| Model | Input ($/M tokens) | Output ($/M tokens) | Open weights |
|---|---|---|---|
| Mistral Medium 3.5 | $1.50 | $7.50 | Yes |
| Devstral 2 | $0.40 | $2.00 | Yes |
| Devstral 2 Small | $0.10 | $0.30 | Yes |
| Claude Sonnet 4.6 | $3.00 | $15.00 | No |
| DeepSeek V4 Pro | $1.74 | $3.48 | Yes |
| DeepSeek V4 Flash | $0.14 | $0.28 | Yes |
| GPT-5.4 | ~$5.00 | ~$15.00 | No |
| Gemini 3.1 Pro | ~$3.50 | ~$10.50 | No |
Medium 3.5 sits in the mid-range: cheaper than all closed-source frontier models, more expensive than DeepSeek V4 variants. The value proposition is the combination of open weights + dense architecture + strong coding performance at a price point that’s half of Claude Sonnet 4.6.
For subscription access, Le Chat Pro at $14.99/month includes Vibe CLI with Medium 3.5 and Devstral 2. Le Chat Team is $24.99/seat/month. Students get 50% off Pro.
FAQ
Is Mistral Medium 3.5 open source?
Yes, with a caveat. The weights are available on Hugging Face under a modified MIT license. “Modified MIT” means it’s permissive but not identical to pure MIT — check the license file for specific restrictions. For most commercial use cases, it’s functionally open.
Does Medium 3.5 replace Devstral 2?
Yes. Medium 3.5 subsumes Devstral 2’s coding capabilities while adding general reasoning, vision, and configurable thinking. Devstral 2 and Devstral 2 Small remain available for cost-sensitive coding-only workloads, but Medium 3.5 is the recommended model going forward.
How many GPUs do I need to self-host Medium 3.5?
Mistral states “as few as 4 GPUs.” With FP8 quantization on 4× A100 80GB or 4× H100 80GB, you can serve the full 128B model. For quantized variants (GGUF), check our guide to running Mistral Medium 3.5 locally.
How does Medium 3.5 compare to Claude Sonnet 4.6 for coding?
Sonnet 4.6 scores 79.6% on SWE-bench vs Medium 3.5’s 77.6% — a 2-point lead. But Sonnet costs 2× more ($3/$15 vs $1.50/$7.50) and is closed-source. If you need open weights, self-hosting, or lower API costs, Medium 3.5 is the better choice. If you need maximum accuracy and don’t care about open weights, Sonnet 4.6 has the edge.
Can I use Medium 3.5 with Ollama?
Yes. Once the GGUF quantizations are available (typically within days of release), you can run it via Ollama. The full 128B model requires significant RAM/VRAM, but quantized versions (Q4_K_M, Q5_K_M) bring requirements down to consumer hardware with enough unified memory (e.g., Mac Studio with 192GB).
What’s the difference between reasoning effort “none” and “high”?
“None” skips extended chain-of-thought reasoning — faster responses, lower token usage, suitable for simple tasks. “High” enables deep reasoning with internal thinking steps — slower but significantly better for complex coding, debugging, math, and multi-step planning. You pay for output tokens either way, so “high” costs more per request due to longer outputs.
Does Medium 3.5 support function calling and tool use?
Yes. Medium 3.5 supports function calling, structured outputs (JSON mode), and tool use — the same capabilities needed for agentic workflows in Vibe CLI and Le Chat Work Mode.
Bottom line
Mistral Medium 3.5 is the strongest open-weight dense model for coding as of April 2026. At 77.6% SWE-bench, it beats everything except the largest closed-source models (Claude Sonnet 4.6, DeepSeek V4 Pro) while costing half as much as Anthropic’s offering and running on 4 GPUs.
The model merger strategy is the real story. One model replacing three simplifies everything — deployment, routing, cost management. If you were using Devstral 2 for coding and Magistral for reasoning, you now use Medium 3.5 for both.
For API setup, see our Mistral Medium 3.5 API guide. For self-hosting, see how to run Mistral Medium 3.5 locally. For the CLI experience, check out what is Mistral Vibe CLI.