Mistral AI Complete Model Guide — Every Model, Spec, and Use Case (2026)
📢 Update: Mistral Medium 3.5 is now available — 128B dense model replacing Medium 3.1 and Devstral 2. See the Medium 3.5 complete guide, how to run it locally, and API guide.
Mistral AI offers six main models in 2026, each optimized for different tasks. Here’s the complete breakdown with benchmarks, pricing, and recommendations for which model to use when.
The full lineup
Mistral Large 2 (123B) — Flagship reasoning
The general-purpose powerhouse. 123B dense parameters, 128K context. Competes with Claude Sonnet and GPT-4o on reasoning, coding, and multilingual tasks. Best-in-class for European languages.
- Use for: Complex reasoning, analysis, multilingual content, long documents
- Price: $2/$6 per 1M tokens (input/output)
- Context window: 128K tokens
- Run locally: 1x H100 or Mac Studio Ultra with 192GB
- Key benchmarks: MMLU 84.0%, HumanEval 92%, MATH 83%
- Full guide: Mistral Large 2 Complete Guide
Mistral Large 2 is the model to use when you need frontier-level reasoning without paying Claude Opus prices. It’s particularly strong at structured output, function calling, and multilingual tasks (30+ languages). The 128K context handles most use cases, though it’s smaller than Devstral’s 256K.
Devstral 2 (123B) — Best open coding agent
Same 123B architecture as Large 2 but fine-tuned specifically for agentic coding. 72.2% on SWE-bench Verified — matching Claude Opus. 256K context window.
- Use for: Multi-file refactoring, autonomous coding, complex bug fixes
- Price: $2/$6 per 1M tokens
- Context window: 256K tokens
- License: Modified MIT (commercial OK)
- Key benchmarks: SWE-bench 72.2%, HumanEval 92%, MBPP+ 88%
- Full guide: Devstral 2 Complete Guide
Devstral 2 is the crown jewel of Mistral’s lineup for developers. The 72.2% SWE-bench score means it can autonomously fix real-world GitHub issues at a rate matching the best closed models. The 256K context lets it reason about entire codebases. It’s the default model for Mistral’s Vibe CLI.
Devstral Small 2 (24B) — Local coding agent
The consumer-friendly Devstral. Runs on a single RTX 4090 or 32GB Mac. Still has the 256K context window.
- Use for: Local agentic coding, privacy-sensitive environments
- Price: Free (Modified MIT license)
- Context window: 256K tokens
- VRAM: ~14GB (Q4), ~26GB (Q8)
- Key benchmarks: SWE-bench ~58%, HumanEval ~82%
- Full guide: Devstral Small 2 Guide
The best local agentic coding model in its size class. Nothing else at 24B comes close on SWE-bench. The 256K context is double what competing models offer (Qwen, Gemma cap at 128K). Run it with Ollama for zero-cost coding assistance.
Codestral (22B) — Best autocomplete
Purpose-built for code completion with native Fill-in-the-Middle (FIM). 256K context, 80+ languages, 86.6% HumanEval.
- Use for: IDE autocomplete, tab completions, code suggestions
- Price: $0.30/$0.90 per 1M tokens (or free locally via Ollama)
- Context window: 256K tokens
- VRAM: ~12GB (Q4)
- Key benchmarks: HumanEval 86.6%, MBPP+ 82%, FIM accuracy 91%
- Full guide: Codestral Complete Guide
Codestral is not a chat model — it’s a completion engine. Its native FIM support means it understands code before AND after the cursor, producing contextually perfect completions. It’s the best open model for IDE autocomplete, outperforming GitHub Copilot’s base model on most benchmarks. Use it with Continue.dev, VS Code, or any editor that supports custom completion endpoints.
Mistral Small (22B) — Fast and cheap
General-purpose model optimized for speed and cost. Good enough for most tasks at a fraction of frontier model prices.
- Use for: Chat, summarization, simple coding, high-volume tasks, classification
- Price: $0.10/$0.30 per 1M tokens
- Context window: 128K tokens
- VRAM: ~12GB (Q4)
- Key benchmarks: MMLU 75%, HumanEval 72%, MT-Bench 8.1
- Run locally: Any 16GB+ GPU or 32GB Mac
Mistral Small is the workhorse for production applications where you need good-enough quality at minimal cost. At $0.10/1M input tokens, you can process massive volumes without breaking the budget. It handles summarization, classification, simple Q&A, and basic coding well. Don’t use it for complex reasoning or multi-step coding — that’s what Large 2 and Devstral are for.
Mistral Nemo (12B) — Edge deployment
The smallest Mistral model. Runs on laptops and edge devices. Apache 2.0 licensed.
- Use for: Mobile apps, edge devices, Raspberry Pi, embedded systems
- Price: Free (Apache 2.0)
- Context window: 128K tokens
- VRAM: ~7GB (Q4)
- Key benchmarks: MMLU 68%, HumanEval 65%, MT-Bench 7.6
Nemo is for deployment scenarios where every megabyte of RAM matters. It runs on a Raspberry Pi 5 with 8GB RAM (at Q4), on phones, and on minimal cloud instances. The Apache 2.0 license means zero restrictions on commercial use. Quality is limited compared to larger models, but for simple tasks (classification, extraction, basic chat) it’s more than adequate.
Complete benchmark comparison
| Model | MMLU | HumanEval | SWE-bench | MATH | Context |
|---|---|---|---|---|---|
| Large 2 | 84.0% | 92% | N/A | 83% | 128K |
| Devstral 2 | 80% | 92% | 72.2% | 78% | 256K |
| Devstral Small | 72% | 82% | 58% | 65% | 256K |
| Codestral | 70% | 86.6% | N/A | 60% | 256K |
| Small | 75% | 72% | N/A | 58% | 128K |
| Nemo | 68% | 65% | N/A | 45% | 128K |
Pricing comparison
| Model | Input (per 1M) | Output (per 1M) | Local option |
|---|---|---|---|
| Large 2 | $2.00 | $6.00 | H100 / Mac Ultra 192GB |
| Devstral 2 | $2.00 | $6.00 | H100 / Mac Ultra 192GB |
| Devstral Small | $0.10 | $0.30 | RTX 4090 / Mac 32GB |
| Codestral | $0.30 | $0.90 | RTX 3090 / Mac 32GB |
| Small | $0.10 | $0.30 | RTX 3090 / Mac 32GB |
| Nemo | Free tier | Free tier | Any 8GB+ device |
The recommended Mistral stack
For a complete AI coding setup using only Mistral models:
- Devstral 2 via API for complex agent tasks ($2/1M)
- Codestral locally for autocomplete (free)
- Mistral Small via API for quick questions ($0.10/1M)
Total cost: ~$10-30/month for heavy use. Compare that to Claude Opus at $200+/month.
Budget-conscious stack (all local, $0/month)
- Devstral Small 2 for agent tasks (14GB VRAM)
- Codestral for autocomplete (12GB VRAM)
- Nemo for quick questions (7GB VRAM)
Requires a 24GB GPU (run one at a time) or 48GB+ (run two simultaneously).
Which model for which tool?
| Tool | Best Mistral model | Why |
|---|---|---|
| Vibe CLI | Devstral 2 (native) | Built specifically for it |
| Aider | Devstral 2 or Small | Best agentic performance |
| Continue.dev | Codestral (autocomplete) + Large 2 (chat) | FIM for completions, reasoning for chat |
| OpenCode | Devstral 2 (via API) | Agentic coding focus |
| Cursor | Codestral (autocomplete) | Fast completions |
| Custom agents | Devstral Small (local) or Devstral 2 (API) | Depends on budget |
When to use Mistral vs competitors
| Need | Mistral choice | Alternative |
|---|---|---|
| Best coding agent | Devstral 2 | Claude Opus, GPT-4.5 |
| Best local coding | Devstral Small 2 | Qwen 3.5 27B |
| Best autocomplete | Codestral | GitHub Copilot |
| Cheapest good model | Mistral Small ($0.10/1M) | GPT-4o-mini ($0.15/1M) |
| European languages | Mistral Large 2 | No clear alternative |
| Edge/mobile | Nemo (7GB) | Gemma 4 9B |
Mistral’s key advantage is the open-weight licensing combined with competitive performance. You can run their models locally, modify them, and deploy commercially — something you can’t do with Claude or GPT.
FAQ
Which Mistral model is best?
It depends on your task. For agentic coding (multi-file edits, bug fixes), Devstral 2 is the best — it matches Claude Opus on SWE-bench at 72.2%. For general reasoning and analysis, Mistral Large 2 is the flagship. For IDE autocomplete, Codestral with its native FIM support is unmatched. For local coding on consumer hardware, Devstral Small 2 offers the best quality-per-VRAM ratio. There’s no single “best” — Mistral intentionally offers specialized models rather than one model that does everything.
Is Mistral free?
Partially. Devstral Small 2 (Modified MIT), Nemo (Apache 2.0), and Codestral (when run locally) are free to use with no API costs. The larger models — Large 2, Devstral 2, and Small — are available via the Mistral API at competitive prices ($0.10-$6.00 per million tokens). You can also download the open-weight models and run them locally for free if you have sufficient hardware. Mistral offers a free tier on their platform with rate limits for experimentation.
Can I run Mistral locally?
Yes, most Mistral models can run locally. Nemo (12B) needs just 7GB VRAM — it runs on laptops. Codestral and Mistral Small (22B) need ~12GB — an RTX 3090 or 32GB Mac works. Devstral Small 2 (24B) needs ~14GB. The full Large 2 and Devstral 2 (123B) require ~65GB VRAM at Q4, meaning you need an H100 or Mac Studio Ultra with 192GB. Use Ollama for the easiest setup, or vLLM for production deployments.
How does Mistral compare to Claude?
On coding tasks, Devstral 2 matches Claude Opus (72.2% vs ~72% on SWE-bench) at significantly lower cost ($2/1M vs ~$15/1M tokens). On general reasoning, Claude Sonnet and Mistral Large 2 are comparable, with Claude slightly ahead on complex analysis and Mistral ahead on multilingual tasks. The biggest difference is openness: Mistral’s models are open-weight — you can run them locally, fine-tune them, and deploy without API dependencies. Claude is API-only. For a detailed comparison, see our AI model comparison guide.
Related: What is Mistral AI? · Mistral Large 2 vs Claude Sonnet · How to Run Mistral Large 2 Locally