Apr 11, 2026 · 7 min read

Last updated on Apr 19, 2026

Mistral AI Complete Model Guide — Every Model, Spec, and Use Case (2026)

📢 Update: Mistral Medium 3.5 is now available — 128B dense model replacing Medium 3.1 and Devstral 2. See the Medium 3.5 complete guide, how to run it locally, and API guide.

Mistral AI offers six main models in 2026, each optimized for different tasks. Here’s the complete breakdown with benchmarks, pricing, and recommendations for which model to use when.

The full lineup

Mistral Large 2 (123B) — Flagship reasoning

The general-purpose powerhouse. 123B dense parameters, 128K context. Competes with Claude Sonnet and GPT-4o on reasoning, coding, and multilingual tasks. Best-in-class for European languages.

Use for: Complex reasoning, analysis, multilingual content, long documents
Price: $2/$6 per 1M tokens (input/output)
Context window: 128K tokens
Run locally: 1x H100 or Mac Studio Ultra with 192GB
Key benchmarks: MMLU 84.0%, HumanEval 92%, MATH 83%
Full guide: Mistral Large 2 Complete Guide

Mistral Large 2 is the model to use when you need frontier-level reasoning without paying Claude Opus prices. It’s particularly strong at structured output, function calling, and multilingual tasks (30+ languages). The 128K context handles most use cases, though it’s smaller than Devstral’s 256K.

Devstral 2 (123B) — Best open coding agent

Same 123B architecture as Large 2 but fine-tuned specifically for agentic coding. 72.2% on SWE-bench Verified — matching Claude Opus. 256K context window.

Use for: Multi-file refactoring, autonomous coding, complex bug fixes
Price: $2/$6 per 1M tokens
Context window: 256K tokens
License: Modified MIT (commercial OK)
Key benchmarks: SWE-bench 72.2%, HumanEval 92%, MBPP+ 88%
Full guide: Devstral 2 Complete Guide

Devstral 2 is the crown jewel of Mistral’s lineup for developers. The 72.2% SWE-bench score means it can autonomously fix real-world GitHub issues at a rate matching the best closed models. The 256K context lets it reason about entire codebases. It’s the default model for Mistral’s Vibe CLI.

Devstral Small 2 (24B) — Local coding agent

The consumer-friendly Devstral. Runs on a single RTX 4090 or 32GB Mac. Still has the 256K context window.

Use for: Local agentic coding, privacy-sensitive environments
Price: Free (Modified MIT license)
Context window: 256K tokens
VRAM: ~14GB (Q4), ~26GB (Q8)
Key benchmarks: SWE-bench ~58%, HumanEval ~82%
Full guide: Devstral Small 2 Guide

The best local agentic coding model in its size class. Nothing else at 24B comes close on SWE-bench. The 256K context is double what competing models offer (Qwen, Gemma cap at 128K). Run it with Ollama for zero-cost coding assistance.

Codestral (22B) — Best autocomplete

Purpose-built for code completion with native Fill-in-the-Middle (FIM). 256K context, 80+ languages, 86.6% HumanEval.

Use for: IDE autocomplete, tab completions, code suggestions
Price: $0.30/$0.90 per 1M tokens (or free locally via Ollama)
Context window: 256K tokens
VRAM: ~12GB (Q4)
Key benchmarks: HumanEval 86.6%, MBPP+ 82%, FIM accuracy 91%
Full guide: Codestral Complete Guide

Codestral is not a chat model — it’s a completion engine. Its native FIM support means it understands code before AND after the cursor, producing contextually perfect completions. It’s the best open model for IDE autocomplete, outperforming GitHub Copilot’s base model on most benchmarks. Use it with Continue.dev, VS Code, or any editor that supports custom completion endpoints.

Mistral Small (22B) — Fast and cheap

General-purpose model optimized for speed and cost. Good enough for most tasks at a fraction of frontier model prices.

Use for: Chat, summarization, simple coding, high-volume tasks, classification
Price: $0.10/$0.30 per 1M tokens
Context window: 128K tokens
VRAM: ~12GB (Q4)
Key benchmarks: MMLU 75%, HumanEval 72%, MT-Bench 8.1
Run locally: Any 16GB+ GPU or 32GB Mac

Mistral Small is the workhorse for production applications where you need good-enough quality at minimal cost. At $0.10/1M input tokens, you can process massive volumes without breaking the budget. It handles summarization, classification, simple Q&A, and basic coding well. Don’t use it for complex reasoning or multi-step coding — that’s what Large 2 and Devstral are for.

Mistral Nemo (12B) — Edge deployment

The smallest Mistral model. Runs on laptops and edge devices. Apache 2.0 licensed.

Use for: Mobile apps, edge devices, Raspberry Pi, embedded systems
Price: Free (Apache 2.0)
Context window: 128K tokens
VRAM: ~7GB (Q4)
Key benchmarks: MMLU 68%, HumanEval 65%, MT-Bench 7.6

Nemo is for deployment scenarios where every megabyte of RAM matters. It runs on a Raspberry Pi 5 with 8GB RAM (at Q4), on phones, and on minimal cloud instances. The Apache 2.0 license means zero restrictions on commercial use. Quality is limited compared to larger models, but for simple tasks (classification, extraction, basic chat) it’s more than adequate.

Complete benchmark comparison

Model	MMLU	HumanEval	SWE-bench	MATH	Context
Large 2	84.0%	92%	N/A	83%	128K
Devstral 2	80%	92%	72.2%	78%	256K
Devstral Small	72%	82%	58%	65%	256K
Codestral	70%	86.6%	N/A	60%	256K
Small	75%	72%	N/A	58%	128K
Nemo	68%	65%	N/A	45%	128K

Pricing comparison

Model	Input (per 1M)	Output (per 1M)	Local option
Large 2	$2.00	$6.00	H100 / Mac Ultra 192GB
Devstral 2	$2.00	$6.00	H100 / Mac Ultra 192GB
Devstral Small	$0.10	$0.30	RTX 4090 / Mac 32GB
Codestral	$0.30	$0.90	RTX 3090 / Mac 32GB
Small	$0.10	$0.30	RTX 3090 / Mac 32GB
Nemo	Free tier	Free tier	Any 8GB+ device

The recommended Mistral stack

For a complete AI coding setup using only Mistral models:

Devstral 2 via API for complex agent tasks ($2/1M)
Codestral locally for autocomplete (free)
Mistral Small via API for quick questions ($0.10/1M)

Total cost: ~$10-30/month for heavy use. Compare that to Claude Opus at $200+/month.

Budget-conscious stack (all local, $0/month)

Devstral Small 2 for agent tasks (14GB VRAM)
Codestral for autocomplete (12GB VRAM)
Nemo for quick questions (7GB VRAM)

Requires a 24GB GPU (run one at a time) or 48GB+ (run two simultaneously).

Which model for which tool?

Tool	Best Mistral model	Why
Vibe CLI	Devstral 2 (native)	Built specifically for it
Aider	Devstral 2 or Small	Best agentic performance
Continue.dev	Codestral (autocomplete) + Large 2 (chat)	FIM for completions, reasoning for chat
OpenCode	Devstral 2 (via API)	Agentic coding focus
Cursor	Codestral (autocomplete)	Fast completions
Custom agents	Devstral Small (local) or Devstral 2 (API)	Depends on budget

When to use Mistral vs competitors

Need	Mistral choice	Alternative
Best coding agent	Devstral 2	Claude Opus, GPT-4.5
Best local coding	Devstral Small 2	Qwen 3.5 27B
Best autocomplete	Codestral	GitHub Copilot
Cheapest good model	Mistral Small ($0.10/1M)	GPT-4o-mini ($0.15/1M)
European languages	Mistral Large 2	No clear alternative
Edge/mobile	Nemo (7GB)	Gemma 4 9B

Mistral’s key advantage is the open-weight licensing combined with competitive performance. You can run their models locally, modify them, and deploy commercially — something you can’t do with Claude or GPT.

FAQ

Which Mistral model is best?

It depends on your task. For agentic coding (multi-file edits, bug fixes), Devstral 2 is the best — it matches Claude Opus on SWE-bench at 72.2%. For general reasoning and analysis, Mistral Large 2 is the flagship. For IDE autocomplete, Codestral with its native FIM support is unmatched. For local coding on consumer hardware, Devstral Small 2 offers the best quality-per-VRAM ratio. There’s no single “best” — Mistral intentionally offers specialized models rather than one model that does everything.

Is Mistral free?

Partially. Devstral Small 2 (Modified MIT), Nemo (Apache 2.0), and Codestral (when run locally) are free to use with no API costs. The larger models — Large 2, Devstral 2, and Small — are available via the Mistral API at competitive prices ($0.10-$6.00 per million tokens). You can also download the open-weight models and run them locally for free if you have sufficient hardware. Mistral offers a free tier on their platform with rate limits for experimentation.

Can I run Mistral locally?

Yes, most Mistral models can run locally. Nemo (12B) needs just 7GB VRAM — it runs on laptops. Codestral and Mistral Small (22B) need ~12GB — an RTX 3090 or 32GB Mac works. Devstral Small 2 (24B) needs ~14GB. The full Large 2 and Devstral 2 (123B) require ~65GB VRAM at Q4, meaning you need an H100 or Mac Studio Ultra with 192GB. Use Ollama for the easiest setup, or vLLM for production deployments.

How does Mistral compare to Claude?

On coding tasks, Devstral 2 matches Claude Opus (72.2% vs ~72% on SWE-bench) at significantly lower cost ($2/1M vs ~$15/1M tokens). On general reasoning, Claude Sonnet and Mistral Large 2 are comparable, with Claude slightly ahead on complex analysis and Mistral ahead on multilingual tasks. The biggest difference is openness: Mistral’s models are open-weight — you can run them locally, fine-tune them, and deploy without API dependencies. Claude is API-only. For a detailed comparison, see our AI model comparison guide.