🤖 AI Tools
· 7 min read
Last updated on

Mistral AI Complete Model Guide — Every Model, Spec, and Use Case (2026)


📢 Update: Mistral Medium 3.5 is now available — 128B dense model replacing Medium 3.1 and Devstral 2. See the Medium 3.5 complete guide, how to run it locally, and API guide.

Mistral AI offers six main models in 2026, each optimized for different tasks. Here’s the complete breakdown with benchmarks, pricing, and recommendations for which model to use when.

The full lineup

Mistral Large 2 (123B) — Flagship reasoning

The general-purpose powerhouse. 123B dense parameters, 128K context. Competes with Claude Sonnet and GPT-4o on reasoning, coding, and multilingual tasks. Best-in-class for European languages.

  • Use for: Complex reasoning, analysis, multilingual content, long documents
  • Price: $2/$6 per 1M tokens (input/output)
  • Context window: 128K tokens
  • Run locally: 1x H100 or Mac Studio Ultra with 192GB
  • Key benchmarks: MMLU 84.0%, HumanEval 92%, MATH 83%
  • Full guide: Mistral Large 2 Complete Guide

Mistral Large 2 is the model to use when you need frontier-level reasoning without paying Claude Opus prices. It’s particularly strong at structured output, function calling, and multilingual tasks (30+ languages). The 128K context handles most use cases, though it’s smaller than Devstral’s 256K.

Devstral 2 (123B) — Best open coding agent

Same 123B architecture as Large 2 but fine-tuned specifically for agentic coding. 72.2% on SWE-bench Verified — matching Claude Opus. 256K context window.

  • Use for: Multi-file refactoring, autonomous coding, complex bug fixes
  • Price: $2/$6 per 1M tokens
  • Context window: 256K tokens
  • License: Modified MIT (commercial OK)
  • Key benchmarks: SWE-bench 72.2%, HumanEval 92%, MBPP+ 88%
  • Full guide: Devstral 2 Complete Guide

Devstral 2 is the crown jewel of Mistral’s lineup for developers. The 72.2% SWE-bench score means it can autonomously fix real-world GitHub issues at a rate matching the best closed models. The 256K context lets it reason about entire codebases. It’s the default model for Mistral’s Vibe CLI.

Devstral Small 2 (24B) — Local coding agent

The consumer-friendly Devstral. Runs on a single RTX 4090 or 32GB Mac. Still has the 256K context window.

  • Use for: Local agentic coding, privacy-sensitive environments
  • Price: Free (Modified MIT license)
  • Context window: 256K tokens
  • VRAM: ~14GB (Q4), ~26GB (Q8)
  • Key benchmarks: SWE-bench ~58%, HumanEval ~82%
  • Full guide: Devstral Small 2 Guide

The best local agentic coding model in its size class. Nothing else at 24B comes close on SWE-bench. The 256K context is double what competing models offer (Qwen, Gemma cap at 128K). Run it with Ollama for zero-cost coding assistance.

Codestral (22B) — Best autocomplete

Purpose-built for code completion with native Fill-in-the-Middle (FIM). 256K context, 80+ languages, 86.6% HumanEval.

  • Use for: IDE autocomplete, tab completions, code suggestions
  • Price: $0.30/$0.90 per 1M tokens (or free locally via Ollama)
  • Context window: 256K tokens
  • VRAM: ~12GB (Q4)
  • Key benchmarks: HumanEval 86.6%, MBPP+ 82%, FIM accuracy 91%
  • Full guide: Codestral Complete Guide

Codestral is not a chat model — it’s a completion engine. Its native FIM support means it understands code before AND after the cursor, producing contextually perfect completions. It’s the best open model for IDE autocomplete, outperforming GitHub Copilot’s base model on most benchmarks. Use it with Continue.dev, VS Code, or any editor that supports custom completion endpoints.

Mistral Small (22B) — Fast and cheap

General-purpose model optimized for speed and cost. Good enough for most tasks at a fraction of frontier model prices.

  • Use for: Chat, summarization, simple coding, high-volume tasks, classification
  • Price: $0.10/$0.30 per 1M tokens
  • Context window: 128K tokens
  • VRAM: ~12GB (Q4)
  • Key benchmarks: MMLU 75%, HumanEval 72%, MT-Bench 8.1
  • Run locally: Any 16GB+ GPU or 32GB Mac

Mistral Small is the workhorse for production applications where you need good-enough quality at minimal cost. At $0.10/1M input tokens, you can process massive volumes without breaking the budget. It handles summarization, classification, simple Q&A, and basic coding well. Don’t use it for complex reasoning or multi-step coding — that’s what Large 2 and Devstral are for.

Mistral Nemo (12B) — Edge deployment

The smallest Mistral model. Runs on laptops and edge devices. Apache 2.0 licensed.

  • Use for: Mobile apps, edge devices, Raspberry Pi, embedded systems
  • Price: Free (Apache 2.0)
  • Context window: 128K tokens
  • VRAM: ~7GB (Q4)
  • Key benchmarks: MMLU 68%, HumanEval 65%, MT-Bench 7.6

Nemo is for deployment scenarios where every megabyte of RAM matters. It runs on a Raspberry Pi 5 with 8GB RAM (at Q4), on phones, and on minimal cloud instances. The Apache 2.0 license means zero restrictions on commercial use. Quality is limited compared to larger models, but for simple tasks (classification, extraction, basic chat) it’s more than adequate.

Complete benchmark comparison

ModelMMLUHumanEvalSWE-benchMATHContext
Large 284.0%92%N/A83%128K
Devstral 280%92%72.2%78%256K
Devstral Small72%82%58%65%256K
Codestral70%86.6%N/A60%256K
Small75%72%N/A58%128K
Nemo68%65%N/A45%128K

Pricing comparison

ModelInput (per 1M)Output (per 1M)Local option
Large 2$2.00$6.00H100 / Mac Ultra 192GB
Devstral 2$2.00$6.00H100 / Mac Ultra 192GB
Devstral Small$0.10$0.30RTX 4090 / Mac 32GB
Codestral$0.30$0.90RTX 3090 / Mac 32GB
Small$0.10$0.30RTX 3090 / Mac 32GB
NemoFree tierFree tierAny 8GB+ device

For a complete AI coding setup using only Mistral models:

  1. Devstral 2 via API for complex agent tasks ($2/1M)
  2. Codestral locally for autocomplete (free)
  3. Mistral Small via API for quick questions ($0.10/1M)

Total cost: ~$10-30/month for heavy use. Compare that to Claude Opus at $200+/month.

Budget-conscious stack (all local, $0/month)

  1. Devstral Small 2 for agent tasks (14GB VRAM)
  2. Codestral for autocomplete (12GB VRAM)
  3. Nemo for quick questions (7GB VRAM)

Requires a 24GB GPU (run one at a time) or 48GB+ (run two simultaneously).

Which model for which tool?

ToolBest Mistral modelWhy
Vibe CLIDevstral 2 (native)Built specifically for it
AiderDevstral 2 or SmallBest agentic performance
Continue.devCodestral (autocomplete) + Large 2 (chat)FIM for completions, reasoning for chat
OpenCodeDevstral 2 (via API)Agentic coding focus
CursorCodestral (autocomplete)Fast completions
Custom agentsDevstral Small (local) or Devstral 2 (API)Depends on budget

When to use Mistral vs competitors

NeedMistral choiceAlternative
Best coding agentDevstral 2Claude Opus, GPT-4.5
Best local codingDevstral Small 2Qwen 3.5 27B
Best autocompleteCodestralGitHub Copilot
Cheapest good modelMistral Small ($0.10/1M)GPT-4o-mini ($0.15/1M)
European languagesMistral Large 2No clear alternative
Edge/mobileNemo (7GB)Gemma 4 9B

Mistral’s key advantage is the open-weight licensing combined with competitive performance. You can run their models locally, modify them, and deploy commercially — something you can’t do with Claude or GPT.

FAQ

Which Mistral model is best?

It depends on your task. For agentic coding (multi-file edits, bug fixes), Devstral 2 is the best — it matches Claude Opus on SWE-bench at 72.2%. For general reasoning and analysis, Mistral Large 2 is the flagship. For IDE autocomplete, Codestral with its native FIM support is unmatched. For local coding on consumer hardware, Devstral Small 2 offers the best quality-per-VRAM ratio. There’s no single “best” — Mistral intentionally offers specialized models rather than one model that does everything.

Is Mistral free?

Partially. Devstral Small 2 (Modified MIT), Nemo (Apache 2.0), and Codestral (when run locally) are free to use with no API costs. The larger models — Large 2, Devstral 2, and Small — are available via the Mistral API at competitive prices ($0.10-$6.00 per million tokens). You can also download the open-weight models and run them locally for free if you have sufficient hardware. Mistral offers a free tier on their platform with rate limits for experimentation.

Can I run Mistral locally?

Yes, most Mistral models can run locally. Nemo (12B) needs just 7GB VRAM — it runs on laptops. Codestral and Mistral Small (22B) need ~12GB — an RTX 3090 or 32GB Mac works. Devstral Small 2 (24B) needs ~14GB. The full Large 2 and Devstral 2 (123B) require ~65GB VRAM at Q4, meaning you need an H100 or Mac Studio Ultra with 192GB. Use Ollama for the easiest setup, or vLLM for production deployments.

How does Mistral compare to Claude?

On coding tasks, Devstral 2 matches Claude Opus (72.2% vs ~72% on SWE-bench) at significantly lower cost ($2/1M vs ~$15/1M tokens). On general reasoning, Claude Sonnet and Mistral Large 2 are comparable, with Claude slightly ahead on complex analysis and Mistral ahead on multilingual tasks. The biggest difference is openness: Mistral’s models are open-weight — you can run them locally, fine-tune them, and deploy without API dependencies. Claude is API-only. For a detailed comparison, see our AI model comparison guide.

Related: What is Mistral AI? · Mistral Large 2 vs Claude Sonnet · How to Run Mistral Large 2 Locally