Mistral Large 2 Complete Guide β Europe's 123B Frontier Model (2026)
Mistral Large 2 is a 123-billion-parameter dense model from Mistral AI, the leading European AI company. It achieves roughly 95% of Llama 3.1 405Bβs performance while using only 30% of the compute β making it one of the most efficient frontier-class models available.
Why Mistral Large 2 matters
In a world dominated by American and Chinese AI labs, Mistral is Europeβs answer. Based in Paris, theyβve built a model that competes with GPT-5 and Claude on reasoning and coding while being small enough to run on a single server node.
Key advantages:
- 123B dense β no MoE complexity, simpler to deploy than GLM-5.1 (754B) or Kimi K2.5 (1T)
- 128K context β handles large codebases and documents
- Single-node inference β runs on 1x H100 or 2x A100, no multi-node clusters needed
- Strong multilingual β best-in-class for European languages
- Function calling β native tool use for agent workflows
Specs
| Spec | Mistral Large 2 |
|---|---|
| Parameters | 123B (dense) |
| Context window | 128K tokens |
| Architecture | Dense transformer |
| Languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi |
| Function calling | β Native |
| License | Mistral Research License (non-commercial) |
| Quantized size | ~65GB (Q4) |
Benchmarks
| Benchmark | Mistral Large 2 | Llama 3.1 405B | Claude Sonnet 4.6 |
|---|---|---|---|
| MMLU | 84.0 | 85.2 | 86.8 |
| HumanEval | 84.1 | 80.5 | 88.7 |
| MATH | 75.0 | 73.8 | 78.3 |
| MT-Bench | 8.65 | 8.52 | 8.81 |
Mistral Large 2 consistently lands between Llama 405B and Claude Sonnet β impressive for a model 3x smaller than Llama.
How to use Mistral Large 2
Via Mistral API
from mistralai import Mistral
client = Mistral(api_key="your-mistral-key")
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "system", "content": "You are a senior software engineer."},
{"role": "user", "content": "Design a rate limiter for a REST API"}
]
)
Via OpenRouter
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="mistralai/mistral-large-2",
messages=[{"role": "user", "content": "Explain the CAP theorem"}]
)
See our OpenRouter guide for more options.
Run locally
At 123B parameters, Mistral Large 2 is the largest model you can realistically run on a single high-end GPU:
| Setup | Speed | Usable? |
|---|---|---|
| 1x H100 (80GB) | ~30 tok/s | β Excellent |
| 2x A100 (160GB) | ~25 tok/s | β Good |
| 4x RTX 4090 (96GB) | ~10 tok/s (Q4) | β οΈ Slow |
| Mac Studio Ultra 192GB | ~5-8 tok/s (Q4) | β οΈ Usable |
# With Ollama (if you have the VRAM)
ollama pull mistral-large:123b
# With vLLM
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-Large-Instruct-2411 \
--tensor-parallel-size 2
For most developers, the API is more practical. For local alternatives at this quality level, consider Qwen 3.5 72B which is MIT licensed.
Pricing
| Provider | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Mistral API | $2.00 | $6.00 |
| OpenRouter | ~$2.00 | ~$6.00 |
| Self-hosted | Hardware only | β |
Competitive with Gemini 3.1 Pro and significantly cheaper than Claude Opus.
Mistral Large 2 vs the competition
| Mistral Large 2 | Qwen 3.5 Plus | DeepSeek V3 | Claude Sonnet 4.6 | |
|---|---|---|---|---|
| Params | 123B dense | 400B+ MoE | 671B MoE | Unknown |
| License | Research only | Apache 2.0 | MIT | Proprietary |
| Multilingual | Best | Good | Good | Good |
| Coding | Very good | Very good | Excellent | Excellent |
| Price | $2/$6 | $0.26/$1.56 | $0.27/$1.10 | $3/$15 |
| Self-host | 1 node | Multi-node | Multi-node | No |
Pick Mistral Large 2 for: European language tasks, single-node self-hosting, balanced quality/cost.
Pick Qwen/DeepSeek for: Cheapest API pricing, MIT license for commercial use.
Pick Claude for: Best coding quality, willing to pay premium.
The Mistral ecosystem
Mistral offers a family of models for different needs:
| Model | Size | Best for |
|---|---|---|
| Mistral Large 2 | 123B | Complex reasoning, coding |
| Codestral | 22B | Code completion, FIM |
| Codestral Embed | β | Code search, RAG |
| Mistral Small | 22B | Fast, cheap general tasks |
| Mistral Nemo | 12B | Edge deployment |
The combination of Mistral Large 2 for reasoning + Codestral for autocomplete is one of the best AI coding setups available.
Bottom line
Mistral Large 2 is the best frontier model you can run on a single server. It wonβt beat Claude Opus or GPT-5.4 on raw benchmarks, but itβs close enough that the cost and deployment advantages matter. For European companies with data sovereignty requirements, itβs the obvious choice.
FAQ
Is Mistral Large 2 free?
The model weights are available under a permissive license for self-hosting at no cost. API access through Mistralβs platform has a free tier for experimentation, with competitive per-token pricing for production use.
Can I run it locally?
Yes, Mistral Large 2 can run on a single high-end server with 4x A100 or equivalent GPUs. Quantized versions reduce requirements further, making it one of the most deployable frontier-class models available.
How does it compare to Claude Sonnet?
Mistral Large 2 trades blows with Claude Sonnet across most benchmarks, with Sonnet slightly ahead on creative and instruction-following tasks. Mistral Large 2 wins on deployment flexibility, European data sovereignty, and cost when self-hosted.
Is Mistral Large good for coding?
Yes, Mistral Large 2 performs well on coding tasks and supports 80+ programming languages with strong multi-file understanding. For dedicated coding work, Mistral also offers Codestral and Devstral which are specifically optimized for code generation.
Related: Codestral Complete Guide Β· Mistral Large 2 vs Claude Sonnet Β· Best AI Models for Coding Locally