Apr 18, 2026 · 4 min read

Last updated on Apr 19, 2026

Mistral Large 2 Complete Guide — Europe's 123B Frontier Model (2026)

Mistral Large 2 is a 123-billion-parameter dense model from Mistral AI, the leading European AI company. It achieves roughly 95% of Llama 3.1 405B’s performance while using only 30% of the compute — making it one of the most efficient frontier-class models available.

Why Mistral Large 2 matters

In a world dominated by American and Chinese AI labs, Mistral is Europe’s answer. Based in Paris, they’ve built a model that competes with GPT-5 and Claude on reasoning and coding while being small enough to run on a single server node.

Key advantages:

123B dense — no MoE complexity, simpler to deploy than GLM-5.1 (754B) or Kimi K2.5 (1T)
128K context — handles large codebases and documents
Single-node inference — runs on 1x H100 or 2x A100, no multi-node clusters needed
Strong multilingual — best-in-class for European languages
Function calling — native tool use for agent workflows

Specs

Spec	Mistral Large 2
Parameters	123B (dense)
Context window	128K tokens
Architecture	Dense transformer
Languages	English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi
Function calling	✅ Native
License	Mistral Research License (non-commercial)
Quantized size	~65GB (Q4)

Benchmarks

Benchmark	Mistral Large 2	Llama 3.1 405B	Claude Sonnet 4.6
MMLU	84.0	85.2	86.8
HumanEval	84.1	80.5	88.7
MATH	75.0	73.8	78.3
MT-Bench	8.65	8.52	8.81

Mistral Large 2 consistently lands between Llama 405B and Claude Sonnet — impressive for a model 3x smaller than Llama.

How to use Mistral Large 2

Via Mistral API

from mistralai import Mistral

client = Mistral(api_key="your-mistral-key")

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "system", "content": "You are a senior software engineer."},
        {"role": "user", "content": "Design a rate limiter for a REST API"}
    ]
)

Via OpenRouter

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

response = client.chat.completions.create(
    model="mistralai/mistral-large-2",
    messages=[{"role": "user", "content": "Explain the CAP theorem"}]
)

See our OpenRouter guide for more options.

Run locally

At 123B parameters, Mistral Large 2 is the largest model you can realistically run on a single high-end GPU:

Setup	Speed	Usable?
1x H100 (80GB)	~30 tok/s	✅ Excellent
2x A100 (160GB)	~25 tok/s	✅ Good
4x RTX 4090 (96GB)	~10 tok/s (Q4)	⚠️ Slow
Mac Studio Ultra 192GB	~5-8 tok/s (Q4)	⚠️ Usable

# With Ollama (if you have the VRAM)
ollama pull mistral-large:123b

# With vLLM
python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mistral-Large-Instruct-2411 \
  --tensor-parallel-size 2

For most developers, the API is more practical. For local alternatives at this quality level, consider Qwen 3.5 72B which is MIT licensed.

Pricing

Provider	Input (per 1M)	Output (per 1M)
Mistral API	$2.00	$6.00
OpenRouter	~$2.00	~$6.00
Self-hosted	Hardware only	—

Competitive with Gemini 3.1 Pro and significantly cheaper than Claude Opus.

Mistral Large 2 vs the competition

	Mistral Large 2	Qwen 3.5 Plus	DeepSeek V3	Claude Sonnet 4.6
Params	123B dense	400B+ MoE	671B MoE	Unknown
License	Research only	Apache 2.0	MIT	Proprietary
Multilingual	Best	Good	Good	Good
Coding	Very good	Very good	Excellent	Excellent
Price	$2/$6	$0.26/$1.56	$0.27/$1.10	$3/$15
Self-host	1 node	Multi-node	Multi-node	No

Pick Mistral Large 2 for: European language tasks, single-node self-hosting, balanced quality/cost.

Pick Qwen/DeepSeek for: Cheapest API pricing, MIT license for commercial use.

Pick Claude for: Best coding quality, willing to pay premium.

The Mistral ecosystem

Mistral offers a family of models for different needs:

Model	Size	Best for
Mistral Large 2	123B	Complex reasoning, coding
Codestral	22B	Code completion, FIM
Codestral Embed	—	Code search, RAG
Mistral Small	22B	Fast, cheap general tasks
Mistral Nemo	12B	Edge deployment

The combination of Mistral Large 2 for reasoning + Codestral for autocomplete is one of the best AI coding setups available.

Bottom line

Mistral Large 2 is the best frontier model you can run on a single server. It won’t beat Claude Opus or GPT-5.4 on raw benchmarks, but it’s close enough that the cost and deployment advantages matter. For European companies with data sovereignty requirements, it’s the obvious choice.

FAQ

Is Mistral Large 2 free?

The model weights are available under a permissive license for self-hosting at no cost. API access through Mistral’s platform has a free tier for experimentation, with competitive per-token pricing for production use.

Can I run it locally?

Yes, Mistral Large 2 can run on a single high-end server with 4x A100 or equivalent GPUs. Quantized versions reduce requirements further, making it one of the most deployable frontier-class models available.

How does it compare to Claude Sonnet?

Mistral Large 2 trades blows with Claude Sonnet across most benchmarks, with Sonnet slightly ahead on creative and instruction-following tasks. Mistral Large 2 wins on deployment flexibility, European data sovereignty, and cost when self-hosted.

Is Mistral Large good for coding?

Yes, Mistral Large 2 performs well on coding tasks and supports 80+ programming languages with strong multi-file understanding. For dedicated coding work, Mistral also offers Codestral and Devstral which are specifically optimized for code generation.