Codestral is Mistral AI’s dedicated coding model — a 22-billion-parameter model trained from the ground up on 80+ programming languages. Unlike general-purpose models that happen to code, Codestral is purpose-built for code generation, completion, and fill-in-the-middle (FIM) tasks.
It’s the best autocomplete model available and one of the most efficient coding models you can run locally.
Why Codestral matters
Most AI coding happens through two workflows:
- Chat/agent — describe what you want, AI writes it (Claude Code, Aider)
- Autocomplete — AI predicts what you’ll type next (Copilot, Cursor tab)
Codestral dominates the second category. Its FIM (Fill-in-the-Middle) capability means it understands code before AND after your cursor, producing completions that fit naturally into existing code.
Specs
| Spec | Codestral 25.01 |
|---|---|
| Parameters | 22B |
| Architecture | Dense transformer |
| Context window | 256K tokens |
| Languages | 80+ |
| FIM support | ✅ Native |
| HumanEval | 86.6% |
| License | Mistral Non-Production License |
| Quantized size | ~12GB (Q4) |
The 256K context window is massive for a 22B model — it can see your entire codebase in a single pass.
Benchmarks
| Benchmark | Codestral 25.01 | DeepSeek Coder 33B | CodeLlama 70B |
|---|---|---|---|
| HumanEval | 86.6% | 79.3% | 67.8% |
| MBPP | 78.2% | 73.1% | 62.0% |
| RepoBench (long-range) | Best | Good | Limited |
Codestral outperforms models 3x its size on code generation benchmarks. The RepoBench score is particularly impressive — it handles long-range code completion across entire repositories better than any model in its class.
How to use Codestral
Via Mistral API
from mistralai import Mistral
client = Mistral(api_key="your-mistral-key")
# Chat completion
response = client.chat.complete(
model="codestral-latest",
messages=[{"role": "user", "content": "Write a binary search in Rust"}]
)
# Fill-in-the-middle
response = client.fim.complete(
model="codestral-latest",
prompt="def fibonacci(n):\n ",
suffix="\n return result"
)
Via OpenRouter
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="mistralai/codestral-latest",
messages=[{"role": "user", "content": "Optimize this SQL query"}]
)
See our OpenRouter guide for setup.
Run locally with Ollama
Codestral at 22B fits comfortably on consumer hardware:
ollama pull codestral:22b
ollama run codestral:22b
| Hardware | Speed | Usable? |
|---|---|---|
| RTX 4090 (24GB) | ~40 tok/s | ✅ Excellent |
| RTX 4070 (12GB) | ~20 tok/s (Q4) | ✅ Good |
| Mac M4 32GB | ~25 tok/s | ✅ Good |
| Mac M4 16GB | ~15 tok/s (Q4) | ⚠️ Tight |
See our Ollama guide for detailed setup.
In VS Code with Continue.dev
Codestral is one of the best models for Continue.dev autocomplete:
{
"tabAutocompleteModel": {
"provider": "ollama",
"model": "codestral:22b",
"title": "Codestral"
}
}
Codestral vs other coding models
| Codestral 22B | Gemma 4 27B | Qwen 2.5 Coder 32B | DeepSeek Coder 33B | |
|---|---|---|---|---|
| Best at | Autocomplete/FIM | General + coding | Coding breadth | Reasoning + coding |
| FIM | ✅ Native | ❌ | ✅ | ✅ |
| Context | 256K | 128K | 128K | 128K |
| Local VRAM | 12GB (Q4) | 16GB (Q4) | 18GB (Q4) | 18GB (Q4) |
| License | Non-production | Gemma | Apache 2.0 | MIT |
Pick Codestral for: IDE autocomplete, FIM tasks, fast local completions. Pick Gemma 4 for: General-purpose coding + other tasks. Pick Qwen Coder for: Broadest language support, commercial use.
Licensing caveat
Codestral uses Mistral’s Non-Production License (MNPL), which means:
- ✅ Free for research and personal use
- ✅ Free for development and testing
- ❌ Cannot be used in production commercial applications
- ❌ Cannot be redistributed
For commercial production use, you need Mistral’s commercial license or should use Qwen 2.5 Coder (Apache 2.0) or DeepSeek Coder (MIT) instead.
Bottom line
Codestral is the best autocomplete model available. At 22B parameters, it runs on consumer hardware while outperforming models 3x its size. The 256K context window means it understands your entire project. If you’re setting up a local AI coding environment, Codestral should be your autocomplete model — paired with a larger model like Claude or GLM-5.1 for complex tasks.
Related: Codestral vs DeepSeek Coder · Best AI Models for Coding Locally · Continue.dev Complete Guide