🤖 AI Tools
· 3 min read

What Is Codestral? Mistral's 22B Coding Model Explained


Codestral is Mistral AI’s dedicated coding model. It has 22 billion parameters, a 256K token context window, and it’s trained specifically for code generation across 80+ programming languages. The latest version, Codestral 25.01, is the #1 model on the LMSys Copilot Arena leaderboard and the state-of-the-art for fill-in-the-middle (FIM) tasks.

If you use VS Code or JetBrains with an AI code assistant, there’s a good chance Codestral is already one of your model options.

What is Codestral?

Unlike general-purpose models that handle code as one of many capabilities, Codestral was trained from the ground up on a massive code dataset. It’s optimized for three things:

  1. Fill-in-the-middle (FIM): You give it code before and after a cursor position, and it fills in the gap. This is what powers autocomplete in IDEs.
  2. Code generation: Full function and file generation from natural language descriptions.
  3. Code correction and test generation: Finding bugs and writing tests for existing code.

The 22B parameter size is intentional. It’s small enough to run fast (about 2x faster than the original Codestral) while being large enough to produce high-quality code. Low latency matters for autocomplete — you need suggestions in milliseconds, not seconds.

Codestral 25.01 benchmarks

The January 2025 update brought significant improvements:

Python:

  • HumanEval: 86.6% (DeepSeek Coder V2 Lite: 83.5%)
  • MBPP: 80.2%
  • CruxEval: 55.5% (DeepSeek: 49.7%)
  • LiveCodeBench: 37.9% (DeepSeek: 28.1%)

Fill-in-the-middle (FIM pass@1):

  • Python: 92.5%
  • Java: 97.1%
  • JavaScript: 96.1%
  • Average: 95.3% — SOTA across the board

Multi-language HumanEval:

  • Python: 86.6%, C++: 78.9%, JavaScript: 82.6%, TypeScript: 82.4%, Bash: 43.0%, C#: 53.2%
  • Average: 71.4% (DeepSeek Coder V2 Lite: 65.9%)

Codestral leads on almost every benchmark against sub-100B coding models. The FIM scores are particularly impressive — 95.3% average means it almost always gets autocomplete right.

Pricing

  • Input: $0.20 per million tokens
  • Output: $0.60 per million tokens

That’s extremely cheap. For context:

  • Claude Sonnet 4.6: $3/$15 per million tokens
  • GPT-5.2: varies but significantly more expensive
  • Qwen 2.5 Coder: free (open-source, self-hosted)

At $0.20/M input tokens, you could run Codestral all day for code completion and spend less than a dollar.

How to use it

In your IDE: Codestral is available through Continue (VS Code and JetBrains), Cursor, and other IDE plugins. Select “Codestral” or “codestral-latest” in the model picker.

Via API:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "messages": [
      {"role": "user", "content": "Write a Python function that validates email addresses using regex"}
    ]
  }'

It’s also available on Google Cloud Vertex AI, Azure AI Foundry (preview), and through OpenRouter.

Codestral vs general-purpose models

The question people always ask: why use a specialized coding model instead of Claude or GPT?

Use Codestral for:

  • IDE autocomplete and FIM (it’s literally the best at this)
  • High-volume code completion where cost matters
  • Fast response times for interactive coding

Use Claude/GPT for:

  • Complex multi-file refactoring that needs deep reasoning
  • Code review with natural language explanations
  • Architecture decisions and system design discussions

The sweet spot is using both: Codestral for the fast, cheap, high-frequency autocomplete, and a frontier model for the harder thinking tasks.