🤖 AI Tools
· 3 min read
Last updated on

What Is Codestral? Mistral's 22B Coding Model Explained


Codestral is Mistral AI’s dedicated coding model. It has 22 billion parameters, a 256K token context window, and it’s trained specifically for code generation across 80+ programming languages. The latest version, Codestral 25.01, is the #1 model on the LMSys Copilot Arena leaderboard and the state-of-the-art for fill-in-the-middle (FIM) tasks.

If you use VS Code or JetBrains with an AI code assistant, there’s a good chance Codestral is already one of your model options.

What is Codestral?

Unlike general-purpose models that handle code as one of many capabilities, Codestral was trained from the ground up on a massive code dataset. It’s optimized for three things:

  1. Fill-in-the-middle (FIM): You give it code before and after a cursor position, and it fills in the gap. This is what powers autocomplete in IDEs.
  2. Code generation: Full function and file generation from natural language descriptions.
  3. Code correction and test generation: Finding bugs and writing tests for existing code.

The 22B parameter size is intentional. It’s small enough to run fast (about 2x faster than the original Codestral) while being large enough to produce high-quality code. Low latency matters for autocomplete — you need suggestions in milliseconds, not seconds.

Codestral 25.01 benchmarks

The January 2025 update brought significant improvements:

Python:

  • HumanEval: 86.6% (DeepSeek Coder V2 Lite: 83.5%)
  • MBPP: 80.2%
  • CruxEval: 55.5% (DeepSeek: 49.7%)
  • LiveCodeBench: 37.9% (DeepSeek: 28.1%)

Fill-in-the-middle (FIM pass@1):

  • Python: 92.5%
  • Java: 97.1%
  • JavaScript: 96.1%
  • Average: 95.3% — SOTA across the board

Multi-language HumanEval:

  • Python: 86.6%, C++: 78.9%, JavaScript: 82.6%, TypeScript: 82.4%, Bash: 43.0%, C#: 53.2%
  • Average: 71.4% (DeepSeek Coder V2 Lite: 65.9%)

Codestral leads on almost every benchmark against sub-100B coding models. The FIM scores are particularly impressive — 95.3% average means it almost always gets autocomplete right.

Pricing

  • Input: $0.20 per million tokens
  • Output: $0.60 per million tokens

That’s extremely cheap. For context:

  • Claude Sonnet 4.6: $3/$15 per million tokens
  • GPT-5.2: varies but significantly more expensive
  • Qwen 2.5 Coder: free (open-source, self-hosted)

At $0.20/M input tokens, you could run Codestral all day for code completion and spend less than a dollar.

How to use it

In your IDE: Codestral is available through Continue (VS Code and JetBrains), Cursor, and other IDE plugins. Select “Codestral” or “codestral-latest” in the model picker.

Via API:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "messages": [
      {"role": "user", "content": "Write a Python function that validates email addresses using regex"}
    ]
  }'

It’s also available on Google Cloud Vertex AI, Azure AI Foundry (preview), and through OpenRouter.

Codestral vs general-purpose models

The question people always ask: why use a specialized coding model instead of Claude or GPT?

Use Codestral for:

  • IDE autocomplete and FIM (it’s literally the best at this)
  • High-volume code completion where cost matters
  • Fast response times for interactive coding

Use Claude/GPT for:

  • Complex multi-file refactoring that needs deep reasoning
  • Code review with natural language explanations
  • Architecture decisions and system design discussions

The sweet spot is using both: Codestral for the fast, cheap, high-frequency autocomplete, and a frontier model for the harder thinking tasks.

FAQ

Can I use Codestral for free?

Yes, in two ways. You can self-host it locally via Ollama (ollama pull codestral:22b) for completely free usage on your own hardware. Alternatively, the API at $0.20/1M input tokens is so cheap that typical daily coding usage costs pennies — far less than a GitHub Copilot subscription.

Is Codestral better than GitHub Copilot?

For raw autocomplete quality, Codestral 25.01 is state-of-the-art with 95.3% FIM accuracy. It’s competitive with or better than Copilot’s suggestions. The trade-off is that Copilot is a plug-and-play experience, while Codestral requires some setup (either via API key or local installation with Ollama + Continue.dev).

What’s the difference between Codestral and Devstral?

Codestral (22B) is optimized for fast autocomplete — inline tab completions as you type. Devstral 2 (123B) is built for complex agentic coding tasks like refactoring, bug fixing, and feature implementation. Use Codestral as your IDE autocomplete engine and Devstral as your AI coding agent.

Related: How to Choose an AI Coding Agent · What is Mistral AI