πŸ€– AI Tools
Β· 9 min read

Mistral Medium 3.5 vs Devstral 2 β€” Why Mistral Replaced Its Own Coding Model (2026)


Mistral made a surprising move in April 2026: they replaced Devstral 2, their dedicated coding model, with Mistral Medium 3.5 as the default model in Vibe CLI. This was not a quiet deprecation β€” it was a deliberate statement that a single general-purpose model can outperform a specialist coding model, even one built by the same team.

This article explains why Mistral made the switch, how the two models compare on benchmarks, and when the smaller Devstral models still make sense.

Why Mistral replaced Devstral 2

The short answer: Mistral Medium 3.5 beats Devstral 2 on the benchmarks that matter most for coding.

Devstral 2 was purpose-built for code generation and editing. It was trained on a code-heavy dataset, optimized for agentic coding workflows, and designed to be the backbone of Vibe CLI. But when Mistral shipped Medium 3.5 β€” a 128B dense general-purpose model β€” it turned out that the larger, more broadly trained model outperformed the specialist on coding tasks too.

This follows a pattern we have seen across the industry in 2026: the era of specialist coding models is ending. As general-purpose models get larger and better trained, the advantage of code-specific fine-tuning shrinks. A 128B model trained on everything β€” code, math, reasoning, multilingual text β€” develops deeper understanding of software engineering concepts than a smaller model trained primarily on code.

For the full story on Medium 3.5, see our Mistral Medium 3.5 complete guide.

Head-to-head specifications

Mistral Medium 3.5Devstral 2
RoleDefault model in Vibe CLIFormer default, now secondary
Parameters128B (dense)~70B (estimated, dense)
ArchitectureDense transformerDense transformer
Context window256K tokens128K tokens
SWE-bench Verified77.6%~72%
tau3-TelecomStrong (highlighted by Mistral)Not benchmarked
Input price (API)$1.50/M tokens~$1.00/M tokens
Output price (API)$7.50/M tokens~$4.00/M tokens
LicenseModified MITModified MIT
Self-hosting4Γ— A100 80GB (FP8)2Γ— A100 80GB (FP8)
VisionYesLimited
General tasksStrong (math, reasoning, multilingual)Weak (coding-focused)

Benchmark comparison

SWE-bench Verified

Mistral Medium 3.5 scores 77.6% on SWE-bench Verified. Devstral 2 scores approximately 72%. That is a 5.6-point gap β€” significant enough that Mistral could not justify keeping Devstral 2 as the default.

The gap is most visible on complex tasks that require understanding beyond pure code: interpreting requirements, reasoning about architecture, and making design decisions. Medium 3.5’s broader training gives it better judgment on these tasks.

tau3-Telecom

Mistral specifically highlights Medium 3.5’s performance on tau3-Telecom, a domain-specific benchmark for telecom engineering. Devstral 2 was never benchmarked on this because it was not designed for domain-specific tasks. This illustrates the core advantage of the generalist approach: Medium 3.5 handles specialized domains that Devstral 2 cannot.

Code generation quality

On pure code generation tasks β€” writing functions, implementing algorithms, generating boilerplate β€” the gap between the models is smaller. Devstral 2 produces clean, idiomatic code and is slightly faster at simple generation tasks due to its smaller size. But Medium 3.5 produces equally clean code while also handling the reasoning and planning that surrounds code generation in real workflows.

Agentic coding workflows

In agentic workflows (multi-step tasks involving file reading, editing, test execution, and iteration), Medium 3.5 pulls further ahead. Its stronger reasoning capabilities help it plan multi-step operations more effectively, recover from errors more gracefully, and make better decisions about which files to edit and in what order.

The merged model philosophy

Mistral’s decision to replace Devstral 2 with Medium 3.5 reflects a broader industry shift: one model for everything vs specialists for each task.

The specialist argument (Devstral 2):

  • Smaller model, faster inference
  • Lower cost per token
  • Optimized specifically for code patterns
  • Easier to self-host on limited hardware

The generalist argument (Medium 3.5):

  • Higher accuracy on coding benchmarks despite being general-purpose
  • Handles non-coding tasks (documentation, architecture, planning) in the same session
  • One model to maintain, deploy, and optimize instead of multiple
  • Broader understanding of software engineering context

The generalist argument won. When a single model can beat the specialist at its own game while also handling everything else, there is no reason to maintain the specialist as the default.

This does not mean specialists are dead. It means the bar for a specialist model is now β€œmust beat the best generalist at the specific task.” Devstral 2 does not clear that bar against Medium 3.5.

Pricing comparison

Devstral 2 is cheaper per token, but the cost advantage is smaller than you might expect.

Mistral Medium 3.5:

  • Input: $1.50 per million tokens
  • Output: $7.50 per million tokens

Devstral 2:

  • Input: ~$1.00 per million tokens
  • Output: ~$4.00 per million tokens

For a typical coding session (50K input, 10K output):

  • Medium 3.5: $0.075 + $0.075 = $0.15
  • Devstral 2: $0.05 + $0.04 = $0.09

Devstral 2 is roughly 40% cheaper per session. But if Medium 3.5 solves the task in fewer iterations (due to better reasoning), the total cost per completed task may be similar or even lower. Mistral’s internal data apparently showed this β€” the higher per-token cost was offset by fewer tokens needed to complete tasks.

For more on Devstral 2’s capabilities, see our Devstral 2 complete guide.

When Devstral Small 24B still makes sense

Mistral replaced Devstral 2 as the default, but they did not kill the Devstral line. Devstral Small 2 β€” a 24B parameter model β€” still has a clear niche.

Local development on consumer hardware: Devstral Small 24B runs on a single RTX 4090 or even on Apple Silicon Macs with 32GB+ RAM. Medium 3.5 requires 4Γ— A100 80GB GPUs. If you want a local coding assistant that runs on your laptop, Devstral Small is the only Mistral option.

Cost-sensitive high-volume workloads: For tasks like automated code review, linting suggestions, or simple code completion where you are processing thousands of requests per hour, Devstral Small’s lower cost and faster inference make it more practical than Medium 3.5.

Offline and air-gapped environments: Devstral Small’s modest hardware requirements make it deployable in environments where you cannot access the internet or cloud GPUs. Medium 3.5 is technically self-hostable but requires significant infrastructure.

Edge deployment: If you are building coding tools that run on developer machines rather than in the cloud, Devstral Small is the right choice. Medium 3.5 is a server-side model.

The rule of thumb: use Medium 3.5 when you can (API or server-side self-hosting), use Devstral Small 24B when you must run locally on limited hardware.

Self-hosting comparison

Both models are self-hostable under the same modified MIT license, but the hardware requirements differ significantly.

Mistral Medium 3.5 (128B):

  • FP16: 8Γ— A100 80GB
  • FP8: 4Γ— A100 80GB
  • 4-bit quantized: 2Γ— A100 80GB (with quality trade-offs)
  • vLLM, TGI compatible
  • Not practical on consumer hardware

Devstral 2 (~70B):

  • FP16: 4Γ— A100 80GB
  • FP8: 2Γ— A100 80GB
  • 4-bit quantized: 1Γ— A100 80GB or 2Γ— RTX 4090
  • vLLM, TGI compatible
  • Marginal on high-end consumer hardware

Devstral Small 24B:

  • FP16: 1Γ— A100 80GB or 1Γ— RTX 4090
  • 4-bit quantized: Runs on 16GB VRAM or Apple Silicon 32GB
  • Ollama, llama.cpp, vLLM compatible
  • Practical on consumer hardware

If self-hosting is your primary concern and you have server-grade GPUs, Medium 3.5 is worth the extra hardware for the benchmark improvement. If you are limited to consumer hardware, Devstral Small 24B is your only option in the Mistral family.

Migration guide: Devstral 2 to Medium 3.5

If you have been using Devstral 2 and want to switch to Medium 3.5, here is what to expect.

Vibe CLI

If you use Vibe CLI, the migration is automatic. Vibe now defaults to Medium 3.5. Your existing workflows, system prompts, and MCP configurations will work without changes. You may notice slightly different output styles β€” Medium 3.5 tends to be more thorough in its explanations and may include more context in code comments.

API migration

Update your model parameter from the Devstral 2 model ID to the Medium 3.5 model ID. The API format is identical β€” both use Mistral’s standard chat completion endpoint. No code changes beyond the model name.

Expect higher per-token costs but potentially fewer total tokens per task. Monitor your total spend for the first week to see how the economics play out for your specific workloads.

System prompt adjustments

Medium 3.5 responds well to the same system prompts that worked with Devstral 2, but you may want to adjust:

  • Remove any instructions that compensate for Devstral 2’s weaker reasoning (e.g., β€œthink step by step before coding”)
  • Medium 3.5 handles multi-step reasoning natively, so explicit chain-of-thought prompting is less necessary
  • If you were limiting Devstral 2’s output to avoid verbosity, you can relax those constraints β€” Medium 3.5 is naturally more concise

Self-hosting migration

If you self-host Devstral 2, switching to Medium 3.5 requires more GPU memory. Plan for 2Γ— the hardware. The inference server configuration (vLLM, TGI) stays the same β€” just point it at the new model weights.

The bigger picture

Mistral replacing Devstral 2 with Medium 3.5 is a signal about where the industry is heading. The specialist model era β€” where you needed a different model for coding, reasoning, math, and creative writing β€” is giving way to unified models that handle everything well.

This is good news for developers. Instead of managing multiple models, routing between them, and maintaining separate configurations, you can use one model for your entire workflow. Medium 3.5 handles code generation, code review, documentation writing, architecture planning, and debugging in a single context.

The trade-off is cost and hardware requirements. Medium 3.5 is more expensive per token and harder to self-host than Devstral 2. But for most teams, the simplicity of one model outweighs the cost difference.

FAQ

Why did Mistral replace Devstral 2 with Medium 3.5?

Medium 3.5 scores 77.6% on SWE-bench Verified versus Devstral 2’s approximately 72%. The generalist model beats the specialist at coding while also handling reasoning, math, and multilingual tasks. Mistral decided that maintaining a separate coding model was no longer justified when the general-purpose model was simply better.

Is Devstral 2 deprecated?

No. Devstral 2 is still available via the API and for self-hosting. It is no longer the default in Vibe CLI, but you can still select it explicitly. Mistral has not announced an end-of-life date. However, future development effort will likely focus on the Medium line rather than Devstral.

Should I switch from Devstral 2 to Medium 3.5?

For most users, yes. Medium 3.5 is better at coding and handles non-coding tasks that Devstral 2 cannot. The main reasons to stay on Devstral 2 are if you need the lower per-token cost for high-volume workloads, or if your self-hosting hardware cannot handle the larger model.

Can I still use Devstral Small 24B for local development?

Yes, and you should. Devstral Small 24B fills a different niche β€” local development on consumer hardware. Medium 3.5 cannot run on a laptop. Devstral Small can. Use Medium 3.5 via API or server-side self-hosting for your primary work, and Devstral Small for offline or local-only scenarios.

How much more does Medium 3.5 cost compared to Devstral 2?

Roughly 40% more per session ($0.15 vs $0.09 for a typical coding session). However, Medium 3.5 often completes tasks in fewer iterations due to better reasoning, which can offset the higher per-token cost. Monitor your total spend per completed task, not just per-token cost.

Will Mistral release a Devstral 3?

Mistral has not announced plans for Devstral 3. Given that Medium 3.5 already outperforms Devstral 2 on coding benchmarks, a new specialist model would need to beat Medium 3.5 to justify its existence. The more likely path is continued improvement of the Medium line, with Devstral Small maintained for the local/edge use case.