πŸ€– AI Tools
Β· 4 min read
Last updated on

How to Use the Devstral 2 API β€” Setup Guide With Code Examples


Devstral 2 is Mistral’s best coding model β€” 72.2% on SWE-bench, matching Claude Opus. It’s available via the Mistral API, OpenRouter, and other providers. Here’s everything you need to use it in your projects and coding tools.

API endpoints and authentication

Mistral API (direct)

  • Base URL: https://api.mistral.ai/v1
  • Model ID: devstral-2-latest
  • Auth: Bearer token via Authorization header
  • Get your key: console.mistral.ai

OpenRouter

  • Base URL: https://openrouter.ai/api/v1
  • Model ID: mistralai/devstral-2
  • Auth: Bearer token
  • Get your key: openrouter.ai/keys

Both endpoints are OpenAI-compatible, so any library that works with the OpenAI API works with Devstral 2.

Code examples

Python β€” Mistral SDK

from mistralai import Mistral

client = Mistral(api_key="your-mistral-key")

response = client.chat.complete(
    model="devstral-2-latest",
    messages=[
        {"role": "system", "content": "You are an expert software engineer."},
        {"role": "user", "content": "Fix the race condition in this handler:\n\n```go\nfunc (s *Server) Handle(w http.ResponseWriter, r *http.Request) {\n    s.count++\n    fmt.Fprintf(w, \"Request %d\", s.count)\n}\n```"}
    ],
    temperature=0.2,
    max_tokens=2048
)

print(response.choices[0].message.content)

Python β€” OpenAI-compatible (works with any provider)

from openai import OpenAI

# Via Mistral directly
client = OpenAI(
    base_url="https://api.mistral.ai/v1",
    api_key="your-mistral-key"
)

# Or via OpenRouter
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

response = client.chat.completions.create(
    model="devstral-2-latest",  # or "mistralai/devstral-2" for OpenRouter
    messages=[
        {"role": "user", "content": "Refactor this class to use dependency injection"}
    ],
    temperature=0.2
)

print(response.choices[0].message.content)

curl

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "devstral-2-latest",
    "messages": [
      {"role": "user", "content": "Write a Python function to merge two sorted arrays"}
    ],
    "temperature": 0.2,
    "max_tokens": 1024
  }'

JavaScript/TypeScript

import MistralClient from "@mistralai/mistralai";

const client = new MistralClient("your-mistral-key");

const response = await client.chat({
  model: "devstral-2-latest",
  messages: [
    { role: "user", content: "Add error handling to this async function" }
  ],
  temperature: 0.2,
});

console.log(response.choices[0].message.content);

Fill-in-the-Middle (FIM) support

Devstral 2 supports FIM for code completion β€” predicting what goes between a prefix and suffix. This is how IDE integrations provide inline completions.

response = client.fim.complete(
    model="devstral-2-latest",
    prompt="def calculate_total(items):\n    ",
    suffix="\n    return total",
    temperature=0.1,
    max_tokens=256
)

print(response.choices[0].message.content)
# Output: total = sum(item.price * item.quantity for item in items)

curl for FIM

curl https://api.mistral.ai/v1/fim/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "devstral-2-latest",
    "prompt": "function fetchUser(id: string) {\n  ",
    "suffix": "\n  return response.json();\n}",
    "temperature": 0.1
  }'

FIM is particularly useful for IDE plugins and coding tools that need to complete code at the cursor position.

Integration with coding tools

Aider

export MISTRAL_API_KEY=your-key
aider --model mistral/devstral-2-latest

Or in .aider.conf.yml:

model: mistral/devstral-2-latest

OpenCode

export MISTRAL_API_KEY=your-key
opencode --model mistral/devstral-2-latest

Continue.dev

In .continue/config.json:

{
  "models": [{
    "provider": "mistral",
    "model": "devstral-2-latest",
    "apiKey": "your-key"
  }]
}

See our Aider guide and OpenCode guide for full tool setup.

Pricing

ModelInputOutputContext
Devstral 2 (123B)$2.00/1M tokens$6.00/1M tokens128K
Codestral (22B)$0.30/1M tokens$0.90/1M tokens32K
Claude Sonnet 4$3.00/1M tokens$15.00/1M tokens200K
Claude Opus$15.00/1M tokens$75.00/1M tokens200K

Devstral 2 matches Claude Opus on SWE-bench (72.2%) at a fraction of the cost. For coding tasks specifically, it’s one of the best value propositions available.

Cost estimate for typical usage: A heavy coding session (50 requests, ~2K input + 1K output tokens each) costs roughly $0.50 with Devstral 2 vs $3.75 with Claude Opus.

Rate limits

Mistral API rate limits (as of early 2026):

TierRequests/minTokens/minTokens/day
Free24,000100,000
Build60500,00010M
Scale3002,000,000Unlimited

For coding tool integration (Aider, OpenCode), the Build tier is sufficient for individual developers. Teams should consider Scale tier for uninterrupted workflows.

OpenRouter has its own rate limits that vary by plan and model demand.

Streaming

For real-time output in coding tools, use streaming:

stream = client.chat.stream(
    model="devstral-2-latest",
    messages=[{"role": "user", "content": "Explain this code"}],
)

for chunk in stream:
    content = chunk.data.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Streaming is essential for interactive coding tools where you want to see the response as it generates rather than waiting for the full completion.

Tips for best results

  1. Use low temperature (0.1-0.3) for code generation β€” higher temperatures introduce unnecessary variation
  2. Provide context β€” include relevant type definitions, interfaces, and surrounding code
  3. Be specific β€” β€œAdd error handling for network failures and invalid JSON” beats β€œmake it better”
  4. Use system prompts β€” set the role and constraints upfront for consistent behavior
  5. Leverage FIM for completions β€” it’s specifically trained for this and produces more natural insertions

FAQ

Is the Devstral 2 API free?

Mistral offers a free tier with very limited rate limits (2 requests/min, 100K tokens/day) β€” enough for testing but not for real development work. The Build tier ($0 monthly + pay-per-token) is what most developers use. At $2/$6 per million tokens, a typical coding session costs $0.30-0.50. Via OpenRouter, pricing is similar. For completely free usage, run Devstral Small locally with Ollama instead.

Does Devstral support fill-in-the-middle?

Yes. Devstral 2 has native FIM support via the /v1/fim/completions endpoint. You provide a prompt (code before cursor) and suffix (code after cursor), and the model predicts what goes in between. This is how IDE integrations provide inline code completions. FIM works best with low temperature (0.1) and shorter max_tokens (128-256) for snappy completions.

How does Devstral API compare to Codestral?

Codestral is Mistral’s smaller (22B) coding model β€” faster and cheaper ($0.30/$0.90 per 1M tokens) but less capable. Devstral 2 (123B) scores 72.2% on SWE-bench vs Codestral’s ~45%. Use Codestral for fast completions and simple tasks where speed matters more than quality. Use Devstral 2 for complex refactoring, bug fixing, and tasks requiring deep understanding. Many developers use Codestral for FIM/autocomplete and Devstral 2 for chat-based coding assistance.

Related: Devstral 2 Complete Guide Β· What is Codestral 2026 Β· Mistral API Guide Β· Best AI Models for Coding Locally