🤖 AI Tools
· 11 min read

Mistral Medium 3.5 API Guide — Authentication, Endpoints, and Code Examples (2026)


Mistral Medium 3.5 is a 128B dense model with 77.6% SWE-bench Verified, 256K context, configurable reasoning, and native vision. The API is available through La Plateforme at $1.50 input / $7.50 output per million tokens — half the cost of Claude Sonnet 4.6 with open weights included.

This guide covers everything you need to build with the Mistral Medium 3.5 API: authentication, chat completions, streaming, reasoning effort, function calling, vision, structured output, and integration with coding tools. For the full model overview, see our Mistral Medium 3.5 complete guide.

Getting started

1. Get an API key

Sign up at La Plateforme, add a payment method, and generate an API key from the dashboard. No waitlist — the key works immediately.

2. Install the client library

pip install mistralai

3. Make your first request

from mistralai import Mistral

client = Mistral(api_key="your-api-key")

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "user", "content": "Explain Python generators in 3 sentences."}
    ]
)

print(response.choices[0].message.content)

That’s it. The model ID is mistral-medium-3.5. All requests go through https://api.mistral.ai.

Pricing

All prices are per 1M tokens. For a broader comparison, see AI API pricing compared and how to reduce LLM API costs.

InputOutput
Mistral Medium 3.5$1.50$7.50

Medium 3.5 replaces three separate models (Medium 3.1, Magistral, and Devstral 2), so you no longer need to route between models or manage multiple pricing tiers. One model, one price.

Authentication

Every request requires your API key in the Authorization header:

Authorization: Bearer your-api-key

The Python and JavaScript client libraries handle this automatically when you pass the key at initialization. If you’re calling the REST API directly:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-medium-3.5",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Store your key in an environment variable (MISTRAL_API_KEY) rather than hardcoding it. For key management best practices, see secure AI API keys.

Basic chat completion

The core endpoint is /v1/chat/completions. Here’s a complete example with a system prompt:

from mistralai import Mistral

client = Mistral(api_key="your-api-key")

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "You are a senior Python developer. Be concise."},
        {"role": "user", "content": "Write a decorator that retries a function 3 times on exception."}
    ],
    temperature=0.3
)

print(response.choices[0].message.content)

Medium 3.5 has strong system prompt adherence — it follows formatting instructions, persona constraints, and output rules more reliably than most open-weight models. This makes it well-suited for production applications where consistent behavior matters.

Streaming responses

For user-facing applications, stream responses to reduce perceived latency:

from mistralai import Mistral

client = Mistral(api_key="your-api-key")

stream = client.chat.stream(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Explain how async/await works in Python."}
    ]
)

for chunk in stream:
    content = chunk.data.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Streaming is especially important when using reasoning_effort="high", since extended reasoning produces longer outputs.

Configurable reasoning effort

Medium 3.5 supports two reasoning modes per request. This replaces the need for separate reasoning models like Magistral. For more context on how this compares to other models, see the Mistral Medium 3.5 complete guide.

ModeBehaviorWhen to use
noneFast, direct answers. No chain-of-thought.Classification, autocomplete, formatting, high-throughput tasks
highExtended internal reasoning before responding.Complex coding, debugging, math, multi-step planning

reasoning_effort=“none”

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Classify this text as positive or negative: 'Great product, fast shipping'"}],
    reasoning_effort="none"
)

Use none when the task is straightforward. It’s faster and uses fewer output tokens, which directly reduces cost.

reasoning_effort=“high”

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Debug this race condition in my Go service that causes duplicate writes under load."}],
    reasoning_effort="high"
)

Use high for anything that benefits from step-by-step thinking: complex code generation, architectural decisions, debugging, and math. The model produces more output tokens (and costs more per request), but the quality improvement is significant for hard problems.

Default behavior: If you omit reasoning_effort, the model uses its default mode. Explicitly set it when you know the task complexity.

Temperature recommendations

Mistral recommends different temperature settings depending on the reasoning mode:

  • With reasoning (reasoning_effort="high"): Use temperature=0.7. The reasoning process benefits from some exploration.
  • Without reasoning (reasoning_effort="none"): Use temperature=0.0 to 0.7 depending on the task. Lower for deterministic outputs (classification, extraction), higher for creative tasks.
# Deterministic classification — low temperature, no reasoning
response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Is this a bug report or feature request? 'The button doesn't work on mobile'"}],
    reasoning_effort="none",
    temperature=0.0
)

# Creative code generation — higher temperature, with reasoning
response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Design a novel caching strategy for a multi-tenant SaaS app."}],
    reasoning_effort="high",
    temperature=0.7
)

Function calling / tool use

Medium 3.5 supports function calling for building agents and automated workflows. Define tools with JSON schemas and the model generates structured calls.

from mistralai import Mistral
import json

client = Mistral(api_key="your-api-key")

# Define a calculator tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform a mathematical calculation",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The math expression to evaluate, e.g. '2 + 3 * 4'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

# Send a message that requires the tool
response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "What is 1547 * 382 + 91?"}],
    tools=tools,
    tool_choice="auto"
)

# Check if the model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Execute the tool (your implementation)
    result = eval(args["expression"])  # In production, use a safe math parser

    # Send the result back
    follow_up = client.chat.complete(
        model="mistral-medium-3.5",
        messages=[
            {"role": "user", "content": "What is 1547 * 382 + 91?"},
            message,
            {"role": "tool", "content": str(result), "tool_call_id": tool_call.id}
        ],
        tools=tools
    )
    print(follow_up.choices[0].message.content)

For more on tool use patterns, see what is tool calling.

Vision / multimodal input

Medium 3.5 includes a vision encoder trained from scratch — not a bolted-on adapter. It handles variable image sizes, documents, diagrams, UI screenshots, and charts.

from mistralai import Mistral

client = Mistral(api_key="your-api-key")

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what's in this image and identify any UI issues."},
                {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
            ]
        }
    ]
)

print(response.choices[0].message.content)

You can also pass multiple images in a single request:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two designs and list the differences."},
                {"type": "image_url", "image_url": {"url": "https://example.com/design-v1.png"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/design-v2.png"}}
            ]
        }
    ]
)

Vision works with both reasoning modes. Use reasoning_effort="high" for complex diagram analysis or document understanding.

JSON mode / structured output

Force the model to return valid JSON by setting response_format:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "Extract the following fields from the text: name, email, company. Return JSON."},
        {"role": "user", "content": "Hi, I'm Sarah Chen from Acme Corp. Reach me at sarah@acme.io."}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)
# {"name": "Sarah Chen", "email": "sarah@acme.io", "company": "Acme Corp"}

JSON mode guarantees the output is parseable JSON. Combine it with a clear schema description in the system prompt for reliable structured extraction. For more on this pattern, see structured outputs explained.

System prompts

Medium 3.5 has notably strong system prompt adherence. It follows formatting rules, persona constraints, language requirements, and output schemas more consistently than most models in its class.

Tips for effective system prompts:

  • Be specific about format: “Return a JSON object with keys: summary, severity, recommendation” works better than “return structured data.”
  • Set constraints early: Put output format and behavioral rules in the system prompt, not the user message.
  • Use examples: One-shot or few-shot examples in the system prompt significantly improve consistency.
response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "system",
            "content": """You are a code reviewer. For each review:
1. List issues found (bullet points)
2. Rate severity: low/medium/high
3. Suggest a fix for each issue
Never explain what the code does — only review it."""
        },
        {"role": "user", "content": "Review this:\n\ndef get_user(id):\n  return db.query(f'SELECT * FROM users WHERE id = {id}')"}
    ]
)

OpenAI-compatible endpoint

Mistral’s API is compatible with the OpenAI SDK. If you’re already using the OpenAI Python library, you can switch to Mistral Medium 3.5 by changing the base URL and API key:

from openai import OpenAI

client = OpenAI(
    api_key="your-mistral-api-key",
    base_url="https://api.mistral.ai/v1"
)

response = client.chat.completions.create(
    model="mistral-medium-3.5",
    messages=[
        {"role": "user", "content": "Write a Dockerfile for a Node.js app with multi-stage build."}
    ]
)

print(response.choices[0].message.content)

This means any tool or framework that supports the OpenAI API format can use Mistral Medium 3.5 with minimal configuration changes.

Using with coding tools

Aider

Aider works with Mistral Medium 3.5 directly:

export MISTRAL_API_KEY=your-api-key

aider --model mistral/mistral-medium-3.5

Or use it via OpenRouter:

export OPENROUTER_API_KEY=your-key
aider --model openrouter/mistralai/mistral-medium-3.5

OpenCode

OpenCode supports any OpenAI-compatible endpoint. Add this to your opencode.json:

{
  "provider": {
    "mistral": {
      "apiKey": "your-mistral-api-key",
      "baseURL": "https://api.mistral.ai/v1",
      "models": {
        "mistral-medium-3.5": {
          "maxTokens": 16384,
          "contextWindow": 262144
        }
      }
    }
  },
  "model": "mistral/mistral-medium-3.5"
}

Continue.dev

Continue.dev supports Mistral natively. Add this to your config.json:

{
  "models": [
    {
      "title": "Mistral Medium 3.5",
      "provider": "mistral",
      "model": "mistral-medium-3.5",
      "apiKey": "your-mistral-api-key"
    }
  ]
}

Continue.dev also supports tab autocomplete with Codestral if you want to pair Medium 3.5 for chat with Codestral for completions.

Rate limits and error handling

Mistral applies rate limits based on your account tier. If you hit a limit, the API returns HTTP 429.

Error codeMeaningAction
401Invalid API keyCheck your key and ensure it’s active
429Rate limit exceededBack off and retry with exponential delay
400Bad request (invalid model, malformed input)Check your request body and model ID
500Server errorRetry after a short delay

Retry with exponential backoff

import time
from mistralai import Mistral

client = Mistral(api_key="your-api-key")

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.complete(
                model="mistral-medium-3.5",
                messages=messages
            )
        except Exception as e:
            if "429" in str(e) and attempt < max_retries - 1:
                wait = 2 ** attempt
                print(f"Rate limited. Retrying in {wait}s...")
                time.sleep(wait)
            else:
                raise

For production applications, consider using an AI gateway pattern with automatic retries and fallback to other models via OpenRouter.

Pricing comparison vs other APIs

How Medium 3.5 compares to other frontier model APIs as of April 2026:

ModelInput ($/M tokens)Output ($/M tokens)ContextOpen weights
Mistral Medium 3.5$1.50$7.50256KYes
Claude Sonnet 4.6$3.00$15.001M (beta)No
DeepSeek V4 Pro$1.74$3.481MYes
DeepSeek V4 Flash$0.14$0.281MYes
GPT-5.4~$5.00~$15.00256KNo
Gemini 3.1 Pro~$3.50~$10.502MNo
Kimi K2.6~$1.00~$4.00256KNo
Qwen 3.5Free (Alibaba)Free (Alibaba)128KYes

Key takeaways:

  • Cheapest frontier option? No — DeepSeek V4 Flash is far cheaper. But Medium 3.5 beats it on SWE-bench (77.6% vs ~76%).
  • vs Claude Sonnet 4.6: Medium 3.5 is half the price. Sonnet leads by 2 points on SWE-bench but you’re paying 2× for that margin.
  • vs DeepSeek V4 Pro: V4 Pro has cheaper output ($3.48 vs $7.50) and slightly higher SWE-bench (80.6%). But Medium 3.5 is a simpler dense architecture that’s easier to self-host.
  • Open weights advantage: Medium 3.5 and DeepSeek V4 are the only frontier-class models you can self-host. If you need to run on your own infrastructure, these are your options.

For strategies to minimize API spend, see how to reduce LLM API costs.

FAQ

Which model ID do I use for Mistral Medium 3.5?

Use mistral-medium-3.5. This is the model ID for both the Mistral client library and the OpenAI-compatible endpoint.

Does Medium 3.5 replace Devstral 2 in the API?

Medium 3.5 subsumes Devstral 2’s capabilities — coding, agentic tasks, and tool use — while adding vision, configurable reasoning, and general-purpose intelligence. Devstral 2 remains available at a lower price point ($0.40/$2.00) for coding-only workloads where you don’t need vision or reasoning modes.

Can I use Medium 3.5 with the OpenAI SDK in JavaScript/TypeScript?

Yes. Any OpenAI-compatible SDK works. Set the base URL to https://api.mistral.ai/v1 and use your Mistral API key. Alternatively, use the official Mistral JavaScript SDK (@mistralai/mistralai).

What’s the context window limit?

256K tokens. If you exceed it, the API returns a 400 error. For workloads that need more context, consider DeepSeek V4 (1M tokens) or Gemini 3.1 Pro (2M tokens).

Is there a free tier?

No. Mistral’s API is pay-per-token with no free tier. However, at $1.50/M input tokens, light usage costs pennies. For free alternatives, see best free AI APIs 2026.

Does the API support batch processing?

Yes. You can send multiple independent requests concurrently. For high-throughput workloads, combine reasoning_effort="none" with streaming disabled to maximize throughput.

How does Medium 3.5 handle GDPR compliance?

Mistral’s API runs on EU infrastructure. Data stays in Europe without additional configuration — no Standard Contractual Clauses needed. This is a significant advantage for European companies. See our AI GDPR developers guide for details.

Can I self-host Medium 3.5 instead of using the API?

Yes. The model has open weights on Hugging Face and runs on as few as 4 GPUs with FP8 quantization. See how to run Mistral Medium 3.5 locally for setup instructions with vLLM, SGLang, and Ollama.

Next steps