May 26, 2026 · 3 min read

Last updated on Apr 20, 2026

How to Use the Codestral API — Autocomplete and FIM Setup Guide

Codestral is one of the best models for code autocomplete. Here’s how to use its API for both chat completions and Fill-in-the-Middle (FIM).

Getting your API key

Sign up at console.mistral.ai and create an API key. For the full Mistral API guide, see our dedicated article. Codestral is available on both the standard API endpoint and the dedicated codestral endpoint.

Chat completion

Python

from mistralai import Mistral

client = Mistral(api_key="your-key")
response = client.chat.complete(
    model="codestral-latest",
    messages=[{"role": "user", "content": "Write a binary search in Python"}]
)
print(response.choices[0].message.content)

curl

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "messages": [{"role": "user", "content": "Write a binary search in Python"}],
    "temperature": 0.1
  }'

JavaScript / TypeScript

import Mistral from "@mistralai/mistralai";

const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });

const response = await client.chat.complete({
  model: "codestral-latest",
  messages: [{ role: "user", content: "Write a binary search in Python" }],
});

console.log(response.choices[0].message.content);

Fill-in-the-Middle (FIM)

FIM is Codestral’s killer feature — it understands code before AND after your cursor. This is what powers autocomplete in IDEs. See how Codestral compares to DeepSeek Coder on FIM tasks.

FIM endpoint details

The FIM endpoint is separate from chat completions:

Endpoint: https://codestral.mistral.ai/v1/fim/completions
Method: POST
Required fields: model, prompt (code before cursor)
Optional fields: suffix (code after cursor), temperature, max_tokens, stop

Python FIM example

response = client.fim.complete(
    model="codestral-latest",
    prompt="def calculate_tax(income, rate):\n    ",
    suffix="\n    return round(tax, 2)",
    temperature=0.1,
    max_tokens=128
)
# Returns the middle part that connects prompt to suffix
print(response.choices[0].message.content)
# Output: "tax = income * rate"

curl FIM example

curl https://codestral.mistral.ai/v1/fim/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "prompt": "def calculate_tax(income, rate):\n    ",
    "suffix": "\n    return round(tax, 2)",
    "temperature": 0.1,
    "max_tokens": 128
  }'

JavaScript FIM example

const response = await client.fim.complete({
  model: "codestral-latest",
  prompt: "function calculateTax(income, rate) {\n  ",
  suffix: "\n  return Math.round(tax * 100) / 100;\n}",
  temperature: 0.1,
});

console.log(response.choices[0].message.content);
// Output: "const tax = income * rate;"

Streaming responses

For real-time autocomplete, use streaming to get tokens as they’re generated:

stream = client.chat.stream(
    model="codestral-latest",
    messages=[{"role": "user", "content": "Write a merge sort in Rust"}]
)

for chunk in stream:
    content = chunk.data.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

curl streaming:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "messages": [{"role": "user", "content": "Write a merge sort in Rust"}],
    "stream": true
  }'

Via OpenRouter

from openai import OpenAI
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key="your-key")
response = client.chat.completions.create(
    model="mistralai/codestral-latest",
    messages=[{"role": "user", "content": "Optimize this SQL query"}]
)

Error handling

Common errors and how to handle them:

from mistralai import Mistral
from mistralai.exceptions import MistralAPIException

client = Mistral(api_key="your-key")

try:
    response = client.chat.complete(
        model="codestral-latest",
        messages=[{"role": "user", "content": "Fix this code"}]
    )
except MistralAPIException as e:
    if e.status_code == 429:
        # Rate limited — back off and retry
        print("Rate limited. Waiting before retry...")
    elif e.status_code == 401:
        print("Invalid API key")
    elif e.status_code == 500:
        print("Server error — retry after a moment")
    else:
        print(f"API error {e.status_code}: {e.message}")

Rate limits

Codestral API rate limits (as of early 2026):

Plan	Requests/min	Tokens/min	Tokens/day
Free tier	30	100K	2M
Pay-as-you-go	300	1M	Unlimited
Enterprise	Custom	Custom	Unlimited

For IDE autocomplete, the free tier is usually sufficient for individual use (autocomplete requests are small). Heavy batch processing or team usage requires pay-as-you-go.

IDE integration

For autocomplete in VS Code, use Continue.dev:

{
  "tabAutocompleteModel": {
    "provider": "mistral",
    "model": "codestral-latest",
    "apiKey": "your-key"
  }
}

Or run locally for free with Ollama: ollama pull codestral:22b

Running locally eliminates rate limits entirely and keeps your code private — see our guide on what Codestral is and how it works for more on local vs. API tradeoffs.

Pricing

	Input	Output
Codestral API	$0.30/1M	$0.90/1M
Via OpenRouter	~$0.30/1M	~$0.90/1M
Local (Ollama)	Free	Free

For typical autocomplete usage (short prompts, short completions), expect to spend $1-3/month on the API. For heavy chat usage with long contexts, costs can reach $10-20/month.