How to Use the Codestral API β Autocomplete and FIM Setup Guide
Codestral is one of the best models for code autocomplete. Hereβs how to use its API for both chat completions and Fill-in-the-Middle (FIM).
Getting your API key
Sign up at console.mistral.ai and create an API key. For the full Mistral API guide, see our dedicated article. Codestral is available on both the standard API endpoint and the dedicated codestral endpoint.
Chat completion
Python
from mistralai import Mistral
client = Mistral(api_key="your-key")
response = client.chat.complete(
model="codestral-latest",
messages=[{"role": "user", "content": "Write a binary search in Python"}]
)
print(response.choices[0].message.content)
curl
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "codestral-latest",
"messages": [{"role": "user", "content": "Write a binary search in Python"}],
"temperature": 0.1
}'
JavaScript / TypeScript
import Mistral from "@mistralai/mistralai";
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
const response = await client.chat.complete({
model: "codestral-latest",
messages: [{ role: "user", content: "Write a binary search in Python" }],
});
console.log(response.choices[0].message.content);
Fill-in-the-Middle (FIM)
FIM is Codestralβs killer feature β it understands code before AND after your cursor. This is what powers autocomplete in IDEs. See how Codestral compares to DeepSeek Coder on FIM tasks.
FIM endpoint details
The FIM endpoint is separate from chat completions:
- Endpoint:
https://codestral.mistral.ai/v1/fim/completions - Method: POST
- Required fields:
model,prompt(code before cursor) - Optional fields:
suffix(code after cursor),temperature,max_tokens,stop
Python FIM example
response = client.fim.complete(
model="codestral-latest",
prompt="def calculate_tax(income, rate):\n ",
suffix="\n return round(tax, 2)",
temperature=0.1,
max_tokens=128
)
# Returns the middle part that connects prompt to suffix
print(response.choices[0].message.content)
# Output: "tax = income * rate"
curl FIM example
curl https://codestral.mistral.ai/v1/fim/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "codestral-latest",
"prompt": "def calculate_tax(income, rate):\n ",
"suffix": "\n return round(tax, 2)",
"temperature": 0.1,
"max_tokens": 128
}'
JavaScript FIM example
const response = await client.fim.complete({
model: "codestral-latest",
prompt: "function calculateTax(income, rate) {\n ",
suffix: "\n return Math.round(tax * 100) / 100;\n}",
temperature: 0.1,
});
console.log(response.choices[0].message.content);
// Output: "const tax = income * rate;"
Streaming responses
For real-time autocomplete, use streaming to get tokens as theyβre generated:
stream = client.chat.stream(
model="codestral-latest",
messages=[{"role": "user", "content": "Write a merge sort in Rust"}]
)
for chunk in stream:
content = chunk.data.choices[0].delta.content
if content:
print(content, end="", flush=True)
curl streaming:
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "codestral-latest",
"messages": [{"role": "user", "content": "Write a merge sort in Rust"}],
"stream": true
}'
Via OpenRouter
from openai import OpenAI
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key="your-key")
response = client.chat.completions.create(
model="mistralai/codestral-latest",
messages=[{"role": "user", "content": "Optimize this SQL query"}]
)
Error handling
Common errors and how to handle them:
from mistralai import Mistral
from mistralai.exceptions import MistralAPIException
client = Mistral(api_key="your-key")
try:
response = client.chat.complete(
model="codestral-latest",
messages=[{"role": "user", "content": "Fix this code"}]
)
except MistralAPIException as e:
if e.status_code == 429:
# Rate limited β back off and retry
print("Rate limited. Waiting before retry...")
elif e.status_code == 401:
print("Invalid API key")
elif e.status_code == 500:
print("Server error β retry after a moment")
else:
print(f"API error {e.status_code}: {e.message}")
Rate limits
Codestral API rate limits (as of early 2026):
| Plan | Requests/min | Tokens/min | Tokens/day |
|---|---|---|---|
| Free tier | 30 | 100K | 2M |
| Pay-as-you-go | 300 | 1M | Unlimited |
| Enterprise | Custom | Custom | Unlimited |
For IDE autocomplete, the free tier is usually sufficient for individual use (autocomplete requests are small). Heavy batch processing or team usage requires pay-as-you-go.
IDE integration
For autocomplete in VS Code, use Continue.dev:
{
"tabAutocompleteModel": {
"provider": "mistral",
"model": "codestral-latest",
"apiKey": "your-key"
}
}
Or run locally for free with Ollama: ollama pull codestral:22b
Running locally eliminates rate limits entirely and keeps your code private β see our guide on what Codestral is and how it works for more on local vs. API tradeoffs.
Pricing
| Input | Output | |
|---|---|---|
| Codestral API | $0.30/1M | $0.90/1M |
| Via OpenRouter | ~$0.30/1M | ~$0.90/1M |
| Local (Ollama) | Free | Free |
For typical autocomplete usage (short prompts, short completions), expect to spend $1-3/month on the API. For heavy chat usage with long contexts, costs can reach $10-20/month.
Related: What is Codestral 2026 Β· Codestral vs DeepSeek Coder Β· Mistral API Guide Β· Best AI Autocomplete Models 2026 Β· Codestral Complete Guide Β· Continue.dev Complete Guide