πŸ€– AI Tools
Β· 3 min read
Last updated on

How to Use the Codestral API β€” Autocomplete and FIM Setup Guide


Codestral is one of the best models for code autocomplete. Here’s how to use its API for both chat completions and Fill-in-the-Middle (FIM).

Getting your API key

Sign up at console.mistral.ai and create an API key. For the full Mistral API guide, see our dedicated article. Codestral is available on both the standard API endpoint and the dedicated codestral endpoint.

Chat completion

Python

from mistralai import Mistral

client = Mistral(api_key="your-key")
response = client.chat.complete(
    model="codestral-latest",
    messages=[{"role": "user", "content": "Write a binary search in Python"}]
)
print(response.choices[0].message.content)

curl

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "messages": [{"role": "user", "content": "Write a binary search in Python"}],
    "temperature": 0.1
  }'

JavaScript / TypeScript

import Mistral from "@mistralai/mistralai";

const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });

const response = await client.chat.complete({
  model: "codestral-latest",
  messages: [{ role: "user", content: "Write a binary search in Python" }],
});

console.log(response.choices[0].message.content);

Fill-in-the-Middle (FIM)

FIM is Codestral’s killer feature β€” it understands code before AND after your cursor. This is what powers autocomplete in IDEs. See how Codestral compares to DeepSeek Coder on FIM tasks.

FIM endpoint details

The FIM endpoint is separate from chat completions:

  • Endpoint: https://codestral.mistral.ai/v1/fim/completions
  • Method: POST
  • Required fields: model, prompt (code before cursor)
  • Optional fields: suffix (code after cursor), temperature, max_tokens, stop

Python FIM example

response = client.fim.complete(
    model="codestral-latest",
    prompt="def calculate_tax(income, rate):\n    ",
    suffix="\n    return round(tax, 2)",
    temperature=0.1,
    max_tokens=128
)
# Returns the middle part that connects prompt to suffix
print(response.choices[0].message.content)
# Output: "tax = income * rate"

curl FIM example

curl https://codestral.mistral.ai/v1/fim/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "prompt": "def calculate_tax(income, rate):\n    ",
    "suffix": "\n    return round(tax, 2)",
    "temperature": 0.1,
    "max_tokens": 128
  }'

JavaScript FIM example

const response = await client.fim.complete({
  model: "codestral-latest",
  prompt: "function calculateTax(income, rate) {\n  ",
  suffix: "\n  return Math.round(tax * 100) / 100;\n}",
  temperature: 0.1,
});

console.log(response.choices[0].message.content);
// Output: "const tax = income * rate;"

Streaming responses

For real-time autocomplete, use streaming to get tokens as they’re generated:

stream = client.chat.stream(
    model="codestral-latest",
    messages=[{"role": "user", "content": "Write a merge sort in Rust"}]
)

for chunk in stream:
    content = chunk.data.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

curl streaming:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codestral-latest",
    "messages": [{"role": "user", "content": "Write a merge sort in Rust"}],
    "stream": true
  }'

Via OpenRouter

from openai import OpenAI
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key="your-key")
response = client.chat.completions.create(
    model="mistralai/codestral-latest",
    messages=[{"role": "user", "content": "Optimize this SQL query"}]
)

Error handling

Common errors and how to handle them:

from mistralai import Mistral
from mistralai.exceptions import MistralAPIException

client = Mistral(api_key="your-key")

try:
    response = client.chat.complete(
        model="codestral-latest",
        messages=[{"role": "user", "content": "Fix this code"}]
    )
except MistralAPIException as e:
    if e.status_code == 429:
        # Rate limited β€” back off and retry
        print("Rate limited. Waiting before retry...")
    elif e.status_code == 401:
        print("Invalid API key")
    elif e.status_code == 500:
        print("Server error β€” retry after a moment")
    else:
        print(f"API error {e.status_code}: {e.message}")

Rate limits

Codestral API rate limits (as of early 2026):

PlanRequests/minTokens/minTokens/day
Free tier30100K2M
Pay-as-you-go3001MUnlimited
EnterpriseCustomCustomUnlimited

For IDE autocomplete, the free tier is usually sufficient for individual use (autocomplete requests are small). Heavy batch processing or team usage requires pay-as-you-go.

IDE integration

For autocomplete in VS Code, use Continue.dev:

{
  "tabAutocompleteModel": {
    "provider": "mistral",
    "model": "codestral-latest",
    "apiKey": "your-key"
  }
}

Or run locally for free with Ollama: ollama pull codestral:22b

Running locally eliminates rate limits entirely and keeps your code private β€” see our guide on what Codestral is and how it works for more on local vs. API tradeoffs.

Pricing

InputOutput
Codestral API$0.30/1M$0.90/1M
Via OpenRouter~$0.30/1M~$0.90/1M
Local (Ollama)FreeFree

For typical autocomplete usage (short prompts, short completions), expect to spend $1-3/month on the API. For heavy chat usage with long contexts, costs can reach $10-20/month.

Related: What is Codestral 2026 Β· Codestral vs DeepSeek Coder Β· Mistral API Guide Β· Best AI Autocomplete Models 2026 Β· Codestral Complete Guide Β· Continue.dev Complete Guide