πŸ€– AI Tools
Β· 2 min read

How to Run Mistral Models Locally β€” Ollama Setup Guide (2026)


πŸ“’ Update: Mistral Medium 3.5 is now available β€” 128B dense model replacing Medium 3.1 and Devstral 2. See the Medium 3.5 complete guide, how to run it locally, and API guide.

Mistral has three models you can run locally for free: Codestral (autocomplete), Devstral Small 2 (coding agent), and Nemo (general chat). Here’s how to set them up.

Which model for your hardware

HardwareRAM/VRAMBest Mistral modelCommand
RTX 4090 (24GB)24GBCodestral 22B + Devstral Small 24BBoth fit
RTX 4070 (12GB)12GBCodestral 22B (Q4)ollama pull codestral:22b
Mac M4 32GB32GBDevstral Small 24B + Codestral 22BBoth fit
Mac M4 16GB16GBCodestral 22B (Q4)ollama pull codestral:22b
8GB VRAM/RAM8GBNemo 12B (Q4)ollama pull mistral-nemo

If your hardware doesn’t have enough VRAM for Codestral or Devstral, cloud GPU providers offer 24GB+ GPU instances starting at a few dollars per hour.

Setup with Ollama

# Install Ollama
brew install ollama  # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull your model
ollama pull codestral:22b        # Best autocomplete (12GB)
ollama pull devstral-small:24b   # Best local coding agent (14GB)
ollama pull mistral-nemo         # General chat (7GB)

Use with coding tools

Continue.dev (VS Code autocomplete)

{
  "tabAutocompleteModel": {
    "provider": "ollama",
    "model": "codestral:22b"
  },
  "models": [{
    "provider": "ollama",
    "model": "devstral-small:24b",
    "title": "Devstral Small"
  }]
}

See our Continue.dev guide.

Aider (terminal)

aider --model ollama/devstral-small:24b

See our Aider guide.

OpenCode (terminal)

{"providers": {"ollama": {"baseUrl": "http://localhost:11434"}}, "defaultModel": "ollama/devstral-small:24b"}

See our OpenCode guide.

Performance

ModelMac M4 32GBRTX 4090
Codestral 22B~25 tok/s~40 tok/s
Devstral Small 24B~22 tok/s~35 tok/s
Nemo 12B~35 tok/s~55 tok/s

All fast enough for interactive coding. Codestral’s autocomplete feels instant at these speeds.

The ideal local Mistral setup

Run both Codestral (autocomplete) and Devstral Small (agent) on a 32GB machine. Ollama swaps models automatically β€” you don’t need to manage them manually.

For tasks that exceed local model quality, fall back to Devstral 2 via API ($2/1M tokens) or Mistral Large 2 for reasoning.

Related: Ollama Complete Guide Β· Best AI Models for Mac Β· Codestral Complete Guide Β· Best AI Models Under 16GB VRAM