Apr 11, 2026 · 2 min read

How to Run Mistral Models Locally — Ollama Setup Guide (2026)

📢 Update: Mistral Medium 3.5 is now available — 128B dense model replacing Medium 3.1 and Devstral 2. See the Medium 3.5 complete guide, how to run it locally, and API guide.

Mistral has three models you can run locally for free: Codestral (autocomplete), Devstral Small 2 (coding agent), and Nemo (general chat). Here’s how to set them up.

Which model for your hardware

Hardware	RAM/VRAM	Best Mistral model	Command
RTX 4090 (24GB)	24GB	Codestral 22B + Devstral Small 24B	Both fit
RTX 4070 (12GB)	12GB	Codestral 22B (Q4)	`ollama pull codestral:22b`
Mac M4 32GB	32GB	Devstral Small 24B + Codestral 22B	Both fit
Mac M4 16GB	16GB	Codestral 22B (Q4)	`ollama pull codestral:22b`
8GB VRAM/RAM	8GB	Nemo 12B (Q4)	`ollama pull mistral-nemo`

If your hardware doesn’t have enough VRAM for Codestral or Devstral, cloud GPU providers offer 24GB+ GPU instances starting at a few dollars per hour.

Setup with Ollama

# Install Ollama
brew install ollama  # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull your model
ollama pull codestral:22b        # Best autocomplete (12GB)
ollama pull devstral-small:24b   # Best local coding agent (14GB)
ollama pull mistral-nemo         # General chat (7GB)

Use with coding tools

Continue.dev (VS Code autocomplete)

{
  "tabAutocompleteModel": {
    "provider": "ollama",
    "model": "codestral:22b"
  },
  "models": [{
    "provider": "ollama",
    "model": "devstral-small:24b",
    "title": "Devstral Small"
  }]
}

See our Continue.dev guide.

Aider (terminal)

aider --model ollama/devstral-small:24b

See our Aider guide.

OpenCode (terminal)

{"providers": {"ollama": {"baseUrl": "http://localhost:11434"}}, "defaultModel": "ollama/devstral-small:24b"}

See our OpenCode guide.

Performance

Model	Mac M4 32GB	RTX 4090
Codestral 22B	~25 tok/s	~40 tok/s
Devstral Small 24B	~22 tok/s	~35 tok/s
Nemo 12B	~35 tok/s	~55 tok/s

All fast enough for interactive coding. Codestral’s autocomplete feels instant at these speeds.

The ideal local Mistral setup

Run both Codestral (autocomplete) and Devstral Small (agent) on a 32GB machine. Ollama swaps models automatically — you don’t need to manage them manually.

For tasks that exceed local model quality, fall back to Devstral 2 via API ($2/1M tokens) or Mistral Large 2 for reasoning.

How to Run Mistral Models Locally — Ollama Setup Guide (2026)

Which model for your hardware

Setup with Ollama

Use with coding tools

Continue.dev (VS Code autocomplete)

Aider (terminal)

OpenCode (terminal)

Performance

The ideal local Mistral setup

📬 AI Dev Weekly

You might also like

How to Run Mistral Large 2 Locally — Setup Guide (2026)

How to Use Aider with Ollama — Free Local AI Coding Setup

How to Use OpenCode with Ollama — Free Local AI Coding Setup

How to Run Jais 2 Locally — Arabic AI Model Setup Guide