Apr 11, 2026 · 2 min read

How to Run Mistral Models Locally — Ollama Setup Guide (2026)

Mistral has three models you can run locally for free: Codestral (autocomplete), Devstral Small 2 (coding agent), and Nemo (general chat). Here’s how to set them up.

Which model for your hardware

Hardware	RAM/VRAM	Best Mistral model	Command
RTX 4090 (24GB)	24GB	Codestral 22B + Devstral Small 24B	Both fit
RTX 4070 (12GB)	12GB	Codestral 22B (Q4)	`ollama pull codestral:22b`
Mac M4 32GB	32GB	Devstral Small 24B + Codestral 22B	Both fit
Mac M4 16GB	16GB	Codestral 22B (Q4)	`ollama pull codestral:22b`
8GB VRAM/RAM	8GB	Nemo 12B (Q4)	`ollama pull mistral-nemo`

Setup with Ollama

# Install Ollama
brew install ollama  # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull your model
ollama pull codestral:22b        # Best autocomplete (12GB)
ollama pull devstral-small:24b   # Best local coding agent (14GB)
ollama pull mistral-nemo         # General chat (7GB)

Use with coding tools

Continue.dev (VS Code autocomplete)

{
  "tabAutocompleteModel": {
    "provider": "ollama",
    "model": "codestral:22b"
  },
  "models": [{
    "provider": "ollama",
    "model": "devstral-small:24b",
    "title": "Devstral Small"
  }]
}

See our Continue.dev guide.

Aider (terminal)

aider --model ollama/devstral-small:24b

See our Aider guide.

OpenCode (terminal)

{"providers": {"ollama": {"baseUrl": "http://localhost:11434"}}, "defaultModel": "ollama/devstral-small:24b"}

See our OpenCode guide.

Performance

Model	Mac M4 32GB	RTX 4090
Codestral 22B	~25 tok/s	~40 tok/s
Devstral Small 24B	~22 tok/s	~35 tok/s
Nemo 12B	~35 tok/s	~55 tok/s

All fast enough for interactive coding. Codestral’s autocomplete feels instant at these speeds.

The ideal local Mistral setup

Run both Codestral (autocomplete) and Devstral Small (agent) on a 32GB machine. Ollama swaps models automatically — you don’t need to manage them manually.

For tasks that exceed local model quality, fall back to Devstral 2 via API ($2/1M tokens) or Mistral Large 2 for reasoning.

How to Run Mistral Models Locally — Ollama Setup Guide (2026)

Which model for your hardware

Setup with Ollama

Use with coding tools

Continue.dev (VS Code autocomplete)

Aider (terminal)

OpenCode (terminal)

Performance

The ideal local Mistral setup

📬 Get weekly dev tools & AI tips

You might also like

How to Run GLM-5.1 Locally — Hardware, Setup, and Quantization Guide

How to Run Gemma 4 Locally — Complete Setup Guide (2026)

Mistral API Guide — Endpoints, Pricing, and Code Examples (2026)

How to Use GLM-5.1 with Claude Code — Complete Setup Guide