π’ Update: Mistral Medium 3.5 is now available β 128B dense model replacing Medium 3.1 and Devstral 2. See the Medium 3.5 complete guide, how to run it locally, and API guide.
Mistral has three models you can run locally for free: Codestral (autocomplete), Devstral Small 2 (coding agent), and Nemo (general chat). Hereβs how to set them up.
Which model for your hardware
| Hardware | RAM/VRAM | Best Mistral model | Command |
|---|---|---|---|
| RTX 4090 (24GB) | 24GB | Codestral 22B + Devstral Small 24B | Both fit |
| RTX 4070 (12GB) | 12GB | Codestral 22B (Q4) | ollama pull codestral:22b |
| Mac M4 32GB | 32GB | Devstral Small 24B + Codestral 22B | Both fit |
| Mac M4 16GB | 16GB | Codestral 22B (Q4) | ollama pull codestral:22b |
| 8GB VRAM/RAM | 8GB | Nemo 12B (Q4) | ollama pull mistral-nemo |
If your hardware doesnβt have enough VRAM for Codestral or Devstral, cloud GPU providers offer 24GB+ GPU instances starting at a few dollars per hour.
Setup with Ollama
# Install Ollama
brew install ollama # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh # Linux
# Pull your model
ollama pull codestral:22b # Best autocomplete (12GB)
ollama pull devstral-small:24b # Best local coding agent (14GB)
ollama pull mistral-nemo # General chat (7GB)
Use with coding tools
Continue.dev (VS Code autocomplete)
{
"tabAutocompleteModel": {
"provider": "ollama",
"model": "codestral:22b"
},
"models": [{
"provider": "ollama",
"model": "devstral-small:24b",
"title": "Devstral Small"
}]
}
See our Continue.dev guide.
Aider (terminal)
aider --model ollama/devstral-small:24b
See our Aider guide.
OpenCode (terminal)
{"providers": {"ollama": {"baseUrl": "http://localhost:11434"}}, "defaultModel": "ollama/devstral-small:24b"}
See our OpenCode guide.
Performance
| Model | Mac M4 32GB | RTX 4090 |
|---|---|---|
| Codestral 22B | ~25 tok/s | ~40 tok/s |
| Devstral Small 24B | ~22 tok/s | ~35 tok/s |
| Nemo 12B | ~35 tok/s | ~55 tok/s |
All fast enough for interactive coding. Codestralβs autocomplete feels instant at these speeds.
The ideal local Mistral setup
Run both Codestral (autocomplete) and Devstral Small (agent) on a 32GB machine. Ollama swaps models automatically β you donβt need to manage them manually.
For tasks that exceed local model quality, fall back to Devstral 2 via API ($2/1M tokens) or Mistral Large 2 for reasoning.
Related: Ollama Complete Guide Β· Best AI Models for Mac Β· Codestral Complete Guide Β· Best AI Models Under 16GB VRAM