π’ Update: MiMo V2.5 Pro is now available β significantly improved over V2. See the V2.5 complete guide, how to use the API, and V2.5 vs V2 Pro comparison.
MiMo V2 Pro is Xiaomiβs flagship coding model. You can run it locally with Ollama for free, private AI coding. Hereβs the setup.
Install and run
# Install Ollama
brew install ollama # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh # Linux
# Pull MiMo V2 Pro
ollama pull mimo-v2-pro
# Test it
ollama run mimo-v2-pro "Write a Python REST API with FastAPI and SQLAlchemy"
Hardware requirements
| Hardware | Performance | Usable? |
|---|---|---|
| MacBook Air M2 16GB | ~15 tok/s | β Good |
| MacBook Pro M3 36GB | ~25 tok/s | β Great |
| Mac Mini M4 Pro 48GB | ~30 tok/s | β Excellent |
| RTX 4090 24GB | ~40 tok/s | β Excellent |
| 8GB RAM (any) | Too slow | β Need 16GB+ |
MiMo V2 Pro needs at least 16GB RAM. If you only have 8GB, use Yi-Coder 9B or Qwen3 8B instead. If you want to experiment with larger models or faster inference, cloud GPU providers let you rent the exact hardware you need by the hour.
See our VRAM guide for exact memory calculations.
Connect to coding tools
Aider
aider --model ollama/mimo-v2-pro
This is the same setup we use for the Xiaomi agent in the AI Startup Race. See our MiMo + Aider guide for advanced configuration.
Continue.dev (VS Code)
{
"models": [{
"title": "MiMo V2 Pro Local",
"provider": "ollama",
"model": "mimo-v2-pro"
}]
}
OpenCode
opencode --provider ollama --model mimo-v2-pro
MiMo V2 Pro vs other local coding models
| Model | Size | RAM needed | Coding quality | Speed |
|---|---|---|---|---|
| MiMo V2 Pro | ~14 GB | 16 GB | Good | Fast |
| Devstral Small 24B | ~16 GB | 16 GB | Best | Medium |
| Qwen 3.5 27B | ~17 GB | 20 GB | Very good | Medium |
| DeepSeek R1 14B | ~9 GB | 12 GB | Good (reasoning) | Slow |
| Yi-Coder 9B | ~5 GB | 8 GB | Good | Fast |
MiMo V2 Pro sits in the middle β better than the small models (Yi-Coder, Qwen3 8B) but not quite as good as Devstral Small 24B for pure coding quality. Its advantage is speed β it generates code faster than the 24B+ models.
Local vs API
| Local (Ollama) | API (OpenRouter) | |
|---|---|---|
| Cost | Free | ~$25/mo |
| Privacy | β Full | β Data sent to API |
| Speed | Depends on hardware | Fast (cloud GPU) |
| Context | Limited by RAM | 128K |
| Offline | β Works offline | β Needs internet |
Run locally for privacy and zero cost. Use the API when you need faster responses or are on weaker hardware.
The MiMo V2 family locally
| Model | Use case | Ollama command |
|---|---|---|
| MiMo V2 Pro | Best quality coding | ollama pull mimo-v2-pro |
| MiMo V2 Omni | Balanced quality/speed | ollama pull mimo-v2-omni |
| MiMo V2 Flash | Fastest, lighter tasks | ollama pull mimo-v2-flash |
Use Pro for complex coding, Flash for quick questions and autocomplete. See our MiMo V2 family guide for detailed comparisons.
Troubleshooting
- βmodel not foundβ β check exact name with
ollama list - Too slow β verify GPU is being used:
ollama ps - Out of memory β try MiMo V2 Flash or a quantized version
- Context too short β increase with
--num-ctx 32768
See our Ollama troubleshooting guide for all common errors.
Related: MiMo V2 Family Guide Β· MiMo V2 Pro + Aider Setup Β· Best Ollama Models for Coding Β· Ollama Complete Guide Β· Ollama vs LM Studio vs vLLM