Apr 25, 2026 · 3 min read

How to Run MiMo V2 Pro Locally with Ollama

📢 Update: MiMo V2.5 Pro is now available — significantly improved over V2. See the V2.5 complete guide, how to use the API, and V2.5 vs V2 Pro comparison.

MiMo V2 Pro is Xiaomi’s flagship coding model. You can run it locally with Ollama for free, private AI coding. Here’s the setup.

Install and run

# Install Ollama
brew install ollama  # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull MiMo V2 Pro
ollama pull mimo-v2-pro

# Test it
ollama run mimo-v2-pro "Write a Python REST API with FastAPI and SQLAlchemy"

Hardware requirements

Hardware	Performance	Usable?
MacBook Air M2 16GB	~15 tok/s	✅ Good
MacBook Pro M3 36GB	~25 tok/s	✅ Great
Mac Mini M4 Pro 48GB	~30 tok/s	✅ Excellent
RTX 4090 24GB	~40 tok/s	✅ Excellent
8GB RAM (any)	Too slow	❌ Need 16GB+

MiMo V2 Pro needs at least 16GB RAM. If you only have 8GB, use Yi-Coder 9B or Qwen3 8B instead. If you want to experiment with larger models or faster inference, cloud GPU providers let you rent the exact hardware you need by the hour.

See our VRAM guide for exact memory calculations.

Connect to coding tools

Aider

aider --model ollama/mimo-v2-pro

This is the same setup we use for the Xiaomi agent in the AI Startup Race. See our MiMo + Aider guide for advanced configuration.

Continue.dev (VS Code)

{
  "models": [{
    "title": "MiMo V2 Pro Local",
    "provider": "ollama",
    "model": "mimo-v2-pro"
  }]
}

OpenCode

opencode --provider ollama --model mimo-v2-pro

MiMo V2 Pro vs other local coding models

Model	Size	RAM needed	Coding quality	Speed
MiMo V2 Pro	~14 GB	16 GB	Good	Fast
Devstral Small 24B	~16 GB	16 GB	Best	Medium
Qwen 3.5 27B	~17 GB	20 GB	Very good	Medium
DeepSeek R1 14B	~9 GB	12 GB	Good (reasoning)	Slow
Yi-Coder 9B	~5 GB	8 GB	Good	Fast

MiMo V2 Pro sits in the middle — better than the small models (Yi-Coder, Qwen3 8B) but not quite as good as Devstral Small 24B for pure coding quality. Its advantage is speed — it generates code faster than the 24B+ models.

Local vs API

	Local (Ollama)	API (OpenRouter)
Cost	Free	~$25/mo
Privacy	✅ Full	❌ Data sent to API
Speed	Depends on hardware	Fast (cloud GPU)
Context	Limited by RAM	128K
Offline	✅ Works offline	❌ Needs internet

Run locally for privacy and zero cost. Use the API when you need faster responses or are on weaker hardware.

The MiMo V2 family locally

Model	Use case	Ollama command
MiMo V2 Pro	Best quality coding	`ollama pull mimo-v2-pro`
MiMo V2 Omni	Balanced quality/speed	`ollama pull mimo-v2-omni`
MiMo V2 Flash	Fastest, lighter tasks	`ollama pull mimo-v2-flash`

Use Pro for complex coding, Flash for quick questions and autocomplete. See our MiMo V2 family guide for detailed comparisons.

Troubleshooting

“model not found” — check exact name with ollama list
Too slow — verify GPU is being used: ollama ps
Out of memory — try MiMo V2 Flash or a quantized version
Context too short — increase with --num-ctx 32768

See our Ollama troubleshooting guide for all common errors.

How to Run MiMo V2 Pro Locally with Ollama

Install and run

Hardware requirements

Connect to coding tools

Aider

Continue.dev (VS Code)

OpenCode

MiMo V2 Pro vs other local coding models

Local vs API

The MiMo V2 family locally

Troubleshooting

📬 AI Dev Weekly

You might also like

How to Use Aider with Ollama — Free Local AI Coding Setup

How to Use OpenCode with Ollama — Free Local AI Coding Setup

How to Run Jais 2 Locally — Arabic AI Model Setup Guide

How to Run Falcon Models Locally with Ollama (2026)