π Update (June 13, 2026): MiniMax M3 open weights are now available on Hugging Face (MiniMaxAI/MiniMax-M3). You can now download and self-host the 428B parameter model.
MiniMax released open weights for their M2.5 and M2.7 models. You can run them locally with Ollama for free, private AI coding. Hereβs how.
Available models
| Model | Parameters | Size (Q5_K_M) | RAM needed | Best for |
|---|---|---|---|---|
| MiniMax-M2.5 | ~45B (MoE, ~8B active) | ~6 GB | 8 GB | Fast coding, budget hardware |
| MiniMax-M2.7 | ~45B (MoE, ~8B active) | ~6 GB | 8 GB | Agentic coding, better quality |
Both use Mixture of Experts (MoE) architecture β 45B total parameters but only ~8B active per token. This means they run fast on modest hardware while maintaining quality close to much larger models.
Installation
# Install Ollama if you haven't
brew install ollama # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh # Linux
# Pull MiniMax M2.7 (recommended)
ollama pull minimax-m2.7
# Or M2.5 (older but stable)
ollama pull minimax-m2.5
# Test it
ollama run minimax-m2.7 "Write a Python function to validate email addresses"
Hardware requirements
| Hardware | M2.5/M2.7 performance | Usable? |
|---|---|---|
| MacBook Air M2 8GB | ~20 tok/s | β Good |
| MacBook Pro M3 16GB | ~30 tok/s | β Great |
| Mac Mini M4 24GB | ~35 tok/s | β Excellent |
| RTX 3080 10GB | ~40 tok/s | β Excellent |
| CPU only (modern) | ~8 tok/s | β οΈ Slow but works |
The MoE architecture is the key advantage β you get 45B model quality at 8B model speed and memory usage. See our VRAM guide for detailed calculations.
Connecting to coding tools
Aider
aider --model ollama/minimax-m2.7
Continue.dev (VS Code)
{
"models": [{
"title": "MiniMax M2.7 Local",
"provider": "ollama",
"model": "minimax-m2.7"
}]
}
OpenCode
opencode --provider ollama --model minimax-m2.7
MiniMax local vs API
| Local (Ollama) | API | |
|---|---|---|
| Cost | Free | ~$0.15/$0.60 per M tokens |
| Privacy | β Full | β Data sent to MiniMax |
| Speed | Depends on hardware | Fast (cloud GPU) |
| Context | Limited by RAM | 128K |
| Availability | Always on | Depends on service |
Run locally for privacy and zero cost. Use the API when you need faster responses or longer context.
MiniMax vs other local models
| Model | Size | Quality | Speed | Best for |
|---|---|---|---|---|
| MiniMax M2.7 | 6 GB | Good (agentic) | Fast | Agent tasks, tool calling |
| Qwen3 8B | 5 GB | Good (general) | Fast | All-purpose coding |
| DeepSeek R1 14B | 9 GB | Good (reasoning) | Medium | Debugging, complex logic |
| Devstral Small 24B | 16 GB | Best | Medium | Best local coding quality |
MiniMax M2.7 is the best choice when you want agentic behavior (reliable tool calling, multi-step planning) on budget hardware. For raw coding quality, Devstral Small 24B is better but needs 16GB+ RAM.
Troubleshooting
If you run into issues, check our Ollama troubleshooting guide. Common problems:
- Model not found: Check the exact model name with
ollama list - Too slow: Ensure GPU is being used (
ollama ps) - Out of memory: Try a more quantized version or close other apps
Related: MiniMax M2.7 Complete Guide Β· What is MiniMax? Β· MiniMax M2.5 vs M2.7 Β· Ollama Complete Guide Β· Best Ollama Models for Coding Β· Ollama Troubleshooting