MiniMax released open weights for their M2.5 and M2.7 models. You can run them locally with Ollama for free, private AI coding. Hereβs how.
Available models
| Model | Parameters | Size (Q5_K_M) | RAM needed | Best for |
|---|---|---|---|---|
| MiniMax-M2.5 | ~45B (MoE, ~8B active) | ~6 GB | 8 GB | Fast coding, budget hardware |
| MiniMax-M2.7 | ~45B (MoE, ~8B active) | ~6 GB | 8 GB | Agentic coding, better quality |
Both use Mixture of Experts (MoE) architecture β 45B total parameters but only ~8B active per token. This means they run fast on modest hardware while maintaining quality close to much larger models.
Installation
# Install Ollama if you haven't
brew install ollama # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh # Linux
# Pull MiniMax M2.7 (recommended)
ollama pull minimax-m2.7
# Or M2.5 (older but stable)
ollama pull minimax-m2.5
# Test it
ollama run minimax-m2.7 "Write a Python function to validate email addresses"
Hardware requirements
| Hardware | M2.5/M2.7 performance | Usable? |
|---|---|---|
| MacBook Air M2 8GB | ~20 tok/s | β Good |
| MacBook Pro M3 16GB | ~30 tok/s | β Great |
| Mac Mini M4 24GB | ~35 tok/s | β Excellent |
| RTX 3080 10GB | ~40 tok/s | β Excellent |
| CPU only (modern) | ~8 tok/s | β οΈ Slow but works |
The MoE architecture is the key advantage β you get 45B model quality at 8B model speed and memory usage. See our VRAM guide for detailed calculations.
Connecting to coding tools
Aider
aider --model ollama/minimax-m2.7
Continue.dev (VS Code)
{
"models": [{
"title": "MiniMax M2.7 Local",
"provider": "ollama",
"model": "minimax-m2.7"
}]
}
OpenCode
opencode --provider ollama --model minimax-m2.7
MiniMax local vs API
| Local (Ollama) | API | |
|---|---|---|
| Cost | Free | ~$0.15/$0.60 per M tokens |
| Privacy | β Full | β Data sent to MiniMax |
| Speed | Depends on hardware | Fast (cloud GPU) |
| Context | Limited by RAM | 128K |
| Availability | Always on | Depends on service |
Run locally for privacy and zero cost. Use the API when you need faster responses or longer context.
MiniMax vs other local models
| Model | Size | Quality | Speed | Best for |
|---|---|---|---|---|
| MiniMax M2.7 | 6 GB | Good (agentic) | Fast | Agent tasks, tool calling |
| Qwen3 8B | 5 GB | Good (general) | Fast | All-purpose coding |
| DeepSeek R1 14B | 9 GB | Good (reasoning) | Medium | Debugging, complex logic |
| Devstral Small 24B | 16 GB | Best | Medium | Best local coding quality |
MiniMax M2.7 is the best choice when you want agentic behavior (reliable tool calling, multi-step planning) on budget hardware. For raw coding quality, Devstral Small 24B is better but needs 16GB+ RAM.
Troubleshooting
If you run into issues, check our Ollama troubleshooting guide. Common problems:
- Model not found: Check the exact model name with
ollama list - Too slow: Ensure GPU is being used (
ollama ps) - Out of memory: Try a more quantized version or close other apps
Related: MiniMax M2.7 Complete Guide Β· What is MiniMax? Β· MiniMax M2.5 vs M2.7 Β· Ollama Complete Guide Β· Best Ollama Models for Coding Β· Ollama Troubleshooting