πŸ€– AI Tools
Β· 3 min read

How to Run MiniMax Models Locally with Ollama


MiniMax released open weights for their M2.5 and M2.7 models. You can run them locally with Ollama for free, private AI coding. Here’s how.

Available models

ModelParametersSize (Q5_K_M)RAM neededBest for
MiniMax-M2.5~45B (MoE, ~8B active)~6 GB8 GBFast coding, budget hardware
MiniMax-M2.7~45B (MoE, ~8B active)~6 GB8 GBAgentic coding, better quality

Both use Mixture of Experts (MoE) architecture β€” 45B total parameters but only ~8B active per token. This means they run fast on modest hardware while maintaining quality close to much larger models.

Installation

# Install Ollama if you haven't
brew install ollama  # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull MiniMax M2.7 (recommended)
ollama pull minimax-m2.7

# Or M2.5 (older but stable)
ollama pull minimax-m2.5

# Test it
ollama run minimax-m2.7 "Write a Python function to validate email addresses"

Hardware requirements

HardwareM2.5/M2.7 performanceUsable?
MacBook Air M2 8GB~20 tok/sβœ… Good
MacBook Pro M3 16GB~30 tok/sβœ… Great
Mac Mini M4 24GB~35 tok/sβœ… Excellent
RTX 3080 10GB~40 tok/sβœ… Excellent
CPU only (modern)~8 tok/s⚠️ Slow but works

The MoE architecture is the key advantage β€” you get 45B model quality at 8B model speed and memory usage. See our VRAM guide for detailed calculations.

Connecting to coding tools

Aider

aider --model ollama/minimax-m2.7

Continue.dev (VS Code)

{
  "models": [{
    "title": "MiniMax M2.7 Local",
    "provider": "ollama",
    "model": "minimax-m2.7"
  }]
}

OpenCode

opencode --provider ollama --model minimax-m2.7

MiniMax local vs API

Local (Ollama)API
CostFree~$0.15/$0.60 per M tokens
Privacyβœ… Full❌ Data sent to MiniMax
SpeedDepends on hardwareFast (cloud GPU)
ContextLimited by RAM128K
AvailabilityAlways onDepends on service

Run locally for privacy and zero cost. Use the API when you need faster responses or longer context.

MiniMax vs other local models

ModelSizeQualitySpeedBest for
MiniMax M2.76 GBGood (agentic)FastAgent tasks, tool calling
Qwen3 8B5 GBGood (general)FastAll-purpose coding
DeepSeek R1 14B9 GBGood (reasoning)MediumDebugging, complex logic
Devstral Small 24B16 GBBestMediumBest local coding quality

MiniMax M2.7 is the best choice when you want agentic behavior (reliable tool calling, multi-step planning) on budget hardware. For raw coding quality, Devstral Small 24B is better but needs 16GB+ RAM.

Troubleshooting

If you run into issues, check our Ollama troubleshooting guide. Common problems:

  • Model not found: Check the exact model name with ollama list
  • Too slow: Ensure GPU is being used (ollama ps)
  • Out of memory: Try a more quantized version or close other apps

Related: MiniMax M2.7 Complete Guide Β· What is MiniMax? Β· MiniMax M2.5 vs M2.7 Β· Ollama Complete Guide Β· Best Ollama Models for Coding Β· Ollama Troubleshooting