May 7, 2026 · 3 min read

How to Run MiniMax Models Locally with Ollama

MiniMax released open weights for their M2.5 and M2.7 models. You can run them locally with Ollama for free, private AI coding. Here’s how.

Available models

Model	Parameters	Size (Q5_K_M)	RAM needed	Best for
MiniMax-M2.5	~45B (MoE, ~8B active)	~6 GB	8 GB	Fast coding, budget hardware
MiniMax-M2.7	~45B (MoE, ~8B active)	~6 GB	8 GB	Agentic coding, better quality

Both use Mixture of Experts (MoE) architecture — 45B total parameters but only ~8B active per token. This means they run fast on modest hardware while maintaining quality close to much larger models.

Installation

# Install Ollama if you haven't
brew install ollama  # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull MiniMax M2.7 (recommended)
ollama pull minimax-m2.7

# Or M2.5 (older but stable)
ollama pull minimax-m2.5

# Test it
ollama run minimax-m2.7 "Write a Python function to validate email addresses"

Hardware requirements

Hardware	M2.5/M2.7 performance	Usable?
MacBook Air M2 8GB	~20 tok/s	✅ Good
MacBook Pro M3 16GB	~30 tok/s	✅ Great
Mac Mini M4 24GB	~35 tok/s	✅ Excellent
RTX 3080 10GB	~40 tok/s	✅ Excellent
CPU only (modern)	~8 tok/s	⚠️ Slow but works

The MoE architecture is the key advantage — you get 45B model quality at 8B model speed and memory usage. See our VRAM guide for detailed calculations.

Connecting to coding tools

Aider

aider --model ollama/minimax-m2.7

Continue.dev (VS Code)

{
  "models": [{
    "title": "MiniMax M2.7 Local",
    "provider": "ollama",
    "model": "minimax-m2.7"
  }]
}

OpenCode

opencode --provider ollama --model minimax-m2.7

MiniMax local vs API

	Local (Ollama)	API
Cost	Free	~$0.15/$0.60 per M tokens
Privacy	✅ Full	❌ Data sent to MiniMax
Speed	Depends on hardware	Fast (cloud GPU)
Context	Limited by RAM	128K
Availability	Always on	Depends on service

Run locally for privacy and zero cost. Use the API when you need faster responses or longer context.

MiniMax vs other local models

Model	Size	Quality	Speed	Best for
MiniMax M2.7	6 GB	Good (agentic)	Fast	Agent tasks, tool calling
Qwen3 8B	5 GB	Good (general)	Fast	All-purpose coding
DeepSeek R1 14B	9 GB	Good (reasoning)	Medium	Debugging, complex logic
Devstral Small 24B	16 GB	Best	Medium	Best local coding quality

MiniMax M2.7 is the best choice when you want agentic behavior (reliable tool calling, multi-step planning) on budget hardware. For raw coding quality, Devstral Small 24B is better but needs 16GB+ RAM.

Troubleshooting

If you run into issues, check our Ollama troubleshooting guide. Common problems:

Model not found: Check the exact model name with ollama list
Too slow: Ensure GPU is being used (ollama ps)
Out of memory: Try a more quantized version or close other apps

How to Run MiniMax Models Locally with Ollama

Available models

Installation

Hardware requirements

Connecting to coding tools

Aider

Continue.dev (VS Code)

OpenCode

MiniMax local vs API

MiniMax vs other local models

Troubleshooting

📬 AI Dev Weekly

You might also like

How to Run Jais 2 Locally — Arabic AI Model Setup Guide

How to Run Falcon Models Locally with Ollama (2026)

How to Run Yi Models Locally with Ollama — Yi-34B and Yi-Coder

How to Run GLM-5.1 with Ollama — Local Setup Guide