πŸ€– AI Tools
Β· 7 min read

MiniMax M3: Complete Guide to the Open-Weight Frontier Model (2026)


MiniMax released M3 on June 1, 2026 β€” the first open-weight model to combine frontier-level coding, a 1-million-token context window, and native multimodal capabilities in a single model. It scores 59% on SWE-bench Pro (beating GPT-5.5’s 58.6%), supports text, image, and video input, can operate a desktop computer, and costs $0.60 per million input tokens.

The architectural innovation is MiniMax Sparse Attention (MSA), which delivers 15.6Γ— faster decoding and 9.7Γ— faster prefill compared to the previous M2 generation at million-token contexts. Weights will be released within 10 days of launch. The API is live now.

This is a significant moment for open-source AI: a model that genuinely competes with Claude Opus 4.8 and GPT-5.5 on real-world tasks, at a fraction of the cost, with weights you can download and self-host.

Quick specs

Developer MiniMax (Shanghai, China)
Release date June 1, 2026
Architecture MiniMax Sparse Attention (MSA)
Context window 1,000,000 tokens (512K guaranteed minimum)
Modalities Text, images, video input β†’ text output
Computer use Yes (desktop operation)
Input pricing (≀512K) $0.60/M tokens
Output pricing (≀512K) $2.40/M tokens
Cache reads $0.12/M tokens
Long context (512K-1M) 2Γ— standard rates ($1.20/$4.80)
Launch discount 50% off for 7 days
Open weight Yes (weights in ~10 days)
OpenRouter Available (minimax/minimax-m3), 50% launch discount
Coding interface code.minimax.io

Benchmarks

BenchmarkMiniMax M3Claude Opus 4.8GPT-5.5Gemini 3.1 ProWhat it measures
SWE-bench Pro59.0%69.2%58.6%54.2%Agentic coding
Terminal-Bench 2.166.0%74.2%72.1%70.0%Command-line tasks
SVG-Bench63.7%β€”β€”59.2%Visual code generation
BrowseComp83.5%β€”β€”β€”Web browsing accuracy
MCP Atlas74.2%82.2%β€”β€”Multi-step tool use
BankerToolBenchBeats GPT-5.5β€”β€”β€”Financial tool use

The headline: M3 beats GPT-5.5 on SWE-bench Pro (59.0% vs 58.6%) while costing 12Γ— less on input and 12.5Γ— less on output. It trails Claude Opus 4.8 by 10 points on coding but leads on browsing (BrowseComp: 83.5%) and visual code generation (SVG-Bench: 63.7%).

MiniMax Sparse Attention (MSA)

The architectural breakthrough behind M3 is MSA β€” a new sparse attention mechanism that fundamentally changes the economics of long-context inference.

Key numbers vs M2:

  • 15.6Γ— faster decoding at 1M context
  • 9.7Γ— faster prefill at 1M context
  • Works on uncompressed key-values (no precision loss)

Unlike DeepSeek’s Multi-head Latent Attention which compresses KV cache at the cost of some precision, MSA maintains full precision while achieving comparable or better speed improvements. This matters for tasks where subtle details in long contexts affect output quality β€” code analysis, legal document review, multi-file debugging.

The practical impact: running agentic workloads over entire codebases or massive document sets becomes economically viable in open-weight form for the first time.

Three pillars in one model

MiniMax positions M3 around three capabilities that have historically required separate models:

1. Frontier coding and agentic performance

M3 was tested autonomously reproducing an ICLR 2025 Outstanding Paper β€” running for nearly 12 hours, producing 18 commits and 23 experimental figures without human intervention. This is not a benchmark score; it is a demonstration of sustained autonomous execution over a complex research task.

For more on M3’s agentic capabilities, see our MiniMax M3 for Agentic Coding guide.

2. Million-token context

The 1M context window, powered by MSA, enables:

  • Analyzing entire codebases in a single prompt
  • Processing long video sequences
  • Multi-document reasoning across hundreds of files
  • Long-running agent sessions without context truncation

See our MiniMax M3 1M Context Guide for practical usage patterns.

3. Native multimodality

M3 handles images and video as first-class inputs β€” not bolted on after training. It can:

  • Parse UI interfaces, charts, and documents
  • Process video frames for temporal reasoning
  • Operate a desktop computer (computer use)
  • Generate structured data from visual content

Pricing comparison

ModelInput/MOutput/MCache/MContextOpen weight
MiniMax M3$0.60$2.40$0.121Mβœ… (10 days)
MiniMax M3 (launch)$0.30$1.20$0.061Mβœ…
Claude Opus 4.8$5.00$25.00$0.501M❌
GPT-5.5$5.00$30.00N/A1M❌
DeepSeek V4-Pro$0.435$0.87$0.0041Mβœ…
MiMo V2.5 Pro$0.435$0.87$0.0041Mβœ…
Step 3.7 Flash$0.20$0.80$0.04256Kβœ…

M3 sits in the middle tier β€” more expensive than DeepSeek/MiMo ($0.435/$0.87) but far cheaper than Opus/GPT ($5/$25-30). The premium over DeepSeek buys you native multimodal, computer use, and the MSA speed advantage at long contexts.

During the 7-day launch discount (50% off), M3 costs $0.30/$1.20 β€” making it competitive with DeepSeek on input pricing.

How to use MiniMax M3

Via OpenRouter (fastest start)

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"]
)

response = client.chat.completions.create(
    model="minimax/minimax-m3",
    messages=[{"role": "user", "content": "Refactor this Express app to use dependency injection"}]
)

Via MiniMax API directly

client = OpenAI(
    base_url="https://api.minimax.io/v1",
    api_key=os.environ["MINIMAX_API_KEY"]
)

response = client.chat.completions.create(
    model="minimax-m3",
    messages=[{"role": "user", "content": "Analyze this codebase for security vulnerabilities"}]
)

For the full setup walkthrough, see our MiniMax M3 API Setup Guide.

MiniMax Code (dedicated coding interface)

MiniMax launched a dedicated coding interface at code.minimax.io β€” similar to how Anthropic has Claude Code. This provides a purpose-built environment for coding tasks with M3.

Who should use MiniMax M3

  • Teams wanting open-weight frontier quality β€” M3 is the first open model matching GPT-5.5 on coding while offering 1M context and multimodal
  • Long-context workloads β€” MSA makes 1M context fast and affordable
  • Multimodal agent builders β€” Native image/video + computer use in one model
  • Cost-conscious developers β€” 8-12Γ— cheaper than Opus/GPT with competitive quality
  • Self-hosting enterprises β€” Weights coming in 10 days for on-premise deployment

What M3 does NOT do well (yet)

  • Abstract reasoning β€” Chinese models generally score below US labs on ARC-AGI-2 (generalized fluid intelligence)
  • Opus-level coding β€” 10 points behind on SWE-bench Pro (59% vs 69.2%). For the hardest coding tasks, Opus 4.8 is still better.
  • Ecosystem maturity β€” Newer model, less community tooling than DeepSeek or Claude
  • Immediate self-hosting β€” Weights are not available yet (10 days). API only for now.

What changed from M2.7

For a detailed comparison, see MiniMax M3 vs M2.7: What Changed. The short version:

  • MSA architecture (15.6Γ— faster decoding at 1M context)
  • 1M context window (up from 200K)
  • Native multimodal (M2.7 was text-only)
  • Computer use capability (new)
  • Significantly higher coding benchmarks
  • Open-weight (M2.7 was API-only)

FAQ

Is MiniMax M3 better than Claude Opus 4.8?

No β€” Opus 4.8 leads on coding (69.2% vs 59.0% SWE-bench Pro) and has dynamic workflows. But M3 is 8Γ— cheaper, open-weight, and leads on browsing and visual code generation. See our full comparison.

Is MiniMax M3 better than GPT-5.5?

On SWE-bench Pro, yes (59.0% vs 58.6%). On Terminal-Bench, no (66.0% vs 72.1%). M3 is 12Γ— cheaper and open-weight. For most coding tasks, M3 offers better value. See our M3 vs GPT-5.5 comparison.

When will weights be available?

Within 10 days of the June 1 launch β€” expected around June 10-11. A full technical report will accompany the release. See our how to run M3 locally guide for hardware requirements.

How does M3 compare to DeepSeek V4-Pro?

DeepSeek is cheaper ($0.435/$0.87 vs $0.60/$2.40) and scores higher on SWE-bench Verified (80.6%). M3 has native multimodal, computer use, and faster long-context inference via MSA. See our detailed comparison.

What is MiniMax Sparse Attention (MSA)?

A new attention architecture that enables fast inference at million-token contexts without compressing key-values. It delivers 15.6Γ— faster decoding and 9.7Γ— faster prefill compared to standard attention at 1M tokens, while maintaining full precision.

Can I use M3 with Aider or Claude Code?

M3 works with any tool supporting OpenAI-compatible endpoints (Aider, Continue, Cursor). It does not work directly with Claude Code (Anthropic models only). Use it via OpenRouter for the easiest integration.