MiniMax released M3 on June 1, 2026 β the first open-weight model to combine frontier-level coding, a 1-million-token context window, and native multimodal capabilities in a single model. It scores 59% on SWE-bench Pro (beating GPT-5.5βs 58.6%), supports text, image, and video input, can operate a desktop computer, and costs $0.60 per million input tokens.
The architectural innovation is MiniMax Sparse Attention (MSA), which delivers 15.6Γ faster decoding and 9.7Γ faster prefill compared to the previous M2 generation at million-token contexts. Weights will be released within 10 days of launch. The API is live now.
This is a significant moment for open-source AI: a model that genuinely competes with Claude Opus 4.8 and GPT-5.5 on real-world tasks, at a fraction of the cost, with weights you can download and self-host.
Quick specs
| Developer | MiniMax (Shanghai, China) |
| Release date | June 1, 2026 |
| Architecture | MiniMax Sparse Attention (MSA) |
| Context window | 1,000,000 tokens (512K guaranteed minimum) |
| Modalities | Text, images, video input β text output |
| Computer use | Yes (desktop operation) |
| Input pricing (β€512K) | $0.60/M tokens |
| Output pricing (β€512K) | $2.40/M tokens |
| Cache reads | $0.12/M tokens |
| Long context (512K-1M) | 2Γ standard rates ($1.20/$4.80) |
| Launch discount | 50% off for 7 days |
| Open weight | Yes (weights in ~10 days) |
| OpenRouter | Available (minimax/minimax-m3), 50% launch discount |
| Coding interface | code.minimax.io |
Benchmarks
| Benchmark | MiniMax M3 | Claude Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro | What it measures |
|---|---|---|---|---|---|
| SWE-bench Pro | 59.0% | 69.2% | 58.6% | 54.2% | Agentic coding |
| Terminal-Bench 2.1 | 66.0% | 74.2% | 72.1% | 70.0% | Command-line tasks |
| SVG-Bench | 63.7% | β | β | 59.2% | Visual code generation |
| BrowseComp | 83.5% | β | β | β | Web browsing accuracy |
| MCP Atlas | 74.2% | 82.2% | β | β | Multi-step tool use |
| BankerToolBench | Beats GPT-5.5 | β | β | β | Financial tool use |
The headline: M3 beats GPT-5.5 on SWE-bench Pro (59.0% vs 58.6%) while costing 12Γ less on input and 12.5Γ less on output. It trails Claude Opus 4.8 by 10 points on coding but leads on browsing (BrowseComp: 83.5%) and visual code generation (SVG-Bench: 63.7%).
MiniMax Sparse Attention (MSA)
The architectural breakthrough behind M3 is MSA β a new sparse attention mechanism that fundamentally changes the economics of long-context inference.
Key numbers vs M2:
- 15.6Γ faster decoding at 1M context
- 9.7Γ faster prefill at 1M context
- Works on uncompressed key-values (no precision loss)
Unlike DeepSeekβs Multi-head Latent Attention which compresses KV cache at the cost of some precision, MSA maintains full precision while achieving comparable or better speed improvements. This matters for tasks where subtle details in long contexts affect output quality β code analysis, legal document review, multi-file debugging.
The practical impact: running agentic workloads over entire codebases or massive document sets becomes economically viable in open-weight form for the first time.
Three pillars in one model
MiniMax positions M3 around three capabilities that have historically required separate models:
1. Frontier coding and agentic performance
M3 was tested autonomously reproducing an ICLR 2025 Outstanding Paper β running for nearly 12 hours, producing 18 commits and 23 experimental figures without human intervention. This is not a benchmark score; it is a demonstration of sustained autonomous execution over a complex research task.
For more on M3βs agentic capabilities, see our MiniMax M3 for Agentic Coding guide.
2. Million-token context
The 1M context window, powered by MSA, enables:
- Analyzing entire codebases in a single prompt
- Processing long video sequences
- Multi-document reasoning across hundreds of files
- Long-running agent sessions without context truncation
See our MiniMax M3 1M Context Guide for practical usage patterns.
3. Native multimodality
M3 handles images and video as first-class inputs β not bolted on after training. It can:
- Parse UI interfaces, charts, and documents
- Process video frames for temporal reasoning
- Operate a desktop computer (computer use)
- Generate structured data from visual content
Pricing comparison
| Model | Input/M | Output/M | Cache/M | Context | Open weight |
|---|---|---|---|---|---|
| MiniMax M3 | $0.60 | $2.40 | $0.12 | 1M | β (10 days) |
| MiniMax M3 (launch) | $0.30 | $1.20 | $0.06 | 1M | β |
| Claude Opus 4.8 | $5.00 | $25.00 | $0.50 | 1M | β |
| GPT-5.5 | $5.00 | $30.00 | N/A | 1M | β |
| DeepSeek V4-Pro | $0.435 | $0.87 | $0.004 | 1M | β |
| MiMo V2.5 Pro | $0.435 | $0.87 | $0.004 | 1M | β |
| Step 3.7 Flash | $0.20 | $0.80 | $0.04 | 256K | β |
M3 sits in the middle tier β more expensive than DeepSeek/MiMo ($0.435/$0.87) but far cheaper than Opus/GPT ($5/$25-30). The premium over DeepSeek buys you native multimodal, computer use, and the MSA speed advantage at long contexts.
During the 7-day launch discount (50% off), M3 costs $0.30/$1.20 β making it competitive with DeepSeek on input pricing.
How to use MiniMax M3
Via OpenRouter (fastest start)
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"]
)
response = client.chat.completions.create(
model="minimax/minimax-m3",
messages=[{"role": "user", "content": "Refactor this Express app to use dependency injection"}]
)
Via MiniMax API directly
client = OpenAI(
base_url="https://api.minimax.io/v1",
api_key=os.environ["MINIMAX_API_KEY"]
)
response = client.chat.completions.create(
model="minimax-m3",
messages=[{"role": "user", "content": "Analyze this codebase for security vulnerabilities"}]
)
For the full setup walkthrough, see our MiniMax M3 API Setup Guide.
MiniMax Code (dedicated coding interface)
MiniMax launched a dedicated coding interface at code.minimax.io β similar to how Anthropic has Claude Code. This provides a purpose-built environment for coding tasks with M3.
Who should use MiniMax M3
- Teams wanting open-weight frontier quality β M3 is the first open model matching GPT-5.5 on coding while offering 1M context and multimodal
- Long-context workloads β MSA makes 1M context fast and affordable
- Multimodal agent builders β Native image/video + computer use in one model
- Cost-conscious developers β 8-12Γ cheaper than Opus/GPT with competitive quality
- Self-hosting enterprises β Weights coming in 10 days for on-premise deployment
What M3 does NOT do well (yet)
- Abstract reasoning β Chinese models generally score below US labs on ARC-AGI-2 (generalized fluid intelligence)
- Opus-level coding β 10 points behind on SWE-bench Pro (59% vs 69.2%). For the hardest coding tasks, Opus 4.8 is still better.
- Ecosystem maturity β Newer model, less community tooling than DeepSeek or Claude
- Immediate self-hosting β Weights are not available yet (10 days). API only for now.
What changed from M2.7
For a detailed comparison, see MiniMax M3 vs M2.7: What Changed. The short version:
- MSA architecture (15.6Γ faster decoding at 1M context)
- 1M context window (up from 200K)
- Native multimodal (M2.7 was text-only)
- Computer use capability (new)
- Significantly higher coding benchmarks
- Open-weight (M2.7 was API-only)
FAQ
Is MiniMax M3 better than Claude Opus 4.8?
No β Opus 4.8 leads on coding (69.2% vs 59.0% SWE-bench Pro) and has dynamic workflows. But M3 is 8Γ cheaper, open-weight, and leads on browsing and visual code generation. See our full comparison.
Is MiniMax M3 better than GPT-5.5?
On SWE-bench Pro, yes (59.0% vs 58.6%). On Terminal-Bench, no (66.0% vs 72.1%). M3 is 12Γ cheaper and open-weight. For most coding tasks, M3 offers better value. See our M3 vs GPT-5.5 comparison.
When will weights be available?
Within 10 days of the June 1 launch β expected around June 10-11. A full technical report will accompany the release. See our how to run M3 locally guide for hardware requirements.
How does M3 compare to DeepSeek V4-Pro?
DeepSeek is cheaper ($0.435/$0.87 vs $0.60/$2.40) and scores higher on SWE-bench Verified (80.6%). M3 has native multimodal, computer use, and faster long-context inference via MSA. See our detailed comparison.
What is MiniMax Sparse Attention (MSA)?
A new attention architecture that enables fast inference at million-token contexts without compressing key-values. It delivers 15.6Γ faster decoding and 9.7Γ faster prefill compared to standard attention at 1M tokens, while maintaining full precision.
Can I use M3 with Aider or Claude Code?
M3 works with any tool supporting OpenAI-compatible endpoints (Aider, Continue, Cursor). It does not work directly with Claude Code (Anthropic models only). Use it via OpenRouter for the easiest integration.