Qwen 3.7 Max vs MiniMax M3: China's Two Newest Frontier Models Compared (2026)
Qwen 3.7 Max and MiniMax M3 are the two newest Chinese frontier models as of June 2026. Both compete with GPT-5.5 on coding benchmarks. Both cost a fraction of American models. But they take fundamentally different approaches: Qwen is a text-only reasoning powerhouse with a 1M context window. M3 is a multimodal model with native vision, video, and computer use via a novel sparse attention architecture.
This guide breaks down exactly where each wins so you can pick the right one β or use both.
Head-to-head comparison
| Qwen 3.7 Max | MiniMax M3 | |
|---|---|---|
| Developer | Alibaba | MiniMax (Shanghai) |
| Released | May 20, 2026 | June 1, 2026 |
| Architecture | Dense (undisclosed size) | MSA (MiniMax Sparse Attention) |
| Input price | $2.50/M | $0.60/M |
| Output price | $7.50/M | $2.40/M |
| Context window | 1M tokens | 1M tokens (512K guaranteed) |
| Modalities | Text only | β Text + images + video |
| Computer use | β | β |
| SWE-bench Pro | ~58%* | 59.0% |
| AI Index score | 56.6 | β |
| BrowseComp | β | 83.5% |
| SVG-Bench | β | 63.7% |
| MCP Atlas | β | 74.2% |
| Open weight | β (API only) | β (weights in ~10 days) |
| Self-hostable | β | β |
| Available on OpenRouter | β | β |
*Qwen 3.7 Maxβs SWE-bench Pro score estimated from available benchmark data.
Pricing: M3 is 4Γ cheaper
The cost difference is substantial:
| Qwen 3.7 Max | MiniMax M3 | Ratio | |
|---|---|---|---|
| Input | $2.50/M | $0.60/M | 4.2Γ |
| Output | $7.50/M | $2.40/M | 3.1Γ |
| Cache | ~$0.25/M | $0.12/M | 2Γ |
| 1hr coding session | ~$1.50 | ~$0.50 | 3Γ |
| Monthly (24/7 agent) | ~$1,080 | ~$360 | 3Γ |
M3 is significantly cheaper. For high-volume workloads, this 3-4Γ gap adds up fast.
Where Qwen 3.7 Max wins
Reasoning depth
Qwen 3.7 Max scored 92.4% on GPQA Diamond (PhD-level science questions) β one of the highest scores recorded. It excels at tasks requiring deep multi-step reasoning, mathematical logic, and complex analysis. Its text-only focus means all compute goes toward reasoning rather than visual processing.
AI Intelligence Index
Qwen 3.7 Max scored 56.6 on Artificial Analysisβs Intelligence Index β the highest of any Chinese model at launch. This composite score measures across coding, reasoning, knowledge, and instruction following.
Established ecosystem
Qwen 3.7 has been available since May 20 with OpenRouter, DashScope, and multiple API providers. Community tooling, guides, and benchmarks are more mature. M3 launched June 1 β it is brand new with less community data.
Simpler pricing model
Qwenβs pricing is flat regardless of context length. M3 doubles its price above 512K tokens ($1.20/$4.80 for 512K-1M). If you routinely use 500K+ context, Qwen may be cheaper in that range despite higher base rates.
Where MiniMax M3 wins
Multimodal (images + video + computer use)
This is the decisive differentiator. M3 handles images and video natively, plus it can operate a desktop computer. Qwen 3.7 Max is text-only. If your workload involves:
- Parsing screenshots or UI mockups
- Processing video content
- Visual code verification (write code β view output β fix)
- Browser automation
- Document/chart analysis
Then M3 is the only option. Qwen cannot do any of this.
Price (4Γ cheaper)
At $0.60/$2.40 vs $2.50/$7.50, M3 is dramatically cheaper. For coding tasks where both perform similarly, M3 offers better value.
Open weight
M3 weights will be available ~June 10. You can self-host, fine-tune, and run completely offline. Qwen 3.7 Max is API-only with no self-hosting option. For enterprises with data privacy requirements, this alone decides the choice.
Long-context speed (MSA)
M3βs MiniMax Sparse Attention delivers 15.6Γ faster decoding at 1M tokens compared to standard attention. While both support 1M context, M3 responds faster at long contexts. See our M3 1M context guide.
Browsing and web tasks
83.5% on BrowseComp makes M3 excellent for research agents and web scraping tasks. Qwen does not have comparable browsing benchmarks published.
Coding comparison
Both models compete with GPT-5.5 on coding:
| Benchmark | Qwen 3.7 Max | MiniMax M3 | Winner |
|---|---|---|---|
| SWE-bench Pro | ~58% | 59.0% | M3 (slightly) |
| Terminal-Bench 2.1 | β | 66.0% | M3 (no Qwen data) |
| SVG-Bench | β | 63.7% | M3 |
| GPQA Diamond | 92.4% | β | Qwen |
For pure text-based coding tasks, both are roughly equivalent. M3 has a slight edge on SWE-bench Pro. Qwen has stronger reasoning scores. The practical difference in code quality is negligible for most tasks.
Decision framework
| Your workload | Best choice | Why |
|---|---|---|
| Pure text coding (budget) | MiniMax M3 | 4Γ cheaper, similar quality |
| Complex reasoning/math | Qwen 3.7 Max | 92.4% GPQA, stronger reasoning |
| Multimodal (images/video) | MiniMax M3 | Qwen is text-only |
| Computer use / GUI agents | MiniMax M3 | Only option |
| Self-hosting / privacy | MiniMax M3 | Open weight |
| Research / web browsing agents | MiniMax M3 | 83.5% BrowseComp |
| Established tooling needed | Qwen 3.7 Max | More mature ecosystem |
| Long documents (>512K tokens) | Qwen 3.7 Max | Flat pricing vs M3βs 2Γ above 512K |
Using both
Both are available on OpenRouter. Route based on task type:
def choose_model(task):
if task.has_images or task.has_video or task.needs_browser:
return "minimax/minimax-m3"
elif task.type == "math" or task.type == "complex_reasoning":
return "qwen/qwen3.7-max"
else:
return "minimax/minimax-m3" # Default: cheaper
The broader Chinese AI landscape
These two models sit at the top of an increasingly crowded Chinese frontier:
| Model | Input/M | Output/M | Strength |
|---|---|---|---|
| DeepSeek V4-Pro | $0.435 | $0.87 | Cheapest frontier, highest SWE-bench Verified |
| MiMo V2.5 Pro | $0.435 | $0.87 | Token efficiency, agentic coding |
| MiniMax M3 | $0.60 | $2.40 | Multimodal, computer use, open weight |
| Qwen 3.7 Max | $2.50 | $7.50 | Reasoning depth, highest AI Index |
| Kimi K2.6 | $0.60 | $2.50 | Agent swarms, open weight |
| Step 3.7 Flash | $0.20 | $0.80 | 400 t/s speed, multimodal |
All of these undercut Claude Opus 4.8 ($5/$25) and GPT-5.5 ($5/$30) by 3-60Γ. See our full Chinese AI pricing analysis.
FAQ
Which is better for coding?
Roughly equivalent. M3 scores slightly higher on SWE-bench Pro (59% vs ~58%). For most coding tasks, you will not notice a quality difference. Pick based on price (M3 wins) or reasoning depth (Qwen wins for complex architecture decisions).
Can I use both through the same tool?
Yes. Both work via OpenRouter on a single API key. Both are OpenAI-compatible. Switch between them by changing the model string.
Which should I pick if I can only choose one?
MiniMax M3. It is cheaper, multimodal, open-weight, and scores equivalently on coding. The only reason to prefer Qwen is if you specifically need deep reasoning capabilities and do not need vision/multimodal.
When will Qwen 3.7 Max be open-weight?
Not announced. Alibaba has released open versions of previous Qwen models (3.6-27B, 3.6-35B) but 3.7 Max remains API-only. M3 weights are expected ~June 10.
How do they compare to DeepSeek V4-Pro?
Both are more expensive than DeepSeek V4-Pro ($0.435/$0.87). DeepSeek scores higher on SWE-bench Verified (80.6%) but is text-only. If you need multimodal, M3 is the cheapest option. If you need pure text coding at minimum cost, DeepSeek wins.
Is the Qwen 3.7 Plus version worth considering?
Qwen 3.7 Plus is the multimodal variant with vision. If you need Qwen + images, Plus exists β but M3 is cheaper and has video + computer use on top of vision.