Jun 4, 2026 · 6 min read

Qwen 3.7 Max vs MiniMax M3: China's Two Newest Frontier Models Compared (2026)

Qwen 3.7 Max and MiniMax M3 are the two newest Chinese frontier models as of June 2026. Both compete with GPT-5.5 on coding benchmarks. Both cost a fraction of American models. But they take fundamentally different approaches: Qwen is a text-only reasoning powerhouse with a 1M context window. M3 is a multimodal model with native vision, video, and computer use via a novel sparse attention architecture.

This guide breaks down exactly where each wins so you can pick the right one — or use both.

Head-to-head comparison

	Qwen 3.7 Max	MiniMax M3
Developer	Alibaba	MiniMax (Shanghai)
Released	May 20, 2026	June 1, 2026
Architecture	Dense (undisclosed size)	MSA (MiniMax Sparse Attention)
Input price	$2.50/M	$0.60/M
Output price	$7.50/M	$2.40/M
Context window	1M tokens	1M tokens (512K guaranteed)
Modalities	Text only	✅ Text + images + video
Computer use	❌	✅
SWE-bench Pro	~58%*	59.0%
AI Index score	56.6	—
BrowseComp	—	83.5%
SVG-Bench	—	63.7%
MCP Atlas	—	74.2%
Open weight	❌ (API only)	✅ (weights in ~10 days)
Self-hostable	❌	✅
Available on OpenRouter	✅	✅

*Qwen 3.7 Max’s SWE-bench Pro score estimated from available benchmark data.

Pricing: M3 is 4× cheaper

The cost difference is substantial:

	Qwen 3.7 Max	MiniMax M3	Ratio
Input	$2.50/M	$0.60/M	4.2×
Output	$7.50/M	$2.40/M	3.1×
Cache	~$0.25/M	$0.12/M	2×
1hr coding session	~$1.50	~$0.50	3×
Monthly (24/7 agent)	~$1,080	~$360	3×

M3 is significantly cheaper. For high-volume workloads, this 3-4× gap adds up fast.

Where Qwen 3.7 Max wins

Reasoning depth

Qwen 3.7 Max scored 92.4% on GPQA Diamond (PhD-level science questions) — one of the highest scores recorded. It excels at tasks requiring deep multi-step reasoning, mathematical logic, and complex analysis. Its text-only focus means all compute goes toward reasoning rather than visual processing.

AI Intelligence Index

Qwen 3.7 Max scored 56.6 on Artificial Analysis’s Intelligence Index — the highest of any Chinese model at launch. This composite score measures across coding, reasoning, knowledge, and instruction following.

Established ecosystem

Qwen 3.7 has been available since May 20 with OpenRouter, DashScope, and multiple API providers. Community tooling, guides, and benchmarks are more mature. M3 launched June 1 — it is brand new with less community data.

Simpler pricing model

Qwen’s pricing is flat regardless of context length. M3 doubles its price above 512K tokens ($1.20/$4.80 for 512K-1M). If you routinely use 500K+ context, Qwen may be cheaper in that range despite higher base rates.

Where MiniMax M3 wins

Multimodal (images + video + computer use)

This is the decisive differentiator. M3 handles images and video natively, plus it can operate a desktop computer. Qwen 3.7 Max is text-only. If your workload involves:

Parsing screenshots or UI mockups
Processing video content
Visual code verification (write code → view output → fix)
Browser automation
Document/chart analysis

Then M3 is the only option. Qwen cannot do any of this.

Price (4× cheaper)

At $0.60/$2.40 vs $2.50/$7.50, M3 is dramatically cheaper. For coding tasks where both perform similarly, M3 offers better value.

Open weight

M3 weights will be available ~June 10. You can self-host, fine-tune, and run completely offline. Qwen 3.7 Max is API-only with no self-hosting option. For enterprises with data privacy requirements, this alone decides the choice.

Long-context speed (MSA)

M3’s MiniMax Sparse Attention delivers 15.6× faster decoding at 1M tokens compared to standard attention. While both support 1M context, M3 responds faster at long contexts. See our M3 1M context guide.

Browsing and web tasks

83.5% on BrowseComp makes M3 excellent for research agents and web scraping tasks. Qwen does not have comparable browsing benchmarks published.

Coding comparison

Both models compete with GPT-5.5 on coding:

Benchmark	Qwen 3.7 Max	MiniMax M3	Winner
SWE-bench Pro	~58%	59.0%	M3 (slightly)
Terminal-Bench 2.1	—	66.0%	M3 (no Qwen data)
SVG-Bench	—	63.7%	M3
GPQA Diamond	92.4%	—	Qwen

For pure text-based coding tasks, both are roughly equivalent. M3 has a slight edge on SWE-bench Pro. Qwen has stronger reasoning scores. The practical difference in code quality is negligible for most tasks.

Decision framework

Your workload	Best choice	Why
Pure text coding (budget)	MiniMax M3	4× cheaper, similar quality
Complex reasoning/math	Qwen 3.7 Max	92.4% GPQA, stronger reasoning
Multimodal (images/video)	MiniMax M3	Qwen is text-only
Computer use / GUI agents	MiniMax M3	Only option
Self-hosting / privacy	MiniMax M3	Open weight
Research / web browsing agents	MiniMax M3	83.5% BrowseComp
Established tooling needed	Qwen 3.7 Max	More mature ecosystem
Long documents (>512K tokens)	Qwen 3.7 Max	Flat pricing vs M3’s 2× above 512K

Using both

Both are available on OpenRouter. Route based on task type:

def choose_model(task):
    if task.has_images or task.has_video or task.needs_browser:
        return "minimax/minimax-m3"
    elif task.type == "math" or task.type == "complex_reasoning":
        return "qwen/qwen3.7-max"
    else:
        return "minimax/minimax-m3"  # Default: cheaper

The broader Chinese AI landscape

These two models sit at the top of an increasingly crowded Chinese frontier:

Model	Input/M	Output/M	Strength
DeepSeek V4-Pro	$0.435	$0.87	Cheapest frontier, highest SWE-bench Verified
MiMo V2.5 Pro	$0.435	$0.87	Token efficiency, agentic coding
MiniMax M3	$0.60	$2.40	Multimodal, computer use, open weight
Qwen 3.7 Max	$2.50	$7.50	Reasoning depth, highest AI Index
Kimi K2.6	$0.60	$2.50	Agent swarms, open weight
Step 3.7 Flash	$0.20	$0.80	400 t/s speed, multimodal

All of these undercut Claude Opus 4.8 ($5/$25) and GPT-5.5 ($5/$30) by 3-60×. See our full Chinese AI pricing analysis.

FAQ

Which is better for coding?

Roughly equivalent. M3 scores slightly higher on SWE-bench Pro (59% vs ~58%). For most coding tasks, you will not notice a quality difference. Pick based on price (M3 wins) or reasoning depth (Qwen wins for complex architecture decisions).

Can I use both through the same tool?

Yes. Both work via OpenRouter on a single API key. Both are OpenAI-compatible. Switch between them by changing the model string.

Which should I pick if I can only choose one?

MiniMax M3. It is cheaper, multimodal, open-weight, and scores equivalently on coding. The only reason to prefer Qwen is if you specifically need deep reasoning capabilities and do not need vision/multimodal.

When will Qwen 3.7 Max be open-weight?

Not announced. Alibaba has released open versions of previous Qwen models (3.6-27B, 3.6-35B) but 3.7 Max remains API-only. M3 weights are expected ~June 10.

How do they compare to DeepSeek V4-Pro?

Both are more expensive than DeepSeek V4-Pro ($0.435/$0.87). DeepSeek scores higher on SWE-bench Verified (80.6%) but is text-only. If you need multimodal, M3 is the cheapest option. If you need pure text coding at minimum cost, DeepSeek wins.

Is the Qwen 3.7 Plus version worth considering?

Qwen 3.7 Plus is the multimodal variant with vision. If you need Qwen + images, Plus exists — but M3 is cheaper and has video + computer use on top of vision.