πŸ€– AI Tools
Β· 6 min read

Qwen 3.7 Max vs MiniMax M3: China's Two Newest Frontier Models Compared (2026)


Qwen 3.7 Max and MiniMax M3 are the two newest Chinese frontier models as of June 2026. Both compete with GPT-5.5 on coding benchmarks. Both cost a fraction of American models. But they take fundamentally different approaches: Qwen is a text-only reasoning powerhouse with a 1M context window. M3 is a multimodal model with native vision, video, and computer use via a novel sparse attention architecture.

This guide breaks down exactly where each wins so you can pick the right one β€” or use both.

Head-to-head comparison

Qwen 3.7 MaxMiniMax M3
DeveloperAlibabaMiniMax (Shanghai)
ReleasedMay 20, 2026June 1, 2026
ArchitectureDense (undisclosed size)MSA (MiniMax Sparse Attention)
Input price$2.50/M$0.60/M
Output price$7.50/M$2.40/M
Context window1M tokens1M tokens (512K guaranteed)
ModalitiesText onlyβœ… Text + images + video
Computer useβŒβœ…
SWE-bench Pro~58%*59.0%
AI Index score56.6β€”
BrowseCompβ€”83.5%
SVG-Benchβ€”63.7%
MCP Atlasβ€”74.2%
Open weight❌ (API only)βœ… (weights in ~10 days)
Self-hostableβŒβœ…
Available on OpenRouterβœ…βœ…

*Qwen 3.7 Max’s SWE-bench Pro score estimated from available benchmark data.

Pricing: M3 is 4Γ— cheaper

The cost difference is substantial:

Qwen 3.7 MaxMiniMax M3Ratio
Input$2.50/M$0.60/M4.2Γ—
Output$7.50/M$2.40/M3.1Γ—
Cache~$0.25/M$0.12/M2Γ—
1hr coding session~$1.50~$0.503Γ—
Monthly (24/7 agent)~$1,080~$3603Γ—

M3 is significantly cheaper. For high-volume workloads, this 3-4Γ— gap adds up fast.

Where Qwen 3.7 Max wins

Reasoning depth

Qwen 3.7 Max scored 92.4% on GPQA Diamond (PhD-level science questions) β€” one of the highest scores recorded. It excels at tasks requiring deep multi-step reasoning, mathematical logic, and complex analysis. Its text-only focus means all compute goes toward reasoning rather than visual processing.

AI Intelligence Index

Qwen 3.7 Max scored 56.6 on Artificial Analysis’s Intelligence Index β€” the highest of any Chinese model at launch. This composite score measures across coding, reasoning, knowledge, and instruction following.

Established ecosystem

Qwen 3.7 has been available since May 20 with OpenRouter, DashScope, and multiple API providers. Community tooling, guides, and benchmarks are more mature. M3 launched June 1 β€” it is brand new with less community data.

Simpler pricing model

Qwen’s pricing is flat regardless of context length. M3 doubles its price above 512K tokens ($1.20/$4.80 for 512K-1M). If you routinely use 500K+ context, Qwen may be cheaper in that range despite higher base rates.

Where MiniMax M3 wins

Multimodal (images + video + computer use)

This is the decisive differentiator. M3 handles images and video natively, plus it can operate a desktop computer. Qwen 3.7 Max is text-only. If your workload involves:

  • Parsing screenshots or UI mockups
  • Processing video content
  • Visual code verification (write code β†’ view output β†’ fix)
  • Browser automation
  • Document/chart analysis

Then M3 is the only option. Qwen cannot do any of this.

Price (4Γ— cheaper)

At $0.60/$2.40 vs $2.50/$7.50, M3 is dramatically cheaper. For coding tasks where both perform similarly, M3 offers better value.

Open weight

M3 weights will be available ~June 10. You can self-host, fine-tune, and run completely offline. Qwen 3.7 Max is API-only with no self-hosting option. For enterprises with data privacy requirements, this alone decides the choice.

Long-context speed (MSA)

M3’s MiniMax Sparse Attention delivers 15.6Γ— faster decoding at 1M tokens compared to standard attention. While both support 1M context, M3 responds faster at long contexts. See our M3 1M context guide.

Browsing and web tasks

83.5% on BrowseComp makes M3 excellent for research agents and web scraping tasks. Qwen does not have comparable browsing benchmarks published.

Coding comparison

Both models compete with GPT-5.5 on coding:

BenchmarkQwen 3.7 MaxMiniMax M3Winner
SWE-bench Pro~58%59.0%M3 (slightly)
Terminal-Bench 2.1β€”66.0%M3 (no Qwen data)
SVG-Benchβ€”63.7%M3
GPQA Diamond92.4%β€”Qwen

For pure text-based coding tasks, both are roughly equivalent. M3 has a slight edge on SWE-bench Pro. Qwen has stronger reasoning scores. The practical difference in code quality is negligible for most tasks.

Decision framework

Your workloadBest choiceWhy
Pure text coding (budget)MiniMax M34Γ— cheaper, similar quality
Complex reasoning/mathQwen 3.7 Max92.4% GPQA, stronger reasoning
Multimodal (images/video)MiniMax M3Qwen is text-only
Computer use / GUI agentsMiniMax M3Only option
Self-hosting / privacyMiniMax M3Open weight
Research / web browsing agentsMiniMax M383.5% BrowseComp
Established tooling neededQwen 3.7 MaxMore mature ecosystem
Long documents (>512K tokens)Qwen 3.7 MaxFlat pricing vs M3’s 2Γ— above 512K

Using both

Both are available on OpenRouter. Route based on task type:

def choose_model(task):
    if task.has_images or task.has_video or task.needs_browser:
        return "minimax/minimax-m3"
    elif task.type == "math" or task.type == "complex_reasoning":
        return "qwen/qwen3.7-max"
    else:
        return "minimax/minimax-m3"  # Default: cheaper

The broader Chinese AI landscape

These two models sit at the top of an increasingly crowded Chinese frontier:

ModelInput/MOutput/MStrength
DeepSeek V4-Pro$0.435$0.87Cheapest frontier, highest SWE-bench Verified
MiMo V2.5 Pro$0.435$0.87Token efficiency, agentic coding
MiniMax M3$0.60$2.40Multimodal, computer use, open weight
Qwen 3.7 Max$2.50$7.50Reasoning depth, highest AI Index
Kimi K2.6$0.60$2.50Agent swarms, open weight
Step 3.7 Flash$0.20$0.80400 t/s speed, multimodal

All of these undercut Claude Opus 4.8 ($5/$25) and GPT-5.5 ($5/$30) by 3-60Γ—. See our full Chinese AI pricing analysis.

FAQ

Which is better for coding?

Roughly equivalent. M3 scores slightly higher on SWE-bench Pro (59% vs ~58%). For most coding tasks, you will not notice a quality difference. Pick based on price (M3 wins) or reasoning depth (Qwen wins for complex architecture decisions).

Can I use both through the same tool?

Yes. Both work via OpenRouter on a single API key. Both are OpenAI-compatible. Switch between them by changing the model string.

Which should I pick if I can only choose one?

MiniMax M3. It is cheaper, multimodal, open-weight, and scores equivalently on coding. The only reason to prefer Qwen is if you specifically need deep reasoning capabilities and do not need vision/multimodal.

When will Qwen 3.7 Max be open-weight?

Not announced. Alibaba has released open versions of previous Qwen models (3.6-27B, 3.6-35B) but 3.7 Max remains API-only. M3 weights are expected ~June 10.

How do they compare to DeepSeek V4-Pro?

Both are more expensive than DeepSeek V4-Pro ($0.435/$0.87). DeepSeek scores higher on SWE-bench Verified (80.6%) but is text-only. If you need multimodal, M3 is the cheapest option. If you need pure text coding at minimum cost, DeepSeek wins.

Is the Qwen 3.7 Plus version worth considering?

Qwen 3.7 Plus is the multimodal variant with vision. If you need Qwen + images, Plus exists β€” but M3 is cheaper and has video + computer use on top of vision.