May 22, 2026 · 6 min read

Qwen 3.7 Complete Guide: Alibaba's Strongest AI Model Yet (2026)

Qwen 3.7 is Alibaba’s newest flagship AI model family, announced May 20-21, 2026 at the Alibaba Cloud Summit in Hangzhou. It comes in two variants: Qwen3.7-Max (text-only flagship) and Qwen3.7-Plus (multimodal with vision). Both are closed-weights and API-only for now.

The headline numbers: Intelligence Index v4.0 score of 56.6 (5th overall, #1 Chinese model), 1 million token context window, 50.8% on Terminal-Bench Hard, and the lowest hallucination rate among frontier models at 22.9%.

If you used Qwen 3.6, this is a significant step up. If you haven’t tried Qwen models before, 3.7 Max is the best entry point Alibaba has ever offered.

Key specs

Spec	Qwen3.7-Max	Qwen3.7-Plus
Type	Text flagship	Multimodal (vision + text)
Context window	1M tokens	1M tokens
Intelligence Index v4.0	56.6	TBD
Terminal-Bench Hard	50.8%	TBD
Humanity’s Last Exam	38.1%	TBD
CritPt	13.4%	TBD
Apex Math	44.5	TBD
MCP-Atlas	76.4	TBD
Arena AI Elo	1,475	TBD
Hallucination rate	22.9% (AA-Omniscience)	TBD
Pricing (input)	$2.50/1M tokens	TBD
Pricing (output)	$7.50/1M tokens	TBD
Open weights	No (expected later)	No
Local execution	Not possible yet	Not possible yet

Benchmark performance

Here’s how Qwen 3.7 Max stacks up against the current frontier:

Model	Intelligence Index v4.0	Terminal-Bench Hard	Humanity’s Last Exam	CritPt	Apex Math
GPT-5.5	60.2	N/A	N/A	N/A	N/A
Claude Opus 4.7	57.3	N/A	N/A	N/A	N/A
Gemini 3.1 Pro Preview	57.2	N/A	N/A	N/A	N/A
Qwen 3.7 Max	56.6	50.8%	38.1%	13.4%	44.5
Gemini 3.5 Flash	55.3	N/A	N/A	N/A	N/A

Qwen 3.7 Max is the #1 Chinese model on Intelligence Index v4.0 and beats Gemini 3.5 Flash by 1.3 points. On Apex Math (44.5), it surpasses Claude Opus 4.6 Max (34.5) by a wide margin.

The CritPt score of 13.4% is almost 4x the 3.6 result (3.7%), making it the strongest Chinese model on critical point reasoning.

Pricing

Qwen 3.7 Max is competitively priced:

Provider	Input	Output
DashScope (native)	$2.50/1M tokens	$7.50/1M tokens
OpenRouter	$2.50/1M tokens	$7.50/1M tokens

For comparison, Claude Opus 4.7 costs $15/$75 per million tokens. GPT-5.5 costs $10/$30. Qwen 3.7 Max delivers frontier-adjacent performance at a fraction of the price.

See our full API setup guide for step-by-step instructions.

Context window: 1 million tokens

Qwen 3.7 Max supports 1 million tokens of context, up from 256K on Qwen 3.6 Max. That’s roughly:

750,000 words of text
An entire medium-sized codebase
Multiple books in a single prompt

This makes it viable for repository-level code analysis, long document processing, and multi-session agent memory without external retrieval.

Autonomous capabilities

Alibaba demonstrated a 35-hour autonomous operation session with Qwen 3.7 Max, executing 1,158 tool calls without human intervention. This positions it as a serious contender for long-running agent workflows.

Key agent metrics:

MCP-Atlas: 76.4 points (strong tool use and protocol adherence)
Arena AI Elo: 1,475 (#13 overall)
Sustained operation: 35 hours demonstrated
Tool calls: 1,158 in a single session

If you’re building agents that need to run for hours or days, Qwen 3.7 Max has the endurance and reliability to handle it.

Cross-harness support

Qwen 3.7 Max supports the Anthropic API protocol natively. This means it works directly with tools built for Claude, including Claude Code. You can point Claude Code at Qwen 3.7 Max and it works out of the box.

This is a big deal for developers who want to use Claude Code’s interface but prefer Qwen’s pricing or performance characteristics. No adapter layer needed.

Who should use Qwen 3.7

Use Qwen 3.7 Max if you:

Need frontier-level reasoning at budget pricing
Build long-running autonomous agents
Want 1M context for codebase analysis
Use Claude Code but want cheaper inference
Need low hallucination rates for factual tasks
Work with Chinese language content (strongest Chinese model)

Use Qwen 3.7 Plus if you:

Need vision/multimodal capabilities
Want image understanding alongside text reasoning

Skip it if you:

Need to run models locally (closed weights, see alternatives)
Need absolute top performance (GPT-5.5 and Claude Opus 4.7 still lead)
Require open-source licensing for compliance

Limitations

No open weights yet. Following the 3.6 pattern, open-weight variants will likely come weeks to months after the API launch.
API-only. You cannot run Qwen 3.7 locally. No GGUF, no Ollama, no vLLM support yet.
Closed weights. No fine-tuning, no self-hosting, no auditing the model.
Behind top 3. GPT-5.5 (60.2), Claude Opus 4.7 (57.3), and Gemini 3.1 Pro Preview (57.2) still score higher on Intelligence Index.
Arena ranking. Elo 1,475 places it #13 overall, suggesting real-world chat performance lags behind benchmark scores.

How to get started

DashScope API: Sign up at dashscope.aliyuncs.com, get an API key, and start making requests. See our API guide.
OpenRouter: Available as qwen/qwen3.7-max at $2.50/1M input. Works with any OpenAI-compatible client.
Claude Code: Point your Claude Code installation at the Qwen 3.7 Max endpoint using the Anthropic protocol compatibility.

What changed from Qwen 3.6

For a detailed comparison, see Qwen 3.7 vs 3.6. The short version:

Context: 256K to 1M tokens
Terminal-Bench Hard: 43.9% to 50.8%
Humanity’s Last Exam: 28.9% to 38.1%
CritPt: 3.7% to 13.4% (almost 4x)
New: Anthropic API protocol support
New: 35-hour autonomous operation capability

How it compares to competitors

vs Gemini 3.5 Flash: Qwen 3.7 wins on Intelligence Index (56.6 vs 55.3) and math, Gemini wins on speed
vs Claude Opus 4.7: Claude leads on Intelligence Index (57.3 vs 56.6), Qwen wins massively on price ($2.50 vs $15 input)

FAQ

Is Qwen 3.7 free?

No. Qwen 3.7 Max costs $2.50/1M input tokens and $7.50/1M output tokens. There’s no free tier, but it’s significantly cheaper than Claude or GPT alternatives.

Can I run Qwen 3.7 locally?

Not yet. Both Max and Plus are closed-weights, API-only models. Open-weight variants are expected to follow based on Alibaba’s release pattern with 3.6. See our local guide for alternatives.

Does Qwen 3.7 work with Claude Code?

Yes. Qwen 3.7 Max supports the Anthropic API protocol natively, so it works as a drop-in backend for Claude Code without any adapter.

How does Qwen 3.7 compare to GPT-5.5?

GPT-5.5 scores 60.2 on Intelligence Index v4.0 vs Qwen 3.7’s 56.6. GPT-5.5 is stronger overall, but costs 4x more ($10/1M input vs $2.50/1M input).

What’s the context window?

1 million tokens. That’s roughly 750,000 words or an entire medium codebase in a single prompt.

Is Qwen 3.7 the best Chinese AI model?

Yes. It’s #1 among Chinese models on Intelligence Index v4.0 (56.6) and CritPt (13.4%). It’s also the first Chinese model to break into the top 5 overall on Intelligence Index.

When will open weights be released?

No official date. Based on the 3.6 pattern (API first, open weights weeks later), expect open-weight variants sometime in June or July 2026.

What’s the hallucination rate?

22.9% on AA-Omniscience, which is the lowest among all frontier models tested. This makes it particularly suitable for factual retrieval and knowledge-intensive tasks.