Qwen 3.7 is Alibaba’s newest flagship AI model family, announced May 20-21, 2026 at the Alibaba Cloud Summit in Hangzhou. It comes in two variants: Qwen3.7-Max (text-only flagship) and Qwen3.7-Plus (multimodal with vision). Both are closed-weights and API-only for now.
The headline numbers: Intelligence Index v4.0 score of 56.6 (5th overall, #1 Chinese model), 1 million token context window, 50.8% on Terminal-Bench Hard, and the lowest hallucination rate among frontier models at 22.9%.
If you used Qwen 3.6, this is a significant step up. If you haven’t tried Qwen models before, 3.7 Max is the best entry point Alibaba has ever offered.
Key specs
| Spec | Qwen3.7-Max | Qwen3.7-Plus |
|---|---|---|
| Type | Text flagship | Multimodal (vision + text) |
| Context window | 1M tokens | 1M tokens |
| Intelligence Index v4.0 | 56.6 | TBD |
| Terminal-Bench Hard | 50.8% | TBD |
| Humanity’s Last Exam | 38.1% | TBD |
| CritPt | 13.4% | TBD |
| Apex Math | 44.5 | TBD |
| MCP-Atlas | 76.4 | TBD |
| Arena AI Elo | 1,475 | TBD |
| Hallucination rate | 22.9% (AA-Omniscience) | TBD |
| Pricing (input) | $2.50/1M tokens | TBD |
| Pricing (output) | $7.50/1M tokens | TBD |
| Open weights | No (expected later) | No |
| Local execution | Not possible yet | Not possible yet |
Benchmark performance
Here’s how Qwen 3.7 Max stacks up against the current frontier:
| Model | Intelligence Index v4.0 | Terminal-Bench Hard | Humanity’s Last Exam | CritPt | Apex Math |
|---|---|---|---|---|---|
| GPT-5.5 | 60.2 | N/A | N/A | N/A | N/A |
| Claude Opus 4.7 | 57.3 | N/A | N/A | N/A | N/A |
| Gemini 3.1 Pro Preview | 57.2 | N/A | N/A | N/A | N/A |
| Qwen 3.7 Max | 56.6 | 50.8% | 38.1% | 13.4% | 44.5 |
| Gemini 3.5 Flash | 55.3 | N/A | N/A | N/A | N/A |
Qwen 3.7 Max is the #1 Chinese model on Intelligence Index v4.0 and beats Gemini 3.5 Flash by 1.3 points. On Apex Math (44.5), it surpasses Claude Opus 4.6 Max (34.5) by a wide margin.
The CritPt score of 13.4% is almost 4x the 3.6 result (3.7%), making it the strongest Chinese model on critical point reasoning.
Pricing
Qwen 3.7 Max is competitively priced:
| Provider | Input | Output |
|---|---|---|
| DashScope (native) | $2.50/1M tokens | $7.50/1M tokens |
| OpenRouter | $2.50/1M tokens | $7.50/1M tokens |
For comparison, Claude Opus 4.7 costs $15/$75 per million tokens. GPT-5.5 costs $10/$30. Qwen 3.7 Max delivers frontier-adjacent performance at a fraction of the price.
See our full API setup guide for step-by-step instructions.
Context window: 1 million tokens
Qwen 3.7 Max supports 1 million tokens of context, up from 256K on Qwen 3.6 Max. That’s roughly:
- 750,000 words of text
- An entire medium-sized codebase
- Multiple books in a single prompt
This makes it viable for repository-level code analysis, long document processing, and multi-session agent memory without external retrieval.
Autonomous capabilities
Alibaba demonstrated a 35-hour autonomous operation session with Qwen 3.7 Max, executing 1,158 tool calls without human intervention. This positions it as a serious contender for long-running agent workflows.
Key agent metrics:
- MCP-Atlas: 76.4 points (strong tool use and protocol adherence)
- Arena AI Elo: 1,475 (#13 overall)
- Sustained operation: 35 hours demonstrated
- Tool calls: 1,158 in a single session
If you’re building agents that need to run for hours or days, Qwen 3.7 Max has the endurance and reliability to handle it.
Cross-harness support
Qwen 3.7 Max supports the Anthropic API protocol natively. This means it works directly with tools built for Claude, including Claude Code. You can point Claude Code at Qwen 3.7 Max and it works out of the box.
This is a big deal for developers who want to use Claude Code’s interface but prefer Qwen’s pricing or performance characteristics. No adapter layer needed.
Who should use Qwen 3.7
Use Qwen 3.7 Max if you:
- Need frontier-level reasoning at budget pricing
- Build long-running autonomous agents
- Want 1M context for codebase analysis
- Use Claude Code but want cheaper inference
- Need low hallucination rates for factual tasks
- Work with Chinese language content (strongest Chinese model)
Use Qwen 3.7 Plus if you:
- Need vision/multimodal capabilities
- Want image understanding alongside text reasoning
Skip it if you:
- Need to run models locally (closed weights, see alternatives)
- Need absolute top performance (GPT-5.5 and Claude Opus 4.7 still lead)
- Require open-source licensing for compliance
Limitations
- No open weights yet. Following the 3.6 pattern, open-weight variants will likely come weeks to months after the API launch.
- API-only. You cannot run Qwen 3.7 locally. No GGUF, no Ollama, no vLLM support yet.
- Closed weights. No fine-tuning, no self-hosting, no auditing the model.
- Behind top 3. GPT-5.5 (60.2), Claude Opus 4.7 (57.3), and Gemini 3.1 Pro Preview (57.2) still score higher on Intelligence Index.
- Arena ranking. Elo 1,475 places it #13 overall, suggesting real-world chat performance lags behind benchmark scores.
How to get started
- DashScope API: Sign up at dashscope.aliyuncs.com, get an API key, and start making requests. See our API guide.
- OpenRouter: Available as
qwen/qwen3.7-maxat $2.50/1M input. Works with any OpenAI-compatible client. - Claude Code: Point your Claude Code installation at the Qwen 3.7 Max endpoint using the Anthropic protocol compatibility.
What changed from Qwen 3.6
For a detailed comparison, see Qwen 3.7 vs 3.6. The short version:
- Context: 256K to 1M tokens
- Terminal-Bench Hard: 43.9% to 50.8%
- Humanity’s Last Exam: 28.9% to 38.1%
- CritPt: 3.7% to 13.4% (almost 4x)
- New: Anthropic API protocol support
- New: 35-hour autonomous operation capability
How it compares to competitors
- vs Gemini 3.5 Flash: Qwen 3.7 wins on Intelligence Index (56.6 vs 55.3) and math, Gemini wins on speed
- vs Claude Opus 4.7: Claude leads on Intelligence Index (57.3 vs 56.6), Qwen wins massively on price ($2.50 vs $15 input)
FAQ
Is Qwen 3.7 free?
No. Qwen 3.7 Max costs $2.50/1M input tokens and $7.50/1M output tokens. There’s no free tier, but it’s significantly cheaper than Claude or GPT alternatives.
Can I run Qwen 3.7 locally?
Not yet. Both Max and Plus are closed-weights, API-only models. Open-weight variants are expected to follow based on Alibaba’s release pattern with 3.6. See our local guide for alternatives.
Does Qwen 3.7 work with Claude Code?
Yes. Qwen 3.7 Max supports the Anthropic API protocol natively, so it works as a drop-in backend for Claude Code without any adapter.
How does Qwen 3.7 compare to GPT-5.5?
GPT-5.5 scores 60.2 on Intelligence Index v4.0 vs Qwen 3.7’s 56.6. GPT-5.5 is stronger overall, but costs 4x more ($10/1M input vs $2.50/1M input).
What’s the context window?
1 million tokens. That’s roughly 750,000 words or an entire medium codebase in a single prompt.
Is Qwen 3.7 the best Chinese AI model?
Yes. It’s #1 among Chinese models on Intelligence Index v4.0 (56.6) and CritPt (13.4%). It’s also the first Chinese model to break into the top 5 overall on Intelligence Index.
When will open weights be released?
No official date. Based on the 3.6 pattern (API first, open weights weeks later), expect open-weight variants sometime in June or July 2026.
What’s the hallucination rate?
22.9% on AA-Omniscience, which is the lowest among all frontier models tested. This makes it particularly suitable for factual retrieval and knowledge-intensive tasks.