🤖 AI Tools
· 6 min read

Qwen 3.7 Complete Guide: Alibaba's Strongest AI Model Yet (2026)


Qwen 3.7 is Alibaba’s newest flagship AI model family, announced May 20-21, 2026 at the Alibaba Cloud Summit in Hangzhou. It comes in two variants: Qwen3.7-Max (text-only flagship) and Qwen3.7-Plus (multimodal with vision). Both are closed-weights and API-only for now.

The headline numbers: Intelligence Index v4.0 score of 56.6 (5th overall, #1 Chinese model), 1 million token context window, 50.8% on Terminal-Bench Hard, and the lowest hallucination rate among frontier models at 22.9%.

If you used Qwen 3.6, this is a significant step up. If you haven’t tried Qwen models before, 3.7 Max is the best entry point Alibaba has ever offered.

Key specs

SpecQwen3.7-MaxQwen3.7-Plus
TypeText flagshipMultimodal (vision + text)
Context window1M tokens1M tokens
Intelligence Index v4.056.6TBD
Terminal-Bench Hard50.8%TBD
Humanity’s Last Exam38.1%TBD
CritPt13.4%TBD
Apex Math44.5TBD
MCP-Atlas76.4TBD
Arena AI Elo1,475TBD
Hallucination rate22.9% (AA-Omniscience)TBD
Pricing (input)$2.50/1M tokensTBD
Pricing (output)$7.50/1M tokensTBD
Open weightsNo (expected later)No
Local executionNot possible yetNot possible yet

Benchmark performance

Here’s how Qwen 3.7 Max stacks up against the current frontier:

ModelIntelligence Index v4.0Terminal-Bench HardHumanity’s Last ExamCritPtApex Math
GPT-5.560.2N/AN/AN/AN/A
Claude Opus 4.757.3N/AN/AN/AN/A
Gemini 3.1 Pro Preview57.2N/AN/AN/AN/A
Qwen 3.7 Max56.650.8%38.1%13.4%44.5
Gemini 3.5 Flash55.3N/AN/AN/AN/A

Qwen 3.7 Max is the #1 Chinese model on Intelligence Index v4.0 and beats Gemini 3.5 Flash by 1.3 points. On Apex Math (44.5), it surpasses Claude Opus 4.6 Max (34.5) by a wide margin.

The CritPt score of 13.4% is almost 4x the 3.6 result (3.7%), making it the strongest Chinese model on critical point reasoning.

Pricing

Qwen 3.7 Max is competitively priced:

ProviderInputOutput
DashScope (native)$2.50/1M tokens$7.50/1M tokens
OpenRouter$2.50/1M tokens$7.50/1M tokens

For comparison, Claude Opus 4.7 costs $15/$75 per million tokens. GPT-5.5 costs $10/$30. Qwen 3.7 Max delivers frontier-adjacent performance at a fraction of the price.

See our full API setup guide for step-by-step instructions.

Context window: 1 million tokens

Qwen 3.7 Max supports 1 million tokens of context, up from 256K on Qwen 3.6 Max. That’s roughly:

  • 750,000 words of text
  • An entire medium-sized codebase
  • Multiple books in a single prompt

This makes it viable for repository-level code analysis, long document processing, and multi-session agent memory without external retrieval.

Autonomous capabilities

Alibaba demonstrated a 35-hour autonomous operation session with Qwen 3.7 Max, executing 1,158 tool calls without human intervention. This positions it as a serious contender for long-running agent workflows.

Key agent metrics:

  • MCP-Atlas: 76.4 points (strong tool use and protocol adherence)
  • Arena AI Elo: 1,475 (#13 overall)
  • Sustained operation: 35 hours demonstrated
  • Tool calls: 1,158 in a single session

If you’re building agents that need to run for hours or days, Qwen 3.7 Max has the endurance and reliability to handle it.

Cross-harness support

Qwen 3.7 Max supports the Anthropic API protocol natively. This means it works directly with tools built for Claude, including Claude Code. You can point Claude Code at Qwen 3.7 Max and it works out of the box.

This is a big deal for developers who want to use Claude Code’s interface but prefer Qwen’s pricing or performance characteristics. No adapter layer needed.

Who should use Qwen 3.7

Use Qwen 3.7 Max if you:

  • Need frontier-level reasoning at budget pricing
  • Build long-running autonomous agents
  • Want 1M context for codebase analysis
  • Use Claude Code but want cheaper inference
  • Need low hallucination rates for factual tasks
  • Work with Chinese language content (strongest Chinese model)

Use Qwen 3.7 Plus if you:

  • Need vision/multimodal capabilities
  • Want image understanding alongside text reasoning

Skip it if you:

  • Need to run models locally (closed weights, see alternatives)
  • Need absolute top performance (GPT-5.5 and Claude Opus 4.7 still lead)
  • Require open-source licensing for compliance

Limitations

  • No open weights yet. Following the 3.6 pattern, open-weight variants will likely come weeks to months after the API launch.
  • API-only. You cannot run Qwen 3.7 locally. No GGUF, no Ollama, no vLLM support yet.
  • Closed weights. No fine-tuning, no self-hosting, no auditing the model.
  • Behind top 3. GPT-5.5 (60.2), Claude Opus 4.7 (57.3), and Gemini 3.1 Pro Preview (57.2) still score higher on Intelligence Index.
  • Arena ranking. Elo 1,475 places it #13 overall, suggesting real-world chat performance lags behind benchmark scores.

How to get started

  1. DashScope API: Sign up at dashscope.aliyuncs.com, get an API key, and start making requests. See our API guide.
  2. OpenRouter: Available as qwen/qwen3.7-max at $2.50/1M input. Works with any OpenAI-compatible client.
  3. Claude Code: Point your Claude Code installation at the Qwen 3.7 Max endpoint using the Anthropic protocol compatibility.

What changed from Qwen 3.6

For a detailed comparison, see Qwen 3.7 vs 3.6. The short version:

  • Context: 256K to 1M tokens
  • Terminal-Bench Hard: 43.9% to 50.8%
  • Humanity’s Last Exam: 28.9% to 38.1%
  • CritPt: 3.7% to 13.4% (almost 4x)
  • New: Anthropic API protocol support
  • New: 35-hour autonomous operation capability

How it compares to competitors

  • vs Gemini 3.5 Flash: Qwen 3.7 wins on Intelligence Index (56.6 vs 55.3) and math, Gemini wins on speed
  • vs Claude Opus 4.7: Claude leads on Intelligence Index (57.3 vs 56.6), Qwen wins massively on price ($2.50 vs $15 input)

FAQ

Is Qwen 3.7 free?

No. Qwen 3.7 Max costs $2.50/1M input tokens and $7.50/1M output tokens. There’s no free tier, but it’s significantly cheaper than Claude or GPT alternatives.

Can I run Qwen 3.7 locally?

Not yet. Both Max and Plus are closed-weights, API-only models. Open-weight variants are expected to follow based on Alibaba’s release pattern with 3.6. See our local guide for alternatives.

Does Qwen 3.7 work with Claude Code?

Yes. Qwen 3.7 Max supports the Anthropic API protocol natively, so it works as a drop-in backend for Claude Code without any adapter.

How does Qwen 3.7 compare to GPT-5.5?

GPT-5.5 scores 60.2 on Intelligence Index v4.0 vs Qwen 3.7’s 56.6. GPT-5.5 is stronger overall, but costs 4x more ($10/1M input vs $2.50/1M input).

What’s the context window?

1 million tokens. That’s roughly 750,000 words or an entire medium codebase in a single prompt.

Is Qwen 3.7 the best Chinese AI model?

Yes. It’s #1 among Chinese models on Intelligence Index v4.0 (56.6) and CritPt (13.4%). It’s also the first Chinese model to break into the top 5 overall on Intelligence Index.

When will open weights be released?

No official date. Based on the 3.6 pattern (API first, open weights weeks later), expect open-weight variants sometime in June or July 2026.

What’s the hallucination rate?

22.9% on AA-Omniscience, which is the lowest among all frontier models tested. This makes it particularly suitable for factual retrieval and knowledge-intensive tasks.