May 22, 2026 · 5 min read

Qwen 3.7 vs 3.6: What Changed and Should You Upgrade?

Alibaba released Qwen 3.7 on May 20-21, 2026, just one month after Qwen 3.6. The monthly cadence continues: 3.5 in March, 3.6 in April, 3.7 in May.

The question: is it worth upgrading? Short answer: yes, if you’re using the API. The improvements are substantial across every benchmark, the context window quadrupled, and you get new capabilities like Anthropic protocol support.

Benchmark comparison

Benchmark	Qwen 3.6 Max	Qwen 3.7 Max	Change
Intelligence Index v4.0	~52 (estimated)	56.6	+4.6
Terminal-Bench Hard	43.9%	50.8%	+6.9%
Humanity’s Last Exam	28.9%	38.1%	+9.2%
CritPt	3.7%	13.4%	+9.7% (3.6x)
Apex Math	N/A	44.5	New benchmark
MCP-Atlas	N/A	76.4	New benchmark
Arena AI Elo	N/A	1,475 (#13)	New ranking
Hallucination (AA-Omniscience)	N/A	22.9% (lowest)	New metric

Every single benchmark shows meaningful improvement. The CritPt jump from 3.7% to 13.4% is the standout, nearly 4x improvement in critical point reasoning. Terminal-Bench Hard went from 43.9% to 50.8%, crossing the 50% threshold for the first time.

Context window

	Qwen 3.6 Max	Qwen 3.7 Max
Context window	256K tokens	1M tokens
Improvement	N/A	4x

This is the single biggest practical upgrade. Going from 256K to 1M tokens means you can now fit entire codebases, full documentation sets, or multiple long documents in a single prompt without chunking or retrieval.

For agent workflows, this means longer conversation histories, more tool call results, and less context management overhead.

Architecture changes

Both models are closed-weights, so internal architecture details are limited. What we know:

Qwen 3.6: Hybrid linear attention + sparse MoE, available in multiple sizes (35B-A3B, 27B dense, Plus, Max Preview, Flash)
Qwen 3.7: Two variants only (Max and Plus), likely evolved architecture optimized for longer context and autonomous operation

The 3.7 release focuses on the flagship API models rather than the open-weight ecosystem. Alibaba appears to be shipping the API first and following up with open weights later.

New capabilities in 3.7

Anthropic API protocol support

Qwen 3.7 Max natively supports the Anthropic API protocol. This means tools built for Claude (including Claude Code) work directly with Qwen 3.7 without any adapter or translation layer.

This didn’t exist in 3.6. It’s a strategic move that lets developers use Qwen as a drop-in replacement for Claude in existing toolchains.

35-hour autonomous operation

Alibaba demonstrated Qwen 3.7 Max running autonomously for 35 hours, executing 1,158 tool calls. This is a new capability class. While 3.6 supported tool calling, the sustained autonomous operation at this scale is new.

Lower hallucination rate

22.9% on AA-Omniscience is the lowest among frontier models. This wasn’t a tracked metric for 3.6, but the improvement in factual reliability is notable for production use cases.

Pricing comparison

	Qwen 3.6 Max Preview	Qwen 3.7 Max
Input	Standard Aliyun pricing	$2.50/1M tokens
Output	Standard Aliyun pricing	$7.50/1M tokens
OpenRouter	Free (preview)	$2.50/1M input

Qwen 3.6 Plus was available free on OpenRouter during its preview period. Qwen 3.7 Max is a paid model from day one at $2.50/$7.50 per million tokens. This is still extremely competitive compared to Western frontier models.

Model variants comparison

Variant	Qwen 3.6	Qwen 3.7
Max/Flagship	Max Preview	Max
Plus/Mid-tier	Plus (free preview)	Plus (multimodal)
Flash/Speed	Flash	Not yet
Open-weight large	35B-A3B (Apache 2.0)	Not yet
Open-weight dense	27B	Not yet

Qwen 3.6 had a broader model family at this point in its lifecycle. Qwen 3.7 launched with just Max and Plus, with open-weight variants expected to follow.

Should you upgrade?

Upgrade if you:

Use Qwen 3.6 Max/Plus via API and want better performance
Need more than 256K context
Build autonomous agents that run for extended periods
Want to use Qwen with Claude Code or other Anthropic-protocol tools
Need lower hallucination rates for factual tasks

Stay on 3.6 if you:

Run models locally (3.7 has no open weights yet, 3.6 35B-A3B and 27B still work)
Need the free tier (3.6 Plus on OpenRouter may still be free)
Have workflows that depend on specific 3.6 behavior and can’t risk regression

Migration notes

API endpoint changes

If you’re using DashScope, update your model parameter:

# Before (3.6)
model = "qwen-max-preview"

# After (3.7)
model = "qwen3.7-max"

OpenRouter

# Before (3.6)
model = "qwen/qwen-max-preview"

# After (3.7)
model = "qwen/qwen3.7-max"

Behavior differences

Output style may differ slightly. Test your prompts before switching production traffic.
The 1M context window means you can send larger payloads, but costs scale with token count.
Tool calling format is compatible but may have improved reliability.

For full API setup instructions, see our Qwen 3.7 API guide.

The bigger picture

Alibaba’s monthly release cadence is aggressive. Each version brings meaningful improvements:

3.5 (March 2026): Established Qwen as a serious coding model
3.6 (April 2026): Open weights, 35B-A3B, free API preview
3.7 (May 2026): Frontier performance, 1M context, autonomous agents

At this pace, Qwen 3.8 could arrive in June. The gap between Qwen and the top 3 (GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro) is narrowing with each release.

For a complete overview of Qwen 3.7’s capabilities, see our complete guide.

FAQ

Is Qwen 3.7 backward compatible with 3.6 API calls?

The API format is compatible (OpenAI-style), but you need to update the model name. Prompts that worked with 3.6 will work with 3.7, though outputs may differ.

Can I still use Qwen 3.6?

Yes. Qwen 3.6 models remain available. The open-weight variants (35B-A3B, 27B) are still the best option for local deployment.

Is the 1M context window real or just marketing?

It’s a real 1M token context window. Whether performance degrades at the edges (common with very long contexts) remains to be tested at scale, but the capability is there.

Why did Alibaba skip open weights for 3.7?

They didn’t skip them permanently. Following the 3.6 pattern, open-weight variants will likely come weeks after the API launch. Alibaba ships API first to monetize, then releases open weights for community adoption.

How much faster is 3.7 than 3.6?

Speed benchmarks haven’t been published yet. The focus of 3.7 is on capability (longer context, better reasoning, autonomous operation) rather than raw inference speed.

Does 3.7 replace 3.6 for coding tasks?

For API users, yes. Qwen 3.7 Max is strictly better than 3.6 Max on every published benchmark. For local users, 3.6 remains the only option until 3.7 open weights drop.