Apr 27, 2026 · 11 min read

Qwen 3.6 Max Preview: Alibaba's New Flagship Tops 6 Coding Benchmarks (2026)

Q: What is preserve_thinking and do I need it?

`preserve_thinking` is a model-level feature that carries internal reasoning traces across conversation turns. It is most valuable for agentic coding workflows where a model iterates through plan-execute-observe loops. If you are using Max Preview for single-turn tasks like code completion or translation, you do not need it. The feature adds token overhead since reasoning traces count toward your context window. If you are building coding agents or multi-step pipelines, enable it for significantly better coherence. Developers report fewer contradictions, better error recovery, and more consistent architectural decisions across long conversations when the feature is active.

Alibaba just dropped its strongest model yet. Qwen 3.6 Max Preview is a closed-weights proprietary flagship that tops six coding benchmarks, scores 52 on the AA Intelligence Index (the highest of any Chinese model), and introduces a preserve_thinking feature built for agentic coding workflows. Released on April 20, 2026, it runs a Mixture-of-Experts architecture with 35B total parameters and only 3B active at inference time, keeping latency low while pushing benchmark scores past every competitor in its class.

If you have been following the Qwen 3.6 complete guide, Max Preview sits at the top of the lineup. It shares the same MoE skeleton as the open-weight 35B-A3B variant but benefits from proprietary training data, longer RLHF runs, and closed-weight optimizations that Alibaba reserves for its API-only tier.

This guide covers architecture, benchmarks, pricing, the preserve_thinking feature, and how Max Preview compares to Plus, the open-weight 27B, and DeepSeek V4 Pro.

Architecture: MoE 35B Total, 3B Active

Qwen 3.6 Max Preview uses the same Mixture-of-Experts topology found in the open-weight Qwen 3.6-27B and 35B-A3B releases, but with key differences under the hood:

Total parameters: 35 billion
Active parameters per token: 3 billion
Context window: 256K tokens
Architecture type: Mixture-of-Experts (MoE), transformer-based
Weights: Closed (proprietary, API-only)
Training: Extended RLHF with proprietary data pipelines not available in the open-weight releases

The 35B/3B split means that only a small fraction of the network fires for each token. This keeps per-token compute costs comparable to a dense 3B model while giving the router access to a much larger pool of specialized expert layers. The result is strong performance at a fraction of the cost you would expect from a 35B dense model.

The routing mechanism selects a small subset of experts for each token based on learned gating functions. This sparse activation pattern is what makes MoE models so efficient: you get the representational capacity of 35B parameters without paying the compute cost of running all of them on every forward pass. For coding tasks, this means the model can maintain deep specialization across different programming languages, frameworks, and problem types without sacrificing inference speed.

Compared to the open-weight 35B-A3B, Max Preview uses the same expert count and routing strategy but has been trained longer on curated proprietary datasets. Alibaba has not disclosed the exact training corpus, but internal benchmarks suggest meaningfully better code generation, multi-step reasoning, and instruction following. The closed-weight nature allows Alibaba to incorporate data sources that cannot be redistributed under open licenses, giving Max Preview a training advantage that the open-weight variants cannot replicate.

The 256K context window matches the longest context available in the Qwen 3.6 family, including the Flash variant. This makes Max Preview suitable for large codebase analysis, long document summarization, and multi-file agentic tasks. In practice, 256K tokens can hold roughly 500 to 700 files of typical source code, making it possible to load an entire small-to-medium repository into a single prompt for holistic analysis.

Benchmark Results: Tops 6 Coding Benchmarks

Max Preview leads on six coding-focused benchmarks and achieves the highest AA Intelligence Index score of any Chinese-origin model.

Benchmark	Qwen 3.6 Max Preview	Qwen 3.6 Plus	Qwen 3.6-27B	DeepSeek V4 Pro
SWE-bench Pro	#1	#4	#6	#2
Terminal-Bench 2.0	#1	#5	#7	#3
SkillsBench	#1	#3	#5	#2
NL2Repo	#1	#4	#8	#2
QwenClawBench	#1	#2	#4	#3
QwenWebBench	#1	#3	#5	#2
AA Intelligence Index	52	44	40	49

Here is what each benchmark measures:

SWE-bench Pro: Real-world GitHub issue resolution across popular open-source repositories. Max Preview resolves more issues end-to-end than any other model tested.
Terminal-Bench 2.0: Multi-step terminal command generation and execution. Tests the model’s ability to chain shell commands, handle errors, and complete system administration tasks.
SkillsBench: Fine-grained coding skill evaluation across languages, paradigms, and difficulty levels.
NL2Repo: Natural language to full repository generation. Measures the model’s ability to scaffold complete project structures from high-level descriptions.
QwenClawBench: Alibaba’s internal benchmark for agentic tool use in coding environments.
QwenWebBench: Web development tasks including frontend generation, API design, and full-stack scaffolding.

The AA Intelligence Index score of 52 places Max Preview above DeepSeek V4 Pro (49) and well ahead of the open-weight Qwen 3.6-27B (40). For context on how this fits into the broader Chinese AI landscape, see the best Chinese AI models roundup.

preserve_thinking: Reasoning Traces Across Multi-Turn Conversations

One of the most significant features in Max Preview is preserve_thinking. This is not a prompt engineering trick. It is a model-level capability that maintains internal reasoning traces across multiple conversation turns.

In standard chat models, the chain-of-thought reasoning generated during turn 1 is discarded before turn 2 begins. The model starts fresh each time, losing the logical scaffolding it built to arrive at its previous answer. This is a major problem for agentic coding, where a model might need to:

Analyze a codebase in turn 1
Propose a fix in turn 2
Refine the fix based on test output in turn 3
Handle a follow-up edge case in turn 4

With preserve_thinking, Max Preview carries its reasoning context forward. The model retains awareness of why it made earlier decisions, what alternatives it considered, and what constraints it identified. This leads to more coherent multi-step problem solving and fewer contradictions across turns.

To enable it, pass preserve_thinking: true in the API request. The feature adds a small overhead to token usage (reasoning tokens are counted toward context) but dramatically improves consistency in agentic pipelines.

Here is a minimal example of enabling the feature via the API:

{
  "model": "qwen/qwen-3.6-max-preview",
  "messages": [...],
  "preserve_thinking": true
}

When enabled, the model’s reasoning tokens from previous turns are retained in the conversation history. You can inspect these traces for debugging or logging purposes, which adds transparency to agentic decision-making.

This is particularly valuable for coding agents that operate in loops: plan, execute, observe, revise. Without preserved reasoning, each iteration starts from scratch. With it, the model builds on its own prior analysis. Early reports from developers using Max Preview in agentic pipelines suggest that preserve_thinking reduces contradictory outputs by a significant margin in conversations that exceed five turns.

Pricing on OpenRouter

Max Preview is available through OpenRouter with straightforward pricing:

Tier	Input Price (per 1M tokens)	Output Price (per 1M tokens)
Standard (up to 256K context)	$1.30	$1.30
Extended (above 256K context)	Tiered pricing applies	Tiered pricing applies

At $1.30 per million input tokens, Max Preview is competitively priced for a flagship model. For comparison, DeepSeek V4 Pro charges slightly more for equivalent context lengths, and proprietary Western models like GPT-5 and Claude Opus 4 sit at higher price points.

The tiered pricing above 256K context means costs increase as you push past the standard window. For most coding tasks, 256K is more than sufficient to hold an entire medium-sized repository in context.

Keep in mind that preserve_thinking tokens count toward your context usage. In long multi-turn sessions, reasoning traces can accumulate and push you into higher pricing tiers faster. Monitor your token usage if you are running extended agentic sessions with the feature enabled.

Closed-Weights vs Open-Weights: Alibaba’s Dual Strategy

Alibaba runs two parallel tracks with Qwen 3.6:

Open-weight models (27B, 35B-A3B, Flash): Released under Apache 2.0. You can download, fine-tune, and deploy them anywhere. See the Qwen 3.6-27B guide and Flash guide for details.
Closed-weight models (Max Preview, Plus): API-only. Better training, proprietary data, features like preserve_thinking. No self-hosting.

This mirrors the strategy used by other labs. Meta releases Llama openly while keeping internal models proprietary. Alibaba does the same: the open-weight 27B builds community adoption and ecosystem tooling, while Max Preview captures revenue from developers who need the best possible performance and are willing to pay per token.

The gap between Max Preview and the open-weight 27B is meaningful. On the AA Intelligence Index, Max Preview scores 52 versus 40 for the 27B. On SWE-bench Pro, the difference is even starker. If you need top-tier coding performance and can use an API, Max Preview is the clear choice. If you need to self-host, fine-tune, or run offline, the 27B under Apache 2.0 is the best open option in the Qwen family.

This dual approach also benefits the ecosystem. Open-weight models attract researchers, hobbyists, and startups who build tooling, fine-tunes, and integrations. That community activity feeds back into Alibaba’s understanding of how Qwen models are used, which informs improvements to the closed-weight flagship. The open models serve as an on-ramp; the closed models serve as the premium tier.

For a detailed breakdown of how the entire 3.6 lineup compares to the previous generation, see Qwen 3.6 vs 3.5.

Comparison: Max Preview vs Plus vs 27B vs DeepSeek V4 Pro

Feature	Qwen 3.6 Max Preview	Qwen 3.6 Plus	Qwen 3.6-27B	DeepSeek V4 Pro
Parameters (total)	35B MoE	Undisclosed	27B dense	Undisclosed MoE
Active parameters	3B	Undisclosed	27B	Undisclosed
Context window	256K	128K	128K	256K
Weights	Closed	Closed	Open (Apache 2.0)	Closed
preserve_thinking	Yes	No	No	No
AA Intelligence Index	52	44	40	49
SWE-bench Pro rank	#1	#4	#6	#2
Pricing (input/1M)	$1.30	$0.80	Self-host	$1.50
Best for	Top-tier agentic coding	Cost-effective API use	Self-hosted / fine-tuning	General flagship tasks

Max Preview vs Plus: Max Preview is the stronger model across every benchmark. Plus is cheaper and still capable, making it a good choice for high-volume tasks where marginal quality differences matter less than cost. Max Preview pulls ahead on complex multi-step coding and agentic workflows, especially with preserve_thinking enabled. If your workload involves straightforward code generation, completions, or simple Q&A, Plus at $0.80 per million tokens offers strong value. If your workload involves debugging complex issues, generating entire repositories, or running multi-turn agent loops, the extra $0.50 per million tokens for Max Preview pays for itself in output quality.

Max Preview vs 27B: The 27B is open-weight and free to self-host, but it trails Max Preview by 12 points on the AA Intelligence Index and ranks significantly lower on coding benchmarks. Choose 27B if you need Apache 2.0 licensing, fine-tuning, or offline deployment. Choose Max Preview if you want the best results and can use an API. The 27B is also a dense model (all 27B parameters active per token), which means it requires more compute per token than Max Preview’s 3B active parameters. Self-hosting the 27B requires a capable GPU setup, while Max Preview offloads all compute to Alibaba’s infrastructure.

Max Preview vs DeepSeek V4 Pro: This is the closest competition. DeepSeek V4 Pro scores 49 on the AA Intelligence Index (vs 52 for Max Preview) and takes second place on most of the same coding benchmarks. V4 Pro costs slightly more at $1.50 per million input tokens. Max Preview’s preserve_thinking feature gives it a distinct advantage in multi-turn agentic scenarios. DeepSeek V4 Pro may still be preferable for certain general-purpose tasks where its training data mix provides an edge. Both models support 256K context, so the choice comes down to benchmark performance, pricing, and whether preserve_thinking matters for your use case.

FAQ

Is Qwen 3.6 Max Preview open-source?

No. Max Preview is a closed-weights proprietary model available only through Alibaba’s API and third-party providers like OpenRouter. You cannot download the weights, inspect the architecture details, or run it on your own hardware.

If you need an open-weight Qwen model, the Qwen 3.6-27B is released under Apache 2.0 and can be downloaded, fine-tuned, and self-hosted. The open-weight 35B-A3B variant shares the same MoE architecture as Max Preview but uses publicly available training data.

What is preserve_thinking and do I need it?

preserve_thinking is a model-level feature that carries internal reasoning traces across conversation turns. It is most valuable for agentic coding workflows where a model iterates through plan-execute-observe loops.

If you are using Max Preview for single-turn tasks like code completion or translation, you do not need it. The feature adds token overhead since reasoning traces count toward your context window.

If you are building coding agents or multi-step pipelines, enable it for significantly better coherence. Developers report fewer contradictions, better error recovery, and more consistent architectural decisions across long conversations when the feature is active.

How does Max Preview compare to Western flagship models?

Max Preview’s AA Intelligence Index score of 52 and its #1 rankings on six coding benchmarks place it in direct competition with the top Western proprietary models. Its pricing at $1.30 per million input tokens is notably lower than most Western alternatives.

The main trade-off is that Max Preview is closed-weight with no self-hosting option, and its training data composition is not publicly disclosed. Latency may also vary depending on your geographic location relative to Alibaba’s inference infrastructure.

For a broader comparison across Chinese and Western models, see the best Chinese AI models guide.

Who Should Use Qwen 3.6 Max Preview

Max Preview is built for developers and teams who need the strongest possible coding model and are comfortable using an API. The ideal use cases include:

Agentic coding pipelines: The combination of 256K context, preserve_thinking, and top-tier benchmark scores makes Max Preview the best current choice for autonomous coding agents that plan, execute, and iterate.
Large codebase analysis: With 256K tokens of context, you can load substantial portions of a codebase for refactoring analysis, security audits, or migration planning.
Complex debugging: Max Preview’s strong performance on SWE-bench Pro means it excels at understanding real-world codebases and generating correct fixes for non-trivial bugs.
Repository generation: The #1 ranking on NL2Repo makes it a strong choice for scaffolding new projects from natural language descriptions.

If your needs are simpler (code completion, short Q&A, lightweight generation), Qwen 3.6 Plus offers 80% of the capability at 60% of the price. If you need to self-host, the Qwen 3.6-27B under Apache 2.0 is the way to go.

Max Preview is still labeled “Preview,” which means Alibaba may update the model weights, adjust pricing, or change feature availability before a stable release. For production workloads, factor in the possibility of breaking changes during the preview period.

Qwen 3.6 Max Preview: Alibaba's New Flagship Tops 6 Coding Benchmarks (2026)

Architecture: MoE 35B Total, 3B Active

Benchmark Results: Tops 6 Coding Benchmarks

preserve_thinking: Reasoning Traces Across Multi-Turn Conversations

Pricing on OpenRouter

Closed-Weights vs Open-Weights: Alibaba’s Dual Strategy

Comparison: Max Preview vs Plus vs 27B vs DeepSeek V4 Pro

FAQ

Is Qwen 3.6 Max Preview open-source?

What is preserve_thinking and do I need it?

How does Max Preview compare to Western flagship models?

Who Should Use Qwen 3.6 Max Preview

📬 AI Dev Weekly

You might also like

Qwen 3.6 Flash Complete Guide: Fast 1M-Context Model for $0.25/1M Input (2026)

Qwen 3.6-27B Complete Guide: 77.2% SWE-bench in a 27B Dense Model (2026)

Qwen 3.6-35B-A3B: 73.4% SWE-bench With Only 3B Active Params — Runs on a Laptop (2026)

Qwen 3.6 Plus: Free 1M Context Model That Beats GPT-5 on Coding (2026)