Qwen 3.6 Plus is Alibaba’s latest flagship model, released March 30, 2026. It features a 1M token context window, hybrid linear attention + MoE architecture, and scores 78.8% on SWE-bench Verified. It’s currently free on OpenRouter.
Key specs
| Spec | Value |
|---|---|
| Developer | Alibaba (Tongyi Lab) |
| Release date | March 30, 2026 |
| Architecture | Hybrid linear attention + sparse MoE |
| Context window | 1M tokens (256K native, extended via YaRN) |
| Max output | 65,536 tokens |
| Chain-of-thought | Always-on |
| SWE-bench Verified | 78.8% |
| Terminal-Bench 2.0 | 61.6% |
| MCPMark | 48.2% |
| Price (OpenRouter) | Free (preview) |
| Price (Aliyun) | Standard Qwen Plus pricing |
| Open weights | Not yet (API-only) |
| License | Proprietary (API access) |
What it’s good at
Qwen 3.6 Plus was specifically optimized for agentic coding workflows:
- Repository-level coding — understands entire codebases, not just single files
- Front-end generation — HTML/CSS/JS from natural language descriptions
- Code repair — finds and fixes bugs across multiple files
- Terminal automation — executes commands and interprets output
- Tool calling — reliable MCP and function calling
- Long document analysis — 1M context handles entire books, transcripts, or codebases
What it’s not good at
- Not available locally — no Ollama or GGUF downloads yet
- Not a chat model — optimized for technical tasks, not conversational AI
- Peak hour limitations — Aliyun API may have rate limits during high demand
- Multimodal — text-only for now (Qwen 3.5 Omni handles vision/audio)
How to use it
OpenRouter (free, easiest)
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key",
)
response = client.chat.completions.create(
model="qwen/qwen3.6-plus:free",
messages=[
{"role": "system", "content": "You are a senior software engineer."},
{"role": "user", "content": "Review this code for security issues:\n\n```python\n...```"}
],
max_tokens=65536,
)
print(response.choices[0].message.content)
Sign up at openrouter.ai for a free API key.
Aliyun BaiLian API (production)
For production use with SLAs and guaranteed rate limits:
from openai import OpenAI
client = OpenAI(
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
api_key="your-aliyun-key",
)
response = client.chat.completions.create(
model="qwen-plus-2026-0330",
messages=[{"role": "user", "content": "..."}],
)
With Aider
# Via OpenRouter
aider --model openrouter/qwen/qwen3.6-plus:free
# Via Aliyun
export OPENAI_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
export OPENAI_API_KEY=your-aliyun-key
aider --model qwen-plus-2026-0330
See our Aider Complete Guide for full setup.
With Continue.dev
In your Continue config, add:
{
"models": [{
"provider": "openai",
"model": "qwen/qwen3.6-plus:free",
"apiBase": "https://openrouter.ai/api/v1",
"apiKey": "your-openrouter-key"
}]
}
See our Continue.dev Guide for full setup.
Benchmarks compared
| Model | SWE-bench | Terminal-Bench | MCPMark | Context | Price |
|---|---|---|---|---|---|
| Qwen 3.6 Plus | 78.8% | 61.6% | 48.2% | 1M | Free* |
| Claude Opus 4.5 | ~80% | 59.3% | ~50% | 200K | $15/$75 |
| Claude Sonnet 4.6 | ~75% | ~55% | ~45% | 200K | $3/$15 |
| GPT-5 | ~72% | ~52% | ~40% | 128K | $5/$15 |
| DeepSeek R1 | ~70% | ~48% | ~35% | 128K | $0.55/$2.19 |
| Gemini 2.5 Pro | ~73% | ~50% | ~42% | 1M | $1.25/$10 |
*Free on OpenRouter preview. Production pricing via Aliyun.
Qwen 3.6 Plus is the only model that beats Claude Opus 4.5 on Terminal-Bench while being free. The SWE-bench score of 78.8% puts it in the top tier alongside Claude.
The preserve_thinking parameter
Qwen 3.6 Plus introduces a preserve_thinking parameter for agent workflows. When enabled, the model’s chain-of-thought reasoning is included in the response, letting you see why it made specific decisions:
response = client.chat.completions.create(
model="qwen/qwen3.6-plus:free",
messages=[{"role": "user", "content": "Debug this error..."}],
extra_body={"preserve_thinking": True},
)
This is useful for debugging agent loops and understanding why the model chose a specific approach.
Architecture: hybrid linear attention + MoE
The key innovation in Qwen 3.6 Plus is combining two efficiency techniques:
- Linear attention — reduces the quadratic cost of standard attention to linear, enabling the 1M context window without proportional memory increase
- Sparse MoE — only activates a subset of parameters per token, keeping inference fast despite the large total parameter count
The result is a model that’s roughly 3x faster than Claude Opus 4.6 in community benchmarks while maintaining competitive quality.
Pricing
| Provider | Input | Output | Free tier |
|---|---|---|---|
| OpenRouter | Free (preview) | Free (preview) | Yes |
| Aliyun BaiLian | Standard Qwen Plus rates | Standard rates | Trial credits |
The free OpenRouter preview won’t last forever. For production use, set up the Aliyun API now so you’re ready when the preview ends.
vs Qwen 3.5
See our detailed Qwen 3.6 vs 3.5 comparison for the full breakdown. The short version: 3.6 Plus is better at everything, but it’s API-only. If you need to run locally, Qwen 3.5 is still your best option.
Related: Qwen 3.6 vs 3.5 — What Changed · How to Run Qwen 3.5 Locally · OpenRouter Complete Guide · Aider Complete Guide · Best Open Source Coding Models