Qwen 3.6-35B-A3B: Open-Weight Agentic Coding Model That Runs on Your Laptop (2026)
Alibaba released Qwen 3.6-35B-A3B on April 15, 2026 — the first open-weight model in the Qwen 3.6 generation. It’s a sparse Mixture-of-Experts model with 35 billion total parameters but only 3 billion active at inference time. That means it runs on consumer hardware while delivering coding performance that competes with dense models 10x its active size.
The model is Apache 2.0 licensed, supports 262K native context (extensible to 1M via YaRN), and includes vision capabilities. It’s already trending on HuggingFace with 540+ likes and 21K+ downloads in its first two days.
Quick specs
| Model | Qwen3.6-35B-A3B |
| Developer | Alibaba (Tongyi Lab / Qwen Team) |
| Release date | April 15, 2026 |
| Total parameters | 35B |
| Active parameters | 3B (sparse MoE — 256 experts, 8 routed + 1 shared) |
| Context window | 262,144 tokens native (extensible to 1,010,000 via YaRN) |
| Architecture | Hybrid: Gated DeltaNet (linear attention) + Gated Attention + MoE |
| Vision | Yes (image + video) |
| License | Apache 2.0 (fully open, commercial use allowed) |
| GGUF quantized size | ~21 GB (Q4_K_S) |
| Thinking mode | On by default (chain-of-thought), can be disabled |
Benchmark results
The headline: 73.4% on SWE-bench Verified with only 3B active parameters. For context, the dense Qwen 3.5-27B (with 27B active) scores 75.0%. Qwen 3.6-35B-A3B gets 98% of the way there at roughly 1/9th the compute.
| Benchmark | Qwen 3.6-35B-A3B | Gemma 4-31B | Qwen 3.5-27B (dense) |
|---|---|---|---|
| SWE-bench Verified | 73.4% | 52.0% | 75.0% |
| SWE-bench Pro | 49.5% | 35.7% | 51.2% |
| SWE-bench Multilingual | 67.2% | 51.7% | 69.3% |
| Terminal-Bench 2.0 | 51.6% | 42.9% | 41.6% |
| AIME 2026 | 92.7% | 89.2% | 92.6% |
| GPQA Diamond | 86.0% | — | — |
| MMLU-Pro | 85.2% | — | — |
The Terminal-Bench 2.0 score (51.6%) is particularly notable — it actually beats the dense Qwen 3.5-27B (41.6%) by a wide margin. This suggests the 3.6 generation made significant improvements in terminal/CLI task handling.
Why 3B active parameters matters
The MoE architecture is the key story here. With 256 experts and only 8+1 activated per token, Qwen 3.6-35B-A3B gives you the learned capacity of a 35B model at the inference cost of a ~3B model. In practice:
- Runs on a MacBook Pro — Simon Willison ran the Q4_K_S quantized version (~21 GB) on his M5 MacBook via LM Studio
- Runs on a single GPU — fits in 24 GB VRAM with quantization
- Fast inference — 3B active parameters means response times comparable to small models
- Full commercial use — Apache 2.0 means no restrictions
For comparison, running Claude Opus 4.7 requires an API call at $5/$25 per million tokens. Running Qwen 3.6-35B-A3B locally costs you electricity.
What it’s good at
Agentic coding. The model was specifically optimized for repository-level coding tasks — understanding entire codebases, planning multi-file changes, and executing through tool calls. The SWE-bench scores confirm this isn’t marketing.
Frontend generation. Qwen highlights frontend workflows as a specific strength. The model handles HTML/CSS/JS generation from natural language with improved fluency over the 3.5 generation.
Thinking preservation. A new feature in 3.6: the model can retain reasoning context from previous messages in multi-turn conversations. This reduces redundant reasoning in agentic loops and can actually lower total token consumption.
Vision. Unlike many coding-focused models, this one includes a vision encoder. It can process images and video alongside text — useful for UI-to-code workflows or diagram interpretation.
Math and reasoning. 92.7% on AIME 2026 and 86.0% on GPQA Diamond put it in frontier territory for reasoning, despite the small active parameter count.
How to run it locally
With LM Studio (easiest)
- Download LM Studio
- Search for
Qwen3.6-35B-A3B - Download the
Q4_K_Squantization (~21 GB) by Unsloth - Load and chat
With Ollama
ollama run qwen3.6:35b-a3b
With vLLM (for serving)
uv pip install vllm --torch-backend=auto
vllm serve Qwen/Qwen3.6-35B-A3B \
--port 8000 \
--tensor-parallel-size 1 \
--max-model-len 262144 \
--reasoning-parser qwen3
With SGLang
python -m sglang.launch_server \
--model-path Qwen/Qwen3.6-35B-A3B \
--port 8000 \
--mem-fraction-static 0.8 \
--context-length 262144 \
--reasoning-parser qwen3
Recommended sampling parameters
Qwen provides specific recommendations depending on your use case:
| Mode | Temperature | top_p | top_k | presence_penalty |
|---|---|---|---|---|
| Thinking (general) | 1.0 | 0.95 | 20 | 1.5 |
| Thinking (precise coding) | 0.6 | 0.95 | 20 | 0.0 |
| Non-thinking (general) | 0.7 | 0.8 | 20 | 1.5 |
| Non-thinking (reasoning) | 1.0 | 0.95 | 20 | 1.5 |
Set max output to 32,768 tokens for most tasks, or 81,920 for complex math/programming problems.
Qwen 3.6-35B-A3B vs Qwen 3.6 Plus
Don’t confuse the two. Qwen 3.6 Plus is the proprietary API model (78.8% SWE-bench Verified). Qwen 3.6-35B-A3B is the open-weight variant — smaller, runs locally, but still impressively capable at 73.4% SWE-bench Verified.
| Qwen 3.6 Plus | Qwen 3.6-35B-A3B | |
|---|---|---|
| Access | API only | Open weights (Apache 2.0) |
| SWE-bench Verified | 78.8% | 73.4% |
| Run locally | No | Yes (~21 GB quantized) |
| Price | Per-token API pricing | Free (self-hosted) |
| Context | 1M tokens | 262K native (1M via YaRN) |
The pelican test
Simon Willison — one of the most respected voices in the AI developer community — tested Qwen 3.6-35B-A3B against Claude Opus 4.7 on his famous “pelican riding a bicycle” SVG benchmark. Running the quantized model on his MacBook, Qwen produced a better pelican than Anthropic’s flagship $5/$25-per-million-token model.
As Willison notes, this doesn’t mean Qwen 3.6-35B-A3B is “better” than Opus 4.7 in any general sense. But it does illustrate something important: a 21 GB model running on a laptop can produce results that compete with — and sometimes beat — the most expensive proprietary models on specific tasks.
The post hit 363 points on Hacker News. The open-source-vs-frontier conversation is alive and well.
Who should use this model
Use Qwen 3.6-35B-A3B if you:
- Want a strong coding model that runs locally with no API costs
- Need Apache 2.0 licensing for commercial products
- Work on a laptop or single-GPU setup
- Want privacy — your code never leaves your machine
- Need a capable model for agentic workflows on a budget
Stick with Opus 4.7 or Qwen 3.6 Plus if you:
- Need the absolute best coding performance (Opus 4.7: 64.3% SWE-bench Pro vs Qwen 3.6-35B-A3B: 49.5%)
- Work with very long contexts (1M tokens natively)
- Need the best vision capabilities (Opus 4.7’s 3.75MP is hard to beat)
- Don’t want to manage local infrastructure
Bottom line
Qwen 3.6-35B-A3B is the most capable open-weight model you can run on consumer hardware right now. 73.4% on SWE-bench Verified with 3B active parameters is remarkable efficiency. The Apache 2.0 license removes all commercial barriers. And the fact that it can beat Opus 4.7 on SVG generation while running on a laptop is the kind of result that keeps the open-source AI movement exciting.
If you’re building products that need local AI inference, this should be on your evaluation list.
See also: Qwen 3.6 Plus Guide · How to Run Qwen 3.5 Locally · Best AI Models for Coding Locally · AI Model Comparison