Apr 17, 2026 · 6 min read

Qwen 3.6-35B-A3B: Open-Weight Agentic Coding Model That Runs on Your Laptop (2026)

Alibaba released Qwen 3.6-35B-A3B on April 15, 2026 — the first open-weight model in the Qwen 3.6 generation. It’s a sparse Mixture-of-Experts model with 35 billion total parameters but only 3 billion active at inference time. That means it runs on consumer hardware while delivering coding performance that competes with dense models 10x its active size.

The model is Apache 2.0 licensed, supports 262K native context (extensible to 1M via YaRN), and includes vision capabilities. It’s already trending on HuggingFace with 540+ likes and 21K+ downloads in its first two days.

Quick specs

Model	Qwen3.6-35B-A3B
Developer	Alibaba (Tongyi Lab / Qwen Team)
Release date	April 15, 2026
Total parameters	35B
Active parameters	3B (sparse MoE — 256 experts, 8 routed + 1 shared)
Context window	262,144 tokens native (extensible to 1,010,000 via YaRN)
Architecture	Hybrid: Gated DeltaNet (linear attention) + Gated Attention + MoE
Vision	Yes (image + video)
License	Apache 2.0 (fully open, commercial use allowed)
GGUF quantized size	~21 GB (Q4_K_S)
Thinking mode	On by default (chain-of-thought), can be disabled

Benchmark results

The headline: 73.4% on SWE-bench Verified with only 3B active parameters. For context, the dense Qwen 3.5-27B (with 27B active) scores 75.0%. Qwen 3.6-35B-A3B gets 98% of the way there at roughly 1/9th the compute.

Benchmark	Qwen 3.6-35B-A3B	Gemma 4-31B	Qwen 3.5-27B (dense)
SWE-bench Verified	73.4%	52.0%	75.0%
SWE-bench Pro	49.5%	35.7%	51.2%
SWE-bench Multilingual	67.2%	51.7%	69.3%
Terminal-Bench 2.0	51.6%	42.9%	41.6%
AIME 2026	92.7%	89.2%	92.6%
GPQA Diamond	86.0%	—	—
MMLU-Pro	85.2%	—	—

The Terminal-Bench 2.0 score (51.6%) is particularly notable — it actually beats the dense Qwen 3.5-27B (41.6%) by a wide margin. This suggests the 3.6 generation made significant improvements in terminal/CLI task handling.

Why 3B active parameters matters

The MoE architecture is the key story here. With 256 experts and only 8+1 activated per token, Qwen 3.6-35B-A3B gives you the learned capacity of a 35B model at the inference cost of a ~3B model. In practice:

Runs on a MacBook Pro — Simon Willison ran the Q4_K_S quantized version (~21 GB) on his M5 MacBook via LM Studio
Runs on a single GPU — fits in 24 GB VRAM with quantization
Fast inference — 3B active parameters means response times comparable to small models
Full commercial use — Apache 2.0 means no restrictions

For comparison, running Claude Opus 4.7 requires an API call at $5/$25 per million tokens. Running Qwen 3.6-35B-A3B locally costs you electricity.

What it’s good at

Agentic coding. The model was specifically optimized for repository-level coding tasks — understanding entire codebases, planning multi-file changes, and executing through tool calls. The SWE-bench scores confirm this isn’t marketing.

Frontend generation. Qwen highlights frontend workflows as a specific strength. The model handles HTML/CSS/JS generation from natural language with improved fluency over the 3.5 generation.

Thinking preservation. A new feature in 3.6: the model can retain reasoning context from previous messages in multi-turn conversations. This reduces redundant reasoning in agentic loops and can actually lower total token consumption.

Vision. Unlike many coding-focused models, this one includes a vision encoder. It can process images and video alongside text — useful for UI-to-code workflows or diagram interpretation.

Math and reasoning. 92.7% on AIME 2026 and 86.0% on GPQA Diamond put it in frontier territory for reasoning, despite the small active parameter count.

How to run it locally

With LM Studio (easiest)

Download LM Studio
Search for Qwen3.6-35B-A3B
Download the Q4_K_S quantization (~21 GB) by Unsloth
Load and chat

With Ollama

ollama run qwen3.6:35b-a3b

With vLLM (for serving)

uv pip install vllm --torch-backend=auto

vllm serve Qwen/Qwen3.6-35B-A3B \
  --port 8000 \
  --tensor-parallel-size 1 \
  --max-model-len 262144 \
  --reasoning-parser qwen3

With SGLang

python -m sglang.launch_server \
  --model-path Qwen/Qwen3.6-35B-A3B \
  --port 8000 \
  --mem-fraction-static 0.8 \
  --context-length 262144 \
  --reasoning-parser qwen3

Recommended sampling parameters

Qwen provides specific recommendations depending on your use case:

Mode	Temperature	top_p	top_k	presence_penalty
Thinking (general)	1.0	0.95	20	1.5
Thinking (precise coding)	0.6	0.95	20	0.0
Non-thinking (general)	0.7	0.8	20	1.5
Non-thinking (reasoning)	1.0	0.95	20	1.5

Set max output to 32,768 tokens for most tasks, or 81,920 for complex math/programming problems.

Qwen 3.6-35B-A3B vs Qwen 3.6 Plus

Don’t confuse the two. Qwen 3.6 Plus is the proprietary API model (78.8% SWE-bench Verified). Qwen 3.6-35B-A3B is the open-weight variant — smaller, runs locally, but still impressively capable at 73.4% SWE-bench Verified.

	Qwen 3.6 Plus	Qwen 3.6-35B-A3B
Access	API only	Open weights (Apache 2.0)
SWE-bench Verified	78.8%	73.4%
Run locally	No	Yes (~21 GB quantized)
Price	Per-token API pricing	Free (self-hosted)
Context	1M tokens	262K native (1M via YaRN)

The pelican test

Simon Willison — one of the most respected voices in the AI developer community — tested Qwen 3.6-35B-A3B against Claude Opus 4.7 on his famous “pelican riding a bicycle” SVG benchmark. Running the quantized model on his MacBook, Qwen produced a better pelican than Anthropic’s flagship $5/$25-per-million-token model.

As Willison notes, this doesn’t mean Qwen 3.6-35B-A3B is “better” than Opus 4.7 in any general sense. But it does illustrate something important: a 21 GB model running on a laptop can produce results that compete with — and sometimes beat — the most expensive proprietary models on specific tasks.

The post hit 363 points on Hacker News. The open-source-vs-frontier conversation is alive and well.

Who should use this model

Use Qwen 3.6-35B-A3B if you:

Want a strong coding model that runs locally with no API costs
Need Apache 2.0 licensing for commercial products
Work on a laptop or single-GPU setup
Want privacy — your code never leaves your machine
Need a capable model for agentic workflows on a budget

Stick with Opus 4.7 or Qwen 3.6 Plus if you:

Need the absolute best coding performance (Opus 4.7: 64.3% SWE-bench Pro vs Qwen 3.6-35B-A3B: 49.5%)
Work with very long contexts (1M tokens natively)
Need the best vision capabilities (Opus 4.7’s 3.75MP is hard to beat)
Don’t want to manage local infrastructure

Bottom line

Qwen 3.6-35B-A3B is the most capable open-weight model you can run on consumer hardware right now. 73.4% on SWE-bench Verified with 3B active parameters is remarkable efficiency. The Apache 2.0 license removes all commercial barriers. And the fact that it can beat Opus 4.7 on SVG generation while running on a laptop is the kind of result that keeps the open-source AI movement exciting.

If you’re building products that need local AI inference, this should be on your evaluation list.