Qwen 3.5 is Alibaba Cloud’s flagship open-source AI model. It launched on February 16, 2026 — Chinese New Year’s Day — and immediately became one of the most capable open-weight models available. It has 397 billion total parameters, activates only 17 billion per forward pass, supports 201 languages, and it’s fully Apache 2.0 licensed.
If you’ve been tracking the open-source AI race, Qwen 3.5 is the model that made people seriously question whether you still need to pay for Claude or GPT.
What is Qwen 3.5?
Qwen 3.5 is a Mixture-of-Experts (MoE) vision-language model. Unlike traditional dense models that activate every parameter for every token, MoE models route each token through a subset of specialized “expert” networks. Qwen 3.5 has 397B total parameters but only activates 17B per token — giving you frontier-level intelligence at a fraction of the compute cost.
It’s natively multimodal, meaning text and vision were fused during training from the start, not bolted on afterward. It processes text, images, and video within one unified system.
The full model family
Qwen 3.5 isn’t a single model. It’s a family released in three waves:
| Series | Models | Released |
|---|---|---|
| Flagship | Qwen3.5-397B-A17B (397B total, 17B active) | Feb 16, 2026 |
| Medium | Qwen3.5-27B (dense), 35B-A3B, 122B-A10B | Feb 24, 2026 |
| Small | Qwen3.5-0.8B, 2B, 4B, 9B | Mar 2, 2026 |
All models share the same architecture, support 201 languages, and ship under Apache 2.0. The vocabulary expanded from 150K to 250K tokens compared to Qwen3, improving encoding efficiency by 10–60% across most languages.
The small models are surprisingly capable. The 9B model matches or surpasses GPT-OSS-120B — a model 13x its size — on benchmarks like GPQA Diamond (81.7 vs 71.5) and HMMT (83.2 vs 76.7). The 35B-A3B runs on GPUs with as little as 8GB VRAM.
Key benchmarks
Here’s how the flagship 397B stacks up against frontier closed models:
Reasoning and math:
- AIME 2026: 91.3 (GPT-5.2: 96.7, Claude 4.6: 93.3)
- GPQA Diamond: 81.0 (GPT-5.2: 78.8)
- IFBench (instruction following): 76.5 — highest of any model
- MultiChallenge: 67.6 (GPT-5.2: 57.9, Claude 4.6: 54.2)
Coding:
- SWE-bench Verified: 76.4 (Claude 4.6: 80.9, GPT-5.2: 80.0)
- SWE-bench Multilingual: 72.0 (tied with GPT-5.2)
- LiveCodeBench v6: 83.6
Vision and multimodal:
- MathVision: 88.6 (GPT-5.2: 83.0, Gemini 3 Pro: 86.6)
- OCRBench: 93.1
- MMMU: 85.0
Qwen 3.5 leads on instruction following, multi-step challenges, and visual reasoning. It trails Claude Opus 4.6 on agentic coding tasks but beats it on multilingual and multimodal benchmarks.
Pricing
Alibaba Cloud’s Qwen3.5-Plus API costs approximately $0.11 per million input tokens. That’s roughly 13x cheaper than Claude Opus 4.6 via API. The hosted version includes a 1M context window and built-in tools like search and code interpreter.
It’s also available through Azure AI Foundry, NVIDIA NIM, and Hugging Face Inference Endpoints. Or you can self-host any model in the family for free.
Running it locally
The small models run on basically anything:
- 0.8B: 2GB RAM, any modern laptop
- 9B: 8GB RAM, runs on a 16GB laptop
- 35B-A3B: 8GB VRAM, runs on an M-series Mac
- 397B (Q4 quantized): ~214GB, needs a 256GB M3 Ultra or multi-GPU setup
# Easiest way — Ollama
ollama run qwen3.5:9b
# Or the flagship if you have the hardware
ollama run qwen3.5
Why it matters
Qwen has crossed 600 million downloads on Hugging Face, with over 170,000 derivative models. Over 40% of all new model derivatives on Hugging Face are now Qwen-based. AI Singapore chose Qwen over Meta’s Llama and Google’s Gemma as the foundation for its regional language model.
The gap between open-source and closed models is closing fast. Qwen 3.5 matches or beats GPT-5.2 on several benchmarks while being fully open and 13x cheaper. For developers who need multilingual support, visual understanding, or just want to avoid vendor lock-in, it’s the strongest option available.