π’ Update: Qwen 3.6 is now available. See the Qwen 3.6 complete guide, how to run 3.6-27B locally, and Qwen 3.6 vs 3.5 comparison.
Qwen 3.5 is available through multiple providers: Alibaba Cloudβs Model Studio, OpenRouter, Azure AI Foundry, NVIDIA NIM, and self-hosted via Ollama or llama.cpp. The API follows OpenAI-compatible conventions, so if youβve used the OpenAI SDK before, you already know how to use Qwen.
Option 1: Alibaba Cloud (cheapest)
The cheapest way to use Qwen 3.5 via API. Approximately $0.11 per million input tokens β roughly 13x cheaper than Claude Opus 4.6.
Sign up at dashscope.aliyuncs.com and get an API key.
curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{"role": "user", "content": "Explain how MoE architectures work in 3 sentences"}
]
}'
With Python (using the OpenAI SDK):
from openai import OpenAI
client = OpenAI(
api_key="your-dashscope-key",
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "user", "content": "Write a Python function that validates email addresses"}
]
)
print(response.choices[0].message.content)
Regional endpoints:
- International:
dashscope-intl.aliyuncs.com - Singapore: for APAC workloads
- Virginia: for US workloads
Option 2: OpenRouter (easiest)
OpenRouter aggregates multiple providers and lets you switch models with one line change. Good if youβre already using it for other models.
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-key",
base_url="https://openrouter.ai/api/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3.5-397b-a17b",
messages=[
{"role": "user", "content": "Compare React and Vue in 5 bullet points"}
]
)
print(response.choices[0].message.content)
Option 3: Self-hosted (free)
All Qwen 3.5 models are Apache 2.0 and can be run locally for free.
With Ollama (easiest):
# Small model β runs on any laptop
ollama run qwen3.5:9b
# Medium model β needs 24GB GPU or M-series Mac
ollama run qwen3.5:27b
# Flagship β needs serious hardware (214GB+ RAM)
ollama run qwen3.5
Then use it like any local API:
from openai import OpenAI
client = OpenAI(
api_key="not-needed",
base_url="http://localhost:11434/v1"
)
response = client.chat.completions.create(
model="qwen3.5:9b",
messages=[
{"role": "user", "content": "Explain Docker networking"}
]
)
Using vision (multimodal)
Qwen 3.5 is natively multimodal. You can send images alongside text:
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
]
}
]
)
This works for charts, screenshots, documents, diagrams β anything visual. Qwen 3.5 scores 93.1 on OCRBench and 85.0 on MMMU, making it one of the strongest vision models available.
Thinking modes
Qwen 3.5 supports three inference modes:
# Auto mode (default) β model decides when to think deeply
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "Solve this step by step: ..."}]
)
# Thinking mode β forces deep chain-of-thought reasoning
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "/think Prove that sqrt(2) is irrational"}]
)
# Fast mode β no chain-of-thought, instant responses
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "/no_think Translate 'hello' to Japanese"}]
)
Use thinking mode for math, complex reasoning, and hard coding problems. Use fast mode for simple tasks where you donβt need the overhead.
Which option to choose
| Use case | Best option |
|---|---|
| Cheapest API | Alibaba Cloud (~$0.11/M) |
| Easiest setup | OpenRouter |
| No API costs | Self-hosted via Ollama |
| Enterprise with SLA | Azure AI Foundry |
| GPU-optimized inference | NVIDIA NIM |
| Privacy-critical | Self-hosted |
Related
- What Is Qwen 3.5? Alibabaβs 397B Open-Source Model Explained
- Qwen 3.5 vs MiMo-V2-Flash β Open-Source AI Showdown
- How to Use the MiMo-V2-Pro API β Setup Guide
- AI Model Comparison β Every Major Model Ranked
Related: HTTP Status Codes Cheat Sheet Β· Run Qwen 3.5 Locally