Qwen 3.5 is available through multiple providers: Alibaba Cloud’s Model Studio, OpenRouter, Azure AI Foundry, NVIDIA NIM, and self-hosted via Ollama or llama.cpp. The API follows OpenAI-compatible conventions, so if you’ve used the OpenAI SDK before, you already know how to use Qwen.
Option 1: Alibaba Cloud (cheapest)
The cheapest way to use Qwen 3.5 via API. Approximately $0.11 per million input tokens — roughly 13x cheaper than Claude Opus 4.6.
Sign up at dashscope.aliyuncs.com and get an API key.
curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{"role": "user", "content": "Explain how MoE architectures work in 3 sentences"}
]
}'
With Python (using the OpenAI SDK):
from openai import OpenAI
client = OpenAI(
api_key="your-dashscope-key",
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "user", "content": "Write a Python function that validates email addresses"}
]
)
print(response.choices[0].message.content)
Regional endpoints:
- International:
dashscope-intl.aliyuncs.com - Singapore: for APAC workloads
- Virginia: for US workloads
Option 2: OpenRouter (easiest)
OpenRouter aggregates multiple providers and lets you switch models with one line change. Good if you’re already using it for other models.
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-key",
base_url="https://openrouter.ai/api/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3.5-397b-a17b",
messages=[
{"role": "user", "content": "Compare React and Vue in 5 bullet points"}
]
)
print(response.choices[0].message.content)
Option 3: Self-hosted (free)
All Qwen 3.5 models are Apache 2.0 and can be run locally for free.
With Ollama (easiest):
# Small model — runs on any laptop
ollama run qwen3.5:9b
# Medium model — needs 24GB GPU or M-series Mac
ollama run qwen3.5:27b
# Flagship — needs serious hardware (214GB+ RAM)
ollama run qwen3.5
Then use it like any local API:
from openai import OpenAI
client = OpenAI(
api_key="not-needed",
base_url="http://localhost:11434/v1"
)
response = client.chat.completions.create(
model="qwen3.5:9b",
messages=[
{"role": "user", "content": "Explain Docker networking"}
]
)
Using vision (multimodal)
Qwen 3.5 is natively multimodal. You can send images alongside text:
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
]
}
]
)
This works for charts, screenshots, documents, diagrams — anything visual. Qwen 3.5 scores 93.1 on OCRBench and 85.0 on MMMU, making it one of the strongest vision models available.
Thinking modes
Qwen 3.5 supports three inference modes:
# Auto mode (default) — model decides when to think deeply
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "Solve this step by step: ..."}]
)
# Thinking mode — forces deep chain-of-thought reasoning
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "/think Prove that sqrt(2) is irrational"}]
)
# Fast mode — no chain-of-thought, instant responses
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "/no_think Translate 'hello' to Japanese"}]
)
Use thinking mode for math, complex reasoning, and hard coding problems. Use fast mode for simple tasks where you don’t need the overhead.
Which option to choose
| Use case | Best option |
|---|---|
| Cheapest API | Alibaba Cloud (~$0.11/M) |
| Easiest setup | OpenRouter |
| No API costs | Self-hosted via Ollama |
| Enterprise with SLA | Azure AI Foundry |
| GPU-optimized inference | NVIDIA NIM |
| Privacy-critical | Self-hosted |