Mar 23, 2026 · 3 min read

How to Use the Qwen 3.5 API — Setup Guide With Code Examples

Qwen 3.5 is available through multiple providers: Alibaba Cloud’s Model Studio, OpenRouter, Azure AI Foundry, NVIDIA NIM, and self-hosted via Ollama or llama.cpp. The API follows OpenAI-compatible conventions, so if you’ve used the OpenAI SDK before, you already know how to use Qwen.

Option 1: Alibaba Cloud (cheapest)

The cheapest way to use Qwen 3.5 via API. Approximately $0.11 per million input tokens — roughly 13x cheaper than Claude Opus 4.6.

curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [
      {"role": "user", "content": "Explain how MoE architectures work in 3 sentences"}
    ]
  }'

With Python (using the OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    api_key="your-dashscope-key",
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "user", "content": "Write a Python function that validates email addresses"}
    ]
)

print(response.choices[0].message.content)

Regional endpoints:

International: dashscope-intl.aliyuncs.com
Singapore: for APAC workloads
Virginia: for US workloads

Option 2: OpenRouter (easiest)

OpenRouter aggregates multiple providers and lets you switch models with one line change. Good if you’re already using it for other models.

from openai import OpenAI

client = OpenAI(
    api_key="your-openrouter-key",
    base_url="https://openrouter.ai/api/v1"
)

response = client.chat.completions.create(
    model="qwen/qwen3.5-397b-a17b",
    messages=[
        {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
    ]
)

print(response.choices[0].message.content)

Option 3: Self-hosted (free)

All Qwen 3.5 models are Apache 2.0 and can be run locally for free.

With Ollama (easiest):

# Small model — runs on any laptop
ollama run qwen3.5:9b

# Medium model — needs 24GB GPU or M-series Mac
ollama run qwen3.5:27b

# Flagship — needs serious hardware (214GB+ RAM)
ollama run qwen3.5

Then use it like any local API:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://localhost:11434/v1"
)

response = client.chat.completions.create(
    model="qwen3.5:9b",
    messages=[
        {"role": "user", "content": "Explain Docker networking"}
    ]
)

Using vision (multimodal)

Qwen 3.5 is natively multimodal. You can send images alongside text:

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
            ]
        }
    ]
)

This works for charts, screenshots, documents, diagrams — anything visual. Qwen 3.5 scores 93.1 on OCRBench and 85.0 on MMMU, making it one of the strongest vision models available.

Thinking modes

Qwen 3.5 supports three inference modes:

# Auto mode (default) — model decides when to think deeply
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Solve this step by step: ..."}]
)

# Thinking mode — forces deep chain-of-thought reasoning
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "/think Prove that sqrt(2) is irrational"}]
)

# Fast mode — no chain-of-thought, instant responses
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "/no_think Translate 'hello' to Japanese"}]
)

Use thinking mode for math, complex reasoning, and hard coding problems. Use fast mode for simple tasks where you don’t need the overhead.

Which option to choose

Use case	Best option
Cheapest API	Alibaba Cloud (~$0.11/M)
Easiest setup	OpenRouter
No API costs	Self-hosted via Ollama
Enterprise with SLA	Azure AI Foundry
GPU-optimized inference	NVIDIA NIM
Privacy-critical	Self-hosted

How to Use the Qwen 3.5 API — Setup Guide With Code Examples

Option 1: Alibaba Cloud (cheapest)

Option 2: OpenRouter (easiest)

Option 3: Self-hosted (free)

Using vision (multimodal)

Thinking modes

Which option to choose

Related

You might also like

Qwen 2.5 Coder vs Codestral — Best Open-Source Coding Model? (2026)

Qwen 2.5 Coder vs DeepSeek Coder — Open-Source Coding Models Compared (2026)

Qwen 3.5 vs MiMo-V2-Flash — Open-Source AI Showdown (2026)

What Is Qwen 3.5? Alibaba's 397B Open-Source Model Explained