🤖 AI Tools
· 3 min read

How to Use the Qwen 3.5 API — Setup Guide With Code Examples


Qwen 3.5 is available through multiple providers: Alibaba Cloud’s Model Studio, OpenRouter, Azure AI Foundry, NVIDIA NIM, and self-hosted via Ollama or llama.cpp. The API follows OpenAI-compatible conventions, so if you’ve used the OpenAI SDK before, you already know how to use Qwen.

Option 1: Alibaba Cloud (cheapest)

The cheapest way to use Qwen 3.5 via API. Approximately $0.11 per million input tokens — roughly 13x cheaper than Claude Opus 4.6.

Sign up at dashscope.aliyuncs.com and get an API key.

curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [
      {"role": "user", "content": "Explain how MoE architectures work in 3 sentences"}
    ]
  }'

With Python (using the OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    api_key="your-dashscope-key",
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "user", "content": "Write a Python function that validates email addresses"}
    ]
)

print(response.choices[0].message.content)

Regional endpoints:

  • International: dashscope-intl.aliyuncs.com
  • Singapore: for APAC workloads
  • Virginia: for US workloads

Option 2: OpenRouter (easiest)

OpenRouter aggregates multiple providers and lets you switch models with one line change. Good if you’re already using it for other models.

from openai import OpenAI

client = OpenAI(
    api_key="your-openrouter-key",
    base_url="https://openrouter.ai/api/v1"
)

response = client.chat.completions.create(
    model="qwen/qwen3.5-397b-a17b",
    messages=[
        {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
    ]
)

print(response.choices[0].message.content)

Option 3: Self-hosted (free)

All Qwen 3.5 models are Apache 2.0 and can be run locally for free.

With Ollama (easiest):

# Small model — runs on any laptop
ollama run qwen3.5:9b

# Medium model — needs 24GB GPU or M-series Mac
ollama run qwen3.5:27b

# Flagship — needs serious hardware (214GB+ RAM)
ollama run qwen3.5

Then use it like any local API:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://localhost:11434/v1"
)

response = client.chat.completions.create(
    model="qwen3.5:9b",
    messages=[
        {"role": "user", "content": "Explain Docker networking"}
    ]
)

Using vision (multimodal)

Qwen 3.5 is natively multimodal. You can send images alongside text:

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
            ]
        }
    ]
)

This works for charts, screenshots, documents, diagrams — anything visual. Qwen 3.5 scores 93.1 on OCRBench and 85.0 on MMMU, making it one of the strongest vision models available.

Thinking modes

Qwen 3.5 supports three inference modes:

# Auto mode (default) — model decides when to think deeply
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Solve this step by step: ..."}]
)

# Thinking mode — forces deep chain-of-thought reasoning
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "/think Prove that sqrt(2) is irrational"}]
)

# Fast mode — no chain-of-thought, instant responses
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "/no_think Translate 'hello' to Japanese"}]
)

Use thinking mode for math, complex reasoning, and hard coding problems. Use fast mode for simple tasks where you don’t need the overhead.

Which option to choose

Use caseBest option
Cheapest APIAlibaba Cloud (~$0.11/M)
Easiest setupOpenRouter
No API costsSelf-hosted via Ollama
Enterprise with SLAAzure AI Foundry
GPU-optimized inferenceNVIDIA NIM
Privacy-criticalSelf-hosted