May 22, 2026 · 5 min read

How to Use the Qwen 3.7 API: Setup, Pricing, and First Request (2026)

Qwen 3.7 Max is available via two main channels: Alibaba’s DashScope API (native) and OpenRouter (third-party aggregator). Both support OpenAI-compatible request formats, so integration is straightforward if you’ve used any modern LLM API.

This guide covers setup for both providers, your first request in curl/Python/Node.js, streaming, tool use, and tips for working with the 1M token context window.

For background on what Qwen 3.7 can do, see our complete guide.

Pricing breakdown

Provider	Input	Output	Notes
DashScope	$2.50/1M tokens	$7.50/1M tokens	Native API, lowest latency
OpenRouter	$2.50/1M tokens	$7.50/1M tokens	OpenAI-compatible, multi-provider

Cost example: A typical coding request with 2,000 input tokens and 1,000 output tokens costs about $0.0125. A heavy agent session using 100K input and 50K output costs about $0.625.

Option 1: DashScope (native API)

Step 1: Create an account

Go to dashscope.aliyuncs.com
Sign up with an Alibaba Cloud account
Navigate to the API Keys section
Generate a new API key

Step 2: Make your first request (curl)

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.7-max",
    "messages": [
      {"role": "user", "content": "Write a Python function that finds the longest palindromic substring."}
    ]
  }'

Step 3: Python example

from openai import OpenAI

client = OpenAI(
    api_key="your-dashscope-api-key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {"role": "user", "content": "Explain how B-trees work in database indexes."}
    ]
)

print(response.choices[0].message.content)

Step 4: Node.js example

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1",
});

const response = await client.chat.completions.create({
  model: "qwen3.7-max",
  messages: [
    { role: "user", content: "Write a Rust function for concurrent file processing." },
  ],
});

console.log(response.choices[0].message.content);

Option 2: OpenRouter

OpenRouter lists Qwen 3.7 Max as qwen/qwen3.7-max. If you already use OpenRouter for other models, this is the fastest path.

Setup

Go to openrouter.ai
Create an account and add credits
Generate an API key

Python example

from openai import OpenAI

client = OpenAI(
    api_key="your-openrouter-key",
    base_url="https://openrouter.ai/api/v1"
)

response = client.chat.completions.create(
    model="qwen/qwen3.7-max",
    messages=[
        {"role": "user", "content": "Design a rate limiter using Redis sorted sets."}
    ]
)

print(response.choices[0].message.content)

Streaming responses

For real-time output (useful in chat interfaces or agent loops):

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

stream = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {"role": "user", "content": "Write a comprehensive guide to WebSocket authentication."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Tool use (function calling)

Qwen 3.7 Max scored 76.4 on MCP-Atlas, indicating strong tool use capabilities. Here’s how to define and use tools:

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_codebase",
            "description": "Search for code patterns in the repository",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "file_type": {"type": "string", "description": "File extension filter"}
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {"role": "user", "content": "Find all usages of the deprecated auth middleware."}
    ],
    tools=tools,
    tool_choice="auto"
)

# Check if the model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
    for call in message.tool_calls:
        print(f"Tool: {call.function.name}")
        print(f"Args: {call.function.arguments}")

Working with the 1M context window

The 1M token context window is powerful but requires some planning:

Tips for large context usage

Put instructions at the start and end. Models attend better to the beginning and end of long contexts.
Use clear section markers. When passing multiple files, use headers like ### File: src/auth.ts to help the model navigate.
Be specific about what you need. With 1M tokens of context, vague questions produce vague answers. Point the model at specific sections.
Watch your costs. 1M input tokens costs $2.50. Only send what’s relevant.

Example: Analyzing a codebase

import os
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# Collect source files
codebase = ""
for root, dirs, files in os.walk("./src"):
    for file in files:
        if file.endswith((".ts", ".tsx")):
            path = os.path.join(root, file)
            with open(path) as f:
                codebase += f"### File: {path}\n```\n{f.read()}\n```\n\n"

response = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {"role": "system", "content": "You are a code reviewer. Analyze the codebase for security issues."},
        {"role": "user", "content": f"Review this codebase for SQL injection vulnerabilities:\n\n{codebase}"}
    ]
)

print(response.choices[0].message.content)

Error handling

Common errors and how to handle them:

from openai import OpenAI, APIError, RateLimitError, APITimeoutError

client = OpenAI(
    api_key="your-api-key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

try:
    response = client.chat.completions.create(
        model="qwen3.7-max",
        messages=[{"role": "user", "content": "Hello"}],
        timeout=60
    )
except RateLimitError:
    # Back off and retry
    print("Rate limited. Wait and retry.")
except APITimeoutError:
    # Increase timeout for large context requests
    print("Request timed out. Try a shorter prompt or increase timeout.")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

For large context requests (500K+ tokens), increase your timeout to 120-180 seconds. These requests take longer to process.

Using with Claude Code

Qwen 3.7 Max supports the Anthropic API protocol. To use it with Claude Code:

export ANTHROPIC_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/anthropic/v1"
export ANTHROPIC_API_KEY="your-dashscope-key"

Then run Claude Code as normal. It will route requests to Qwen 3.7 Max instead of Claude.

Comparison with Qwen 3.6 API

If you’re migrating from Qwen 3.6, the main changes are:

Model name: qwen-max-preview becomes qwen3.7-max
Context: 256K to 1M tokens
Pricing: May differ from 3.6 preview pricing
New: Anthropic protocol endpoint available

The request format is identical. Update the model name and you’re done.

FAQ

No. DashScope supports international accounts. You can sign up with an email address and international payment method.

Is there a free tier?

No free tier for Qwen 3.7 Max. If you want free access to Qwen models, check if Qwen 3.6 Plus is still available on OpenRouter’s free preview.

What’s the rate limit?

Rate limits depend on your DashScope account tier. New accounts typically start with lower limits. OpenRouter has its own rate limiting based on your credit balance.

Can I use the OpenAI Python SDK?

Yes. Both DashScope and OpenRouter are OpenAI-compatible. Just change the base_url and api_key parameters.

What’s the maximum output length?

Check the API documentation for the current max output token limit. Typically frontier Qwen models support 8K-65K output tokens depending on the endpoint.

How do I count tokens before sending?

Use the tiktoken library with the cl100k_base encoding as an approximation, or use Alibaba’s tokenizer if available. For cost estimation, assume roughly 1 token per 4 characters in English.

Is the API stable for production use?

Qwen 3.7 Max is a production release (not a preview). It should be stable, but as with any new model, test thoroughly before routing production traffic. Consider keeping Qwen 3.6 as a fallback during the transition period.