Z.ai API Complete Guide — GLM Models, Pricing, and Setup (2026)
Z.ai (formerly Zhipu AI) offers API access to the GLM model family through their Coding Plan. The standout feature: an Anthropic-compatible endpoint that lets you use GLM-5.1 through Claude Code.
Available models
| Model | Quality | Speed | Best for |
|---|---|---|---|
| GLM-5.1 | Best | Medium | Complex coding, architecture |
| GLM-5-Turbo | Very good | Fast | Balanced quality/speed |
| GLM-4.7 | Good | Fast | Routine coding, budget |
| GLM-4.5-Air | Basic | Fastest | Quick questions, autocomplete |
All models are included in the Coding Plan — no per-model pricing.
Pricing: the Coding Plan
| Plan | Price | Quota | Models |
|---|---|---|---|
| Coding Plan Lite | $18/month | 5-hour blocks + weekly quota | All 4 models |
How quota works
All models consume from the same quota pool:
| Model | Off-peak rate | Peak rate (14:00-18:00 UTC+8) |
|---|---|---|
| GLM-5.1 | 1x | 2-3x |
| GLM-5-Turbo | 1x | 2-3x |
| GLM-4.7 | 1x | 1x |
| GLM-4.5-Air | 1x | 1x |
Key insight: During off-peak hours, GLM-5.1 costs the same as GLM-4.7. For European and US developers, peak hours (14:00-18:00 UTC+8 = 08:00-12:00 CEST = 02:00-06:00 ET) are early morning — you’re likely sleeping anyway.
See our GLM-5.1 API pricing guide for detailed quota analysis.
Setup: Anthropic-compatible endpoint
The killer feature of Z.ai is the Anthropic-compatible API. This means any tool that works with Claude also works with GLM:
export ANTHROPIC_AUTH_TOKEN="your-zai-api-key"
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
With Claude Code
export ANTHROPIC_AUTH_TOKEN="your-zai-api-key"
export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.1"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air"
claude # Now uses GLM-5.1 instead of Claude
You get the full Claude Code agentic experience at $18/month. See our complete Claude Code + GLM setup guide.
With Python (OpenAI-compatible)
from openai import OpenAI
client = OpenAI(
base_url="https://api.z.ai/api/openai/v1",
api_key="your-zai-api-key",
)
response = client.chat.completions.create(
model="glm-5.1",
messages=[{"role": "user", "content": "Review this code for bugs"}],
)
With Aider
export OPENAI_API_BASE="https://api.z.ai/api/openai/v1"
export OPENAI_API_KEY="your-zai-api-key"
aider --model glm-5.1
Getting your API key
- Sign up at z.ai
- Subscribe to the Coding Plan Lite ($18/month)
- Go to API Keys in your dashboard
- Generate a new key
- Use it with the endpoints above
Quota management tips
From our AI Startup Race testing:
- A 30-minute GLM-5.1 session uses ~44% of the 5-hour quota block
- A 30-minute GLM-4.7 session uses ~35% of the 5-hour quota block
- Weekly quota of 22% per premium session means ~4-5 GLM-5.1 sessions per week
Sustainable daily schedule: 1 premium (GLM-5.1) + 1 cheap (GLM-4.7) session per day.
Maximize value:
- Use GLM-5.1 for complex tasks (architecture, debugging, code review)
- Use GLM-4.7 for routine tasks (refactoring, simple generation)
- Schedule GLM-5.1 sessions during off-peak hours (1x rate instead of 2-3x)
- Monitor quota in the Z.ai dashboard
Z.ai vs other API providers
| Provider | Best model | Monthly cost | Anthropic-compatible |
|---|---|---|---|
| Z.ai | GLM-5.1 | $18 | ✅ |
| Anthropic | Claude Sonnet | $20 | ✅ (native) |
| OpenRouter | Multiple | Pay per token | ❌ (OpenAI-compatible) |
| DeepSeek | DeepSeek V3 | ~$25 | ❌ |
| Moonshot | Kimi K2.5 | ~$19 | ❌ |
Z.ai is the only provider besides Anthropic that offers an Anthropic-compatible endpoint. This makes it the only drop-in replacement for Claude Code.
Rate limits and reliability
- Rate limits: Included in the Coding Plan quota, no separate rate limit
- Uptime: Generally reliable, occasional slowdowns during Chinese business hours
- Regions: Servers primarily in China, ~100-200ms latency from Europe/US
- Support: Chinese-language support primarily, English documentation available
For production use outside of coding tools, consider the latency. For Claude Code sessions where you’re waiting for responses anyway, the latency is not noticeable.
FAQ
What is Z.ai?
Z.ai is the international-facing AI platform from Zhipu AI, a leading Chinese AI company. It provides API access to the GLM model family, including GLM-5.1, GLM-5-Turbo, and GLM-4.7, through both OpenAI-compatible and Anthropic-compatible endpoints.
Is Z.ai the same as Zhipu AI?
Z.ai is the global brand and API platform operated by Zhipu AI. Zhipu AI is the parent company that develops the GLM models, while z.ai is the domain and product name used for their international developer-facing services and Coding Plan subscriptions.
How much does Z.ai cost?
The Coding Plan Lite is $18/month and includes access to all GLM models (GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air) with a quota-based system. There is no pay-per-token option through Z.ai directly, though you can access GLM models via OpenRouter on a per-token basis.
Can I use Z.ai with Claude Code?
Yes — this is Z.ai’s killer feature. The Anthropic-compatible endpoint at https://api.z.ai/api/anthropic lets you use GLM-5.1 as a drop-in replacement for Claude in Claude Code. Just set the ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN environment variables to point to Z.ai.
Related: GLM-5.1 Complete Guide · GLM-5.1 Claude Code Setup · What is Z.ai? · GLM-5.1 API Pricing · GLM-5.1 vs Kimi K2.5 · AI Startup Race