πŸ€– AI Tools
Β· 4 min read
Last updated on

GLM-5.1 API Guide β€” Endpoints, Pricing, and Integration


GLM-5.1 is available through multiple API providers. Here’s everything you need to know about accessing it programmatically.

API options

There are three main ways to access GLM-5.1 via API:

  1. Z.ai direct API β€” Official endpoint from the model creator
  2. GLM Coding Plan β€” Subscription-based access optimized for coding tools
  3. OpenRouter β€” Third-party aggregator with pay-per-token pricing

Z.ai Direct API

Authentication

Sign up at z.ai and generate an API key from your dashboard.

Endpoints

Z.ai provides both OpenAI-compatible and Anthropic-compatible endpoints:

OpenAI-compatible:

curl https://api.z.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1",
    "messages": [
      {"role": "system", "content": "You are a senior software engineer."},
      {"role": "user", "content": "Refactor this function to use async/await"}
    ],
    "temperature": 0.7,
    "max_tokens": 4096
  }'

Anthropic-compatible:

curl https://api.z.ai/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1",
    "max_tokens": 4096,
    "messages": [
      {"role": "user", "content": "Write unit tests for this module"}
    ]
  }'

The Anthropic-compatible endpoint is what makes GLM-5.1 work as a drop-in replacement for Claude Code.

Pricing

ModelInput (per 1M tokens)Output (per 1M tokens)
GLM-5.1~$1.00~$2.30
GLM-5~$0.80~$2.00
GLM-5-Turbo~$0.40~$1.00
GLM-4.7~$0.20~$0.50

GLM-5.1 is priced higher than GLM-5 despite sharing the same architecture. Z.ai justifies this with the improved agentic performance, though the community has questioned this since inference costs should be identical.

GLM Coding Plan

The Coding Plan is a subscription that bundles API access with coding tool integration:

TierPriceModelsBest for
Lite$3/monthGLM-5.1, GLM-5-Turbo, GLM-4.xLight usage, learning
Pro$10/monthAll models including GLM-5Daily development
MaxHigherAll models, higher limitsTeams, heavy usage

All tiers support GLM-5.1. The Coding Plan includes setup guides for Claude Code, OpenClaw, Cline, and other popular tools.

Setup

# Install via the GLM CLI
npm install -g @zai/glm-cli
glm auth login

# Or configure manually
export ANTHROPIC_BASE_URL="https://api.z.ai/v1"
export ANTHROPIC_API_KEY="your-coding-plan-key"

OpenRouter

OpenRouter provides GLM-5.1 through their unified API:

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_OPENROUTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "z-ai/glm-5.1",
    "messages": [
      {"role": "user", "content": "Explain this error and fix it"}
    ]
  }'

OpenRouter pricing varies by provider but is typically competitive with Z.ai’s direct pricing. Check openrouter.ai/z-ai/glm-5 for current rates.

Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.z.ai/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {"role": "system", "content": "You are a coding assistant."},
        {"role": "user", "content": "Write a REST API in FastAPI with user authentication"}
    ],
    temperature=0.3,
    max_tokens=8192
)

print(response.choices[0].message.content)

JavaScript/TypeScript SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.z.ai/v1',
  apiKey: 'your-api-key',
});

const response = await client.chat.completions.create({
  model: 'glm-5.1',
  messages: [
    { role: 'user', content: 'Create a React component for a data table with sorting and filtering' }
  ],
  temperature: 0.3,
});

console.log(response.choices[0].message.content);

Reasoning mode

GLM-5.1 supports reasoning (chain-of-thought) through OpenRouter’s reasoning parameter:

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "z-ai/glm-5.1",
    "messages": [{"role": "user", "content": "Debug this race condition"}],
    "reasoning": {"effort": "high"}
  }'

The response includes a reasoning_details array showing the model’s step-by-step thinking.

Tool use

GLM-5.1 supports function calling through the standard OpenAI tools format:

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }]
)

This is critical for agentic workflows where GLM-5.1 needs to call external tools over thousands of iterations.

Tips for best results

  1. Use low temperature (0.1-0.3) for coding β€” Higher temperatures introduce unnecessary variation in code output
  2. Provide system prompts β€” GLM-5.1 responds well to role-based system messages
  3. Use the Anthropic endpoint for Claude Code β€” The OpenAI endpoint works for general use, but Claude Code specifically needs the Anthropic-compatible one
  4. Monitor token usage β€” GLM-5.1’s 200K context window is generous, but long agentic sessions can consume millions of tokens

FAQ

Is the GLM-5.1 API free?

GLM-5.1 does not have a permanent free tier, but Z.ai occasionally offers trial credits for new accounts. The cheapest ongoing access is through the Coding Plan Lite at $3/month, which includes GLM-5.1 along with all other GLM models.

How do I get a Z.ai API key?

Sign up at z.ai, choose a plan (Coding Plan Lite starts at $3/month), then navigate to the API Keys section in your dashboard. Generate a new key and use it with either the OpenAI-compatible or Anthropic-compatible endpoint.

What’s the rate limit?

Rate limits are bundled into the Coding Plan quota rather than enforced as separate per-minute caps. Your usage is governed by the quota pool shared across all models, with premium models like GLM-5.1 consuming more quota during peak hours (14:00-18:00 UTC+8).

Can I use GLM-5.1 with LangChain?

Yes, since Z.ai provides an OpenAI-compatible endpoint, you can use GLM-5.1 with LangChain’s ChatOpenAI class by setting the base_url to https://api.z.ai/v1 and providing your API key. No custom integration code is needed.

Related: GLM-5.1 Complete Guide Β· How to Use GLM-5.1 with Claude Code Β· Best Free AI APIs 2026