Jun 15, 2026 · 6 min read

Last updated on Jul 24, 2026

GLM-5.2: How to Use Z.ai's Free 1M Context Model (MIT License, 2026)

Z.ai just dropped GLM-5.2 — a 744B parameter mixture-of-experts model with a 1 million token context window, 131K max output tokens, and MIT open weights on the way. Released June 13, 2026, this is the third major iteration in the GLM-5 line and the biggest leap yet. It launched the same day Anthropic’s Claude Fable 5 was banned under US export controls, making it an even more significant release for the global coding model landscape.

Here’s everything you need to know: architecture, what changed from GLM-5.1, how to set it up, pricing, and FAQs.

Who is Z.ai?

Z.ai is the international brand of Zhipu AI, a Beijing-based company spun off from Tsinghua University. They completed a Hong Kong IPO in January 2026 and have been shipping models at a relentless pace ever since. The GLM-5 line started with GLM-5 on February 11, followed by GLM-5-Turbo on March 15, and GLM-5.1 on April 7. GLM-5.2 is the culmination of that rapid iteration cycle.

What’s New in GLM-5.2 vs GLM-5.1

Feature	GLM-5.1	GLM-5.2
Context window	200K tokens	1M tokens
Max output tokens	64K	131K
Architecture	MoE	MoE (DeepSeek Sparse Attention)
Parameters	Not disclosed	744B total, 40B active
Training data	Not disclosed	28.5T tokens
Thinking modes	Standard	High + Max
Open weights	No	MIT (available on Hugging Face)

The headline number is obvious: a 5x jump in context window from 200K to 1M tokens. The model is labeled glm-5.2[1m] to reflect this. But the architectural changes underneath are just as important.

For a deeper dive into how the two compare in practice, see our GLM-5.2 vs GLM-5.1 comparison.

Architecture

GLM-5.2 is a 744 billion parameter mixture-of-experts model with 40 billion parameters active per token. It’s built on the DeepSeek Sparse Attention architecture, which enables the 1M context window without the quadratic scaling costs of dense attention. The model was trained on 28.5 trillion tokens.

The 1M context window is the standout feature for coding use cases. You can fit entire repositories into context — monorepos, full documentation sets, and multi-file refactors that previously required chunking or RAG workarounds.

Thinking Modes

GLM-5.2 introduces two thinking modes:

High — Balanced reasoning for general tasks. Good for code review, documentation, explanations, and straightforward implementations.
Max — Deep reasoning recommended for coding. Use this for complex refactors, debugging, architectural decisions, and multi-step problem solving.

In Claude Code, these map as follows:

Claude Code setting	GLM-5.2 thinking mode
`low` / `medium` / `high`	High
`xhigh` / `max` / `ultracode`	Max

Max uses significantly more compute per prompt but produces markedly better results on complex coding tasks. For most development work, Max is the recommended default.

Benchmarks

Z.ai published no benchmarks at launch — an unusual move. For reference, the predecessor GLM-5.1 scored:

58.4 on SWE-bench Pro
1530 Elo on Code Arena (3rd globally at the time)

We expect third-party benchmarks to appear within days. In the meantime, see our hands-on comparisons: GLM-5.2 vs Claude Opus 4.8 and GLM-5.2 vs Kimi K2.7 Code.

Setup Instructions

Claude Code

This is the most common setup. Set the following environment variables:

export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
export OPUS_MODEL="glm-5.2[1m]"
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000

Add these to your .bashrc, .zshrc, or shell config of choice. The CLAUDE_CODE_AUTO_COMPACT_WINDOW setting ensures Claude Code takes advantage of the full 1M context instead of compacting at the default threshold.

For a step-by-step walkthrough, see our dedicated GLM-5.2 Claude Code setup guide.

Other Supported Tools

GLM-5.2 works with all major agentic coding tools:

Claude Code — Environment variable config (see above)
Cline — Select glm-5.2[1m] from model dropdown
OpenCode — Set in config.yaml
Roo Code — Model selector in settings
OpenClaw — API configuration
Kilo Code — Provider settings
Crush — Model config
Goose — Provider configuration

Each tool connects through the GLM API. If you were previously using GLM-5.1, switching is typically a one-line model name change.

Standalone API

The standalone API and chatbot are coming next week according to Z.ai’s announcement. The API will follow their existing patterns — see our GLM-5.1 API guide for what to expect.

Pricing

GLM-5.2 uses prompt-based pricing, not token-based. One prompt equals approximately 15–20 model invocations under the hood (accounting for agentic loops, tool calls, retries, etc.).

It’s available immediately on all GLM Coding Plan tiers:

Plan	Price	Notes
Lite	~$18/month	Good for individual hobbyists
Pro	—	Higher prompt limits
Max	—	Highest individual limits
Team	—	Shared billing, admin controls

The prompt-based model means you don’t need to worry about context window size inflating your bill. Whether you use 200K or the full 1M context, a prompt is a prompt. This is a significant advantage over token-based pricing for large-context workloads.

Open Weights

Z.ai has released MIT-licensed open weights on Hugging Face. This makes GLM-5.2 one of the largest open-weight models available, and the largest with a 1M context window. The MIT license means unrestricted commercial use — no usage restrictions, no registration requirements.

This continues Z.ai’s pattern from GLM-5.1’s agentic engineering approach of building in the open while monetizing through their hosted platform.

What This Means for the Market

GLM-5.2 arrives at an inflection point. The same day it launched, Claude Fable 5 was banned under US export controls, restricting access for developers in several markets. GLM-5.2 fills that gap immediately — and with MIT licensing, it will be deployable anywhere.

The 1M context window puts it in direct competition with Gemini’s long-context offerings, while the MoE architecture keeps inference costs manageable. The 40B active parameters per token mean it runs on hardware that would choke on a dense 744B model.

FAQs

Is GLM-5.2 free? Not on the hosted platform — you need a GLM Coding Plan (starting at ~$18/month for Lite). However, MIT open weights are now available on Hugging Face, which you can self-host for free.

Can I use it today? Yes. It’s available immediately on all GLM Coding Plan tiers through supported tools like Claude Code, Cline, and others.

How does the 1M context compare to GLM-5.1’s 200K? It’s a 5x increase. In practice, this means you can load entire codebases into context without chunking. See our 1M context deep dive.

Should I use High or Max thinking mode? For coding, use Max. It’s slower and uses more compute, but produces significantly better results on complex tasks. High is fine for simpler queries, documentation, and code review.

Why didn’t Z.ai publish benchmarks? Unknown. They may be waiting for independent verification or plan to release them alongside the open weights. GLM-5.1 scored 58.4 on SWE-bench Pro and 1530 Elo on Code Arena — expect GLM-5.2 to exceed those.

What’s the difference between GLM-5.2 and glm-5.2[1m]? They’re the same model. The [1m] suffix denotes the 1M context window variant, which is the only version available at launch.

Does it support tool use / function calling? Yes. It works with all major agentic coding tools that rely on tool use patterns.

How does prompt-based pricing work? One prompt ≈ 15–20 model invocations. You pay per prompt regardless of token count. This makes large-context usage significantly cheaper than token-based alternatives.

When are open weights available? Z.ai said “next week” from the June 13 launch date, so expect them around June 16–20, 2026.

GLM-5.2: How to Use Z.ai's Free 1M Context Model (MIT License, 2026)

Who is Z.ai?

What’s New in GLM-5.2 vs GLM-5.1

Architecture

Thinking Modes

Benchmarks

Setup Instructions

Claude Code

Other Supported Tools

Standalone API

Pricing

Open Weights

What This Means for the Market

Related Articles

FAQs

Next Steps

📬 AI Dev Weekly

You might also like

GLM-5.2 vs Claude Opus 4.8 — Open Source vs Closed Frontier (2026)

GLM-5.2 vs GLM-5.1 — What Changed and Should You Upgrade? (2026)

Tencent Hy3 vs Qwen 3.7: Which Open Chinese Model Wins for Coding?

GLM-5.2 vs DeepSeek V4 — Best Chinese Coding Model in 2026?