Z.ai just dropped GLM-5.2 β a 744B parameter mixture-of-experts model with a 1 million token context window, 131K max output tokens, and MIT open weights on the way. Released June 13, 2026, this is the third major iteration in the GLM-5 line and the biggest leap yet. It launched the same day Anthropicβs Claude Fable 5 was banned under US export controls, making it an even more significant release for the global coding model landscape.
Hereβs everything you need to know: architecture, what changed from GLM-5.1, how to set it up, pricing, and FAQs.
Who is Z.ai?
Z.ai is the international brand of Zhipu AI, a Beijing-based company spun off from Tsinghua University. They completed a Hong Kong IPO in January 2026 and have been shipping models at a relentless pace ever since. The GLM-5 line started with GLM-5 on February 11, followed by GLM-5-Turbo on March 15, and GLM-5.1 on April 7. GLM-5.2 is the culmination of that rapid iteration cycle.
Whatβs New in GLM-5.2 vs GLM-5.1
| Feature | GLM-5.1 | GLM-5.2 |
|---|---|---|
| Context window | 200K tokens | 1M tokens |
| Max output tokens | 64K | 131K |
| Architecture | MoE | MoE (DeepSeek Sparse Attention) |
| Parameters | Not disclosed | 744B total, 40B active |
| Training data | Not disclosed | 28.5T tokens |
| Thinking modes | Standard | High + Max |
| Open weights | No | MIT (coming next week) |
The headline number is obvious: a 5x jump in context window from 200K to 1M tokens. The model is labeled glm-5.2[1m] to reflect this. But the architectural changes underneath are just as important.
For a deeper dive into how the two compare in practice, see our GLM-5.2 vs GLM-5.1 comparison.
Architecture
GLM-5.2 is a 744 billion parameter mixture-of-experts model with 40 billion parameters active per token. Itβs built on the DeepSeek Sparse Attention architecture, which enables the 1M context window without the quadratic scaling costs of dense attention. The model was trained on 28.5 trillion tokens.
The 1M context window is the standout feature for coding use cases. You can fit entire repositories into context β monorepos, full documentation sets, and multi-file refactors that previously required chunking or RAG workarounds.
Thinking Modes
GLM-5.2 introduces two thinking modes:
- High β Balanced reasoning for general tasks. Good for code review, documentation, explanations, and straightforward implementations.
- Max β Deep reasoning recommended for coding. Use this for complex refactors, debugging, architectural decisions, and multi-step problem solving.
In Claude Code, these map as follows:
| Claude Code setting | GLM-5.2 thinking mode |
|---|---|
low / medium / high | High |
xhigh / max / ultracode | Max |
Max uses significantly more compute per prompt but produces markedly better results on complex coding tasks. For most development work, Max is the recommended default.
Benchmarks
Z.ai published no benchmarks at launch β an unusual move. For reference, the predecessor GLM-5.1 scored:
- 58.4 on SWE-bench Pro
- 1530 Elo on Code Arena (3rd globally at the time)
We expect third-party benchmarks to appear within days. In the meantime, see our hands-on comparisons: GLM-5.2 vs Claude Opus 4.8 and GLM-5.2 vs Kimi K2.7 Code.
Setup Instructions
Claude Code
This is the most common setup. Set the following environment variables:
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
export OPUS_MODEL="glm-5.2[1m]"
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000
Add these to your .bashrc, .zshrc, or shell config of choice. The CLAUDE_CODE_AUTO_COMPACT_WINDOW setting ensures Claude Code takes advantage of the full 1M context instead of compacting at the default threshold.
For a step-by-step walkthrough, see our dedicated GLM-5.2 Claude Code setup guide.
Other Supported Tools
GLM-5.2 works with all major agentic coding tools:
- Claude Code β Environment variable config (see above)
- Cline β Select
glm-5.2[1m]from model dropdown - OpenCode β Set in
config.yaml - Roo Code β Model selector in settings
- OpenClaw β API configuration
- Kilo Code β Provider settings
- Crush β Model config
- Goose β Provider configuration
Each tool connects through the GLM API. If you were previously using GLM-5.1, switching is typically a one-line model name change.
Standalone API
The standalone API and chatbot are coming next week according to Z.aiβs announcement. The API will follow their existing patterns β see our GLM-5.1 API guide for what to expect.
Pricing
GLM-5.2 uses prompt-based pricing, not token-based. One prompt equals approximately 15β20 model invocations under the hood (accounting for agentic loops, tool calls, retries, etc.).
Itβs available immediately on all GLM Coding Plan tiers:
| Plan | Price | Notes |
|---|---|---|
| Lite | ~$18/month | Good for individual hobbyists |
| Pro | β | Higher prompt limits |
| Max | β | Highest individual limits |
| Team | β | Shared billing, admin controls |
The prompt-based model means you donβt need to worry about context window size inflating your bill. Whether you use 200K or the full 1M context, a prompt is a prompt. This is a significant advantage over token-based pricing for large-context workloads.
Open Weights
Z.ai confirmed MIT-licensed open weights are coming next week. This will make GLM-5.2 one of the largest open-weight models available, and the largest with a 1M context window. The MIT license means unrestricted commercial use β no usage restrictions, no registration requirements.
This continues Z.aiβs pattern from GLM-5.1βs agentic engineering approach of building in the open while monetizing through their hosted platform.
What This Means for the Market
GLM-5.2 arrives at an inflection point. The same day it launched, Claude Fable 5 was banned under US export controls, restricting access for developers in several markets. GLM-5.2 fills that gap immediately β and with MIT licensing, it will be deployable anywhere.
The 1M context window puts it in direct competition with Geminiβs long-context offerings, while the MoE architecture keeps inference costs manageable. The 40B active parameters per token mean it runs on hardware that would choke on a dense 744B model.
FAQs
Is GLM-5.2 free? Not on the hosted platform β you need a GLM Coding Plan (starting at ~$18/month for Lite). However, MIT open weights are coming next week, which you can self-host for free.
Can I use it today? Yes. Itβs available immediately on all GLM Coding Plan tiers through supported tools like Claude Code, Cline, and others.
How does the 1M context compare to GLM-5.1βs 200K? Itβs a 5x increase. In practice, this means you can load entire codebases into context without chunking. See our 1M context deep dive.
Should I use High or Max thinking mode? For coding, use Max. Itβs slower and uses more compute, but produces significantly better results on complex tasks. High is fine for simpler queries, documentation, and code review.
Why didnβt Z.ai publish benchmarks? Unknown. They may be waiting for independent verification or plan to release them alongside the open weights. GLM-5.1 scored 58.4 on SWE-bench Pro and 1530 Elo on Code Arena β expect GLM-5.2 to exceed those.
Whatβs the difference between GLM-5.2 and glm-5.2[1m]?
Theyβre the same model. The [1m] suffix denotes the 1M context window variant, which is the only version available at launch.
Does it support tool use / function calling? Yes. It works with all major agentic coding tools that rely on tool use patterns.
How does prompt-based pricing work? One prompt β 15β20 model invocations. You pay per prompt regardless of token count. This makes large-context usage significantly cheaper than token-based alternatives.
When are open weights available? Z.ai said βnext weekβ from the June 13 launch date, so expect them around June 16β20, 2026.