πŸ€– AI Tools
Β· 6 min read

GLM-5.2 Complete Guide β€” 1M Context, MIT License, Setup (2026)


Z.ai just dropped GLM-5.2 β€” a 744B parameter mixture-of-experts model with a 1 million token context window, 131K max output tokens, and MIT open weights on the way. Released June 13, 2026, this is the third major iteration in the GLM-5 line and the biggest leap yet. It launched the same day Anthropic’s Claude Fable 5 was banned under US export controls, making it an even more significant release for the global coding model landscape.

Here’s everything you need to know: architecture, what changed from GLM-5.1, how to set it up, pricing, and FAQs.

Who is Z.ai?

Z.ai is the international brand of Zhipu AI, a Beijing-based company spun off from Tsinghua University. They completed a Hong Kong IPO in January 2026 and have been shipping models at a relentless pace ever since. The GLM-5 line started with GLM-5 on February 11, followed by GLM-5-Turbo on March 15, and GLM-5.1 on April 7. GLM-5.2 is the culmination of that rapid iteration cycle.

What’s New in GLM-5.2 vs GLM-5.1

FeatureGLM-5.1GLM-5.2
Context window200K tokens1M tokens
Max output tokens64K131K
ArchitectureMoEMoE (DeepSeek Sparse Attention)
ParametersNot disclosed744B total, 40B active
Training dataNot disclosed28.5T tokens
Thinking modesStandardHigh + Max
Open weightsNoMIT (coming next week)

The headline number is obvious: a 5x jump in context window from 200K to 1M tokens. The model is labeled glm-5.2[1m] to reflect this. But the architectural changes underneath are just as important.

For a deeper dive into how the two compare in practice, see our GLM-5.2 vs GLM-5.1 comparison.

Architecture

GLM-5.2 is a 744 billion parameter mixture-of-experts model with 40 billion parameters active per token. It’s built on the DeepSeek Sparse Attention architecture, which enables the 1M context window without the quadratic scaling costs of dense attention. The model was trained on 28.5 trillion tokens.

The 1M context window is the standout feature for coding use cases. You can fit entire repositories into context β€” monorepos, full documentation sets, and multi-file refactors that previously required chunking or RAG workarounds.

Thinking Modes

GLM-5.2 introduces two thinking modes:

  • High β€” Balanced reasoning for general tasks. Good for code review, documentation, explanations, and straightforward implementations.
  • Max β€” Deep reasoning recommended for coding. Use this for complex refactors, debugging, architectural decisions, and multi-step problem solving.

In Claude Code, these map as follows:

Claude Code settingGLM-5.2 thinking mode
low / medium / highHigh
xhigh / max / ultracodeMax

Max uses significantly more compute per prompt but produces markedly better results on complex coding tasks. For most development work, Max is the recommended default.

Benchmarks

Z.ai published no benchmarks at launch β€” an unusual move. For reference, the predecessor GLM-5.1 scored:

  • 58.4 on SWE-bench Pro
  • 1530 Elo on Code Arena (3rd globally at the time)

We expect third-party benchmarks to appear within days. In the meantime, see our hands-on comparisons: GLM-5.2 vs Claude Opus 4.8 and GLM-5.2 vs Kimi K2.7 Code.

Setup Instructions

Claude Code

This is the most common setup. Set the following environment variables:

export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
export OPUS_MODEL="glm-5.2[1m]"
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000

Add these to your .bashrc, .zshrc, or shell config of choice. The CLAUDE_CODE_AUTO_COMPACT_WINDOW setting ensures Claude Code takes advantage of the full 1M context instead of compacting at the default threshold.

For a step-by-step walkthrough, see our dedicated GLM-5.2 Claude Code setup guide.

Other Supported Tools

GLM-5.2 works with all major agentic coding tools:

  • Claude Code β€” Environment variable config (see above)
  • Cline β€” Select glm-5.2[1m] from model dropdown
  • OpenCode β€” Set in config.yaml
  • Roo Code β€” Model selector in settings
  • OpenClaw β€” API configuration
  • Kilo Code β€” Provider settings
  • Crush β€” Model config
  • Goose β€” Provider configuration

Each tool connects through the GLM API. If you were previously using GLM-5.1, switching is typically a one-line model name change.

Standalone API

The standalone API and chatbot are coming next week according to Z.ai’s announcement. The API will follow their existing patterns β€” see our GLM-5.1 API guide for what to expect.

Pricing

GLM-5.2 uses prompt-based pricing, not token-based. One prompt equals approximately 15–20 model invocations under the hood (accounting for agentic loops, tool calls, retries, etc.).

It’s available immediately on all GLM Coding Plan tiers:

PlanPriceNotes
Lite~$18/monthGood for individual hobbyists
Proβ€”Higher prompt limits
Maxβ€”Highest individual limits
Teamβ€”Shared billing, admin controls

The prompt-based model means you don’t need to worry about context window size inflating your bill. Whether you use 200K or the full 1M context, a prompt is a prompt. This is a significant advantage over token-based pricing for large-context workloads.

Open Weights

Z.ai confirmed MIT-licensed open weights are coming next week. This will make GLM-5.2 one of the largest open-weight models available, and the largest with a 1M context window. The MIT license means unrestricted commercial use β€” no usage restrictions, no registration requirements.

This continues Z.ai’s pattern from GLM-5.1’s agentic engineering approach of building in the open while monetizing through their hosted platform.

What This Means for the Market

GLM-5.2 arrives at an inflection point. The same day it launched, Claude Fable 5 was banned under US export controls, restricting access for developers in several markets. GLM-5.2 fills that gap immediately β€” and with MIT licensing, it will be deployable anywhere.

The 1M context window puts it in direct competition with Gemini’s long-context offerings, while the MoE architecture keeps inference costs manageable. The 40B active parameters per token mean it runs on hardware that would choke on a dense 744B model.

FAQs

Is GLM-5.2 free? Not on the hosted platform β€” you need a GLM Coding Plan (starting at ~$18/month for Lite). However, MIT open weights are coming next week, which you can self-host for free.

Can I use it today? Yes. It’s available immediately on all GLM Coding Plan tiers through supported tools like Claude Code, Cline, and others.

How does the 1M context compare to GLM-5.1’s 200K? It’s a 5x increase. In practice, this means you can load entire codebases into context without chunking. See our 1M context deep dive.

Should I use High or Max thinking mode? For coding, use Max. It’s slower and uses more compute, but produces significantly better results on complex tasks. High is fine for simpler queries, documentation, and code review.

Why didn’t Z.ai publish benchmarks? Unknown. They may be waiting for independent verification or plan to release them alongside the open weights. GLM-5.1 scored 58.4 on SWE-bench Pro and 1530 Elo on Code Arena β€” expect GLM-5.2 to exceed those.

What’s the difference between GLM-5.2 and glm-5.2[1m]? They’re the same model. The [1m] suffix denotes the 1M context window variant, which is the only version available at launch.

Does it support tool use / function calling? Yes. It works with all major agentic coding tools that rely on tool use patterns.

How does prompt-based pricing work? One prompt β‰ˆ 15–20 model invocations. You pay per prompt regardless of token count. This makes large-context usage significantly cheaper than token-based alternatives.

When are open weights available? Z.ai said β€œnext week” from the June 13 launch date, so expect them around June 16–20, 2026.

Next Steps