GLM-5.1 from Zhipu AI is MIT licensed and available for local inference. While the full model is massive (754B MoE), smaller versions run on consumer hardware via Ollama.
Available GLM models on Ollama
| Model | Parameters | Size | RAM needed | Best for |
|---|---|---|---|---|
| GLM-4 | 9B | ~6 GB | 8 GB | General chat, lightweight |
| GLM-4-9B-Chat | 9B | ~6 GB | 8 GB | Chat-optimized |
| CodeGeeX4 | 9B | ~6 GB | 8 GB | Code generation (GLM-based) |
Important: The full GLM-5.1 (754B MoE) is too large for local inference on consumer hardware. For GLM-5.1 quality, use the Z.ai API ($18/month) or Claude Code with GLM backend.
Setup
# Install Ollama
brew install ollama # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh # Linux
# Pull GLM models
ollama pull glm4:9b # General purpose
ollama pull codegeex4:latest # Code-focused (GLM-based)
# Test
ollama run glm4:9b "Explain Docker networking in simple terms"
ollama run codegeex4 "Write a Python function to parse CSV files"
CodeGeeX4: the coding variant
CodeGeeX4 is Zhipu’s dedicated coding model built on the GLM architecture. At 9B parameters, it’s designed for:
- Code generation across 100+ languages
- Code completion (fill-in-the-middle)
- Code translation between languages
- Code explanation and documentation
# Use with Aider
aider --model ollama/codegeex4
# Use with Continue.dev
# Add to .continue/config.json:
# { "models": [{ "provider": "ollama", "model": "codegeex4" }] }
Hardware requirements
| Hardware | GLM-4 9B | CodeGeeX4 |
|---|---|---|
| 8GB Mac/laptop | ~15 tok/s | ~15 tok/s |
| 16GB Mac | ~25 tok/s | ~25 tok/s |
| RTX 3080 | ~35 tok/s | ~35 tok/s |
| RTX 4090 | ~45 tok/s | ~45 tok/s |
Both models are 9B parameters and have identical hardware requirements. See our VRAM guide for detailed calculations.
Local GLM vs Z.ai API
| Local (Ollama) | Z.ai API | |
|---|---|---|
| Model | GLM-4 9B / CodeGeeX4 | GLM-5.1 (754B) |
| Quality | Good (9B level) | Excellent (frontier) |
| Cost | Free | $18/month |
| Privacy | ✅ Full | ❌ Data sent to Z.ai |
| Claude Code | ❌ | ✅ Full integration |
| Offline | ✅ | ❌ |
The practical approach: Use local GLM-4/CodeGeeX4 for quick tasks and autocomplete. Use Z.ai API with Claude Code for complex coding sessions. Total cost: $18/month for the best of both worlds.
GLM local vs other 9B models
| Model | Coding | General | Chinese | Speed |
|---|---|---|---|---|
| CodeGeeX4 9B | Good | Decent | ✅ Excellent | Fast |
| Yi-Coder 9B | Good | Decent | ✅ Good | Fast |
| Qwen3 8B | Good | Good | ✅ Excellent | Fast |
| Gemma 4 9B | Good | Good | ❌ | Fast |
At the 9B size, all models are competitive. CodeGeeX4 has a slight edge on Chinese code and documentation. Qwen3 8B is the best all-rounder. Yi-Coder 9B is best for pure coding.
Connecting to coding tools
Aider
# CodeGeeX4 for coding
aider --model ollama/codegeex4
# GLM-4 for general tasks
aider --model ollama/glm4:9b
Continue.dev
{
"models": [{
"title": "CodeGeeX4 Local",
"provider": "ollama",
"model": "codegeex4"
}],
"tabAutocompleteModel": {
"title": "CodeGeeX4 Autocomplete",
"provider": "ollama",
"model": "codegeex4"
}
}
OpenCode
opencode --provider ollama --model codegeex4
Troubleshooting
- Model not found — check exact name:
ollama list - Slow performance — ensure GPU is used:
ollama ps - Want better quality — upgrade to Z.ai API for GLM-5.1
See our Ollama troubleshooting guide for all common errors.
Related: GLM-5.1 Complete Guide · How to Run GLM-5.1 Locally · Z.ai API Guide · GLM-5.1 Claude Code Setup · Best Ollama Models for Coding · Ollama Complete Guide