Apr 25, 2026 · 3 min read

How to Run GLM-5.1 with Ollama — Local Setup Guide

GLM-5.1 from Zhipu AI is MIT licensed and available for local inference. While the full model is massive (754B MoE), smaller versions run on consumer hardware via Ollama.

Available GLM models on Ollama

Model	Parameters	Size	RAM needed	Best for
GLM-4	9B	~6 GB	8 GB	General chat, lightweight
GLM-4-9B-Chat	9B	~6 GB	8 GB	Chat-optimized
CodeGeeX4	9B	~6 GB	8 GB	Code generation (GLM-based)

Important: The full GLM-5.1 (754B MoE) is too large for local inference on consumer hardware. For GLM-5.1 quality, use the Z.ai API ($18/month) or Claude Code with GLM backend.

Setup

# Install Ollama
brew install ollama  # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull GLM models
ollama pull glm4:9b           # General purpose
ollama pull codegeex4:latest   # Code-focused (GLM-based)

# Test
ollama run glm4:9b "Explain Docker networking in simple terms"
ollama run codegeex4 "Write a Python function to parse CSV files"

CodeGeeX4: the coding variant

CodeGeeX4 is Zhipu’s dedicated coding model built on the GLM architecture. At 9B parameters, it’s designed for:

Code generation across 100+ languages
Code completion (fill-in-the-middle)
Code translation between languages
Code explanation and documentation

# Use with Aider
aider --model ollama/codegeex4

# Use with Continue.dev
# Add to .continue/config.json:
# { "models": [{ "provider": "ollama", "model": "codegeex4" }] }

Hardware requirements

Hardware	GLM-4 9B	CodeGeeX4
8GB Mac/laptop	~15 tok/s	~15 tok/s
16GB Mac	~25 tok/s	~25 tok/s
RTX 3080	~35 tok/s	~35 tok/s
RTX 4090	~45 tok/s	~45 tok/s

Both models are 9B parameters and have identical hardware requirements. See our VRAM guide for detailed calculations.

Local GLM vs Z.ai API

	Local (Ollama)	Z.ai API
Model	GLM-4 9B / CodeGeeX4	GLM-5.1 (754B)
Quality	Good (9B level)	Excellent (frontier)
Cost	Free	$18/month
Privacy	✅ Full	❌ Data sent to Z.ai
Claude Code	❌	✅ Full integration
Offline	✅	❌

The practical approach: Use local GLM-4/CodeGeeX4 for quick tasks and autocomplete. Use Z.ai API with Claude Code for complex coding sessions. Total cost: $18/month for the best of both worlds.

GLM local vs other 9B models

Model	Coding	General	Chinese	Speed
CodeGeeX4 9B	Good	Decent	✅ Excellent	Fast
Yi-Coder 9B	Good	Decent	✅ Good	Fast
Qwen3 8B	Good	Good	✅ Excellent	Fast
Gemma 4 9B	Good	Good	❌	Fast

At the 9B size, all models are competitive. CodeGeeX4 has a slight edge on Chinese code and documentation. Qwen3 8B is the best all-rounder. Yi-Coder 9B is best for pure coding.

Connecting to coding tools

Aider

# CodeGeeX4 for coding
aider --model ollama/codegeex4

# GLM-4 for general tasks
aider --model ollama/glm4:9b

Continue.dev

{
  "models": [{
    "title": "CodeGeeX4 Local",
    "provider": "ollama",
    "model": "codegeex4"
  }],
  "tabAutocompleteModel": {
    "title": "CodeGeeX4 Autocomplete",
    "provider": "ollama",
    "model": "codegeex4"
  }
}

OpenCode

opencode --provider ollama --model codegeex4

Troubleshooting

Model not found — check exact name: ollama list
Slow performance — ensure GPU is used: ollama ps
Want better quality — upgrade to Z.ai API for GLM-5.1

See our Ollama troubleshooting guide for all common errors.

How to Run GLM-5.1 with Ollama — Local Setup Guide

Available GLM models on Ollama

Setup

CodeGeeX4: the coding variant

Hardware requirements

Local GLM vs Z.ai API

GLM local vs other 9B models

Connecting to coding tools

Aider

Continue.dev

OpenCode

Troubleshooting

📬 AI Dev Weekly

You might also like

How to Run MiMo V2 Pro Locally with Ollama

Ollama Complete Guide: Install, Pull Models, and Run AI Locally in 5 Minutes (2026)

How to Run Mistral Models Locally — Ollama Setup Guide (2026)

How to Run GLM-5.1 Locally — Hardware, Setup, and Quantization Guide