Apr 26, 2026 · 4 min read

Last updated on May 15, 2026

How to Run Yi Models Locally with Ollama — Yi-34B and Yi-Coder

Yi models from 01.AI are fully open source (Apache 2.0) and run great locally. Here’s how to set them up with Ollama.

Which Yi model to pick

Model	Size	RAM needed	Best for
Yi-Coder 9B	5 GB	8 GB	Coding, best bang for buck
Yi-6B	4 GB	6 GB	Lightweight chat, edge devices
Yi-34B	20 GB	24 GB	Best quality, general purpose
Yi-1.5-34B	20 GB	24 GB	Improved 34B, more training data
Yi-VL-34B	20 GB	24 GB	Vision + language (multimodal)

For most developers, Yi-Coder 9B is the sweet spot — strong coding at just 5GB.

Setup

# Install Ollama
brew install ollama  # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull your chosen model
ollama pull yi-coder:9b     # Recommended for coding
ollama pull yi:34b          # Best quality
ollama pull yi:6b           # Lightest

# Test it
ollama run yi-coder:9b "Explain async/await in Python with examples"

Hardware requirements

Your hardware	Best Yi model	Performance
8GB Mac/laptop	Yi-Coder 9B	~20 tok/s
16GB Mac	Yi-Coder 9B or Yi-6B	~30 tok/s
24GB+ Mac	Yi-34B	~15 tok/s
RTX 3080 10GB	Yi-Coder 9B	~40 tok/s
RTX 4090 24GB	Yi-34B	~30 tok/s

For Yi-34B and larger models that need 24GB+ VRAM, cloud GPU providers are often cheaper than buying a second graphics card.

See our VRAM guide for detailed calculations and GPU vs CPU guide for when you need a GPU.

Connect to coding tools

Aider

aider --model ollama/yi-coder:9b

Continue.dev (VS Code)

{
  "models": [{
    "title": "Yi Coder Local",
    "provider": "ollama",
    "model": "yi-coder:9b"
  }]
}

OpenCode

opencode --provider ollama --model yi-coder:9b

Yi-Coder vs other small coding models

Model	Size	Coding quality	Speed
Yi-Coder 9B	5 GB	Good	Fast
Qwen3 8B	5 GB	Good	Fast
DeepSeek R1 14B	9 GB	Good (reasoning)	Medium
Devstral Small 24B	16 GB	Best	Medium

Yi-Coder 9B and Qwen3 8B are very close in quality at the same size. Yi-Coder has a slight edge on Chinese code comments and documentation. Qwen3 8B is better for general-purpose tasks.

Troubleshooting

Common issues and fixes:

“model not found” — check exact name: ollama list shows available models
Too slow — ensure GPU is being used: ollama ps
Out of memory — try Yi-6B or Yi-Coder 9B instead of Yi-34B

See our Ollama troubleshooting guide for all common errors.

Quantization options

Yi models on Ollama come in different quantizations. Pick based on your RAM:

Quantization	Yi-34B size	Yi-Coder 9B size	Quality
Q8_0	~34 GB	~9 GB	Best
Q5_K_M	~23 GB	~6 GB	Sweet spot
Q4_K_M	~19 GB	~5 GB	Good enough
Q3_K_M	~15 GB	~4 GB	Noticeable loss

# Pull specific quantization
ollama pull yi:34b-q5_K_M

See our VRAM guide for the full quantization breakdown.

Yi-Coder for autocomplete

Yi-Coder supports fill-in-the-middle (FIM), making it usable for code autocomplete in Continue.dev:

{
  "tabAutocompleteModel": {
    "title": "Yi Coder Autocomplete",
    "provider": "ollama",
    "model": "yi-coder:9b"
  }
}

This gives you free, local code autocomplete that runs on any 8GB machine. Compare with Codestral (22B, needs 16GB) which is higher quality but requires more hardware.

When to use Yi vs other local models

Scenario	Best model	Why
Only 8GB RAM, need coding	Yi-Coder 9B	Best coding at this size
16GB RAM, general purpose	Qwen 3.5 27B	Better all-rounder
16GB RAM, coding focus	Devstral Small 24B	Best coding quality
Need deep reasoning	DeepSeek R1 14B	Chain-of-thought
Chinese + English bilingual	Yi-34B	Best bilingual at this size

FAQ

Is Yi free for commercial use?

Yes. Yi models are released under the Apache 2.0 license, which allows commercial use without restrictions. You can use Yi in production products, modify it, and distribute it freely.

How does Yi-Coder compare to GitHub Copilot?

Yi-Coder 9B is significantly less capable than Copilot (which uses GPT-4 class models). However, it’s free, runs offline, keeps your code private, and has zero latency. For simple completions and boilerplate, it’s surprisingly good. For complex multi-file reasoning, Copilot wins.

Can I fine-tune Yi on my own data?

Yes. Yi’s Apache 2.0 license allows fine-tuning. Use QLoRA for efficient fine-tuning on consumer hardware (16GB VRAM). The Yi-6B and Yi-9B models are the most practical to fine-tune locally.

Why is Yi-34B so slow on my machine?

Yi-34B requires ~20GB RAM at Q4 quantization. If your system is swapping to disk, it’ll be extremely slow. Either use a smaller model (Yi-9B) or ensure you have enough RAM. On Apple Silicon, unified memory helps — a 32GB M2/M3 Mac runs Yi-34B comfortably.