πŸ€– AI Tools
Β· 4 min read
Last updated on

How to Run Yi Models Locally with Ollama β€” Yi-34B and Yi-Coder


Yi models from 01.AI are fully open source (Apache 2.0) and run great locally. Here’s how to set them up with Ollama.

Which Yi model to pick

ModelSizeRAM neededBest for
Yi-Coder 9B5 GB8 GBCoding, best bang for buck
Yi-6B4 GB6 GBLightweight chat, edge devices
Yi-34B20 GB24 GBBest quality, general purpose
Yi-1.5-34B20 GB24 GBImproved 34B, more training data
Yi-VL-34B20 GB24 GBVision + language (multimodal)

For most developers, Yi-Coder 9B is the sweet spot β€” strong coding at just 5GB.

Setup

# Install Ollama
brew install ollama  # Mac
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull your chosen model
ollama pull yi-coder:9b     # Recommended for coding
ollama pull yi:34b          # Best quality
ollama pull yi:6b           # Lightest

# Test it
ollama run yi-coder:9b "Explain async/await in Python with examples"

Hardware requirements

Your hardwareBest Yi modelPerformance
8GB Mac/laptopYi-Coder 9B~20 tok/s
16GB MacYi-Coder 9B or Yi-6B~30 tok/s
24GB+ MacYi-34B~15 tok/s
RTX 3080 10GBYi-Coder 9B~40 tok/s
RTX 4090 24GBYi-34B~30 tok/s

For Yi-34B and larger models that need 24GB+ VRAM, cloud GPU providers are often cheaper than buying a second graphics card.

See our VRAM guide for detailed calculations and GPU vs CPU guide for when you need a GPU.

Connect to coding tools

Aider

aider --model ollama/yi-coder:9b

Continue.dev (VS Code)

{
  "models": [{
    "title": "Yi Coder Local",
    "provider": "ollama",
    "model": "yi-coder:9b"
  }]
}

OpenCode

opencode --provider ollama --model yi-coder:9b

Yi-Coder vs other small coding models

ModelSizeCoding qualitySpeed
Yi-Coder 9B5 GBGoodFast
Qwen3 8B5 GBGoodFast
DeepSeek R1 14B9 GBGood (reasoning)Medium
Devstral Small 24B16 GBBestMedium

Yi-Coder 9B and Qwen3 8B are very close in quality at the same size. Yi-Coder has a slight edge on Chinese code comments and documentation. Qwen3 8B is better for general-purpose tasks.

Troubleshooting

Common issues and fixes:

  • β€œmodel not found” β€” check exact name: ollama list shows available models
  • Too slow β€” ensure GPU is being used: ollama ps
  • Out of memory β€” try Yi-6B or Yi-Coder 9B instead of Yi-34B

See our Ollama troubleshooting guide for all common errors.

Quantization options

Yi models on Ollama come in different quantizations. Pick based on your RAM:

QuantizationYi-34B sizeYi-Coder 9B sizeQuality
Q8_0~34 GB~9 GBBest
Q5_K_M~23 GB~6 GBSweet spot
Q4_K_M~19 GB~5 GBGood enough
Q3_K_M~15 GB~4 GBNoticeable loss
# Pull specific quantization
ollama pull yi:34b-q5_K_M

See our VRAM guide for the full quantization breakdown.

Yi-Coder for autocomplete

Yi-Coder supports fill-in-the-middle (FIM), making it usable for code autocomplete in Continue.dev:

{
  "tabAutocompleteModel": {
    "title": "Yi Coder Autocomplete",
    "provider": "ollama",
    "model": "yi-coder:9b"
  }
}

This gives you free, local code autocomplete that runs on any 8GB machine. Compare with Codestral (22B, needs 16GB) which is higher quality but requires more hardware.

When to use Yi vs other local models

ScenarioBest modelWhy
Only 8GB RAM, need codingYi-Coder 9BBest coding at this size
16GB RAM, general purposeQwen 3.5 27BBetter all-rounder
16GB RAM, coding focusDevstral Small 24BBest coding quality
Need deep reasoningDeepSeek R1 14BChain-of-thought
Chinese + English bilingualYi-34BBest bilingual at this size

FAQ

Is Yi free for commercial use?

Yes. Yi models are released under the Apache 2.0 license, which allows commercial use without restrictions. You can use Yi in production products, modify it, and distribute it freely.

How does Yi-Coder compare to GitHub Copilot?

Yi-Coder 9B is significantly less capable than Copilot (which uses GPT-4 class models). However, it’s free, runs offline, keeps your code private, and has zero latency. For simple completions and boilerplate, it’s surprisingly good. For complex multi-file reasoning, Copilot wins.

Can I fine-tune Yi on my own data?

Yes. Yi’s Apache 2.0 license allows fine-tuning. Use QLoRA for efficient fine-tuning on consumer hardware (16GB VRAM). The Yi-6B and Yi-9B models are the most practical to fine-tune locally.

Why is Yi-34B so slow on my machine?

Yi-34B requires ~20GB RAM at Q4 quantization. If your system is swapping to disk, it’ll be extremely slow. Either use a smaller model (Yi-9B) or ensure you have enough RAM. On Apple Silicon, unified memory helps β€” a 32GB M2/M3 Mac runs Yi-34B comfortably.