Yi API Setup Guide β Yi-Lightning, Yi-Coder, and Yi-34B (2026)
01.AI offers API access to their Yi model family. The flagship Yi-Lightning ranked 6th on Chatbot Arena. Hereβs how to set it up.
Available via API
| Model | Access | Best for |
|---|---|---|
| Yi-Lightning | 01.AI platform | Best quality, flagship |
| Yi-Large | 01.AI platform | Large context tasks |
| Yi-Coder 9B | Local or HuggingFace | Coding (free locally) |
| Yi-34B | Local or HuggingFace | General purpose (free locally) |
Setup via 01.AI platform
from openai import OpenAI
client = OpenAI(
base_url="https://api.01.ai/v1",
api_key="your-01ai-key",
)
response = client.chat.completions.create(
model="yi-lightning",
messages=[{"role": "user", "content": "Write a Python web scraper with error handling"}],
)
print(response.choices[0].message.content)
Sign up at platform.01.ai for an API key.
Setup via OpenRouter
Some Yi models are available on OpenRouter:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key",
)
response = client.chat.completions.create(
model="01-ai/yi-large",
messages=[{"role": "user", "content": "Explain Kubernetes networking"}],
)
With coding tools
Aider
# Via 01.AI API
export OPENAI_API_BASE=https://api.01.ai/v1
export OPENAI_API_KEY=your-01ai-key
aider --model yi-lightning
# Via local (free)
aider --model ollama/yi-coder:9b
Continue.dev
{
"models": [{
"title": "Yi-Lightning",
"provider": "openai",
"model": "yi-lightning",
"apiBase": "https://api.01.ai/v1",
"apiKey": "your-key"
}]
}
Pricing
| Model | Input | Output |
|---|---|---|
| Yi-Lightning | ~$0.99/M tokens | ~$0.99/M tokens |
| Yi-Large | ~$3/M tokens | ~$3/M tokens |
| Yi-Coder 9B (local) | Free | Free |
| Yi-34B (local) | Free | Free |
For most developers, running Yi-Coder locally is the best value. Use Yi-Lightning via API only when you need the flagship quality.
Yi API vs alternatives
| Provider | Best model | Cost | Unique feature |
|---|---|---|---|
| 01.AI | Yi-Lightning | ~$1/M | Strong Chinese + coding |
| Z.ai | GLM-5.1 | $18/mo flat | Claude Code compatible |
| OpenRouter | Qwen 3.6 | Free (preview) | 300+ models, one key |
| DeepSeek | DeepSeek V3 | $0.27/$1.10 | Cheapest paid |
| Anthropic | Claude Sonnet | $3/$15 | Best coding quality |
Yi-Lightning is competitive on price but the 01.AI platform is less mature than OpenRouter or Anthropic. For most developers, running Yi-Coder locally + using Qwen 3.6 free on OpenRouter covers all needs at zero cost.
Streaming responses
For real-time output in chat interfaces:
stream = client.chat.completions.create(
model="yi-lightning",
messages=[{"role": "user", "content": "Write a Python web scraper"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Error handling
import time
from openai import OpenAI, APIError, RateLimitError
client = OpenAI(base_url="https://api.01.ai/v1", api_key="your-key")
def call_yi(prompt, retries=3):
for attempt in range(retries):
try:
response = client.chat.completions.create(
model="yi-lightning",
messages=[{"role": "user", "content": prompt}],
timeout=30,
)
return response.choices[0].message.content
except RateLimitError:
time.sleep(2 ** attempt)
except APIError as e:
if attempt == retries - 1:
raise
time.sleep(1)
return None
When to use API vs local
| Scenario | Use API (Yi-Lightning) | Use local (Yi-Coder) |
|---|---|---|
| Need best quality | β | |
| Need privacy | β | |
| Budget-conscious | β (free) | |
| No GPU/weak hardware | β | |
| Offline work | β | |
| Chinese language tasks | β (best quality) | Good |
| Code autocomplete | β (1.5B, instant) |
The practical setup: Yi-Coder 9B locally for daily coding (free, private), Yi-Lightning API for complex tasks that need flagship quality.
Yi API vs the competition
| Provider | Model | Input cost | Output cost | Strength |
|---|---|---|---|---|
| 01.AI | Yi-Lightning | ~$0.99/M | ~$0.99/M | Chinese + coding |
| Z.ai | GLM-5.1 | $18/mo flat | Included | Claude Code compatible |
| OpenRouter | Qwen 3.6 | Free | Free | 300+ models |
| DeepSeek | V3 | $0.27/M | $1.10/M | Cheapest paid |
| Anthropic | Sonnet | $3/M | $15/M | Best coding |
For most developers, the free options (Qwen 3.6 on OpenRouter, Yi-Coder locally) are sufficient. Pay for Yi-Lightning only when you specifically need its Chinese language strength or flagship reasoning.
FAQ
Is the Yi API free?
The Yi-Lightning and Yi-Large APIs are paid (around $0.99β$3/M tokens), but Yi-Coder 9B and Yi-34B can be run locally for free using Ollama or other local inference tools. For most developers, the local option covers daily needs at zero cost.
How do I get a Yi API key?
Sign up at platform.01.ai, create an account, and generate an API key from the dashboard. The key works with their OpenAI-compatible endpoint at https://api.01.ai/v1.
Which Yi model is best?
Yi-Lightning is the flagship with the best quality (ranked 6th on Chatbot Arena), while Yi-Coder 9B is the best for coding tasks and runs free locally. Use Yi-Lightning via API for complex reasoning and Yi-Coder locally for daily development work.
Can I use Yi with OpenAI-compatible clients?
Yes, the Yi API uses an OpenAI-compatible interface. You can use the standard OpenAI Python SDK by pointing base_url to https://api.01.ai/v1 β any tool that supports custom OpenAI endpoints (Aider, Continue.dev, etc.) works out of the box.
Related: What is Yi? Β· How to Run Yi Locally Β· Yi-Coder Guide Β· OpenRouter Guide Β· Z.ai API Guide