MiMo V2.5 Pro is Xiaomi’s most capable coding model, built for long-horizon agentic tasks with 1,000+ tool calls per session. The API is OpenAI-compatible, which means you can use the standard OpenAI Python library to make calls. At $1.00 per million input tokens and $3.00 per million output tokens, it undercuts Claude Opus and GPT-5.4 by 10-25x while using 40-60% fewer tokens to complete the same tasks.
This guide covers API setup, pricing, code examples, and integration with coding tools. If you previously used MiMo V2 Pro with Aider, the setup is nearly identical. Just swap the model tag.
Get Your API Key
- Go to platform.xiaomimimo.com
- Sign up with email or phone number
- Navigate to API Keys in the dashboard
- Generate a new key and copy it
The free tier gives you enough credits to test the setup. Paid plans start at $10/month for full-time coding use.
Store the key in an environment variable:
export MIMO_API_KEY="your-api-key-here"
Add this to your .bashrc or .zshrc so it persists across sessions.
Basic Chat Completion (Python)
Install the OpenAI Python library:
pip install openai
Make your first API call:
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.xiaomimimo.com/v1",
api_key=os.environ["MIMO_API_KEY"]
)
response = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{"role": "system", "content": "You are a senior software engineer."},
{"role": "user", "content": "Write a thread-safe LRU cache in Python."}
],
temperature=0.6
)
print(response.choices[0].message.content)
The base URL is https://api.xiaomimimo.com/v1. The model tag is mimo-v2.5-pro. That is all you need to change from a standard OpenAI setup.
Model Tags
Xiaomi released two models in the V2.5 family. Use the right tag for your use case:
| Model Tag | Use Case | Modalities | Best For |
|---|---|---|---|
mimo-v2.5-pro | Complex agentic coding | Text only | Long-horizon coding, autonomous agents, multi-file refactors |
mimo-v2.5 | General-purpose | Text, image, audio, video | Chat, content, analysis, multimodal tasks |
mimo-v2.5-pro is the specialist for coding and agentic workflows. mimo-v2.5 (standard) is faster, cheaper, and handles multimodal inputs. Switching between them is just a model tag change in your API call.
Streaming Responses
For real-time output, enable streaming:
stream = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{"role": "user", "content": "Explain how B-trees work, then implement one in Rust."}
],
temperature=0.6,
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Streaming is recommended for agentic workflows where you want to see progress as the model works through a problem.
API Pricing
V2.5 Pro ships at the same base price as V2 Pro, but the effective cost is roughly half thanks to Token Plan changes and the model’s built-in token efficiency.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical 1hr coding session |
|---|---|---|---|
| MiMo V2.5 Pro | $1.00 | $3.00 | ~$0.40-0.80 |
| MiMo V2.5 Standard | $0.50 | $1.50 | ~$0.15-0.30 |
| Claude Opus 4.6 | $15.00 | $75.00 | ~$8.00-15.00 |
| GPT-5.4 | $10.00 | $30.00 | ~$5.00-10.00 |
| Kimi K2.6 | $0.60 | $3.00 | ~$0.50-1.00 |
V2.5 Pro is 10-25x cheaper than frontier models on raw token price. Factor in the 40-60% token efficiency gain, and the real-world cost difference is even larger. A task that costs $10 with Opus might cost $0.30-0.50 with V2.5 Pro.
Token Plan
The Token Plan is Xiaomi’s prepaid pricing system. V2.5 Pro introduces three changes over V2 Pro:
- No context-length multiplier. V2 Pro charged more for longer contexts. V2.5 Pro doesn’t. You pay the same rate whether you use 10K or 500K tokens of context. This is a big deal for agentic workflows that accumulate large contexts over time.
- Night-time discounts. Reduced rates during off-peak hours (roughly 10 PM to 8 AM local time). If you run batch agent jobs, schedule them at night to save more.
- Auto-renewal. Token Plans now auto-renew, so you don’t lose unused capacity or forget to top up mid-workflow. You can disable this in the dashboard if you prefer manual control.
For exact pricing tiers and regional variations, check platform.xiaomimimo.com.
Long-Context Usage (1M Tokens)
V2.5 Pro supports a 1-million-token context window. This is useful for large codebases, long agent sessions, and tasks that accumulate thousands of tool call results.
# Long-context example: analyzing a large codebase
with open("full_codebase.txt", "r") as f:
codebase = f.read()
response = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{"role": "system", "content": "You are a code reviewer."},
{"role": "user", "content": f"Review this codebase for security issues:\n\n{codebase}"}
],
temperature=0.3
)
print(response.choices[0].message.content)
A few things to keep in mind with long contexts:
- The hybrid attention mechanism maintains quality even near the context limit, but shorter, focused prompts still produce better results.
- No context-length multiplier on pricing means you pay the same per-token rate regardless of how much context you use.
- For agentic sessions that run for hours, V2.5 Pro’s harness awareness helps it manage context proactively, summarizing earlier work and avoiding redundant reads.
Integration with Coding Tools
V2.5 Pro works with the major agentic coding harnesses. Xiaomi specifically recommends Claude Code as the primary harness.
Claude Code
claude config set model mimo-v2.5-pro
claude config set apiBaseUrl https://api.xiaomimimo.com/v1
claude config set apiKey $MIMO_API_KEY
See our full MiMo V2.5 Pro Claude Code setup guide for detailed configuration and tips.
OpenCode
opencode --provider mimo --model mimo-v2.5-pro
Or add it to your OpenCode config file. The setup is similar to the Aider configuration for MiMo V2 Pro.
Kilo
Point the base URL to https://api.xiaomimimo.com/v1 and select mimo-v2.5-pro as the model in Kilo’s settings. Kilo’s lightweight harness pairs well with V2.5 Pro’s token efficiency.
All three tools use the OpenAI-compatible API format, so the underlying connection is the same. Claude Code gets the best results because V2.5 Pro was specifically fine-tuned on traces from Claude Code sessions.
Also Available on OpenRouter
If you prefer a single API key for multiple models, V2.5 Pro is available through OpenRouter:
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"]
)
response = client.chat.completions.create(
model="xiaomi/mimo-v2.5-pro",
messages=[
{"role": "user", "content": "Refactor this function to use async/await."}
]
)
OpenRouter pricing includes a small markup over the direct API. For most users, the convenience of a single billing account and the ability to fall back to other models is worth it. Check the OpenRouter complete guide for setup details.
cURL Example
For quick testing or non-Python environments:
curl https://api.xiaomimimo.com/v1/chat/completions \
-H "Authorization: Bearer $MIMO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mimo-v2.5-pro",
"messages": [{"role": "user", "content": "Write a binary search in Go."}],
"temperature": 0.6
}'
FAQ
How does V2.5 Pro compare to V2 Pro on the API?
Same API endpoint, same format. The differences are all in model capability: V2.5 Pro handles 1,000+ tool calls per session (vs hundreds for V2 Pro), uses 40-60% fewer tokens, and scores higher on every benchmark. The Token Plan is also better with no context-length multiplier and night discounts. If you are currently using V2 Pro, switch the model tag to mimo-v2.5-pro and you are done.
What are the rate limits?
The free tier has conservative rate limits suitable for testing. Paid plans ($10/month and up) provide enough throughput for full-time coding use. If you hit limits on a paid tier, contact MiMo support through the dashboard to request an increase. Rate limits are per-key, so you can create separate keys for different projects.
Can I use V2.5 Pro for non-coding tasks?
You can, but you will get better results with mimo-v2.5 (standard) for general-purpose tasks. V2.5 Pro is optimized for coding and agentic workflows. For chat, content generation, analysis, or anything involving images, audio, or video, the standard model is the better choice. See our AI model comparison for help picking the right model for your use case.
Related: MiMo V2.5 Pro Complete Guide · MiMo V2 Pro Aider Setup · OpenRouter Complete Guide · AI Model Comparison · MiMo V2.5 Pro Claude Code Setup · Best AI Coding Tools 2026