Apr 23, 2026 · 6 min read

Last updated on May 28, 2026

MiMo V2.5 Pro API Guide: Setup, Pricing, and Code Examples (2026)

MiMo V2.5 Pro is Xiaomi’s most capable coding model, built for long-horizon agentic tasks with 1,000+ tool calls per session. The API is OpenAI-compatible, which means you can use the standard OpenAI Python library to make calls. As of May 26, 2026, pricing was permanently reduced by up to 99%: $0.435 per million input tokens, $0.87 per million output tokens, and just $0.0036 per million cached input tokens. This makes it 34x cheaper than GPT-5.5 on output while using 40-60% fewer tokens to complete the same tasks.

This guide covers API setup, pricing, code examples, and integration with coding tools. If you previously used MiMo V2 Pro with Aider, the setup is nearly identical. Just swap the model tag.

Get Your API Key

Go to platform.xiaomimimo.com
Sign up with email or phone number
Navigate to API Keys in the dashboard
Generate a new key and copy it

The free tier gives you enough credits to test the setup. Paid plans start at $10/month for full-time coding use.

Store the key in an environment variable:

export MIMO_API_KEY="your-api-key-here"

Add this to your .bashrc or .zshrc so it persists across sessions.

Basic Chat Completion (Python)

Install the OpenAI Python library:

pip install openai

Make your first API call:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.xiaomimimo.com/v1",
    api_key=os.environ["MIMO_API_KEY"]
)

response = client.chat.completions.create(
    model="mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "You are a senior software engineer."},
        {"role": "user", "content": "Write a thread-safe LRU cache in Python."}
    ],
    temperature=0.6
)

print(response.choices[0].message.content)

The base URL is https://api.xiaomimimo.com/v1. The model tag is mimo-v2.5-pro. That is all you need to change from a standard OpenAI setup.

Model Tags

Xiaomi released two models in the V2.5 family. Use the right tag for your use case:

Model Tag	Use Case	Modalities	Best For
`mimo-v2.5-pro`	Complex agentic coding	Text only	Long-horizon coding, autonomous agents, multi-file refactors
`mimo-v2.5`	General-purpose	Text, image, audio, video	Chat, content, analysis, multimodal tasks

mimo-v2.5-pro is the specialist for coding and agentic workflows. mimo-v2.5 (standard) is faster, cheaper, and handles multimodal inputs. Switching between them is just a model tag change in your API call.

Streaming Responses

For real-time output, enable streaming:

stream = client.chat.completions.create(
    model="mimo-v2.5-pro",
    messages=[
        {"role": "user", "content": "Explain how B-trees work, then implement one in Rust."}
    ],
    temperature=0.6,
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming is recommended for agentic workflows where you want to see progress as the model works through a problem.

API Pricing

Updated May 28, 2026: Xiaomi permanently reduced prices by up to 99% on May 26. The table below reflects the new permanent pricing.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Hit (per 1M)	Typical 1hr coding session
MiMo V2.5 Pro	$0.435	$0.87	$0.0036	~$0.15-0.35
MiMo V2.5 Standard	$0.20	$0.60	$0.002	~$0.08-0.15
DeepSeek V4-Pro	$0.435	$0.87	$0.003625	~$0.15-0.35
Claude Opus 4.7	$5.00	$25.00	$0.50	~$10.00-20.00
GPT-5.5	$5.00	$30.00	N/A	~$8.00-15.00
Kimi K2.5	$0.60	$2.50	N/A	~$0.50-1.00

MiMo V2.5 Pro is now 34x cheaper than GPT-5.5 on output. Factor in the 40-60% token efficiency gain and the 99% cache hit discount, and the real-world cost difference is even larger. A task that costs $10 with Opus might cost $0.10-0.20 with V2.5 Pro when cache hits are factored in.

Token Plan

The Token Plan is Xiaomi’s prepaid pricing system. As of May 26, 2026, Token Plans were massively upgraded — the $100 Max plan now includes 82 billion tokens (up from 1.6 billion, a 51x increase). Key features:

No context-length multiplier. You pay the same rate whether you use 10K or 500K tokens of context. This is a big deal for agentic workflows that accumulate large contexts over time.
Night-time discounts. Reduced rates during off-peak hours (roughly 10 PM to 8 AM local time). If you run batch agent jobs, schedule them at night to save more.
Auto-renewal. Token Plans now auto-renew, so you don’t lose unused capacity or forget to top up mid-workflow. You can disable this in the dashboard if you prefer manual control.
5-51x more tokens per plan. Starter ($10) gets 820M tokens. Pro ($50) gets 4.1B tokens. Max ($100) gets 82B tokens.

For exact pricing tiers and regional variations, check platform.xiaomimimo.com.

Long-Context Usage (1M Tokens)

V2.5 Pro supports a 1-million-token context window. This is useful for large codebases, long agent sessions, and tasks that accumulate thousands of tool call results.

# Long-context example: analyzing a large codebase
with open("full_codebase.txt", "r") as f:
    codebase = f.read()

response = client.chat.completions.create(
    model="mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "You are a code reviewer."},
        {"role": "user", "content": f"Review this codebase for security issues:\n\n{codebase}"}
    ],
    temperature=0.3
)

print(response.choices[0].message.content)

A few things to keep in mind with long contexts:

The hybrid attention mechanism maintains quality even near the context limit, but shorter, focused prompts still produce better results.
No context-length multiplier on pricing means you pay the same per-token rate regardless of how much context you use.
For agentic sessions that run for hours, V2.5 Pro’s harness awareness helps it manage context proactively, summarizing earlier work and avoiding redundant reads.

Integration with Coding Tools

V2.5 Pro works with the major agentic coding harnesses. Xiaomi specifically recommends Claude Code as the primary harness.

Claude Code

claude config set model mimo-v2.5-pro
claude config set apiBaseUrl https://api.xiaomimimo.com/v1
claude config set apiKey $MIMO_API_KEY

See our full MiMo V2.5 Pro Claude Code setup guide for detailed configuration and tips.

OpenCode

opencode --provider mimo --model mimo-v2.5-pro

Or add it to your OpenCode config file. The setup is similar to the Aider configuration for MiMo V2 Pro.

Kilo

Point the base URL to https://api.xiaomimimo.com/v1 and select mimo-v2.5-pro as the model in Kilo’s settings. Kilo’s lightweight harness pairs well with V2.5 Pro’s token efficiency.

All three tools use the OpenAI-compatible API format, so the underlying connection is the same. Claude Code gets the best results because V2.5 Pro was specifically fine-tuned on traces from Claude Code sessions.

Also Available on OpenRouter

If you prefer a single API key for multiple models, V2.5 Pro is available through OpenRouter:

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"]
)

response = client.chat.completions.create(
    model="xiaomi/mimo-v2.5-pro",
    messages=[
        {"role": "user", "content": "Refactor this function to use async/await."}
    ]
)

OpenRouter pricing includes a small markup over the direct API. For most users, the convenience of a single billing account and the ability to fall back to other models is worth it. Check the OpenRouter complete guide for setup details.

cURL Example

For quick testing or non-Python environments:

curl https://api.xiaomimimo.com/v1/chat/completions \
  -H "Authorization: Bearer $MIMO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mimo-v2.5-pro",
    "messages": [{"role": "user", "content": "Write a binary search in Go."}],
    "temperature": 0.6
  }'

FAQ

How does V2.5 Pro compare to V2 Pro on the API?

Same API endpoint, same format. The differences are all in model capability: V2.5 Pro handles 1,000+ tool calls per session (vs hundreds for V2 Pro), uses 40-60% fewer tokens, and scores higher on every benchmark. The Token Plan is also better with no context-length multiplier and night discounts. If you are currently using V2 Pro, switch the model tag to mimo-v2.5-pro and you are done.

What are the rate limits?

The free tier has conservative rate limits suitable for testing. Paid plans ($10/month and up) provide enough throughput for full-time coding use. If you hit limits on a paid tier, contact MiMo support through the dashboard to request an increase. Rate limits are per-key, so you can create separate keys for different projects.

Can I use V2.5 Pro for non-coding tasks?

You can, but you will get better results with mimo-v2.5 (standard) for general-purpose tasks. V2.5 Pro is optimized for coding and agentic workflows. For chat, content generation, analysis, or anything involving images, audio, or video, the standard model is the better choice. See our AI model comparison for help picking the right model for your use case.