May 28, 2026 · 4 min read

MiMo V2.5 Pro Price Cut: 99% Cheaper Cached Input — Full Breakdown

On May 26, 2026, Xiaomi announced a permanent price reduction for the MiMo V2.5 series API. The headline number is a 99% cut on cached input tokens. The Pro model now costs $0.0036 per million cached input tokens — down from $0.36. That is not a promotional discount. It is the new permanent price.

This makes MiMo V2.5 Pro one of the cheapest frontier-class coding models available anywhere, matching DeepSeek V4-Pro’s pricing tier while offering different architectural strengths.

What actually changed

Xiaomi made three pricing changes simultaneously:

Cached input tokens cut by 99% — from ~$0.36/M to $0.0036/M
Standard input/output prices cut by 57-71% — input from $1.00/M to $0.435/M, output from $3.00/M to $0.87/M
Context-length tiers eliminated — flat pricing regardless of whether you use 10K or 1M tokens of context

The old pricing had multipliers for longer contexts. That is gone. You pay the same rate whether your prompt is 1,000 tokens or 900,000 tokens.

New vs old pricing table

	Old Price	New Price	Reduction
Input (cache hit)	~$0.36/M	$0.0036/M	99%
Input (cache miss)	$1.00/M	$0.435/M	57%
Output	$3.00/M	$0.87/M	71%
Context multiplier	2-4x for long contexts	None	Eliminated

How this compares to other models

Model	Input/M	Output/M	Cache Hit/M
MiMo V2.5 Pro (new)	$0.435	$0.87	$0.0036
DeepSeek V4-Pro	$0.435	$0.87	$0.003625
Claude Opus 4.7	$5.00	$25.00	$0.50
GPT-5.5	$5.00	$30.00	N/A
Gemini 2.5 Pro	$1.25	$10.00	$0.315
Kimi K2.5	$0.60	$2.50	N/A

MiMo V2.5 Pro is now 34x cheaper than GPT-5.5 on output and 139x cheaper than Claude Opus 4.7 on cached input. For agent pipelines that reuse system prompts and context heavily, the effective cost approaches zero.

Why the cut is technically possible

Fuli Luo, head of Xiaomi’s MiMo team and a former core DeepSeek developer, published a technical explanation. The savings come from a hierarchical KV cache optimization for Sliding Window Attention (SWA).

The MiMo V2.5 Pro architecture uses a 70-layer model with a 10-layer equivalent attention computation. It employs a 1:7 sparsity ratio between global attention layers and sliding window attention layers. In practice, this means:

The KV cache for cached tokens is roughly 5x smaller than a standard transformer
Storage and processing costs drop by approximately 80%
The inference engine runs at near full capacity at these prices while breaking even

Luo stated: “Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even.”

This is not a loss-leader strategy. The architecture genuinely costs less to run.

Token Plan changes

Xiaomi also restructured the prepaid Token Plans:

Plan	Price	Old Tokens	New Tokens	Increase
Starter	$10	160M	820M	5.1x
Pro	$50	800M	4.1B	5.1x
Max	$100	1.6B	82B	51x

The Max plan at $100 now gets you 82 billion tokens. For context, that is over 60 billion words — enough to run an autonomous coding agent continuously for months.

Who benefits most

The 99% cache hit discount matters most for:

Agent pipelines with stable system prompts that get reused across thousands of calls
RAG systems where the retrieval context is cached between queries
Document processors that analyze the same corpus repeatedly
Coding agents like Claude Code or Aider that maintain long conversation histories

If your workload hits cache frequently (and most production workloads do), your effective per-token cost just dropped to nearly nothing. For a practical guide on switching from expensive US models, see how to migrate from GPT-5.5/Claude to DeepSeek or MiMo.

What this means for the market

This price cut landed the same week DeepSeek made its 75% V4-Pro discount permanent. Two of the strongest coding models in the world now cost the same: $0.435 input, $0.87 output. The gap between Chinese and American frontier models is now 15-34x on raw pricing, and much wider when cache discounts apply.

For a broader analysis of this trend, see our piece on why Chinese AI models now cost 30x less than American ones.

How to switch

If you are already using the MiMo API, you do not need to change anything. The new prices apply automatically to all existing API keys and Token Plans. Your next bill will reflect the reduced rates.

If you are new to MiMo, check our MiMo V2.5 Pro API setup guide for step-by-step instructions.

FAQ

Is this a temporary promotion?

No. Xiaomi explicitly stated this is a permanent price adjustment. The technical architecture supports these prices at break-even margins.

Do I need to update my API key or endpoint?

No. The price reduction applies automatically to all existing integrations. No code changes required.

How does MiMo V2.5 Pro compare to DeepSeek V4-Pro now that they cost the same?

They are identically priced but architecturally different. DeepSeek V4-Pro is a 1.6T parameter MoE model optimized for raw benchmark performance. MiMo V2.5 Pro is a dense model optimized for token efficiency and tool-calling. See our detailed comparison.

What about MiMo V2.5 Flash/Standard?

The price cuts apply across the entire V2.5 family. Flash and Standard models received proportional reductions.

Does the 99% discount apply to all cached tokens?

Yes. Any token that hits the KV cache — whether from a system prompt, prior conversation turns, or cached retrieval context — gets the $0.0036/M rate.