Xiaomi released four AI models in the MiMo-V2 family between December 2025 and March 2026. Together, they form a complete AI agent stack — reasoning, perception, and speech. Here’s everything you need to know about each one.
The family at a glance
| Model | Role | Params (total/active) | Context | Pricing (in/out) | Open source |
|---|---|---|---|---|---|
| MiMo-V2-Pro | Brain | 1T / 42B | 1M tokens | $1.00/$3.00 | ❌ |
| MiMo-V2-Flash | Fast worker | 309B / 15B | 56K tokens | $0.10/$0.30 | ✅ |
| MiMo-V2-Omni | Eyes & ears | — | — | TBD | ❌ |
| MiMo-V2-TTS | Voice | — | — | TBD | ❌ |
MiMo-V2-Pro — The flagship
Released: March 18, 2026 (previously known as “Hunter Alpha”)
Pro is Xiaomi’s frontier model, designed for complex reasoning, coding, and autonomous agent workflows. It spent a week on OpenRouter as an anonymous stealth model before Xiaomi revealed it — and the AI community mistook it for DeepSeek V4.
Key specs:
- 1 trillion total parameters, 42B active (MoE)
- 1 million token context window
- Hybrid attention mechanism
- Multi-Token Prediction for speed
- #3 globally on PinchBench and ClawEval agent benchmarks
Benchmark highlights:
- Approaches Claude Opus 4.6 on coding and agent tasks
- Surpasses Claude Sonnet 4.6 on most benchmarks
- 5-8x cheaper than Opus for comparable quality
Best for: AI agents, complex coding, long-context processing, multi-step workflows.
Deep dive: What Is MiMo-V2-Pro? | How to Use the MiMo-V2-Pro API
MiMo-V2-Flash — The open-source workhorse
Released: December 17, 2025
Flash is the model that put Xiaomi on the AI map. Open-source, blazing fast, and absurdly cheap — it became one of the most popular models on OpenRouter within weeks of launch.
Key specs:
- 309B total parameters, 15B active (MoE)
- 56K token context window
- 150 tokens/sec inference speed
- Hybrid sliding-window attention (128-token window, 5:1 ratio)
- Weights available on HuggingFace
Benchmark highlights:
- 73.4% on SWE-Bench Verified (#1 open-source)
- Comparable to Claude Sonnet 4.5 at 3.5% of the cost
- Top 2 among open-source models on agent benchmarks
Best for: High-volume coding tasks, self-hosting, prototyping, cost-sensitive production workloads.
Deep dive: What Is MiMo-V2-Flash? | MiMo-V2-Flash vs DeepSeek V3
MiMo-V2-Omni — The multimodal perceiver
Released: March 18, 2026
Omni is Xiaomi’s multimodal model — it natively processes text, images, video, and audio within a unified architecture. While Pro thinks and Flash codes, Omni sees and hears.
Key capabilities:
- Text, image, video, and audio input in one model
- 10+ hours of continuous audio processing
- GUI interaction — can navigate and operate browser interfaces
- Cross-modal reasoning (processes visual and textual information together)
Designed for: Browser automation, video analysis, document understanding, voice-controlled agents.
Xiaomi positions Omni as the “executor” in their agent stack — it perceives the environment and carries out actions that Pro plans.
Deep dive: What Is MiMo-V2-Omni?
MiMo-V2-TTS — The voice
Released: March 18, 2026
MiMo-V2-TTS is Xiaomi’s text-to-speech model, designed to give AI agents a human-like voice. It’s not a general-purpose TTS system — it’s specifically built for agent interaction.
Key capabilities:
- Emotional nuance in speech output
- Real-time adaptability (adjusts tone based on context)
- Designed for conversational AI, not just reading text aloud
- Integrates with Pro and Omni for end-to-end agent communication
TTS completes the agent loop: Pro reasons, Omni perceives, and TTS communicates. For Xiaomi’s smart home and automotive products, this means AI assistants that sound natural rather than robotic.
How they work together
Xiaomi designed these models as a system, not as standalone products:
User request
↓
MiMo-V2-Pro (plans the task, breaks it into steps)
↓
MiMo-V2-Omni (perceives environment, executes GUI actions)
↓
MiMo-V2-TTS (communicates results to user)
↓
MiMo-V2-Flash (handles high-volume subtasks cheaply)
This is Xiaomi’s play for their “person-vehicle-home” ecosystem. The AI in your Xiaomi phone, car, and smart home devices all run on this stack.
Which model should you use?
For coding and development: Start with Flash. It’s open source, fast, and cheap. Upgrade to Pro when you need better quality or longer context.
For AI agents: Pro for planning and reasoning. Consider Omni if your agent needs to interact with visual interfaces.
For cost optimization: Use Flash for 80% of tasks, Pro for the remaining 20%. See MiMo-V2-Pro vs Flash for the detailed comparison.
For self-hosting: Flash is your only option — it’s the only open-source model in the family.
The bigger picture
A year ago, Xiaomi was known for phones and rice cookers. Now they have a four-model AI family that competes with Anthropic and OpenAI on specific benchmarks. The lead researcher behind MiMo came from DeepSeek, and the architectural DNA shows.
What makes the MiMo family interesting isn’t any single model — it’s the system. Xiaomi is building an integrated AI stack for their hardware ecosystem, and they’re making the individual models available to developers along the way. Whether that strategy succeeds depends on execution, but the technical foundation is impressive.
Related: MiMo-V2-Pro vs Claude Opus 4.6
Related: MiMo-V2-Pro vs Claude vs GPT
Related: AI Model Comparison 2026