Mar 20, 2026 · 6 min read

How to Use the MiMo-V2-Pro API: Get Started in 5 Minutes

Build It With AI — practical tutorials for developers who want to use AI tools effectively.

MiMo-V2-Pro is Xiaomi’s trillion-parameter agent model that just launched this week. It’s OpenAI-compatible, costs $1/$3 per million tokens (8x cheaper than Claude Opus on output), and ranks #3 globally on agent benchmarks. If you’ve used the OpenAI SDK before, you can be up and running in under 5 minutes.

Here’s how.

Option 1: Xiaomi’s MiMo API (direct)

Get an API key

Go to platform.xiaomimimo.com
Create an account
Generate an API key from the dashboard

There’s a free tier available during the launch period — Xiaomi hasn’t announced when it ends, so grab it while it lasts.

Install the OpenAI SDK

MiMo’s API is OpenAI-compatible, so you use the standard OpenAI SDK:

npm install openai

Or for Python:

pip install openai

Your first request (Node.js)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MIMO_API_KEY,
  baseURL: "https://api.xiaomimimo.com/v1",
});

const response = await client.chat.completions.create({
  model: "mimo-v2-pro",
  messages: [
    {
      role: "system",
      content: "You are a senior software engineer. Be concise and precise.",
    },
    {
      role: "user",
      content: "Write a TypeScript function that retries a fetch request with exponential backoff. Max 3 retries.",
    },
  ],
  max_tokens: 4096,
});

console.log(response.choices[0].message.content);

Your first request (Python)

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["MIMO_API_KEY"],
    base_url="https://api.xiaomimimo.com/v1",
)

response = client.chat.completions.create(
    model="mimo-v2-pro",
    messages=[
        {"role": "system", "content": "You are a senior software engineer. Be concise and precise."},
        {"role": "user", "content": "Write a Python function that retries a request with exponential backoff. Max 3 retries."},
    ],
    max_tokens=4096,
)

print(response.choices[0].message.content)

That’s it. If you’ve used the OpenAI API before, the only differences are the baseURL and the model name.

Option 2: OpenRouter (recommended for flexibility)

OpenRouter lets you access MiMo-V2-Pro alongside 300+ other models through a single API. This is what I’d recommend if you want to easily switch between MiMo, Claude, and GPT without changing your code.

Setup

Get an API key at openrouter.ai
Add credits (or use free models to test)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENROUTER_API_KEY,
  baseURL: "https://openrouter.ai/api/v1",
});

const response = await client.chat.completions.create({
  model: "xiaomi/mimo-v2-pro",
  messages: [
    { role: "user", content: "Explain the MoE architecture in 3 sentences." },
  ],
});

console.log(response.choices[0].message.content);

The advantage: swap xiaomi/mimo-v2-pro for anthropic/claude-opus-4.6 or openai/gpt-5.4 and everything else stays the same. Great for A/B testing models or building a routing layer.

Swapping MiMo into existing code

If you already have code that calls the OpenAI API, switching to MiMo takes two lines:

// Before (OpenAI)
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});
const model = "gpt-5.4";

// After (MiMo via OpenRouter)
const client = new OpenAI({
  apiKey: process.env.OPENROUTER_API_KEY,
  baseURL: "https://openrouter.ai/api/v1",
});
const model = "xiaomi/mimo-v2-pro";

Everything else — message format, streaming, function calling — works the same.

Using the 1M context window

MiMo-V2-Pro supports up to 1 million tokens of context. That’s roughly 750,000 words or an entire medium-sized codebase. Here’s how to use it for repo analysis:

import { readFileSync, readdirSync, statSync } from "fs";
import { join } from "path";

function readRepo(dir, extensions = [".ts", ".js", ".py"]) {
  let content = "";
  for (const file of readdirSync(dir, { recursive: true })) {
    const fullPath = join(dir, file);
    if (statSync(fullPath).isFile() && extensions.some((e) => file.endsWith(e))) {
      content += `\n--- ${file} ---\n${readFileSync(fullPath, "utf-8")}\n`;
    }
  }
  return content;
}

const repoContent = readRepo("./src");

const response = await client.chat.completions.create({
  model: "xiaomi/mimo-v2-pro",
  messages: [
    {
      role: "system",
      content: "You are a senior architect reviewing a codebase. Identify architectural issues, security concerns, and suggest improvements.",
    },
    {
      role: "user",
      content: `Here is the full codebase:\n\n${repoContent}\n\nProvide a detailed architectural review.`,
    },
  ],
  max_tokens: 8192,
});

Keep in mind: context above 256K tokens costs $2/$6 per million tokens instead of $1/$3. Still much cheaper than Opus’s long-context pricing ($10/$37.50).

Building a simple agent loop

MiMo-V2-Pro was designed for agent tasks. Here’s a minimal agent loop that can use tools:

const tools = [
  {
    type: "function",
    function: {
      name: "read_file",
      description: "Read the contents of a file",
      parameters: {
        type: "object",
        properties: { path: { type: "string", description: "File path" } },
        required: ["path"],
      },
    },
  },
  {
    type: "function",
    function: {
      name: "write_file",
      description: "Write content to a file",
      parameters: {
        type: "object",
        properties: {
          path: { type: "string", description: "File path" },
          content: { type: "string", description: "File content" },
        },
        required: ["path", "content"],
      },
    },
  },
];

async function runAgent(task) {
  const messages = [
    { role: "system", content: "You are a coding agent. Use the provided tools to complete tasks. Think step by step." },
    { role: "user", content: task },
  ];

  for (let i = 0; i < 10; i++) {
    const response = await client.chat.completions.create({
      model: "xiaomi/mimo-v2-pro",
      messages,
      tools,
    });

    const msg = response.choices[0].message;
    messages.push(msg);

    if (msg.tool_calls) {
      for (const call of msg.tool_calls) {
        const result = executeToolCall(call);
        messages.push({ role: "tool", tool_call_id: call.id, content: result });
      }
    } else {
      return msg.content; // Agent is done
    }
  }
}

function executeToolCall(call) {
  const args = JSON.parse(call.function.arguments);
  switch (call.function.name) {
    case "read_file":
      return readFileSync(args.path, "utf-8");
    case "write_file":
      writeFileSync(args.path, args.content);
      return `Wrote ${args.content.length} chars to ${args.path}`;
    default:
      return "Unknown tool";
  }
}

// Run it
const result = await runAgent(
  "Read package.json, identify outdated patterns, and create a IMPROVEMENTS.md file with recommendations."
);
console.log(result);

This is a simplified version, but it demonstrates MiMo’s strength: multi-step tool use with planning. In my testing, MiMo-V2-Pro handles 5-10 step tool chains reliably without losing track of the overall goal.

Tips from my testing

A few things I’ve learned using MiMo-V2-Pro this week:

Be explicit about output format. MiMo occasionally deviates from formatting instructions more than Claude does. Adding “Respond ONLY with valid JSON, no markdown” or “Use exactly this format:” helps a lot.

It’s great at planning, good at execution. For complex tasks, I’ve had better results asking MiMo to first output a plan, then execute each step, rather than doing everything in one shot. The planning quality is genuinely close to Opus.

Watch the 32K output limit. If you’re generating long files or comprehensive documentation, you’ll hit the ceiling faster than with Opus (128K) or GPT (64K). Split large generation tasks into chunks.

Streaming works well. The time-to-first-token is fast — noticeably faster than Opus in my experience. The MoE architecture helps here since only 42B parameters activate per token.

Use it for the bulk, escalate for the hard stuff. My current workflow: MiMo for 80% of API calls (routine code gen, analysis, planning), Claude Opus for the 20% that needs maximum quality (complex refactoring, critical code). Cuts costs by ~60%.

Pricing cheat sheet

Context	Input (per 1M tokens)	Output (per 1M tokens)
≤ 256K	$1.00	$3.00
256K – 1M	$2.00	$6.00

For comparison: Claude Opus is $5/$25, GPT-5.4 is $2.50/$15, DeepSeek V3.2 is $0.28/$1.10.

What’s next

MiMo-V2-Pro is brand new — the API launched this week. Expect rough edges, possible rate limit changes, and pricing adjustments as Xiaomi figures out demand. But the core model is solid, the OpenAI compatibility makes integration trivial, and the price-to-performance ratio is hard to ignore.

If you’re building AI-powered features and model cost is a real line item in your budget, MiMo-V2-Pro deserves a spot in your testing rotation.