May 14, 2026 · 3 min read

Last updated on Apr 21, 2026

What is Kimi K2.5? Moonshot AI's Trillion-Parameter Model Explained

Kimi K2.5 is a 1-trillion-parameter AI model from Moonshot AI, a Chinese AI company. Despite its massive size, only 32 billion parameters activate per token — making it efficient to run. It’s MIT licensed (fully open source) and secretly powers Cursor’s Composer.

Key facts

1 trillion parameters (32B active) — one of the largest open models
Agent Swarm — coordinates up to 100 parallel AI agents for faster coding
256K context — understands large codebases in one pass
MIT license — free for any use, including commercial
Multimodal — understands text, images, and video natively
$0.60/1M tokens — 25x cheaper than Claude Opus

K2.6 is here

On April 20, 2026, Moonshot AI released Kimi K2.6, the successor to K2.5. Same architecture (1T/32B MoE), but with major upgrades:

Agent Swarm scales to 300 sub-agents (up from 100), executing 4,000 coordinated steps
80.2% on SWE-Bench Verified, matching Claude Opus 4.6
185% improvement on long-horizon coding tasks
Coding-driven design: generates production-ready UIs from prompts
Same pricing ($0.60/$3.00 per M tokens)

If you’re starting a new project, use K2.6. If you’re already on K2.5, the upgrade is seamless since the architecture is identical. See our K2.6 vs K2.5 comparison for the full breakdown.

Architecture

Kimi K2.5 uses a Mixture-of-Experts (MoE) architecture similar to DeepSeek V3 and Qwen 3.5. The 1 trillion total parameters are distributed across hundreds of expert networks, but each token only activates 32 billion parameters through a learned routing mechanism.

This design gives K2.5 the knowledge capacity of a trillion-parameter model while keeping inference costs comparable to a 32B dense model. The routing is dynamic — different types of tasks activate different expert combinations, allowing the model to specialize without separate fine-tuned versions.

Agent Swarm

Agent Swarm is Kimi K2.5’s signature feature. Instead of processing tasks sequentially like most AI coding tools, Agent Swarm spawns up to 100 parallel agents that work on different parts of a problem simultaneously.

For example, when refactoring a large codebase, Agent Swarm might assign separate agents to handle different modules, coordinate their changes to avoid conflicts, and merge the results. This parallelism can reduce complex multi-file tasks from minutes to seconds.

The Cursor connection

Developers discovered that Cursor’s Composer 2.0 uses Kimi K2.5 under the hood. If you’ve used Cursor, you’ve already used this model. The integration was initially undisclosed, which sparked debate about transparency in AI-powered developer tools.

Benchmarks

Benchmark	Kimi K2.5	Claude Opus 4.6	GPT-5.2
SWE-bench Verified	71.8%	80.9%	80.0%
AIME 2026	88.2%	93.3%	96.7%
HumanEval	92.1%	93.7%	94.2%
GPQA Diamond	79.4%	78.1%	78.8%

K2.5 trails the frontier closed models on coding benchmarks but competes strongly on reasoning and science tasks — especially impressive given its 25x lower cost.

How to use it

Kimi CLI — terminal coding agent with Agent Swarm support
Kimi API — direct API access at $0.60/1M tokens
OpenRouter — via unified API alongside 300+ other models
Aider — as a backend model for terminal-based coding
Cursor — built-in (Composer mode)

Pricing

Kimi K2.5 is one of the cheapest frontier-class models available:

Provider	Input (per 1M)	Output (per 1M)
Kimi API	$0.60	$2.00
OpenRouter	$0.60	$2.00

For comparison, Claude Opus 4.6 costs $15/$25 per million tokens — making K2.5 roughly 25x cheaper on input and 12x cheaper on output.

FAQ

Is Kimi K2.5 truly open source?

Yes, Kimi K2.5 is released under the MIT license, which is the most permissive open-source license available. You can use it commercially, modify it, and redistribute it with no restrictions — the same license used by DeepSeek.

Can I run Kimi K2.5 locally?

The full 1 trillion parameter model requires enterprise-grade hardware (multiple high-end GPUs with hundreds of GB of VRAM). However, quantized versions and the 32B active parameter design mean distilled variants can run on more modest setups. Most developers access it via API through OpenRouter or the Kimi API directly.

How does Agent Swarm differ from running multiple AI agents manually?

Agent Swarm is built into the model’s inference architecture — it’s not just spawning separate API calls. The agents share context and coordinate through a built-in orchestration layer, which means they avoid conflicting edits and can work on interdependent code without manual merge resolution.

Learn more

Kimi K2.5 Complete Guide — architecture, benchmarks, full details
Kimi CLI Complete Guide — terminal tool setup
Kimi K2.5 vs Claude vs GPT-5 — head-to-head comparison

What is Kimi K2.5? Moonshot AI's Trillion-Parameter Model Explained

Key facts

K2.6 is here

Architecture

Agent Swarm

The Cursor connection

Benchmarks

How to use it

Pricing

FAQ

Is Kimi K2.5 truly open source?

Can I run Kimi K2.5 locally?

How does Agent Swarm differ from running multiple AI agents manually?

Learn more

📬 AI Dev Weekly

You might also like

Kimi K2.6 Complete Guide — Open-Source Agentic Model With 300 Sub-Agents

Kimi K2.6 vs K2.5 — What Changed and Should You Upgrade?

Kimi K2.5 Complete Guide — The Trillion-Parameter Open-Source Model Explained

Qwen 3.7 Max vs Kimi K2.6: Reasoning King vs Agent Swarm Master (2026)