πŸ€– AI Tools
Β· 3 min read
Last updated on

What is Kimi K2.5? Moonshot AI's Trillion-Parameter Model Explained


Kimi K2.5 is a 1-trillion-parameter AI model from Moonshot AI, a Chinese AI company. Despite its massive size, only 32 billion parameters activate per token β€” making it efficient to run. It’s MIT licensed (fully open source) and secretly powers Cursor’s Composer.

Key facts

  • 1 trillion parameters (32B active) β€” one of the largest open models
  • Agent Swarm β€” coordinates up to 100 parallel AI agents for faster coding
  • 256K context β€” understands large codebases in one pass
  • MIT license β€” free for any use, including commercial
  • Multimodal β€” understands text, images, and video natively
  • $0.60/1M tokens β€” 25x cheaper than Claude Opus

K2.6 is here

On April 20, 2026, Moonshot AI released Kimi K2.6, the successor to K2.5. Same architecture (1T/32B MoE), but with major upgrades:

  • Agent Swarm scales to 300 sub-agents (up from 100), executing 4,000 coordinated steps
  • 80.2% on SWE-Bench Verified, matching Claude Opus 4.6
  • 185% improvement on long-horizon coding tasks
  • Coding-driven design: generates production-ready UIs from prompts
  • Same pricing ($0.60/$3.00 per M tokens)

If you’re starting a new project, use K2.6. If you’re already on K2.5, the upgrade is seamless since the architecture is identical. See our K2.6 vs K2.5 comparison for the full breakdown.

Architecture

Kimi K2.5 uses a Mixture-of-Experts (MoE) architecture similar to DeepSeek V3 and Qwen 3.5. The 1 trillion total parameters are distributed across hundreds of expert networks, but each token only activates 32 billion parameters through a learned routing mechanism.

This design gives K2.5 the knowledge capacity of a trillion-parameter model while keeping inference costs comparable to a 32B dense model. The routing is dynamic β€” different types of tasks activate different expert combinations, allowing the model to specialize without separate fine-tuned versions.

Agent Swarm

Agent Swarm is Kimi K2.5’s signature feature. Instead of processing tasks sequentially like most AI coding tools, Agent Swarm spawns up to 100 parallel agents that work on different parts of a problem simultaneously.

For example, when refactoring a large codebase, Agent Swarm might assign separate agents to handle different modules, coordinate their changes to avoid conflicts, and merge the results. This parallelism can reduce complex multi-file tasks from minutes to seconds.

The Cursor connection

Developers discovered that Cursor’s Composer 2.0 uses Kimi K2.5 under the hood. If you’ve used Cursor, you’ve already used this model. The integration was initially undisclosed, which sparked debate about transparency in AI-powered developer tools.

Benchmarks

BenchmarkKimi K2.5Claude Opus 4.6GPT-5.2
SWE-bench Verified71.8%80.9%80.0%
AIME 202688.2%93.3%96.7%
HumanEval92.1%93.7%94.2%
GPQA Diamond79.4%78.1%78.8%

K2.5 trails the frontier closed models on coding benchmarks but competes strongly on reasoning and science tasks β€” especially impressive given its 25x lower cost.

How to use it

  • Kimi CLI β€” terminal coding agent with Agent Swarm support
  • Kimi API β€” direct API access at $0.60/1M tokens
  • OpenRouter β€” via unified API alongside 300+ other models
  • Aider β€” as a backend model for terminal-based coding
  • Cursor β€” built-in (Composer mode)

Pricing

Kimi K2.5 is one of the cheapest frontier-class models available:

ProviderInput (per 1M)Output (per 1M)
Kimi API$0.60$2.00
OpenRouter$0.60$2.00

For comparison, Claude Opus 4.6 costs $15/$25 per million tokens β€” making K2.5 roughly 25x cheaper on input and 12x cheaper on output.

FAQ

Is Kimi K2.5 truly open source?

Yes, Kimi K2.5 is released under the MIT license, which is the most permissive open-source license available. You can use it commercially, modify it, and redistribute it with no restrictions β€” the same license used by DeepSeek.

Can I run Kimi K2.5 locally?

The full 1 trillion parameter model requires enterprise-grade hardware (multiple high-end GPUs with hundreds of GB of VRAM). However, quantized versions and the 32B active parameter design mean distilled variants can run on more modest setups. Most developers access it via API through OpenRouter or the Kimi API directly.

How does Agent Swarm differ from running multiple AI agents manually?

Agent Swarm is built into the model’s inference architecture β€” it’s not just spawning separate API calls. The agents share context and coordinate through a built-in orchestration layer, which means they avoid conflicting edits and can work on interdependent code without manual merge resolution.

Learn more