Kimi K2.5 Complete Guide — The Trillion-Parameter Open-Source Model Explained
Kimi K2.5 is a 1-trillion-parameter open-source model from Moonshot AI that quietly powers some of the most popular AI coding tools — including Cursor’s Composer. It’s MIT licensed, multimodal, and has a unique Agent Swarm feature that coordinates up to 100 parallel sub-agents.
Here’s everything you need to know.
What is Kimi K2.5?
Kimi K2.5 is the flagship model from Moonshot AI, a Chinese AI company. Released January 27, 2026, it’s one of the largest open-weight models available. Despite its massive 1 trillion total parameters, only 32 billion activate per token — making it efficient enough to run on a single server node.
The model is natively multimodal: it understands text, images, and video without bolted-on adapters. It was trained on approximately 15 trillion mixed visual and text tokens.
Architecture
| Spec | Value |
|---|---|
| Total parameters | 1.04 trillion |
| Active parameters | 32B per token |
| Architecture | Mixture-of-Experts (384 experts, 8 active per token) |
| Context window | 256K tokens |
| Attention | Multi-Latent Attention (MLA) |
| Activation | SwiGLU |
| Training data | ~15 trillion tokens (text + visual) |
| License | MIT |
| Multimodal | Native (text, image, video) |
The MoE architecture with 384 experts is one of the largest expert pools in any model. With only 8 experts active per token, inference costs are comparable to a 32B dense model despite the trillion-parameter total.
Modes
Kimi K2.5 operates in four distinct modes:
Instant — Fast responses for simple queries. Minimal reasoning overhead, optimized for speed.
Thinking — Transparent chain-of-thought reasoning. Shows its work step by step, similar to DeepSeek’s reasoning models.
Agent — Tool-oriented mode for executing tasks. Can read files, run commands, search the web, and interact with APIs.
Agent Swarm — The headline feature. Coordinates up to 100 parallel sub-agents, cutting execution time by 4.5x on parallelizable tasks like batch refactoring and large-scale code generation.
Agent Swarm explained
Most AI coding tools work sequentially — one task at a time. Kimi K2.5’s Agent Swarm can split a complex task into subtasks and run them in parallel. For example:
- Refactoring 50 files? Spawn 50 sub-agents, one per file.
- Running tests across multiple modules? Parallelize them.
- Generating documentation for an entire codebase? Each sub-agent handles a module.
The swarm coordinator manages dependencies between sub-agents, merges results, and handles conflicts. In benchmarks, this achieves a 4.5x speedup on parallelizable tasks.
Benchmarks
Kimi K2.5 competes with frontier proprietary models:
| Benchmark | Kimi K2.5 | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
| SWE-Bench Verified | 65.8 | 72.1 | 69.3 |
| AIME 2024 | 77.5 | — | — |
| MATH-500 | 96.2 | — | — |
| Codeforces | 1950 Elo | — | — |
On coding benchmarks, K2.5 doesn’t quite match Claude Opus or GPT-5, but it’s remarkably close for an open-source model. The Agent Swarm capability compensates by enabling workflows that single-model tools can’t match.
The Cursor connection
In March 2026, developers discovered that Cursor’s Composer 2.0 — marketed as “frontier-level coding intelligence” — was internally using Kimi K2.5. The model identifier kimi-k2p5-rl-0317-s515-fast was found in Cursor’s code.
This means if you’ve used Cursor, you’ve already used Kimi K2.5. The model’s quality is proven at scale across millions of Cursor users.
Pricing
Kimi K2.5 is available through several channels:
| Access method | Cost |
|---|---|
| Self-hosted (MIT license) | Free (hardware only) |
| Kimi Code membership | ~$19/month + API fees |
| Kimi API | $0.60/$2.50 per 1M tokens (input/output) |
| OpenRouter | Varies by provider |
| Kimi CLI | Free tool, pay for API |
At $0.60/$2.50 per million tokens, Kimi K2.5 is 4-17x cheaper than GPT-5.4 for equivalent coding tasks.
How to use Kimi K2.5
Via Kimi CLI (terminal)
npm install -g @anthropic-ai/kimi-cli
kimi login --device-auth
kimi
See our full Kimi CLI guide for setup details.
Via API
from openai import OpenAI
client = OpenAI(
base_url="https://api.moonshot.cn/v1",
api_key="your-kimi-api-key"
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Refactor this function to use async/await"}],
temperature=0.3
)
Via OpenRouter
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="moonshot/kimi-k2.5",
messages=[{"role": "user", "content": "Write a REST API in Express"}]
)
See our OpenRouter guide for more details.
Self-hosting requirements
At 1 trillion parameters, self-hosting K2.5 requires serious hardware:
| Precision | Memory needed |
|---|---|
| FP16 | ~2TB |
| INT8 | ~1TB |
| 4-bit | ~250-300GB |
A 4-bit quantized version fits on 4x A100 80GB GPUs. For most developers, the API at $0.60/1M input tokens is more practical than self-hosting.
For smaller local models, consider Gemma 4 or Qwen 3.5 which run on consumer hardware.
Who should use Kimi K2.5?
Best for:
- Parallelizable coding tasks (Agent Swarm)
- Cost-conscious teams needing frontier-class quality
- Multimodal workflows (code + images + video)
- Teams wanting MIT-licensed model weights
Not ideal for:
- Consumer hardware (too large for local use)
- Tasks requiring the absolute best single-pass coding (Claude Opus still leads)
- Simple autocomplete (overkill — use Codestral or smaller models)
Bottom line
Kimi K2.5 is the most underrated model in AI. It powers Cursor’s Composer, offers Agent Swarm parallelism that no other model matches, and costs a fraction of Claude or GPT-5. The MIT license and 1T parameter scale make it a serious option for teams building AI-powered development tools.
Related: Kimi CLI Complete Guide · Kimi K2.5 vs Claude vs GPT-5 · Best Open-Source Coding Models 2026