Jun 12, 2026 · 7 min read

Kimi K2.7 Code Complete Guide: 1T Coding Agent That Beats Opus on Tool Use (2026)

Moonshot AI just dropped Kimi K2.7 Code today, and it’s a big deal. This is a 1 trillion parameter open-source coding model that actually beats Claude Opus 4.8 on tool use benchmarks. Let that sink in for a second — an open-source model outperforming one of the most expensive closed models on agentic coding tasks.

I’ve been tracking the Kimi line since K2.5 and through the impressive K2.6 release, and K2.7 Code represents a focused evolution: less about being everything to everyone, more about being the best open-source coding agent you can run.

Let me break down everything you need to know.

What Is Kimi K2.7 Code?

Kimi K2.7 Code is Moonshot AI’s latest large language model, released June 12, 2026 under a Modified MIT license. It’s built specifically for coding and agentic tasks — think code generation, debugging, tool use, and multi-step programming workflows.

The key stats:

1 trillion total parameters (Mixture of Experts)
32 billion activated per token (efficient inference)
256K token context window
384 experts, 8 selected + 1 shared per forward pass
MoonViT vision encoder (400M params) for multimodal input
61 layers with MLA attention and SwiGLU activation

It’s available on HuggingFace, the Moonshot API, and ModelScope. You can run it locally with vLLM, SGLang, or Docker Model Runner.

Architecture Deep Dive

K2.7 Code uses a Mixture of Experts (MoE) architecture, which is why it can have 1T total parameters while only activating 32B per token. This gives you frontier-level intelligence at a fraction of the compute cost of a dense model with equivalent capabilities.

Here’s how the expert routing works:

384 total experts across the model
8 experts selected per token based on learned routing
1 shared expert always active (handles common patterns)
MLA (Multi-Latent Attention) for efficient KV-cache compression
SwiGLU activation function (smoother gradients than ReLU)
61 transformer layers deep

The MLA attention mechanism is particularly clever — it compresses the key-value cache using learned latent projections, which means you can actually fit that 256K context window in memory without needing an absurd amount of VRAM.

For those wanting to run it locally, there’s a native INT4 quantization available that significantly reduces memory requirements while maintaining most of the model’s capability.

What Changed from K2.6

If you’ve been using K2.6, here’s what’s different:

30% Fewer Thinking Tokens

K2.7 Code uses 30% fewer thinking tokens than K2.6 to reach the same conclusions. That’s not a marginal improvement — it means faster responses, lower costs, and less wasted compute on reasoning overhead.

Coding-Focused Agentic Fine-Tuning

While K2.6 was a generalist with agent swarm capabilities, K2.7 Code is laser-focused on coding tasks. The fine-tuning pipeline specifically targeted:

Code generation and completion
Multi-file editing workflows
Tool calling and MCP integration
Debugging and refactoring patterns

Preserve Thinking Mode

This is a genuinely novel feature. In “Preserve Thinking” mode, K2.7 Code maintains its reasoning chain across multiple conversation turns. Most models reset their internal reasoning with each new message — K2.7 keeps the thread, which means it doesn’t lose context about why it made certain decisions in complex multi-step coding tasks.

+21.8% on Kimi Code Bench

The improvement over K2.6 on Moonshot’s own coding benchmark is dramatic: from 50.9 to 62.0, a 21.8% jump. That’s not incremental — that’s a generational leap in coding capability.

Benchmark Performance

Here’s how K2.7 Code stacks up against the competition:

Benchmark	K2.6	K2.7 Code	GPT-5.5	Opus 4.8
Kimi Code Bench v2	50.9	62.0	69.0	67.4
Program Bench	48.3	53.6	69.1	63.8
MLS Bench Lite	26.7	35.1	35.4	81.3
MCP Mark Verified	72.8	81.1	92.9	76.4

The headline number: K2.7 Code scores 81.1% on MCPMark Verified, beating Claude Opus 4.8’s 76.4%. That means it’s better at using tools via MCP than a model that costs $5/$25 per million tokens.

On Kimi Code Bench v2, the gap to GPT-5.5 narrowed from 18 points (K2.6 era) to just 7 points. Open-source is catching up fast.

Where it still lags: MLS Bench Lite (inventing novel ML methods) shows Opus 4.8 at 81.3% vs K2.7’s 35.1%. For pure research creativity, the closed models still dominate.

How to Access Kimi K2.7 Code

Moonshot API

The easiest path. Sign up at kimi.com, grab an API key, and you’re running. Pricing is similar to K2.6 — the Moderato plan runs about $19/month, or you can pay per token.

HuggingFace

The full model weights are at moonshotai/Kimi-K2.7-Code. Download and run locally with your preferred framework.

Kimi Code CLI

K2.7 Code works best with the Kimi Code CLI available at kimi.com/code. This gives you the full agentic coding experience with native MCP tool integration.

Self-Hosting Options

vLLM: Full support with tensor parallelism
SGLang: Optimized for high-throughput serving
Docker Model Runner: Containerized deployment

For local setup guidance, check our how to run Kimi locally guide — the process is similar for K2.7.

Pricing

K2.7 Code follows the same pricing structure as K2.6:

Moderato Plan: ~$19/month for generous usage
Pay-per-token: Competitive rates similar to other open-source model APIs
Self-hosted: Free (you pay for compute only)

Compare that to Claude Opus 4.8 at $5/$25 per million tokens or GPT-5.5 at ~$5/$15. For coding-focused work where K2.7 matches or beats these models, the economics are compelling.

Who Is Kimi K2.7 Code For?

Perfect for:

Developers who want an open-source coding agent they can self-host
Teams building MCP-integrated coding workflows
Anyone who needs strong tool use without paying Opus/GPT prices
Developers working within agentic coding pipelines

Maybe not ideal for:

Pure ML research tasks (MLS Bench shows weakness here)
General-purpose chat (K2.6 is still better for that)
Teams that need 1M context (stuck at 256K vs Opus’s 1M)
Production environments needing the absolute best code quality regardless of cost

The Preserve Thinking Innovation

Let me expand on this because it’s genuinely interesting. Traditional LLMs treat each turn independently — the model’s chain-of-thought doesn’t carry over. K2.7 Code’s Preserve Thinking mode forces the reasoning to persist.

In practice, this means:

Turn 1: You describe a complex refactoring task
Turn 2: You ask for the next file — the model remembers its reasoning about the architecture decisions it made
Turn 3: You point out an edge case — it can reason about how that affects its prior decisions

This is huge for multi-step coding workflows where context about why matters as much as what.

Running K2.7 Code Locally

With native INT4 quantization, you can run K2.7 Code on hardware that would be impractical for the full-precision model. The 32B activated parameters mean inference is comparable to running a 32B dense model — manageable on high-end consumer GPUs or small clusters.

For deployment guides, see our local running guide and API setup walkthrough.

Frequently Asked Questions

Is Kimi K2.7 Code really open-source?

Yes, under a Modified MIT license. You can download the weights from HuggingFace, self-host, fine-tune, and use commercially. The “modified” part typically involves attribution requirements and some usage restrictions, but for most developer use cases it’s functionally open-source.

How does K2.7 Code compare to DeepSeek V4 Pro?

Both are open-source MoE models from Chinese AI labs. DeepSeek V4 Pro has a higher SWE-bench score (~85%) and is cheaper per token ($0.44/$0.87 per M), but K2.7 Code excels on tool use (MCPMark) and has a larger context window (256K vs 128K). See our full comparison.

Can I run K2.7 Code on my local machine?

With the INT4 quantized version, yes — if you have sufficient VRAM. The 32B activated parameters make inference similar to dense 32B models. You’ll want at least 24GB VRAM for comfortable inference, or multiple GPUs for the full-precision model. vLLM, SGLang, and Docker Model Runner are all supported.

Should I upgrade from K2.6 to K2.7 Code?

If your primary use case is coding, absolutely. The 21.8% improvement on coding benchmarks and 30% reduction in thinking tokens make it a clear upgrade. If you use K2.6 for general-purpose tasks, multimodal work, or agent swarms, keep K2.6 for those — K2.7 is a coding specialist.

What is Preserve Thinking mode?

It’s K2.7 Code’s approach to maintaining reasoning chains across multiple conversation turns. Instead of resetting the internal chain-of-thought with each message, the model preserves its reasoning context, leading to more coherent multi-step coding workflows.

How does K2.7 Code beat Opus 4.8 on tool use but score lower overall?

MCPMark specifically tests the model’s ability to correctly invoke tools via the Model Context Protocol. K2.7 Code’s agentic fine-tuning focused heavily on tool calling patterns. On other benchmarks that test raw code generation or novel problem-solving, Opus 4.8’s larger effective compute still wins. Different benchmarks test different capabilities.

Bottom Line

Kimi K2.7 Code is the best open-source coding agent available today for tool-integrated development workflows. It won’t beat Claude Fable 5 on raw SWE-bench scores, and GPT-5.5 still edges it on pure code generation. But for the price (free to self-host, ~$19/mo for API) and the openness (Modified MIT), it’s an incredible value proposition.

The fact that an open-source model now beats Opus 4.8 on MCP tool use is a watershed moment. The gap between open and closed is narrowing faster than anyone expected.

AI API Pricing Compared 2026