Kimi K2.6 is Moonshot AI’s latest reasoning model, available through OpenAI-compatible API endpoints at platform.moonshot.ai. At $0.60 per million input tokens and $3.00 per million output tokens, it undercuts most frontier models by 5-25x while adding features K2.5 never had: instant mode, preserve_thinking for multi-turn reasoning, and experimental video input.
This guide covers everything you need to start making API calls. If you used the Kimi K2.5 API, the base setup is the same. The differences are in the model name and the new thinking controls.
Get your API key
- Go to platform.moonshot.ai
- Sign up with email or phone number
- Navigate to API Keys in the dashboard
- Generate a new key and copy it
Store the key in an environment variable:
export MOONSHOT_API_KEY="your-kimi-api-key"
Basic chat completion (Python)
Install the OpenAI Python library if you haven’t:
pip install openai
K2.6 defaults to thinking mode, which means the model reasons internally before responding. You can access the reasoning via reasoning_content on the response:
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.moonshot.ai/v1",
api_key=os.environ["MOONSHOT_API_KEY"]
)
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": "You are a senior backend engineer."},
{"role": "user", "content": "Design a rate limiter using a sliding window algorithm in Go."}
],
temperature=1.0,
top_p=0.95
)
# Access the final answer
print(response.choices[0].message.content)
# Access the reasoning trace (thinking mode only)
print(response.choices[0].message.reasoning_content)
The reasoning_content field contains the model’s chain-of-thought. This is useful for debugging, auditing, or building UIs that show the model’s reasoning process.
JavaScript/TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.moonshot.ai/v1',
apiKey: process.env.MOONSHOT_API_KEY,
});
const response = await client.chat.completions.create({
model: 'kimi-k2.6',
messages: [
{ role: 'user', content: 'Explain the difference between mutex and semaphore with code examples.' }
],
temperature: 1.0,
top_p: 0.95,
});
console.log(response.choices[0].message.content);
K2.6-specific features
These features are new in K2.6 and not available in K2.5.
Instant mode (disable thinking)
By default, K2.6 uses thinking mode where it reasons before answering. For latency-sensitive tasks where you don’t need chain-of-thought, disable it:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "user", "content": "Convert 72°F to Celsius."}
],
temperature=0.6,
top_p=0.95,
extra_body={
"thinking": {"type": "disabled"}
}
)
Instant mode is faster and cheaper since no reasoning tokens are generated. Use it for simple lookups, translations, formatting tasks, and anything that doesn’t benefit from step-by-step reasoning.
Preserve thinking (multi-turn reasoning)
This is the biggest K2.6 addition. In K2.5, reasoning context was discarded between turns. In K2.6, you can retain it:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "user", "content": "Analyze this database schema and suggest indexes."},
{"role": "assistant", "content": "Based on the query patterns..."},
{"role": "user", "content": "Now optimize the slowest query you identified."}
],
extra_body={
"thinking": {"type": "enabled", "keep": "all"}
}
)
With "keep": "all", the model retains its reasoning from previous turns. This matters for:
- Multi-step agent loops where context builds over time
- Debugging sessions where the model needs to remember what it already analyzed
- Complex coding tasks that span multiple back-and-forth exchanges
Without preserve_thinking, the model would re-derive context from scratch each turn, leading to inconsistencies and wasted tokens.
Recommended temperature settings
| Mode | Temperature | top_p | Best for |
|---|---|---|---|
| Thinking (default) | 1.0 | 0.95 | Coding, math, analysis, complex reasoning |
| Instant | 0.6 | 0.95 | Simple Q&A, formatting, translations |
Moonshot recommends these defaults. Lowering temperature in thinking mode can reduce reasoning quality.
Streaming
For real-time output, use streaming:
stream = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "user", "content": "Write a Python async web scraper with error handling."}
],
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
print(f"[thinking] {delta.reasoning_content}", end="")
if delta.content:
print(delta.content, end="")
When streaming with thinking mode, reasoning tokens arrive first, followed by the final answer.
Multimodal input
Image understanding
K2.6 accepts images as base64-encoded data or URLs:
import base64
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What does this architecture diagram show?"},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}}
]
}
]
)
Video input (experimental)
Video understanding is available through the official Moonshot API only (not through OpenRouter or third-party proxies). This feature is experimental and may change.
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize what happens in this video."},
{"type": "video_url", "video_url": {"url": "https://example.com/demo.mp4"}}
]
}
]
)
Pricing
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| Kimi K2.6 | $0.60 | $3.00 | Thinking + instant modes |
| Kimi K2.6 (cached input) | $0.10 - $0.15 | $3.00 | Automatic for repeated prefixes |
| GPT-5.4 | $2.50 | $15.00 | For comparison |
| Claude Opus 4.6 | $15.00 | $75.00 | For comparison |
K2.6 is 4x cheaper than GPT-5.4 on input and 5x cheaper on output. Compared to Opus 4.6, it’s 25x cheaper on both sides. Cached input pricing kicks in automatically when you send repeated system prompts or context prefixes.
For a full breakdown across providers, see the AI model comparison.
Integration with coding tools
Kimi CLI
The Kimi CLI uses K2.6 by default:
kimi "Refactor this function to use dependency injection"
Aider
export OPENAI_API_BASE=https://api.moonshot.ai/v1
export OPENAI_API_KEY=$MOONSHOT_API_KEY
aider --model kimi-k2.6
Cursor
In Cursor settings, add a custom model:
- API Base:
https://api.moonshot.ai/v1 - API Key: your Moonshot key
- Model:
kimi-k2.6
Alternative providers
OpenRouter
K2.6 is available on OpenRouter, which lets you use one API key across hundreds of models:
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="moonshot/kimi-k2.6",
messages=[{"role": "user", "content": "Hello"}]
)
Note: video input and some K2.6-specific features may not be available through OpenRouter.
Cloudflare Workers AI
K2.6 is also available on Cloudflare Workers AI for edge deployment. This is useful if you want low-latency inference close to your users without managing infrastructure.
Running locally
If you prefer self-hosting, K2.6 open weights are available for local deployment. See How to run Kimi K2.6 locally for setup instructions with Ollama and vLLM. Note that local deployments won’t have video input support.
FAQ
What is the Kimi K2.6 API pricing?
$0.60 per million input tokens and $3.00 per million output tokens. Cached inputs drop to $0.10-$0.15 per million tokens. This applies to both thinking and instant modes. Reasoning tokens in thinking mode count toward output tokens.
Is the Kimi K2.6 API compatible with OpenAI?
Yes. K2.6 uses OpenAI-compatible endpoints. You can use the official OpenAI Python or JavaScript SDK by changing the base_url to https://api.moonshot.ai/v1 and providing your Moonshot API key. The request and response formats match the OpenAI chat completions API.
What is preserve_thinking mode?
Preserve_thinking ("keep": "all") retains the model’s internal reasoning across multiple conversation turns. Without it, K2.6 discards its chain-of-thought between turns and starts fresh. With it enabled, the model can reference its previous reasoning, which improves consistency in multi-step tasks and agent loops. This feature is new in K2.6 and not available in K2.5.
Can I use Kimi K2.6 for image understanding?
Yes. K2.6 accepts images as base64-encoded data or URLs using the same multimodal message format as GPT-4o. Video input is also supported experimentally through the official Moonshot API, but not through third-party providers like OpenRouter.
Related: Kimi K2.6 complete guide · How to use Kimi K2.5 API · Kimi CLI complete guide · How to run Kimi K2.6 locally · AI model comparison · OpenRouter complete guide