🤖 AI Tools
· 6 min read

How to Use the Kimi K2.6 API — Setup, Pricing, and Code Examples


Kimi K2.6 is Moonshot AI’s latest reasoning model, available through OpenAI-compatible API endpoints at platform.moonshot.ai. At $0.60 per million input tokens and $3.00 per million output tokens, it undercuts most frontier models by 5-25x while adding features K2.5 never had: instant mode, preserve_thinking for multi-turn reasoning, and experimental video input.

This guide covers everything you need to start making API calls. If you used the Kimi K2.5 API, the base setup is the same. The differences are in the model name and the new thinking controls.

Get your API key

  1. Go to platform.moonshot.ai
  2. Sign up with email or phone number
  3. Navigate to API Keys in the dashboard
  4. Generate a new key and copy it

Store the key in an environment variable:

export MOONSHOT_API_KEY="your-kimi-api-key"

Basic chat completion (Python)

Install the OpenAI Python library if you haven’t:

pip install openai

K2.6 defaults to thinking mode, which means the model reasons internally before responding. You can access the reasoning via reasoning_content on the response:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.moonshot.ai/v1",
    api_key=os.environ["MOONSHOT_API_KEY"]
)

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "system", "content": "You are a senior backend engineer."},
        {"role": "user", "content": "Design a rate limiter using a sliding window algorithm in Go."}
    ],
    temperature=1.0,
    top_p=0.95
)

# Access the final answer
print(response.choices[0].message.content)

# Access the reasoning trace (thinking mode only)
print(response.choices[0].message.reasoning_content)

The reasoning_content field contains the model’s chain-of-thought. This is useful for debugging, auditing, or building UIs that show the model’s reasoning process.

JavaScript/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.moonshot.ai/v1',
  apiKey: process.env.MOONSHOT_API_KEY,
});

const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [
    { role: 'user', content: 'Explain the difference between mutex and semaphore with code examples.' }
  ],
  temperature: 1.0,
  top_p: 0.95,
});

console.log(response.choices[0].message.content);

K2.6-specific features

These features are new in K2.6 and not available in K2.5.

Instant mode (disable thinking)

By default, K2.6 uses thinking mode where it reasons before answering. For latency-sensitive tasks where you don’t need chain-of-thought, disable it:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "user", "content": "Convert 72°F to Celsius."}
    ],
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "thinking": {"type": "disabled"}
    }
)

Instant mode is faster and cheaper since no reasoning tokens are generated. Use it for simple lookups, translations, formatting tasks, and anything that doesn’t benefit from step-by-step reasoning.

Preserve thinking (multi-turn reasoning)

This is the biggest K2.6 addition. In K2.5, reasoning context was discarded between turns. In K2.6, you can retain it:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "user", "content": "Analyze this database schema and suggest indexes."},
        {"role": "assistant", "content": "Based on the query patterns..."},
        {"role": "user", "content": "Now optimize the slowest query you identified."}
    ],
    extra_body={
        "thinking": {"type": "enabled", "keep": "all"}
    }
)

With "keep": "all", the model retains its reasoning from previous turns. This matters for:

  • Multi-step agent loops where context builds over time
  • Debugging sessions where the model needs to remember what it already analyzed
  • Complex coding tasks that span multiple back-and-forth exchanges

Without preserve_thinking, the model would re-derive context from scratch each turn, leading to inconsistencies and wasted tokens.

ModeTemperaturetop_pBest for
Thinking (default)1.00.95Coding, math, analysis, complex reasoning
Instant0.60.95Simple Q&A, formatting, translations

Moonshot recommends these defaults. Lowering temperature in thinking mode can reduce reasoning quality.

Streaming

For real-time output, use streaming:

stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "user", "content": "Write a Python async web scraper with error handling."}
    ],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        print(f"[thinking] {delta.reasoning_content}", end="")
    if delta.content:
        print(delta.content, end="")

When streaming with thinking mode, reasoning tokens arrive first, followed by the final answer.

Multimodal input

Image understanding

K2.6 accepts images as base64-encoded data or URLs:

import base64

with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What does this architecture diagram show?"},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}}
            ]
        }
    ]
)

Video input (experimental)

Video understanding is available through the official Moonshot API only (not through OpenRouter or third-party proxies). This feature is experimental and may change.

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize what happens in this video."},
                {"type": "video_url", "video_url": {"url": "https://example.com/demo.mp4"}}
            ]
        }
    ]
)

Pricing

ModelInput (per 1M tokens)Output (per 1M tokens)Notes
Kimi K2.6$0.60$3.00Thinking + instant modes
Kimi K2.6 (cached input)$0.10 - $0.15$3.00Automatic for repeated prefixes
GPT-5.4$2.50$15.00For comparison
Claude Opus 4.6$15.00$75.00For comparison

K2.6 is 4x cheaper than GPT-5.4 on input and 5x cheaper on output. Compared to Opus 4.6, it’s 25x cheaper on both sides. Cached input pricing kicks in automatically when you send repeated system prompts or context prefixes.

For a full breakdown across providers, see the AI model comparison.

Integration with coding tools

Kimi CLI

The Kimi CLI uses K2.6 by default:

kimi "Refactor this function to use dependency injection"

Aider

export OPENAI_API_BASE=https://api.moonshot.ai/v1
export OPENAI_API_KEY=$MOONSHOT_API_KEY
aider --model kimi-k2.6

Cursor

In Cursor settings, add a custom model:

  • API Base: https://api.moonshot.ai/v1
  • API Key: your Moonshot key
  • Model: kimi-k2.6

Alternative providers

OpenRouter

K2.6 is available on OpenRouter, which lets you use one API key across hundreds of models:

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

response = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "Hello"}]
)

Note: video input and some K2.6-specific features may not be available through OpenRouter.

Cloudflare Workers AI

K2.6 is also available on Cloudflare Workers AI for edge deployment. This is useful if you want low-latency inference close to your users without managing infrastructure.

Running locally

If you prefer self-hosting, K2.6 open weights are available for local deployment. See How to run Kimi K2.6 locally for setup instructions with Ollama and vLLM. Note that local deployments won’t have video input support.

FAQ

What is the Kimi K2.6 API pricing?

$0.60 per million input tokens and $3.00 per million output tokens. Cached inputs drop to $0.10-$0.15 per million tokens. This applies to both thinking and instant modes. Reasoning tokens in thinking mode count toward output tokens.

Is the Kimi K2.6 API compatible with OpenAI?

Yes. K2.6 uses OpenAI-compatible endpoints. You can use the official OpenAI Python or JavaScript SDK by changing the base_url to https://api.moonshot.ai/v1 and providing your Moonshot API key. The request and response formats match the OpenAI chat completions API.

What is preserve_thinking mode?

Preserve_thinking ("keep": "all") retains the model’s internal reasoning across multiple conversation turns. Without it, K2.6 discards its chain-of-thought between turns and starts fresh. With it enabled, the model can reference its previous reasoning, which improves consistency in multi-step tasks and agent loops. This feature is new in K2.6 and not available in K2.5.

Can I use Kimi K2.6 for image understanding?

Yes. K2.6 accepts images as base64-encoded data or URLs using the same multimodal message format as GPT-4o. Video input is also supported experimentally through the official Moonshot API, but not through third-party providers like OpenRouter.

Related: Kimi K2.6 complete guide · How to use Kimi K2.5 API · Kimi CLI complete guide · How to run Kimi K2.6 locally · AI model comparison · OpenRouter complete guide