Moonshot’s Kimi K2.7 Code is one of the most capable coding models available today — 1 trillion parameters with 32 billion active via MoE, 256K context, and interleaved thinking that actually works. The best part? You can access it through a standard OpenAI-compatible API, which means you can drop it into almost any tool or workflow you’re already using.
In this guide, I’ll walk you through setting up API access, making your first request, enabling thinking mode, using tool calling, and streaming responses. If you’re wondering whether to use the API or self-host locally, this article will help you decide.
Setting Up Your Moonshot Account
First, you need access to the Moonshot platform:
- Go to platform.moonshot.ai
- Create an account (email or GitHub login)
- Navigate to API Keys in your dashboard
- Generate a new API key
- Save it somewhere secure — you won’t see it again
The platform offers both free tier credits for testing and pay-as-you-go pricing for production use.
API Basics
Moonshot provides two compatible endpoint formats:
- OpenAI-compatible: Drop-in replacement for OpenAI’s API format
- Anthropic-compatible: For tools that expect Claude-style message formatting
The base URL for both:
https://api.moonshot.ai/v1
The model ID for K2.7 Code:
kimi-k2.7-code
Your First API Request
Let’s start with a simple curl request to verify everything works:
curl https://api.moonshot.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-d '{
"model": "kimi-k2.7-code",
"messages": [
{"role": "system", "content": "You are a senior software engineer. Write clean, well-documented code."},
{"role": "user", "content": "Write a TypeScript function that debounces an async function and cancels pending calls"}
],
"max_tokens": 2048,
"temperature": 0.7
}'
If you get a valid response with generated code, you’re good to go.
Python SDK Setup
For most developers, using Python with the OpenAI SDK is the most ergonomic approach:
pip install openai
from openai import OpenAI
client = OpenAI(
api_key="your-moonshot-api-key",
base_url="https://api.moonshot.ai/v1"
)
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are an expert Python developer."},
{"role": "user", "content": "Implement a thread-safe LRU cache with TTL expiry"}
],
max_tokens=4096,
temperature=0.7
)
print(response.choices[0].message.content)
That’s it. Because the API is OpenAI-compatible, you can use the official OpenAI Python SDK with just a different base_url and API key.
Enabling Thinking Mode
Kimi K2.7 Code supports “preserve thinking” — the model’s reasoning is retained across turns, giving it better coherence in multi-turn conversations. This is especially powerful for complex coding tasks where the model needs to track state across multiple interactions.
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "user", "content": "Design a pub/sub system in Go that supports wildcard topic matching"}
],
max_tokens=8192,
temperature=0.7,
extra_body={
"thinking": {
"enabled": True,
"preserve": True
}
}
)
if hasattr(response.choices[0].message, 'thinking'):
print("Reasoning:", response.choices[0].message.thinking)
print("Response:", response.choices[0].message.content)
With preserve thinking enabled, follow-up messages in the same conversation will benefit from the model’s previous reasoning — it doesn’t “forget” its analysis between turns.
Streaming Responses
For interactive applications or coding tools, streaming gives you token-by-token output:
stream = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "user", "content": "Write a Rust HTTP server with graceful shutdown handling"}
],
max_tokens=4096,
temperature=0.7,
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming works exactly like OpenAI’s streaming API — same SSE format, same delta structure.
Tool Calling
K2.7 Code supports interleaved thinking and multi-step tool calling, which makes it excellent for agentic workflows. Here’s how to define and use tools:
tools = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path to read"
}
},
"required": ["path"]
}
}
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Write content to a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "The file path"},
"content": {"type": "string", "description": "Content to write"}
},
"required": ["path", "content"]
}
}
}
]
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "user", "content": "Read the config.yaml file and add a new database connection pool setting"}
],
tools=tools,
tool_choice="auto",
max_tokens=4096
)
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
print(f"Tool: {tool_call.function.name}")
print(f"Args: {tool_call.function.arguments}")
The multi-step capability means K2.7 can chain multiple tool calls in a single turn — read a file, analyze it, then write a modified version — all without needing separate user messages in between. For a deeper understanding of how tool calling works, see our tool calling guide.
Multi-Turn Conversation Example
Here’s a more realistic example showing a multi-turn coding session:
from openai import OpenAI
client = OpenAI(
api_key="your-moonshot-api-key",
base_url="https://api.moonshot.ai/v1"
)
messages = [
{"role": "system", "content": "You are a senior full-stack developer helping with a FastAPI project."}
]
def chat(user_message):
messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=messages,
max_tokens=4096,
temperature=0.7,
extra_body={"thinking": {"enabled": True, "preserve": True}}
)
assistant_message = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
print(chat("I need to add WebSocket support to my FastAPI app for real-time notifications"))
print(chat("Now add authentication to the WebSocket connections using JWT"))
print(chat("Add a connection manager that handles disconnections gracefully"))
With preserve thinking, each response builds on the model’s accumulated understanding of your project structure.
Error Handling and Retries
Production code needs proper error handling:
from openai import OpenAI, APIError, RateLimitError, APITimeoutError
import time
client = OpenAI(
api_key="your-moonshot-api-key",
base_url="https://api.moonshot.ai/v1",
timeout=120.0,
max_retries=3
)
def robust_completion(messages, max_retries=3):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=messages,
max_tokens=4096,
temperature=0.7
)
return response.choices[0].message.content
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except APITimeoutError:
print(f"Timeout on attempt {attempt + 1}")
continue
except APIError as e:
print(f"API error: {e}")
raise
raise Exception("Max retries exceeded")
API vs Self-Hosting: When to Choose What
| Factor | API | Self-Hosted |
|---|---|---|
| Setup time | 5 minutes | 2-4 hours |
| Maintenance | Zero | Ongoing |
| Privacy | Data sent to Moonshot | Fully private |
| Cost (low volume) | Pay per token | High fixed cost |
| Cost (high volume) | Expensive | Predictable |
| Rate limits | Platform-imposed | None |
| Latency | Network + inference | Inference only |
For most individual developers and small teams, the API is the right choice. You get started instantly and only pay for what you use. If you’re processing sensitive code at scale, self-hosting makes more sense.
Using K2.7 with Coding Tools
The OpenAI-compatible API means K2.7 works with the entire ecosystem of coding tools:
- Aider: Set K2.7 as your model with a custom API base
- OpenCode: Configure as an OpenAI-compatible provider
- Kimi Code CLI: Purpose-built for K2.7, best experience
- Continue, Cursor, etc.: Any tool supporting custom OpenAI endpoints
For detailed setup instructions with these tools, check our K2.7 integration guide.
Comparison with K2.6 API
If you’ve been using the K2.6 API, here’s what’s different in K2.7 Code:
- Specialized for code: K2.7 Code is fine-tuned specifically for programming tasks
- Better tool calling: Multi-step tool calls in a single turn
- Preserve thinking: Reasoning state maintained across conversation turns
- MoonViT vision: Can process screenshots, diagrams, and images of code
- Same API format: Drop-in replacement — just change the model ID
FAQ
What’s the model ID for Kimi K2.7 Code in API calls?
The model ID is kimi-k2.7-code. Use this in the model field of your API requests. The base URL is https://api.moonshot.ai/v1 and the API format is OpenAI-compatible.
Does Kimi K2.7 Code API support function/tool calling?
Yes. K2.7 Code supports full OpenAI-style tool calling with multi-step execution. The model can chain multiple tool calls in a single response turn, making it excellent for agentic coding workflows. It also interleaves thinking with tool calls for better reasoning.
How does “preserve thinking” work in the API?
Preserve thinking forces the model to retain its reasoning chain across conversation turns. Enable it with extra_body={"thinking": {"enabled": True, "preserve": True}}. This means the model’s analysis from turn 1 informs its responses in turn 2, 3, and beyond — without you needing to repeat context.
Can I use the OpenAI Python SDK with Moonshot’s API?
Absolutely. Just set base_url="https://api.moonshot.ai/v1" and use your Moonshot API key. The API is fully OpenAI-compatible — chat completions, streaming, tool calling, all work identically. There’s also an Anthropic-compatible endpoint if your tools expect that format.
What’s the context window limit for K2.7 Code via API?
256K tokens. This is the full context window available through the API. You can send long codebases, full file contents, and lengthy conversation histories. Token counting follows the same conventions as other large models — roughly 4 characters per token for code.
Next Steps
Now that you have API access working, consider:
- Setting up Kimi Code CLI for the best K2.7 coding experience
- Integrating K2.7 into your existing workflow with Aider or Claude Code
- Reading the complete K2.7 Code guide for benchmarks and capabilities
- Exploring MCP integration for advanced tool setups