Multi-agent systems use multiple AI agents that collaborate on tasks. Instead of one model doing everything, specialized agents handle different parts of a workflow. Hereβs when this works, when it doesnβt, and how to build it.
When multi-agent makes sense
Good use cases:
- Different parts of a task need different expertise (research agent + writing agent + review agent)
- Tasks are parallelizable (Kimiβs Agent Swarm refactoring 50 files at once)
- You need different models for different subtasks (model routing at the agent level)
- Workflows span multiple systems (support agent β billing agent β shipping agent)
Bad use cases:
- Simple tasks that one model handles fine (donβt add complexity for no reason)
- Tasks with heavy dependencies between steps (agents spend more time coordinating than working)
- When you donβt have the infrastructure to manage multiple agents
Architecture patterns
Pattern 1: Sequential pipeline
Agent A (research) β Agent B (draft) β Agent C (review) β Output
Each agent processes the output of the previous one. Simple, predictable, easy to debug.
When to use: Content generation, data processing pipelines, code review workflows.
Pattern 2: Parallel fan-out
β Agent B1 (file1.ts)
Task β Agent A β Agent B2 (file2.ts) β Agent C (merge)
β Agent B3 (file3.ts)
A coordinator splits work across parallel agents, then merges results. This is what Kimiβs Agent Swarm does.
When to use: Batch refactoring, multi-file changes, independent subtasks.
Pattern 3: Hierarchical delegation
Manager Agent
βββ Research Agent (uses web search MCP)
βββ Coding Agent (uses filesystem MCP)
βββ Review Agent (uses git MCP)
A manager agent decides which specialist to delegate to based on the task. Each specialist has its own MCP tools.
When to use: Complex workflows where different steps need different tools and expertise.
Pattern 4: Peer collaboration (A2A)
Agent A ββ Agent B ββ Agent C
(each is independent, communicates via A2A protocol)
Agents from different vendors/teams communicate as peers using A2A. No central coordinator.
When to use: Cross-organization workflows, enterprise integrations.
The protocol stack
| Layer | Protocol | Purpose |
|---|---|---|
| Agent β Tools | MCP | Each agent accesses its tools |
| Agent β Agent | A2A | Agents communicate with each other |
| Orchestration | Your code | Manages the workflow |
See our MCP vs A2A comparison for when to use each.
Building it in practice
Simple: Sequential with different models
# Research with cheap model
research = call_llm("deepseek-chat", f"Research: {topic}")
# Draft with medium model
draft = call_llm("claude-sonnet-4.6", f"Write article based on: {research}")
# Review with best model
review = call_llm("claude-opus-4.6", f"Review and improve: {draft}")
This is multi-agent in the simplest form β different models for different steps. No framework needed.
Medium: Parallel with MCP
import asyncio
async def refactor_file(filepath, instructions):
"""Each 'agent' is an MCP-connected LLM call."""
content = await mcp_read_file(filepath)
refactored = await call_llm("claude-sonnet-4.6",
f"Refactor this file: {instructions}\n\n{content}")
await mcp_write_file(filepath, refactored)
# Fan out across files
files = ["src/auth.ts", "src/api.ts", "src/db.ts"]
await asyncio.gather(*[refactor_file(f, "Use dependency injection") for f in files])
Advanced: A2A delegation
For cross-system workflows, use the A2A protocol to delegate between specialized agents. This is enterprise-grade and requires more infrastructure.
Common pitfalls
-
Over-engineering β Most tasks donβt need multi-agent. Start with one agent, add more only when you hit limits.
-
Coordination overhead β Agents spend tokens communicating. If coordination costs exceed the benefit of parallelism, use a single agent.
-
Error cascading β One agentβs bad output becomes another agentβs bad input. Add validation between steps.
-
Cost multiplication β N agents = NΓ the API costs. Use cheap models for routine agents.
-
Debugging complexity β When something goes wrong, which agent caused it? Use observability with per-agent tracing.
Tools
| Tool | Multi-agent support |
|---|---|
| Kimi CLI | Agent Swarm (built-in) |
| LangGraph | Graph-based agent orchestration |
| CrewAI | Role-based multi-agent framework |
| MCP + custom code | DIY with full control |
For most developers, start with sequential pipelines using different models. Graduate to parallel execution when you have parallelizable tasks. Use frameworks like CrewAI or LangGraph only when your workflow is complex enough to justify the abstraction.
Related: Kimi Agent Swarm Deep Dive Β· MCP Complete Guide Β· What is A2A? Β· Tool Calling Patterns