Youβve used Claude Code or Aider. They feel like magic. But under the hood, every AI agent is a surprisingly simple loop. Hereβs exactly how it works.
The core loop
Every agent β from a simple chatbot with tools to Devin β runs this loop:
def agent(goal, tools, model="claude-sonnet-4.6"):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": goal}
]
while True:
# 1. Send messages + available tools to LLM
response = call_llm(model, messages, tools=tools)
# 2. If LLM wants to use a tool, execute it
if response.tool_calls:
for tool_call in response.tool_calls:
result = execute_tool(tool_call.name, tool_call.arguments)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
# 3. If LLM returns text (no tool calls), it's done
else:
return response.content
Thatβs 20 lines. The rest is error handling, memory management, and guardrails.
Step 1: Tool definition
Tools are described to the LLM as JSON schemas. The model reads these descriptions and decides when to use each tool:
tools = [{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file at the given path",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or relative file path"
}
},
"required": ["path"]
}
}
}, {
"type": "function",
"function": {
"name": "write_file",
"description": "Write content to a file. Creates the file if it doesn't exist.",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
}
}, {
"type": "function",
"function": {
"name": "run_command",
"description": "Execute a shell command and return stdout/stderr",
"parameters": {
"type": "object",
"properties": {
"command": {"type": "string"}
},
"required": ["command"]
}
}
}]
The quality of tool descriptions directly affects agent performance. Vague descriptions lead to wrong tool choices. See our tool calling patterns guide.
Step 2: The LLM decides
When the LLM receives messages + tool definitions, it outputs either:
A) A tool call (structured JSON):
{
"tool_calls": [{
"id": "call_abc123",
"function": {
"name": "read_file",
"arguments": "{\"path\": \"src/main.py\"}"
}
}]
}
B) A text response (the agent is done):
{
"content": "I've fixed the bug. The issue was a missing null check on line 42."
}
The model makes this decision based on the conversation history, the goal, and the available tools. This is where model quality matters β frontier models (Claude, GPT-5) make better decisions than smaller models.
Step 3: Tool execution
Your code executes the tool and returns the result:
def execute_tool(name, arguments):
args = json.loads(arguments)
if name == "read_file":
with open(args["path"]) as f:
return f.read()
elif name == "write_file":
with open(args["path"], "w") as f:
f.write(args["content"])
return f"Written {len(args['content'])} bytes to {args['path']}"
elif name == "run_command":
result = subprocess.run(args["command"], shell=True, capture_output=True, text=True)
return f"stdout: {result.stdout}\nstderr: {result.stderr}\nexit code: {result.returncode}"
The tool result goes back into the conversation as a βtoolβ message. The LLM reads it and decides the next action.
Step 4: The loop continues
The LLM sees the tool result and decides:
- Need more info? β Call another tool
- Need to modify something? β Call write_file or run_command
- Task complete? β Return a text response (loop ends)
A typical coding task might look like:
User: "Fix the failing test in auth.test.js"
Agent: [calls run_command("npm test")] β sees test failure
Agent: [calls read_file("src/auth.js")] β reads the source
Agent: [calls read_file("tests/auth.test.js")] β reads the test
Agent: [calls write_file("src/auth.js", ...)] β fixes the bug
Agent: [calls run_command("npm test")] β verifies fix passes
Agent: "Fixed. The issue was..." β done
6 tool calls, 1 loop. Thatβs how Claude Code works.
What makes agents fail
The model makes a bad decision
The LLM picks the wrong tool, passes wrong arguments, or misunderstands the goal. This is the most common failure and the hardest to fix β itβs a model quality issue.
Mitigation: Use better models, improve tool descriptions, add examples to the system prompt.
Context window overflow
After many tool calls, the conversation gets too long. The model starts βforgettingβ earlier context.
Mitigation: Summarize old context, limit tool output length, set maximum session length. See our context management guide.
Infinite loops
The agent tries the same action repeatedly because it doesnβt recognize failure.
Mitigation: Track action history, detect repetition, force alternative approaches after 3 retries. See our debugging guide.
Tool execution errors
The tool crashes, returns unexpected output, or times out.
Mitigation: Wrap tool execution in try/catch, return clear error messages, set timeouts.
MCP: standardized tool access
MCP (Model Context Protocol) standardizes how agents discover and use tools. Instead of defining tools in your code, MCP servers expose tools via a protocol:
Agent ββ MCP Client ββ MCP Server (filesystem)
ββ MCP Server (GitHub)
ββ MCP Server (database)
This means the same tools work across Claude Code, Cursor, and any MCP-compatible client. See our MCP vs function calling comparison.
Building your first agent
Start simple. This 30-line agent can read files, run commands, and fix bugs:
import json, subprocess
from anthropic import Anthropic
client = Anthropic()
TOOLS = [...] # Tool definitions from above
def agent(goal):
messages = [{"role": "user", "content": goal}]
for _ in range(20): # Max 20 steps
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
tools=TOOLS,
messages=messages,
)
if response.stop_reason == "tool_use":
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, json.dumps(block.input))
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}]})
else:
return response.content[0].text
print(agent("Read main.py and suggest improvements"))
From here, add memory, security, cost limits, and observability as needed.
Related: What is an AI Agent? Β· How to Build Multi-Agent Systems Β· Agent Memory Patterns Β· How to Debug AI Agents Β· Best AI Agent Frameworks Β· Tool Calling Patterns