An AI agent without state management is like a goldfish — every interaction starts from zero. The user explains their project, the agent helps, and next time the user comes back, the agent has forgotten everything. In production, this is unacceptable.
State management for AI agents is harder than for traditional apps because you’re managing three types of state simultaneously: conversation history (what was said), tool state (what was done), and world state (what changed in the environment).
The three types of agent state
1. Conversation state
The chat history between user and agent. This is what most people think of when they say “memory.”
# Simplest form: list of messages
conversation = [
{"role": "user", "content": "I'm building a SaaS with Stripe billing"},
{"role": "assistant", "content": "I'll help with that. What framework are you using?"},
{"role": "user", "content": "Next.js with the App Router"},
]
The challenge: conversation history grows with every turn. A 50-turn conversation can be 20,000+ tokens. At some point, you need to summarize or truncate.
2. Tool state
What the agent has done: files read, APIs called, code executed. This matters for multi-step tasks where the agent needs to remember what it already tried.
tool_state = {
"files_read": ["src/auth/middleware.ts", "src/lib/stripe.ts"],
"files_modified": ["src/auth/middleware.ts"],
"commands_run": ["npm test -- --grep auth"],
"test_results": {"passed": 12, "failed": 2},
"current_task": "fixing the 2 failing auth tests",
}
3. World state
The external environment: database contents, file system state, deployment status. This is the hardest to manage because it changes independently of the agent.
Implementation patterns
Pattern 1: Session-based (simplest)
Store conversation history per session. Each session is independent.
from agents.extensions.memory import SQLAlchemySession
# One session per user conversation
session = SQLAlchemySession(
url="postgresql://user:pass@localhost/agents",
session_id=f"user_{user_id}_session_{session_id}",
)
result = await Runner.run(agent, message, session=session)
# Session automatically persists conversation history
The OpenAI Agents SDK supports multiple session backends:
| Backend | Best for | Persistence | Speed |
|---|---|---|---|
| SQLAlchemy (PostgreSQL) | Production | ✅ Durable | Good |
| SQLAlchemy (SQLite) | Development | ✅ File-based | Fast |
| Redis | High-throughput | Configurable TTL | Fastest |
| Encrypted Session | Sensitive data | ✅ Encrypted at rest | Good |
| In-memory | Testing only | ❌ Lost on restart | Fastest |
Pattern 2: Sliding window
Keep only the last N messages in context. Older messages are stored but not sent to the model:
MAX_CONTEXT_MESSAGES = 20
async def get_context(session_id: str, new_message: str):
all_messages = await db.get_messages(session_id)
# Always include system prompt + last N messages
context = [all_messages[0]] # System prompt
context.extend(all_messages[-MAX_CONTEXT_MESSAGES:])
context.append({"role": "user", "content": new_message})
return context
Simple but lossy. The agent forgets details from early in the conversation.
Pattern 3: Summary + recent
Summarize older messages, keep recent ones in full:
async def get_context_with_summary(session_id: str):
all_messages = await db.get_messages(session_id)
if len(all_messages) > 30:
# Summarize older messages
old_messages = all_messages[1:-10] # Skip system prompt and last 10
summary = await summarize(old_messages)
return [
all_messages[0], # System prompt
{"role": "system", "content": f"Previous conversation summary: {summary}"},
*all_messages[-10:], # Last 10 messages in full
]
return all_messages
This is what Claude Code does with its /compact command — summarize the session, replace history with the summary, keep going.
Pattern 4: Checkpoints and recovery
For long-running agents (like those in our AI Startup Race), save checkpoints so the agent can recover from crashes:
import json
from datetime import datetime
async def save_checkpoint(agent_id: str, state: dict):
checkpoint = {
"agent_id": agent_id,
"timestamp": datetime.utcnow().isoformat(),
"conversation": state["conversation"],
"tool_state": state["tool_state"],
"current_task": state["current_task"],
"budget_remaining": state["budget_remaining"],
}
await db.execute(
"INSERT INTO checkpoints (agent_id, data) VALUES ($1, $2)",
agent_id, json.dumps(checkpoint)
)
async def restore_from_checkpoint(agent_id: str):
row = await db.fetchone(
"SELECT data FROM checkpoints WHERE agent_id = $1 ORDER BY id DESC LIMIT 1",
agent_id
)
if row:
return json.loads(row["data"])
return None
Save checkpoints after every significant action (file write, API call, deployment). If the agent crashes, restore from the last checkpoint instead of starting over.
Database schema
A practical schema for production agent state:
CREATE TABLE sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
agent_name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
metadata JSONB DEFAULT '{}'
);
CREATE TABLE messages (
id BIGSERIAL PRIMARY KEY,
session_id UUID REFERENCES sessions(id),
role TEXT NOT NULL, -- 'user', 'assistant', 'system', 'tool'
content TEXT NOT NULL,
tokens_used INTEGER,
cost_usd NUMERIC(10, 6),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE tool_calls (
id BIGSERIAL PRIMARY KEY,
session_id UUID REFERENCES sessions(id),
message_id BIGINT REFERENCES messages(id),
tool_name TEXT NOT NULL,
input JSONB,
output JSONB,
duration_ms INTEGER,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_messages_session ON messages(session_id, created_at);
CREATE INDEX idx_tool_calls_session ON tool_calls(session_id);
This gives you full audit trails: every message, every tool call, every cost. Essential for debugging agents and cost management.
Cross-session memory
Sometimes agents need to remember things across sessions — user preferences, project context, learned patterns:
async def get_user_context(user_id: str) -> str:
# Fetch persistent user facts
facts = await db.fetch(
"SELECT fact FROM user_memory WHERE user_id = $1 ORDER BY importance DESC LIMIT 10",
user_id
)
if facts:
return "What I know about this user:\n" + "\n".join(f"- {f['fact']}" for f in facts)
return ""
# Inject into system prompt
system_prompt = f"""You are a coding assistant.
{await get_user_context(user_id)}
Help the user with their current request."""
For deeper patterns, see our agent memory patterns guide.
State management anti-patterns
Don’t store full tool outputs in conversation history. A file read that returns 5,000 tokens shouldn’t be in the conversation forever. Store a summary: “Read src/auth.ts (450 lines, JWT middleware).”
Don’t rely on in-memory state in production. Your server will restart. Your process will crash. Always persist to disk or database.
Don’t send the entire conversation history every time. Use sliding windows or summaries. A 100-message conversation costs 50,000+ tokens per request just for context.
Don’t forget to clean up old sessions. Set TTLs on session data. A user who hasn’t interacted in 30 days doesn’t need their full conversation history in hot storage.
Related: Agent Memory Patterns · How to Debug AI Agents · AI Agent Cost Management · Deploy AI Agents to Production · OpenAI Agents SDK Guide · LLM Observability