AI agents have amnesia. Every session starts from zero. The agent that brilliantly debugged your auth system yesterday has no memory of it today. Here are 4 patterns for fixing that.
Pattern 1: Conversation history (simplest)
Store the full conversation and replay it at the start of each session.
import json
def load_history(session_id):
try:
with open(f"history/{session_id}.json") as f:
return json.load(f)
except FileNotFoundError:
return []
def save_history(session_id, messages):
with open(f"history/{session_id}.json", "w") as f:
json.dump(messages, f)
# Resume a conversation
messages = load_history("project-auth-refactor")
messages.append({"role": "user", "content": "Continue where we left off"})
response = call_llm(messages)
messages.append({"role": "assistant", "content": response})
save_history("project-auth-refactor", messages)
Pros: Simple, preserves full context. Cons: Context window fills up fast. At 200K tokens, you get ~50 back-and-forth exchanges before hitting limits.
When to use: Short-lived tasks (1-3 sessions), debugging sessions, code reviews.
Pattern 2: Summarized memory
Periodically summarize old conversations and keep only the summary + recent messages.
def compress_history(messages, keep_recent=10):
if len(messages) <= keep_recent * 2:
return messages
old_messages = messages[:-keep_recent]
recent_messages = messages[-keep_recent:]
summary = call_llm([
{"role": "system", "content": "Summarize this conversation. Include key decisions, code changes, and unresolved issues."},
{"role": "user", "content": json.dumps(old_messages)}
])
return [
{"role": "system", "content": f"Previous conversation summary:\n{summary}"},
*recent_messages
]
Pros: Fits in any context window, preserves key information. Cons: Lossy β details get dropped during summarization.
When to use: Multi-day projects, ongoing agent tasks, the AI Startup Race agents.
Pattern 3: Vector store memory (semantic search)
Store every interaction as an embedding in a vector database. When the agent needs context, retrieve the most relevant past interactions.
from openai import OpenAI
import chromadb
client = OpenAI()
db = chromadb.PersistentClient(path="./agent_memory")
collection = db.get_or_create_collection("memories")
def remember(text, metadata=None):
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=text
).data[0].embedding
collection.add(
documents=[text],
embeddings=[embedding],
metadatas=[metadata or {}],
ids=[f"mem_{int(time.time())}"]
)
def recall(query, n_results=5):
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
results = collection.query(query_embeddings=[embedding], n_results=n_results)
return results["documents"][0]
Pros: Scales to thousands of memories, retrieves only relevant context. Cons: Requires embeddings infrastructure, retrieval quality varies.
When to use: Long-running agents, knowledge-heavy tasks, agents that need to reference past decisions.
See our RAG guide and vector database comparison for implementation details.
Pattern 4: Structured state (most reliable)
Store agent state as structured data (JSON, database rows) rather than natural language. The agent reads and writes specific fields.
# agent_state.json
{
"project": "auth-refactor",
"status": "in_progress",
"completed_tasks": [
"Migrated from JWT to session tokens",
"Added rate limiting to login endpoint"
],
"pending_tasks": [
"Add 2FA support",
"Write integration tests"
],
"decisions": {
"auth_method": "session tokens (chosen over JWT for revocation support)",
"rate_limit": "10 requests/minute per IP"
},
"known_issues": [
"Redis session store needs connection pooling"
]
}
Inject this state into the system prompt:
state = load_state("auth-refactor")
system_prompt = f"""You are working on the auth-refactor project.
Current state:
{json.dumps(state, indent=2)}
Continue from where you left off. Update the state file when you complete tasks or make decisions."""
Pros: Precise, no information loss, easy to inspect and debug. Cons: Requires defining the state schema upfront, agent must be trained to update it.
When to use: Production agents, multi-agent systems, any agent that runs for more than a few sessions.
Which pattern to use
| Scenario | Pattern | Why |
|---|---|---|
| Quick debugging session | Conversation history | Simple, full context |
| Multi-day coding project | Summarized memory | Fits context window |
| Knowledge assistant | Vector store | Semantic retrieval |
| Production agent | Structured state | Reliable, debuggable |
| Complex agent system | Structured state + vector store | Best of both |
Combining patterns
The most robust agents use multiple patterns:
def build_agent_context(project_id, current_task):
# 1. Structured state (always included)
state = load_state(project_id)
# 2. Relevant memories (semantic search)
memories = recall(current_task, n_results=3)
# 3. Recent conversation (last 5 exchanges)
recent = load_recent_history(project_id, limit=5)
system_prompt = f"""Project state:
{json.dumps(state, indent=2)}
Relevant past context:
{chr(10).join(memories)}
Continue working on: {current_task}"""
return [{"role": "system", "content": system_prompt}] + recent
This gives the agent: structured knowledge of the project, semantic recall of relevant past work, and recent conversation context.
Related: How to Build Multi-Agent Systems Β· Agent Orchestration Patterns Β· What is RAG? Β· What is a Vector Database? Β· Tool Calling Patterns