Most AI agent interactions last seconds to minutes. But the most valuable agent work — building features, refactoring codebases, running multi-step experiments — takes hours. GLM-5.1 demonstrated this with 8-hour autonomous coding sessions. Claude Code supports sessions with 1M token context windows.
Long-running agents face problems short sessions don’t: context degradation, state corruption, crash recovery, and cost accumulation. Here’s how to manage them.
The context degradation problem
Every AI model has a context window. As the session grows, three things happen:
- Quality drops: The model pays less attention to information in the middle of long contexts (the “lost in the middle” problem)
- Cost increases: Every new message includes the full conversation history as input tokens
- Latency grows: More input tokens = slower responses
A 4-hour session with Claude Code can easily accumulate 200,000+ tokens of context. At that point, the model is spending more time re-reading old context than doing new work.
Strategy 1: Proactive compaction
Don’t wait for the context to fill up. Compact proactively:
COMPACTION_THRESHOLD = 100_000 # tokens
async def maybe_compact(session):
if session.total_tokens > COMPACTION_THRESHOLD:
summary = await summarize_session(
session.messages,
focus="Keep technical decisions, file changes, and current task. Drop debugging attempts that didn't work."
)
session.replace_history(summary)
In Claude Code, use /compact with a focus instruction:
/compact Focus on the auth refactor. Drop the test debugging from earlier.
The key insight from Anthropic’s Thariq: compact before the model hits the context wall, when compaction quality is at its worst.
Strategy 2: Checkpoint and resume
Save state periodically so the agent can recover from crashes:
import json
from datetime import datetime
class AgentCheckpoint:
def __init__(self, agent_id, db):
self.agent_id = agent_id
self.db = db
async def save(self, state):
await self.db.execute("""
INSERT INTO checkpoints (agent_id, state, created_at)
VALUES ($1, $2, $3)
""", self.agent_id, json.dumps({
"task": state["current_task"],
"files_modified": state["files_modified"],
"decisions_made": state["decisions"],
"budget_spent": state["budget_spent"],
"conversation_summary": state["summary"],
}), datetime.utcnow())
async def restore(self):
row = await self.db.fetchone("""
SELECT state FROM checkpoints
WHERE agent_id = $1 ORDER BY created_at DESC LIMIT 1
""", self.agent_id)
return json.loads(row["state"]) if row else None
Save after every significant action: file write, deployment, test run. If the process crashes, restore from the last checkpoint and continue.
This is exactly how the agents in our AI Startup Race work — each agent checkpoints after every session so it can resume the next day.
Strategy 3: Subagent delegation
Offload intermediate work to subagents that run in their own context:
Main agent (lean context):
"Build the auth module"
├── Subagent 1: "Research JWT best practices" → returns summary
├── Subagent 2: "Write the middleware" → returns code
├── Subagent 3: "Write tests" → returns test results
└── Main agent: integrates results, moves to next task
Each subagent gets a fresh context window. Only the results come back to the main agent, not the intermediate steps. This keeps the main context lean.
Gemini CLI and Claude Code both support this pattern natively. The mental test: “Will I need the tool output again, or just the conclusion?” If just the conclusion, use a subagent.
Strategy 4: Task decomposition
Break long tasks into independent phases:
PHASES = [
{"name": "analysis", "prompt": "Analyze the codebase and create a plan", "max_tokens": 50_000},
{"name": "implementation", "prompt": "Implement the plan from the analysis phase", "max_tokens": 200_000},
{"name": "testing", "prompt": "Write and run tests for the implementation", "max_tokens": 100_000},
{"name": "review", "prompt": "Review the changes and fix any issues", "max_tokens": 50_000},
]
async def run_phased(agent, task, phases):
context = {"task": task}
for phase in phases:
# Start fresh context for each phase
result = await Runner.run(
agent,
f"Phase: {phase['name']}\nTask: {task}\nPrevious context: {json.dumps(context)}",
max_tokens=phase["max_tokens"],
)
# Carry forward only the essential context
context[phase["name"]] = summarize(result.final_output)
await checkpoint.save(context)
Each phase starts with a clean context plus a summary of previous phases. This prevents context degradation while maintaining continuity.
Strategy 5: Handoff notes
When switching sessions (end of day, context full, crash recovery), write explicit handoff notes:
Current state:
- Refactoring auth module (60% complete)
- Files modified: src/auth/middleware.ts, src/auth/jwt.ts
- Approach: replacing session-based auth with JWT
- Constraint: must maintain backward compatibility with v1 API
- Ruled out: OAuth2 (too complex for current scope)
- Next step: update the user model to store refresh tokens
- Blocked on: need to decide token expiration policy
In Claude Code, use /clear and paste this as the first message of the new session. It’s more work than /compact but gives the cleanest context.
Cost management for long sessions
An 8-hour session can cost $5-50 depending on the model and activity level:
| Model | 8-hour session (estimated) | Strategy |
|---|---|---|
| GPT-4o | $20-50 | Use for complex reasoning only |
| GPT-4o-mini | $2-5 | Good for routine coding |
| Claude Sonnet | $15-40 | Good balance |
| DeepSeek | $1-3 | Cheapest API option |
| Ollama local | $0 (electricity) | Free but slower |
The AI Startup Race agents use model routing: expensive models for planning and architecture, cheap models for routine implementation. See our cost management guide for detailed strategies.
When to end a session
End the session and start fresh when:
- Context exceeds 70% of the model’s window
- The agent starts repeating itself or going in circles
- You’re switching to a fundamentally different task
- Quality noticeably degrades (responses become vague or generic)
- The session has been running for 4+ hours without compaction
Don’t end the session when:
- You’re in the middle of a multi-step task (compact instead)
- The agent has important context about your codebase (save handoff notes first)
- You’re debugging a specific issue (the debugging context is valuable)
Related: GLM-5.1 Agentic Engineering · Agent Memory Patterns · AI Agent State Management · AI Agent Cost Management · How to Use Claude Code · Gemini CLI Subagents