Apr 16, 2026 · 4 min read

Long-Running AI Agents: Managing 8-Hour Coding Sessions (2026)

Most AI agent interactions last seconds to minutes. But the most valuable agent work — building features, refactoring codebases, running multi-step experiments — takes hours. GLM-5.1 demonstrated this with 8-hour autonomous coding sessions. Claude Code supports sessions with 1M token context windows.

Long-running agents face problems short sessions don’t: context degradation, state corruption, crash recovery, and cost accumulation. Here’s how to manage them.

The context degradation problem

Every AI model has a context window. As the session grows, three things happen:

Quality drops: The model pays less attention to information in the middle of long contexts (the “lost in the middle” problem)
Cost increases: Every new message includes the full conversation history as input tokens
Latency grows: More input tokens = slower responses

A 4-hour session with Claude Code can easily accumulate 200,000+ tokens of context. At that point, the model is spending more time re-reading old context than doing new work.

Strategy 1: Proactive compaction

Don’t wait for the context to fill up. Compact proactively:

COMPACTION_THRESHOLD = 100_000  # tokens

async def maybe_compact(session):
    if session.total_tokens > COMPACTION_THRESHOLD:
        summary = await summarize_session(
            session.messages,
            focus="Keep technical decisions, file changes, and current task. Drop debugging attempts that didn't work."
        )
        session.replace_history(summary)

In Claude Code, use /compact with a focus instruction:

/compact Focus on the auth refactor. Drop the test debugging from earlier.

The key insight from Anthropic’s Thariq: compact before the model hits the context wall, when compaction quality is at its worst.

Strategy 2: Checkpoint and resume

Save state periodically so the agent can recover from crashes:

import json
from datetime import datetime

class AgentCheckpoint:
    def __init__(self, agent_id, db):
        self.agent_id = agent_id
        self.db = db
    
    async def save(self, state):
        await self.db.execute("""
            INSERT INTO checkpoints (agent_id, state, created_at)
            VALUES ($1, $2, $3)
        """, self.agent_id, json.dumps({
            "task": state["current_task"],
            "files_modified": state["files_modified"],
            "decisions_made": state["decisions"],
            "budget_spent": state["budget_spent"],
            "conversation_summary": state["summary"],
        }), datetime.utcnow())
    
    async def restore(self):
        row = await self.db.fetchone("""
            SELECT state FROM checkpoints 
            WHERE agent_id = $1 ORDER BY created_at DESC LIMIT 1
        """, self.agent_id)
        return json.loads(row["state"]) if row else None

Save after every significant action: file write, deployment, test run. If the process crashes, restore from the last checkpoint and continue.

This is exactly how the agents in our AI Startup Race work — each agent checkpoints after every session so it can resume the next day.

Strategy 3: Subagent delegation

Offload intermediate work to subagents that run in their own context:

Main agent (lean context):
  "Build the auth module"
    ├── Subagent 1: "Research JWT best practices" → returns summary
    ├── Subagent 2: "Write the middleware" → returns code
    ├── Subagent 3: "Write tests" → returns test results
    └── Main agent: integrates results, moves to next task

Each subagent gets a fresh context window. Only the results come back to the main agent, not the intermediate steps. This keeps the main context lean.

Gemini CLI and Claude Code both support this pattern natively. The mental test: “Will I need the tool output again, or just the conclusion?” If just the conclusion, use a subagent.

Strategy 4: Task decomposition

Break long tasks into independent phases:

PHASES = [
    {"name": "analysis", "prompt": "Analyze the codebase and create a plan", "max_tokens": 50_000},
    {"name": "implementation", "prompt": "Implement the plan from the analysis phase", "max_tokens": 200_000},
    {"name": "testing", "prompt": "Write and run tests for the implementation", "max_tokens": 100_000},
    {"name": "review", "prompt": "Review the changes and fix any issues", "max_tokens": 50_000},
]

async def run_phased(agent, task, phases):
    context = {"task": task}
    
    for phase in phases:
        # Start fresh context for each phase
        result = await Runner.run(
            agent,
            f"Phase: {phase['name']}\nTask: {task}\nPrevious context: {json.dumps(context)}",
            max_tokens=phase["max_tokens"],
        )
        
        # Carry forward only the essential context
        context[phase["name"]] = summarize(result.final_output)
        await checkpoint.save(context)

Each phase starts with a clean context plus a summary of previous phases. This prevents context degradation while maintaining continuity.

Strategy 5: Handoff notes

When switching sessions (end of day, context full, crash recovery), write explicit handoff notes:

Current state:
- Refactoring auth module (60% complete)
- Files modified: src/auth/middleware.ts, src/auth/jwt.ts
- Approach: replacing session-based auth with JWT
- Constraint: must maintain backward compatibility with v1 API
- Ruled out: OAuth2 (too complex for current scope)
- Next step: update the user model to store refresh tokens
- Blocked on: need to decide token expiration policy

In Claude Code, use /clear and paste this as the first message of the new session. It’s more work than /compact but gives the cleanest context.

Cost management for long sessions

An 8-hour session can cost $5-50 depending on the model and activity level:

Model	8-hour session (estimated)	Strategy
GPT-4o	$20-50	Use for complex reasoning only
GPT-4o-mini	$2-5	Good for routine coding
Claude Sonnet	$15-40	Good balance
DeepSeek	$1-3	Cheapest API option
Ollama local	$0 (electricity)	Free but slower

The AI Startup Race agents use model routing: expensive models for planning and architecture, cheap models for routine implementation. See our cost management guide for detailed strategies.

When to end a session

End the session and start fresh when:

Context exceeds 70% of the model’s window
The agent starts repeating itself or going in circles
You’re switching to a fundamentally different task
Quality noticeably degrades (responses become vague or generic)
The session has been running for 4+ hours without compaction

Don’t end the session when:

You’re in the middle of a multi-step task (compact instead)
The agent has important context about your codebase (save handoff notes first)
You’re debugging a specific issue (the debugging context is valuable)

Long-Running AI Agents: Managing 8-Hour Coding Sessions (2026)

The context degradation problem

Strategy 1: Proactive compaction

Strategy 2: Checkpoint and resume

Strategy 3: Subagent delegation

Strategy 4: Task decomposition

Strategy 5: Handoff notes

Cost management for long sessions

When to end a session

📬 AI Dev Weekly

You might also like

Claude Sonnet 5: Complete Guide to Benchmarks, Pricing, and Features (2026)

MiniMax M3 1M Context Window: How MSA Makes Million-Token Inference Practical

MiniMax M3 for Agentic Coding: Long-Horizon Autonomy at $0.60/M Tokens

MiniMax M3: Complete Guide to the Open-Weight Frontier Model (2026)