πŸ€– AI Tools
Β· 2 min read

Context Packing Strategies for AI Coding Agents


Your AI coding agent has a finite context window. How you fill it determines output quality. Here are the strategies that work.

Strategy 1: Repo maps (what Aider does)

Instead of loading entire files, build a map of your codebase β€” function signatures, class definitions, imports β€” without the implementation details.

Aider uses tree-sitter to parse your repo into an AST map. The model sees the structure of every file without the cost of loading every line. When it needs details, it requests specific files.

Cost: ~500-2000 tokens for a medium project (vs 50K+ for all files)

Strategy 2: Selective file loading

Only load files relevant to the current task:

/add src/auth.ts          # File being edited
/add src/types/user.ts    # Type definitions needed
/read src/middleware.ts    # Reference only, don't edit

Claude Code and Aider both support this. The /read flag is critical β€” it tells the model β€œuse this for context but don’t modify it,” saving output tokens.

Strategy 3: Conversation summarization

Long coding sessions accumulate history. Instead of keeping every message:

Messages 1-48: [Summary: Fixed auth bug, added rate limiting, updated tests]
Message 49: [Full content]
Message 50: [Full content - current]

This is how GLM-5.1 maintains coherence over 8-hour sessions β€” it summarizes older context rather than dropping it.

Strategy 4: Priority ordering

Models pay more attention to the beginning and end of context (β€œlost in the middle” problem). Put the most important information first and last:

1. System prompt (first β€” always attended to)
2. Current error/task (high priority)
3. Relevant code files
4. Background context (middle β€” less attended)
5. Specific instruction (last β€” well attended)

Strategy 5: Context-aware retrieval

Use RAG or MCP to pull in context on demand rather than pre-loading everything:

User: "Fix the database connection timeout"
β†’ MCP server retrieves: connection config, recent error logs, DB schema
β†’ Only relevant context loaded, not the entire codebase

Strategy 6: File truncation

For large files, load only the relevant section:

# Instead of loading all 2000 lines of utils.py
# Load lines 150-200 where the relevant function is

Our AI race orchestrator truncates PROGRESS.md and backlog files to the last 60 lines before each session β€” keeping recent context while saving tokens.

The cost connection

Better context packing = fewer tokens = lower costs. Combined with prompt caching, good context engineering can reduce API costs by 50-70%.

Related: What is Context Engineering? Β· Prompt Engineering vs Context Engineering Β· How to Reduce LLM API Costs Β· KV Cache Explained Β· Minimax M2 7 Agentic Coding Β· Retrieval Vs Memory Vs Tools