Context Engineering
Memory ManagementBudget, Not Compress
Context Engineering > Prompt Engineering; proactive budget allocation, not reactive compression
s13 > s14 > [ s15 ] s16 > s17 > s18 > s19 > s20 > s21
"Context Engineering > Prompt Engineering" -- The core engineering discipline for AI agents.
Harness layer: Memory Management -- Proactive context curation, not reactive compression.
Problem
s06 solves "what to do when context is full" (compression). But the real question is: how to manage context proactively from the start? Not reacting to overflow, but budgeting every token — like financial planning, not bankruptcy proceedings.
Solution
Context Window = A finite budget
┌────────────────────────────────────────────────────┐
│ │
│ ┌──────┐ ┌─────┐ ┌──────┐ ┌──────┐ ┌──────────┐ │
│ │System│ │Config│ │Tools │ │Buffer│ │ History │ │
│ │ 10% │ │ 5% │ │ 10% │ │ 15% │ │ 60% │ │
│ │ In- │ │Layer│ │Dyna- │ │Curr. │ │ Largest │ │
│ │compr.│ │ override│ │mic │ │turn │ │compress. │ │
│ └──────┘ └─────┘ └──────┘ └──────┘ └──────────┘ │
│ │
│ Three proactive management strategies: │
│ 1. Budget Allocation — manage tokens by region │
│ 2. Layered Config — global → project → dir │
│ 3. Agentic Memory — proactively save findings │
└────────────────────────────────────────────────────┘
Core Concepts
1. Context Budget Manager
Divides the context window into 5 regions, each with a token budget:
| Region | Budget | Content | Characteristics |
|---|---|---|---|
| system | 10% | System prompt | Incompressible |
| config | 5% | Layered config | CLAUDE.md etc. |
| tools | 10% | Tool definitions | Dynamic discovery (MCP) |
| history | 60% | Conversation history | Largest compressible region |
| buffer | 15% | Current turn's tool results | Temporary buffer |
When any single region exceeds its budget, the manager triggers compression for that specific region — not globally.
class ContextBudgetManager:
def __init__(self, max_tokens=100000):
self.budget = {
"system": int(max_tokens * 0.10), # 10,000 tokens
"config": int(max_tokens * 0.05), # 5,000 tokens
"tools": int(max_tokens * 0.10), # 10,000 tokens
"history": int(max_tokens * 0.60), # 60,000 tokens
"buffer": int(max_tokens * 0.15), # 15,000 tokens
}
self.usage = {k: 0 for k in self.budget}
def needs_compaction(self, region):
"""Check if a specific region exceeds its budget."""
return self.usage[region] > self.budget[region]
def get_status(self):
"""Visual budget report."""
for region, info in self.check_budget().items():
bar = "█" * int(info["pct"] / 10) + "░" * (10 - int(info["pct"] / 10))
print(f" {region:8s} [{bar}] {info['used']}/{info['budget']} ({info['pct']}%)")
2. Layered Configuration
Simulates Claude Code's CLAUDE.md layered strategy. More specific configs override more general ones:
~/.agent/config ← Global defaults (coding style, language)
└─ /project/AGENT.md ← Project overrides (tech stack, conventions)
└─ /project/src/AGENT.md ← Directory overrides (module-specific rules)
class LayeredConfig:
def load(self, current_dir):
configs = []
if os.path.exists("~/.agent/config"):
configs.append(read("~/.agent/config")) # Layer 1: Global
if os.path.exists(f"{self.project_root}/AGENT.md"):
configs.append(read(f"{self.project_root}/AGENT.md")) # Layer 2: Project
if os.path.exists(f"{current_dir}/AGENT.md"):
configs.append(read(f"{current_dir}/AGENT.md")) # Layer 3: Directory
return merge(configs) # Later layers override earlier ones
3. Agentic Memory
Unlike s06's passive compression, Agentic Memory is proactive: the agent identifies "this is worth remembering" during work and writes it to a memory file. On next session start, memories are loaded automatically:
class AgenticMemory:
def save(self, key, value):
"""Agent proactively saves an important finding."""
with open(self.memory_file, "a") as f:
f.write(f"\n## {key}\n{value}\n")
def load(self):
"""Load all memories at session start."""
return open(self.memory_file).read()
Example memories:
"This project uses SQLAlchemy 2.0 async syntax, not legacy 1.x""CI pipeline requires Python 3.11+, not 3.9""User prefers type hints on all public functions"
Key Code
# In the agent loop: proactive budget check before every API call
def agent_loop(messages):
while True:
budget_mgr.update_usage("history", json.dumps(messages))
# Proactive compression if over budget
if budget_mgr.needs_compaction("history"):
if len(messages) > 8:
summary = "[Earlier conversation summary]"
messages[:] = [{"role": "user", "content": summary}] + messages[-6:]
# System prompt includes layered config + memories
system = f"Config: {layered_config}\nMemories: {memory.load()}"
budget_mgr.update_usage("system", system)
response = client.messages.create(model=MODEL, system=system, messages=messages, ...)
New Tools
| Tool | Purpose |
|---|---|
context_budget | Check context budget usage across all 5 regions |
save_memory | Save important findings to persistent memory file |
load_memory | Load all saved memories from previous sessions |
What's New (s06 → s15)
| Aspect | s06 (Compact) | s15 (Context Engineering) |
|---|---|---|
| Strategy | Reactive — compress when full | Proactive — pre-allocate budgets |
| Configuration | Fixed system prompt | Layered config (global→project→dir) |
| Memory | None | Agent proactively writes key findings |
| Visibility | None | context_budget tool for real-time monitoring |
| Compression | Global compression | Per-region targeted compression |
| Philosophy | Emergency response | Budget management |
Deep Dive: Design Decisions
Q1: Why the 10/5/10/60/15 budget ratios? Can they be adjusted?
These ratios come from actual usage pattern analysis:
- History 60%: The only continuously growing part, needs maximum space
- Buffer 15%: A single tool call (e.g., reading a large file) can produce massive output
- System 10%: System prompts typically 1000-3000 tokens; 10K budget is more than enough
- Tools 10%: MCP may register dozens of tools, each definition ~200 tokens
- Config 5%: CLAUDE.md configs are usually brief
Adjustable: If your agent uses many tools (multiple MCP Servers), increase Tools to 20% and reduce History accordingly. The key is the total must be ≤ 100%.
Q2: How are layered config conflicts resolved?
Follow the most-specific-wins principle — same as CSS specificity:
Global: "Use 4-space indentation"
Project: "Use 2-space indentation" ← overrides global
Directory: "Use tab indentation" ← overrides project
Effective: Tab indentation (most specific directory-level config)
For list-type configs (like "ignored files"), use merge instead of override:
Global ignore: [".git", "node_modules"]
Project ignore: ["dist"]
Effective: [".git", "node_modules", "dist"] ← merged
Q3: What should Agentic Memory save vs. not save?
Should save — Cross-session valuable info that's hard to infer from code:
- ✅ User preferences: "User prefers TypeScript strict mode"
- ✅ Project-specific rules: "PRs require 2 reviewers to merge"
- ✅ Pitfalls: "Use psycopg2 not psycopg3 due to compatibility issues"
Should NOT save — Info already in code, or quickly stale:
- ❌ Code structure: "main.py has 200 lines" — changes constantly
- ❌ Temporary state: "Just fixed bug #42" — irrelevant next session
- ❌ Inferable info: "Project uses React" — package.json already says this
Q4: When budget overflows, which region gets compressed first?
Compression priority from highest to lowest:
- buffer — Current-turn tool results, least impact on memory
- history — Conversation history, use s06's three-layer compression
- config — Remove low-priority config layers
- tools — Unload rarely-used MCP Server tools
- system — Last resort, system prompt is usually incompressible
Q5: What's the relationship between Context Engineering and RAG?
RAG is one technique within Context Engineering, solving "what information to put in the context":
| Context Engineering | RAG | |
|---|---|---|
| Scope | Manage the entire context window | Manage retrieved documents |
| Focus | Budget allocation, priority, compression | Relevance ranking, chunking |
| Timing | Continuous, before every API call | Specific steps (when there's a query) |
| In s15 | Complete framework | s05's Skill Loading is simplified RAG |
Anthropic defines Context Engineering as managing five dimensions: System Prompt (s15), Tool Definitions (s17/MCP), Retrieved Documents (s05), Message History (s06+s15), MCP Resources (s17).
Try It
cd learn-claude-code
python agents/s15_context_engineering.py
Recommended prompts:
"Show me the context budget"— see 5-region budget usage"Remember that this project requires Python 3.11"— trigger memory write"Help me refactor this large file"— watch budget management handle large file reads"What do you remember about this project?"— load and display all memories
References
- Context Engineering for AI Agents — Anthropic, Sep 2025. Defines Context Engineering as the core engineering discipline for AI agents, with a five-dimension management framework. Theoretical foundation for s15.
- Building Effective Agents — Anthropic, Dec 2025. Discusses "Just-in-Time" retrieval and layered config strategies, emphasizing context as a finite attention budget.
- Claude Code: Context Management — Anthropic Docs. Official documentation for Claude Code's CLAUDE.md layered config system, the real implementation reference for s15.
- The Lost in the Middle Problem — Research showing that even with 1M token windows, attention significantly drops for information in the middle region. Academic foundation for "context is not a junk drawer."
- Agentic Memory and State Management — Anthropic. Discusses how agents achieve cross-session memory through filesystem persistence of key findings.