Learn Claude Code
s15

Context Engineering

Memory Management

Budget, Not Compress

291 LOC6 toolsContextBudgetManager + LayeredConfig + AgenticMemory
Context Engineering > Prompt Engineering; proactive budget allocation, not reactive compression

s13 > s14 > [ s15 ] s16 > s17 > s18 > s19 > s20 > s21

"Context Engineering > Prompt Engineering" -- The core engineering discipline for AI agents.

Harness layer: Memory Management -- Proactive context curation, not reactive compression.

Problem

s06 solves "what to do when context is full" (compression). But the real question is: how to manage context proactively from the start? Not reacting to overflow, but budgeting every token — like financial planning, not bankruptcy proceedings.

Solution

Context Window = A finite budget

┌────────────────────────────────────────────────────┐
│                                                    │
│  ┌──────┐ ┌─────┐ ┌──────┐ ┌──────┐ ┌──────────┐ │
│  │System│ │Config│ │Tools │ │Buffer│ │  History  │ │
│  │ 10%  │ │ 5%  │ │ 10%  │ │ 15%  │ │   60%    │ │
│  │ In-  │ │Layer│ │Dyna- │ │Curr. │ │ Largest  │ │
│  │compr.│ │ override│ │mic  │ │turn  │ │compress. │ │
│  └──────┘ └─────┘ └──────┘ └──────┘ └──────────┘ │
│                                                    │
│  Three proactive management strategies:            │
│    1. Budget Allocation — manage tokens by region   │
│    2. Layered Config — global → project → dir       │
│    3. Agentic Memory — proactively save findings    │
└────────────────────────────────────────────────────┘

Core Concepts

1. Context Budget Manager

Divides the context window into 5 regions, each with a token budget:

RegionBudgetContentCharacteristics
system10%System promptIncompressible
config5%Layered configCLAUDE.md etc.
tools10%Tool definitionsDynamic discovery (MCP)
history60%Conversation historyLargest compressible region
buffer15%Current turn's tool resultsTemporary buffer

When any single region exceeds its budget, the manager triggers compression for that specific region — not globally.

class ContextBudgetManager:
    def __init__(self, max_tokens=100000):
        self.budget = {
            "system":  int(max_tokens * 0.10),  # 10,000 tokens
            "config":  int(max_tokens * 0.05),  # 5,000 tokens
            "tools":   int(max_tokens * 0.10),  # 10,000 tokens
            "history": int(max_tokens * 0.60),  # 60,000 tokens
            "buffer":  int(max_tokens * 0.15),  # 15,000 tokens
        }
        self.usage = {k: 0 for k in self.budget}

    def needs_compaction(self, region):
        """Check if a specific region exceeds its budget."""
        return self.usage[region] > self.budget[region]

    def get_status(self):
        """Visual budget report."""
        for region, info in self.check_budget().items():
            bar = "█" * int(info["pct"] / 10) + "░" * (10 - int(info["pct"] / 10))
            print(f"  {region:8s} [{bar}] {info['used']}/{info['budget']} ({info['pct']}%)")

2. Layered Configuration

Simulates Claude Code's CLAUDE.md layered strategy. More specific configs override more general ones:

~/.agent/config               ← Global defaults (coding style, language)
  └─ /project/AGENT.md        ← Project overrides (tech stack, conventions)
      └─ /project/src/AGENT.md    ← Directory overrides (module-specific rules)
class LayeredConfig:
    def load(self, current_dir):
        configs = []
        if os.path.exists("~/.agent/config"):
            configs.append(read("~/.agent/config"))       # Layer 1: Global
        if os.path.exists(f"{self.project_root}/AGENT.md"):
            configs.append(read(f"{self.project_root}/AGENT.md"))  # Layer 2: Project
        if os.path.exists(f"{current_dir}/AGENT.md"):
            configs.append(read(f"{current_dir}/AGENT.md"))   # Layer 3: Directory
        return merge(configs)  # Later layers override earlier ones

3. Agentic Memory

Unlike s06's passive compression, Agentic Memory is proactive: the agent identifies "this is worth remembering" during work and writes it to a memory file. On next session start, memories are loaded automatically:

class AgenticMemory:
    def save(self, key, value):
        """Agent proactively saves an important finding."""
        with open(self.memory_file, "a") as f:
            f.write(f"\n## {key}\n{value}\n")

    def load(self):
        """Load all memories at session start."""
        return open(self.memory_file).read()

Example memories:

  • "This project uses SQLAlchemy 2.0 async syntax, not legacy 1.x"
  • "CI pipeline requires Python 3.11+, not 3.9"
  • "User prefers type hints on all public functions"

Key Code

# In the agent loop: proactive budget check before every API call
def agent_loop(messages):
    while True:
        budget_mgr.update_usage("history", json.dumps(messages))
        # Proactive compression if over budget
        if budget_mgr.needs_compaction("history"):
            if len(messages) > 8:
                summary = "[Earlier conversation summary]"
                messages[:] = [{"role": "user", "content": summary}] + messages[-6:]
        # System prompt includes layered config + memories
        system = f"Config: {layered_config}\nMemories: {memory.load()}"
        budget_mgr.update_usage("system", system)
        response = client.messages.create(model=MODEL, system=system, messages=messages, ...)

New Tools

ToolPurpose
context_budgetCheck context budget usage across all 5 regions
save_memorySave important findings to persistent memory file
load_memoryLoad all saved memories from previous sessions

What's New (s06 → s15)

Aspects06 (Compact)s15 (Context Engineering)
StrategyReactive — compress when fullProactive — pre-allocate budgets
ConfigurationFixed system promptLayered config (global→project→dir)
MemoryNoneAgent proactively writes key findings
VisibilityNonecontext_budget tool for real-time monitoring
CompressionGlobal compressionPer-region targeted compression
PhilosophyEmergency responseBudget management

Deep Dive: Design Decisions

Q1: Why the 10/5/10/60/15 budget ratios? Can they be adjusted?

These ratios come from actual usage pattern analysis:

  • History 60%: The only continuously growing part, needs maximum space
  • Buffer 15%: A single tool call (e.g., reading a large file) can produce massive output
  • System 10%: System prompts typically 1000-3000 tokens; 10K budget is more than enough
  • Tools 10%: MCP may register dozens of tools, each definition ~200 tokens
  • Config 5%: CLAUDE.md configs are usually brief

Adjustable: If your agent uses many tools (multiple MCP Servers), increase Tools to 20% and reduce History accordingly. The key is the total must be ≤ 100%.

Q2: How are layered config conflicts resolved?

Follow the most-specific-wins principle — same as CSS specificity:

Global: "Use 4-space indentation"
Project: "Use 2-space indentation"    ← overrides global
Directory: "Use tab indentation"      ← overrides project
Effective: Tab indentation (most specific directory-level config)

For list-type configs (like "ignored files"), use merge instead of override:

Global ignore: [".git", "node_modules"]
Project ignore: ["dist"]
Effective: [".git", "node_modules", "dist"]  ← merged

Q3: What should Agentic Memory save vs. not save?

Should save — Cross-session valuable info that's hard to infer from code:

  • ✅ User preferences: "User prefers TypeScript strict mode"
  • ✅ Project-specific rules: "PRs require 2 reviewers to merge"
  • ✅ Pitfalls: "Use psycopg2 not psycopg3 due to compatibility issues"

Should NOT save — Info already in code, or quickly stale:

  • ❌ Code structure: "main.py has 200 lines" — changes constantly
  • ❌ Temporary state: "Just fixed bug #42" — irrelevant next session
  • ❌ Inferable info: "Project uses React" — package.json already says this

Q4: When budget overflows, which region gets compressed first?

Compression priority from highest to lowest:

  1. buffer — Current-turn tool results, least impact on memory
  2. history — Conversation history, use s06's three-layer compression
  3. config — Remove low-priority config layers
  4. tools — Unload rarely-used MCP Server tools
  5. systemLast resort, system prompt is usually incompressible

Q5: What's the relationship between Context Engineering and RAG?

RAG is one technique within Context Engineering, solving "what information to put in the context":

Context EngineeringRAG
ScopeManage the entire context windowManage retrieved documents
FocusBudget allocation, priority, compressionRelevance ranking, chunking
TimingContinuous, before every API callSpecific steps (when there's a query)
In s15Complete frameworks05's Skill Loading is simplified RAG

Anthropic defines Context Engineering as managing five dimensions: System Prompt (s15), Tool Definitions (s17/MCP), Retrieved Documents (s05), Message History (s06+s15), MCP Resources (s17).

Try It

cd learn-claude-code
python agents/s15_context_engineering.py

Recommended prompts:

  • "Show me the context budget" — see 5-region budget usage
  • "Remember that this project requires Python 3.11" — trigger memory write
  • "Help me refactor this large file" — watch budget management handle large file reads
  • "What do you remember about this project?" — load and display all memories

References

  • Context Engineering for AI Agents — Anthropic, Sep 2025. Defines Context Engineering as the core engineering discipline for AI agents, with a five-dimension management framework. Theoretical foundation for s15.
  • Building Effective Agents — Anthropic, Dec 2025. Discusses "Just-in-Time" retrieval and layered config strategies, emphasizing context as a finite attention budget.
  • Claude Code: Context Management — Anthropic Docs. Official documentation for Claude Code's CLAUDE.md layered config system, the real implementation reference for s15.
  • The Lost in the Middle Problem — Research showing that even with 1M token windows, attention significantly drops for information in the middle region. Academic foundation for "context is not a junk drawer."
  • Agentic Memory and State Management — Anthropic. Discusses how agents achieve cross-session memory through filesystem persistence of key findings.