TodoWrite
Planning & CoordinationPlan Before You Act
An agent without a plan drifts; list the steps first, then execute
s01 > s02 > [ s03 ] s04 > s05 > s06 > s07 > s08 > s09 > s10 > s11 > s12
"An agent without a plan drifts" -- plan steps first, completion rate doubles.
Harness layer: Planning -- keeping the model on course without drawing the route.
Problem
In multi-step tasks, models lose track of progress — repeating done work, skipping steps, drifting. Gets worse with longer conversations: tool results fill context, diluting the system prompt's influence. A 10-step refactor might execute 1-3 steps then improvise because steps 4-10 are out of attention.
Solution
+--------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tools |
| prompt | | | | + todo |
+--------+ +---+---+ +----+----+
^ |
| tool_result |
+----------------+
TodoManager: [ ] A [>] B [x] C
if rounds_since_todo >= 3: inject <reminder>
Core Concepts
TodoManager
Single in_progress constraint forces sequential focus:
class TodoManager:
def update(self, items): ... # Validates only 1 in_progress
def render(self): ... # ASCII checklist for context
Nag Reminder
Injected after 3 rounds without todo update — accountability pressure.
Key Code
TOOL_HANDLERS = {
# ...base tools...
"todo": lambda **kw: TODO.update(kw["items"]),
}
What's New (s02 → s03)
| Component | s02 | s03 |
|---|---|---|
| Tools | 4 | 5 (+todo) |
| Planning | None | Stateful TodoManager |
| Nag injection | None | <reminder> after 3 rounds |
| Agent loop | Simple dispatch | + rounds_since_todo counter |
Deep Dive: Design Decisions
Q1: Why only one in_progress at a time?
Attention focusing. LLMs scatter easily — multiple in_progress items cause random switching with half-finished results. Single in_progress = depth-first, one at a time. Mirrors GTD methodology.
Q2: Why 3 rounds as the nag threshold?
Empirical sweet spot. 1 round feels micromanaged; 10 rounds and the plan is already stale. 3 rounds ≈ 1-2 meaningful tool calls, then back to plan update.
Q3: Todo as tool vs system prompt?
Tool is better: model can update state, system can render progress. System prompt is static — "make a plan first" can't track dynamic state changes.
Q4: How does this differ from s07's Task System?
s03 is a flat list. s07 supports dependency graphs: task B can't start until task A completes.
Q5: Does completion rate really double?
Per Anthropic internal testing: simple tasks (1-3 steps) improve ~10%; complex tasks (5-10 steps) improve 40-60%. Gains come from reducing omitted steps and redundant work.
Try It
cd learn-claude-code
python agents/s03_todo_write.py
Recommended prompts:
"Refactor hello.py: add type hints, docstrings, and a main guard"— watch auto-decomposition"Create a Python package with __init__.py, utils.py, and tests/"— multi-file task"Review all Python files and fix style issues"— observe nag reminder
References
- Building Effective Agents: Workflows — Anthropic, Dec 2025. Planning and Prompt Chaining in agents. s03 is the simplest inline planning.
- GTD: Getting Things Done — David Allen. "One thing at a time" methodology. s03's single in_progress directly borrows this.
- Claude Code: Task Management — Anthropic Docs. Claude Code's built-in todowrite tool, s03 is its teaching version.
- Agentic Coding Trends 2026 — Anthropic, Mar 2026. Agent "steering" challenges — s03's nag reminder is a lightweight solution.