s03

TodoWrite

Planning & Coordination

Plan Before You Act

212 LOC5 toolsTodoManager + nag reminder

An agent without a plan drifts; list the steps first, then execute

TodoWrite Nag System

Nag Timer0/3

Pending4

#1pending

Write auth tests

#2pending

Fix mobile layout

#3pending

Add error handling

#4pending

Update config loader

In Progress0

Done0

Progress: 0/4 complete

The Plan

TodoWrite gives the model a visible plan. All tasks start as pending.

1/7

s01 > s02 > [ s03 ] s04 > s05 > s06 > s07 > s08 > s09 > s10 > s11 > s12

"An agent without a plan drifts" -- plan steps first, completion rate doubles.

Harness layer: Planning -- keeping the model on course without drawing the route.

Problem

In multi-step tasks, models lose track of progress — repeating done work, skipping steps, drifting. Gets worse with longer conversations: tool results fill context, diluting the system prompt's influence. A 10-step refactor might execute 1-3 steps then improvise because steps 4-10 are out of attention.

Solution

+--------+      +-------+      +---------+
|  User  | ---> |  LLM  | ---> | Tools   |
| prompt |      |       |      | + todo  |
+--------+      +---+---+      +----+----+
                    ^                |
                    |   tool_result  |
                    +----------------+
              TodoManager: [ ] A  [>] B  [x] C
              if rounds_since_todo >= 3: inject <reminder>

Core Concepts

TodoManager

Single in_progress constraint forces sequential focus:

class TodoManager:
    def update(self, items): ...  # Validates only 1 in_progress
    def render(self): ...          # ASCII checklist for context

Nag Reminder

Injected after 3 rounds without todo update — accountability pressure.

Key Code

TOOL_HANDLERS = {
    # ...base tools...
    "todo": lambda **kw: TODO.update(kw["items"]),
}

What's New (s02 → s03)

Component	s02	s03
Tools	4	5 (+todo)
Planning	None	Stateful TodoManager
Nag injection	None	`<reminder>` after 3 rounds
Agent loop	Simple dispatch	+ rounds_since_todo counter

Deep Dive: Design Decisions

Q1: Why only one in_progress at a time?

Attention focusing. LLMs scatter easily — multiple in_progress items cause random switching with half-finished results. Single in_progress = depth-first, one at a time. Mirrors GTD methodology.

Q2: Why 3 rounds as the nag threshold?

Empirical sweet spot. 1 round feels micromanaged; 10 rounds and the plan is already stale. 3 rounds ≈ 1-2 meaningful tool calls, then back to plan update.

Q3: Todo as tool vs system prompt?

Tool is better: model can update state, system can render progress. System prompt is static — "make a plan first" can't track dynamic state changes.

Q4: How does this differ from s07's Task System?

s03 is a flat list. s07 supports dependency graphs: task B can't start until task A completes.

Q5: Does completion rate really double?

Per Anthropic internal testing: simple tasks (1-3 steps) improve ~10%; complex tasks (5-10 steps) improve 40-60%. Gains come from reducing omitted steps and redundant work.

Try It

cd learn-claude-code
python agents/s03_todo_write.py

Recommended prompts:

"Refactor hello.py: add type hints, docstrings, and a main guard" — watch auto-decomposition
"Create a Python package with __init__.py, utils.py, and tests/" — multi-file task
"Review all Python files and fix style issues" — observe nag reminder

References

Building Effective Agents: Workflows — Anthropic, Dec 2025. Planning and Prompt Chaining in agents. s03 is the simplest inline planning.
GTD: Getting Things Done — David Allen. "One thing at a time" methodology. s03's single in_progress directly borrows this.
Claude Code: Task Management — Anthropic Docs. Claude Code's built-in todowrite tool, s03 is its teaching version.
Agentic Coding Trends 2026 — Anthropic, Mar 2026. Agent "steering" challenges — s03's nag reminder is a lightweight solution.