Learn Claude Code
s03

TodoWrite

Planning & Coordination

Plan Before You Act

212 LOC5 toolsTodoManager + nag reminder
An agent without a plan drifts; list the steps first, then execute

s01 > s02 > [ s03 ] s04 > s05 > s06 > s07 > s08 > s09 > s10 > s11 > s12

"An agent without a plan drifts" -- plan steps first, completion rate doubles.

Harness layer: Planning -- keeping the model on course without drawing the route.

Problem

In multi-step tasks, models lose track of progress — repeating done work, skipping steps, drifting. Gets worse with longer conversations: tool results fill context, diluting the system prompt's influence. A 10-step refactor might execute 1-3 steps then improvise because steps 4-10 are out of attention.

Solution

+--------+      +-------+      +---------+
|  User  | ---> |  LLM  | ---> | Tools   |
| prompt |      |       |      | + todo  |
+--------+      +---+---+      +----+----+
                    ^                |
                    |   tool_result  |
                    +----------------+
              TodoManager: [ ] A  [>] B  [x] C
              if rounds_since_todo >= 3: inject <reminder>

Core Concepts

TodoManager

Single in_progress constraint forces sequential focus:

class TodoManager:
    def update(self, items): ...  # Validates only 1 in_progress
    def render(self): ...          # ASCII checklist for context

Nag Reminder

Injected after 3 rounds without todo update — accountability pressure.

Key Code

TOOL_HANDLERS = {
    # ...base tools...
    "todo": lambda **kw: TODO.update(kw["items"]),
}

What's New (s02 → s03)

Components02s03
Tools45 (+todo)
PlanningNoneStateful TodoManager
Nag injectionNone<reminder> after 3 rounds
Agent loopSimple dispatch+ rounds_since_todo counter

Deep Dive: Design Decisions

Q1: Why only one in_progress at a time?

Attention focusing. LLMs scatter easily — multiple in_progress items cause random switching with half-finished results. Single in_progress = depth-first, one at a time. Mirrors GTD methodology.

Q2: Why 3 rounds as the nag threshold?

Empirical sweet spot. 1 round feels micromanaged; 10 rounds and the plan is already stale. 3 rounds ≈ 1-2 meaningful tool calls, then back to plan update.

Q3: Todo as tool vs system prompt?

Tool is better: model can update state, system can render progress. System prompt is static — "make a plan first" can't track dynamic state changes.

Q4: How does this differ from s07's Task System?

s03 is a flat list. s07 supports dependency graphs: task B can't start until task A completes.

Q5: Does completion rate really double?

Per Anthropic internal testing: simple tasks (1-3 steps) improve ~10%; complex tasks (5-10 steps) improve 40-60%. Gains come from reducing omitted steps and redundant work.

Try It

cd learn-claude-code
python agents/s03_todo_write.py

Recommended prompts:

  • "Refactor hello.py: add type hints, docstrings, and a main guard" — watch auto-decomposition
  • "Create a Python package with __init__.py, utils.py, and tests/" — multi-file task
  • "Review all Python files and fix style issues" — observe nag reminder

References