Auto Mode + Sandbox
Safety & GovernancePermission Tiers + Sandbox
Safety should not be the enemy of efficiency; 95% auto-approve, 5% human oversight
s13 > s14 > s15 > s16 > s17 > [ s18 ] s19 > s20 > s21
"Safety should not be the enemy of efficiency." -- Security and speed can coexist.
Harness layer: Safety & Governance -- Three-tier permission system + sandbox isolation.
Problem
Previous agents required user confirmation for every tool call. For routine coding tasks — reading files, running lint, executing tests — requiring confirmation every time severely slows work. But removing all confirmations risks rm -rf / or data leaks. We need a tiered permission system: safe operations auto-execute, dangerous ones need approval.
Solution
Three Permission Tiers (SAFE / ASK / DENY)
Tool call request
│
▼
┌─────────────────────────────┐
│ Permission Classifier │
│ (classify by tool + args) │
└──────────┬──────────────────┘
│
┌──────┼──────┐
▼ ▼ ▼
┌──────┐ ┌────┐ ┌─────┐
│ SAFE │ │ASK │ │DENY │
│ Auto │ │Need│ │Block│
│ exec │ │ ok │ │ │
└──┬───┘ └─┬──┘ └──┬──┘
│ │ └──► "Permission denied"
│ └──► "Allow? [y/n]"
└──────► [Execute in sandbox]
Core Concepts
Three Permission Tiers
| Tier | Action | Examples |
|---|---|---|
| SAFE | Auto-execute, no confirmation | Read files, grep, lint, git status |
| ASK | Requires user confirmation | Write files, install packages, git commit |
| DENY | Blocked immediately | rm -rf, network requests, /etc access, sudo |
Classifier Logic
class PermissionClassifier:
SAFE_TOOLS = {"read_file", "grep", "lint", "git_status", "git_log"}
DENY_PATTERNS = [r"rm\s+-rf", r"sudo\s+", r"curl|wget", r"/etc/|/var/"]
def classify(self, tool_name, args):
if tool_name in self.SAFE_TOOLS: return "SAFE"
if any(re.search(p, json.dumps(args)) for p in self.DENY_PATTERNS): return "DENY"
return "ASK" # Default: ask user
Sandbox Isolation
For Auto Mode auto-execution, add sandbox protection:
class SandboxExecutor:
def __init__(self):
self.allowed_dirs = ["/workspace", "/tmp"]
self.network_enabled = False
def execute(self, command):
for path in extract_paths(command):
if not any(path.startswith(d) for d in self.allowed_dirs):
raise PermissionError(f"Access denied: {path}")
return subprocess.run(command, ...)
Key Code
def agent_loop(messages):
classifier = PermissionClassifier()
sandbox = SandboxExecutor()
while True:
response = client.messages.create(...)
for tool_use in response.tool_use_blocks:
level = classifier.classify(tool_use.name, tool_use.input)
if level == "DENY":
tool_result = f"❌ Permission denied: {tool_use.name}"
elif level == "ASK":
if input(f"🔒 Allow {tool_use.name}? [y/n] ") == "y":
tool_result = sandbox.execute_tool(tool_use.name, tool_use.input)
else:
tool_result = "⏭️ Skipped by user"
else: # SAFE
tool_result = sandbox.execute_tool(tool_use.name, tool_use.input)
messages.append(...)
What's New (s02 → s18)
| Aspect | s02 (Tools) | s18 (Auto Mode) |
|---|---|---|
| Permissions | All require confirmation | Three tiers |
| Auto execution | None | SAFE tier auto-runs |
| Danger protection | None | DENY blocklist |
| Sandbox | None | Filesystem + network isolation |
| User experience | Confirm everything | Only ASK tier needs confirmation |
Deep Dive: Design Decisions
Q1: Should the classifier use rules or an LLM?
Rules, not LLM. Reasons: (1) Speed: < 1ms vs 1-3s. (2) Determinism: Security policies can't be probabilistic. (3) Auditability: Rules can be fully listed and reviewed. (4) Cost: Extra LLM call per tool call is expensive.
Q2: What's the difference between Auto Mode and skip-permissions?
Auto Mode keeps DENY tier protection. Skip-permissions has zero safety net — only for fully isolated CI/CD containers.
Q3: How does the sandbox isolate filesystem access?
Three layers: (1) Path allowlist. (2) Symlink detection (prevent escape). (3) OS-level sandbox (macOS sandbox-exec, Linux seccomp-bpf).
Q4: What if the SAFE allowlist is too conservative?
Users can extend via config: {"allowedTools": ["npm_test", "git_diff", "prettier_format"]}. Safety advice: consider worst case — if the agent misunderstands parameters, what happens? write_file worst case = overwrite important file. read_file worst case = read wrong file (much lower risk).
Q5: How does enterprise permission management differ from personal use?
Enterprise adds: mandatory audit logging, non-overridable DENY lists, centralized policy distribution, MCP Server approval requirements.
Try It
cd learn-claude-code
python agents/s18_auto_mode.py
Recommended prompts:
"Read the README file"— observe SAFE auto-execution"Delete the temp files"— observe DENY blocking"Write a new file called hello.py"— observe ASK confirmation"Run the test suite"— observe SAFE/ASK tier classification
References
- Claude Code: Auto Mode — Anthropic Docs. Auto Mode settings including
--autoand--dangerously-skip-permissions. - Claude Code: Native Sandboxing — Anthropic, Feb 2026. macOS Sandbox and Linux seccomp integration.
- Building Effective Agents — Anthropic, Dec 2025. Balancing safety and efficiency.
- OWASP Top 10 for LLM Applications — OWASP, 2025. Prompt Injection attacks via malicious tool calls; DENY mechanism defends against this.
- Agentic Coding Trends 2026 — Anthropic, Mar 2026. Enterprise agent permission management trends.