What is autopilot mode for Claude Code?

Autopilot is a loop. KanBots spawns up to 4 concurrent slots, each slot claims the next persona off a round-robin counter, dispatches a Claude Code (or Codex) run against the issue, and keeps cycling until the work converges, you press stop, or the session cost cap is hit.

The mental model

A normal dispatch is one click, one run, one result. Autopilot replaces that with a long-running orchestrator that owns the issue for hours instead of minutes. The orchestrator keeps three things in memory: which personas to rotate through, how many runs to keep in flight at once, and how much money it is allowed to spend before it gives up.

KanBots ships two flavors. Feature-Dev is the multi-persona round-robin — you set personas (product author, engineer, reviewer, tester), parallelism 1–4, an effort level, a model, and a session budget; slots take turns being each persona on the parent issue and any subtasks it spawns. QA is the fix-loop — typecheck, tests, lint, build, e2e, optionally a watched dev server; for every failing check it dispatches a fix run, re-runs the checks, and stops when everything is green or the budget is gone. Both write to the autopilot_sessions table so you can stop the parent and kill every child in one button press.

How the loop is shaped

Inside a Feature-Dev session the orchestrator does five things in order, then repeats:

  1. Spawn N parallel slots (clamped to MAX_PARALLELISM = 4inside packages/api/src/autopilot/feature-dev.ts).
  2. Each slot atomically claims the next persona from cycle_index — slot 0 might grab the engineer, slot 1 grabs the reviewer, slot 2 grabs the tester, slot 3 cycles back to product author. The claim is serialized through a single promise chain so two slots never grab the same persona at the same instant.
  3. The slot dispatches a child run for that persona, in a fresh worktree under .kanbots/worktrees/issue-N-runId/, branching from your default branch as kanbots/issue-N-runId.
  4. When the run finishes (success, failure, or stop), the slot reads its accumulated total_cost_usd, rolls it into the session total, and checks the budget cap. Cap hit means the slot exits and a SessionBudgetExceededError propagates.
  5. Otherwise the slot waits 500ms and claims the next persona. The backlog itself can have grown — a previous engineer or product run may have called splitIssue and dropped child cards on the board for later cycles to pick up.

QA's loop is shorter and dumber on purpose. It does not need personas. It runs every configured AutopilotCheckCommand, collects failures, dispatches a single-persona fix run per failure, and re-runs the checks. The session is done when everything is green or the budget hits.

The Autopilot — Feature Dev modal

Open it from the autopilot button on any card. The modal has these controls, in this order:

  • Personas — a chip row. Click to toggle. KanBots ships built-ins (Product Manager, Senior Engineer, UX Designer, Growth Lead, Reliability Engineer, Tester) and you can write your own from the persona picker. Order in the chip row is the round-robin order.
  • Parallelism slider — 1 through 4. The CLI loves memory; on a 16GB machine 2 is the comfortable ceiling, on 32GB you can run 4.
  • Effortlow / medium / high / xhigh / max. Maps to model selection and tool budget. low keeps the run on cheap models with tight context; max lets the run think for a long time.
  • Model selector — overrides the workspace default for this session only.
  • Budget cap — a USD number. When the session's accumulated cost crosses it, the loop stops with stopReason: 'cost-budget'.

QA's modal is different — instead of personas it has a check list (typecheck / tests / lint / build / e2e), a live UI toggle, an optional dev server command, and the same budget input.

A worked example

  1. Card #142, "Add password reset endpoint with rate limit." Click autopilot, pick Feature-Dev.
  2. Personas: product author, engineer, reviewer, tester. Parallelism 2. Model: Sonnet. Effort: medium. Budget: $15.
  3. Slot 0 claims index 0 (product author). It splits the issue into three subtasks: schema migration, endpoint handler, rate-limit middleware. Three new cards appear under #142.
  4. Slot 1 claims index 1 (engineer) and starts implementing the parent's first subtask in its worktree. Slot 0 returns and claims index 2 (reviewer) — but there is nothing to review yet, so it waits on artifacts and exits cheaply.
  5. By cycle three both slots are running engineer runs on different subtasks in parallel. By cycle five the reviewer slot is reading diffs. By cycle seven the tester slot is invoking the test command.
  6. Session total cost reads $11.40. You watch from the autopilot panel, drinking coffee. When the loop finishes you have three promoteable worktrees and a parent issue marked done.

When to use which flavor

Use Feature-Dev when the work is generative — a new endpoint, a new feature, an issue with prose but no failing test. Use QA when the work is corrective — the test suite is red, the typecheck is broken, the build won't ship. Running QA on a green tree is wasted spend; it will run every check, find nothing, and exit having charged you for a typecheck.

For multi-persona detail see multi-persona agent orchestration; for the budget mechanics see AI agent cost and budget control.

Three failure modes

Autopilot drifts on an ambiguous parent. The product persona keeps splitting because the parent is under-specified, the engineer keeps over-implementing, the reviewer keeps asking for changes. Symptoms: cycle count above 8 with no donechildren, $20+ burned. Fix: stop the session, edit the parent issue to be three sentences plus a concrete acceptance criterion, restart with budget $5 and parallelism 1.

Decision prompts pile up. Personas issued decision events that the orchestrator pauses on; if you are not watching they just sit. Fix: open the run detail modal, answer the decisions, or tighten the persona prompt to avoid asking ("If unsure, pick the safer option and continue").

QA autopilot loops on a flaky test. The fix run passes, the next QA cycle fails on the same test, the fix run passes again. Symptoms: identical failing-test name across three cycles. Fix: skip or quarantine the flaky test before re-running QA; the autopilot is not a flaky-test detector.

When autopilot is wrong

Autopilot is the wrong tool when the issue is one-shot and small — renaming a function, fixing a typo, bumping a dep. The orchestrator's overhead (worktree creation, persona round-robin, cost rollup) is higher than the cost of running a single dispatch. It is also wrong when the work is exploratory and you actually want to be in the loop — autopilot is a fire-and-walk-away tool; if you want to think alongside the agent, use a normal dispatch and the live thread.

For the branch isolation that makes parallel slots safe, see the feature-branch workflow.

Try it on your own folder

Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.