How do you run AI agents in the background on GitHub issues?
backlog column. By morning the column is full of review cards backed by draft PRs.Background agents are the use case that pays for autopilot's existence. You go to bed with a backlog of refined GitHub issues; you wake up to a stack of draft PRs ready for review. The autopilot session ran while you slept, agents split parent issues into subtasks, slots round-robin'd through personas, QA gates failed-close on broken changes, and the cost cap halted the run before any one ticket could spiral.
The honest version of "fire it and walk away" is: this works well on cards that were triaged first, and it fails on cards that weren't. The mitigation is to run a triage pass first so the autopilot has refined inputs to work on.
Why background work is the autopilot use case
Three things have to be true for unsupervised agent work to pay off:
- Each card is shippable in isolation. No agent should need to ask another agent's output mid-run. (Subtask splitting is fine — they queue, they don't block.)
- The QA gate catches the easy failures. Typecheck, tests, lint run after every change in the worktree; broken worktrees never reach
review. - The budget cap is a real ceiling. Autopilot halts on session cost. You set a number you're prepared to spend and you trust the halt.
KanBots autopilot enforces all three. The session writes to autopilot_sessions so you can see every cycle in the morning, including which cards completed, which failed, and how much each cost.
How the Feature-Dev autopilot loop works
Feature-Dev flavor cycles through your enabled personas on each card. With the default roster — product, engineer, reviewer, tester — a card goes through:
- Product pass. Rewrite the card body into acceptance criteria. Split into subtasks if scope is too large. New cards land in the same column with a parent link.
- Engineer pass. Open the relevant files, write the change in the worktree, run typecheck. Iterate until the typecheck passes.
- Reviewer pass. Read the diff, run tests, post a verdict. If the verdict is "request changes," loop back to engineer with the feedback.
- Tester pass. Add or update tests for the change. Run the full suite.
Personas spawn personas: an engineer agent that discovers the work is bigger than the card can spawn child cards for the rest, and those child cards re-enter the round-robin. The backlog evolves as the work uncovers itself. Parallelism applies per persona — with parallelism 4, you can have four engineer slots running on four cards concurrently, then four reviewer slots when those finish.
Walkthrough — set up an overnight session
- Switch the workspace to GitHub mode in Settings → Workspace → Issue source. Authenticate via
ghCLI or the GitHub App. The board pulls open issues into cards mapped bystatus:*labels. - Triage first. Open autopilot Feature-Dev with only the
productpersona enabled and let it run for 15 minutes to refine the cards. Skip this only if the issues are already well-scoped. - Open autopilot Feature-Dev again. Enable all four personas. Parallelism: 4. Effort: Medium (Heavy burns money overnight; Light skips edge-case handling).
- Set a session cost cap you'd be comfortable losing if the autopilot goes sideways. $80–$150 is a reasonable overnight budget for a small team's worth of work; calibrate down based on your first session.
- Pick the
backlogcolumn as input. Click Start. The autopilot session row appears in the bottom drawer with a live cycle counter and running cost. - Close the laptop. The agents run regardless of UI focus; the dispatcher is in the Electron main process, not the renderer. Leave the desktop app running.
- In the morning, open the autopilot session row to see the cycle log: how many cards completed, which ones failed, total cost. Review the
reviewcolumn. Each card with a draft PR is ready for human review.
Failure modes and fixes
The agent picks wrong scope on an underspecified ticket
Symptom: a card titled "improve auth" with no acceptance criteria gets an engineer agent that interprets it as "rewrite the entire auth module." The agent burns 20% of your budget on one ticket. Mitigation: run triage first. The product-persona pass rewrites ambiguous cards into something an engineer agent can scope. The per-cycle-budget-per-card setting in autopilot config also helps — set max_cost_per_card_usd: 5 as a soft ceiling so one runaway card can't eat the session.
Two agents touch the same file in parallel
Symptom: at Promote time, two worktrees have conflicting edits to src/billing/format.ts. Fix: this is partly unavoidable on a complex codebase — both cards legitimately needed the file. The promote flow makes the conflict visible (a "rebase needed" badge on the second card to land). Resolve manually or dispatch a third agent to merge the two worktrees. For chronic cases, group related cards into a sequential chain rather than parallel slots; KanBots supports depends_on relationships between cards.
The autopilot session stops on cost cap mid-card
Symptom: the budget hits at 3am while 4 cards are still mid-engineering. The worktrees stay on disk; the cards drop back to in-progress with their thread intact. Fix: this is by design. Either raise the cap and resume from the same session row, or promote what's salvageable from the four worktrees and discard the rest. The cost cap is the safety; don't let it become "the system that always finishes by morning no matter what."
When background autopilot is the wrong tool
Anything that needs decisions only a human can make. Architecture choices, API contracts that other teams consume, anything where "the agent picked one of two reasonable approaches and committed" is a real cost. The autopilot will charge ahead because that's its job; the result will need re-litigation.
Also: anything time-sensitive. The agents work at their pace, the QA gate adds latency, and there's no SLA. If a customer-blocking bug needs a fix in 20 minutes, you're faster running one agent yourself in the foreground than handing it to a background session with three persona cycles per card.
And finally: cards that depend on each other in ways the dependency model can't capture. If card B's correctness depends on a decision made during card A's review, parallel slots will produce two worktrees that have to be reconciled. Run them sequentially in the foreground instead.
Adjacent reads: the autopilot mental model, how cost caps actually halt the loop, and how the backlog evolves as agents discover work.
Try it on your own folder
Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.
Related questions
- How do you automate backlog triage with AI agents?Point an agent at a column of unrefined tickets and let it split, estimate, label, and propose owners. The exact persona setup for triage autopilot.
- Can you use an AI agent to triage Sentry issues?Auto-import Sentry error groups onto a kanban board, hand each to an agent for root-cause analysis, and promote the fix as a PR. End-to-end walkthrough.
- How do you build an AI agent PR review workflow?A reviewer persona reads the diff, runs the tests in its own worktree, and posts a structured verdict. How to wire it into your existing GitHub flow.
- Can AI agents handle tech debt cleanup?A pattern for dispatching dozens of small refactor agents in parallel, each scoped to one file or module, with QA autopilot guarding the merge.