How do you put a budget cap on AI coding agents?
result event. KanBots accumulates that into per-run, per-card, and per-autopilot-session totals, and a budget cap on the autopilot session stops the loop with stopReason: 'cost-budget' when total spend crosses the limit.The arithmetic that bites
A single Claude Sonnet run at medium effort on a small ticket costs roughly $0.40 to $1.20 and finishes in 2 to 6 minutes. That is cheap. The bite comes from compounding.
Four parallel autopilot slots, each running about $0.50 per minute of wall time, looping for an hour: 4 slots × 60 min × $0.50 = $120. Run that overnight on three issues and you wake up to $360 of CLI charges that no one approved. The economic case for parallel agents needs a meter on it.
Three rules keep this honest. Track every run. Roll up by card and by session. Halt automatically when a number you picked in advance is crossed. KanBots does all three.
Where the cost numbers come from
Both supported CLIs emit a final result JSON object on stdout when a run ends. It carries total_cost_usd, plus token counts and a success flag. The dispatcher's stream-parser classifies that line as the result event and writes the cost onto the agent_runs row. Sum across runs for a card to get the card total; sum across runs in an autopilot session to get the session total.
Three budgets exist in .kanbots/config.json:
{
"defaults": {
"runCostBudgetUsd": 2.50,
"sessionCostBudgetUsd": 25.00
}
}runCostBudgetUsd is per-dispatch. The dispatcher kills the run if the accumulated cost during a single CLI invocation exceeds the cap. sessionCostBudgetUsd is the autopilot cap — when the sum across every child run in the session crosses it, the orchestrator throws SessionBudgetExceededError and all slots return. Setting either to null (or omitting it) disables that cap. The Autopilot — Feature Dev modal has a budget input that overrides sessionCostBudgetUsd for the session you are about to start.
What the UI shows you
Three places to see the number, in order of zoom:
- The run card in the live thread shows the running dispatch's accumulated cost in the run stats row alongside model, elapsed, and tokens. It updates as
resultevents land. - The card detail shows the rollup across every run ever dispatched on that card — useful for asking "how much have we spent on this issue total."
- The autopilot session panel shows the cumulative spend across every child run in the session, plotted against the cap. When the bar fills, the session ends.
Why this beats fire-and-forget cron
A cron that runs claude -p on a queue has no per-run spend visibility, no rollup, no automatic halt. You discover the cost at the end of the month from the Anthropic console. By then the loop has already run for 19 days. KanBots reads the result event for every run and persists it before the next run starts; the budget cap is a constraint the orchestrator actually checks before claiming the next persona.
The economic shape KanBots inherits from its OSS thesis: bring your own keys, pay the model provider directly, never resell inference. The budget cap protects the wallet the keys are billed to — yours.
A worked budget
- Card #221, "Build invoice export." You open Autopilot — Feature Dev. Personas: product, engineer, reviewer, tester. Parallelism 2. Effort: medium. Model: Sonnet.
- Budget cap: $15. (Default the first time you ever run autopilot on a non-trivial card; raise as you learn the work.)
- Cycle 0 (product, $0.30): splits #221 into three subtasks. Cycle 1 (engineer, $1.10): writes the schema. Cycle 2 (reviewer, $0.45): approves. Cycle 3 (tester, $0.80): writes a roundtrip test, it passes.
- Cycles 4–7 (parallelism 2 picking up the remaining subtasks): $4.80 across four runs.
- Cycles 8–10: $3.20 polishing reviewer feedback. Session total now $10.65. Cap not hit.
- Cycle 11 spawns and finishes; total $11.40; loop sees no work left and exits cleanly. You spent $11.40 of the $15 you authorized.
Defaults and recommended caps
First-time autopilot on a card you have never touched: cap $10 to $25. Watch what the loop does for one cycle then decide whether to raise.
Repeated autopilot on a card type you have done before (CRUD endpoint, schema migration, small refactor): cap $5 to $15.
Large multi-day feature with four personas and parallelism 4: start with $50, watch the autopilot panel after 30 minutes, raise if the loop is making real progress and stop if not.
For the autopilot mechanics themselves see autopilot mode; for why a runaway backlog can spike spend see self-evolving backlog.
Three failure modes
You forgot to set a session cap. The budget input was empty, the session ran for two hours, total cost $87. Fix: set sessionCostBudgetUsd in .kanbots/config.json so the modal pre-fills it on every future session. The right default for your team's usage is whatever you'd be comfortable paying for one card.
Effort: max with parallelism 4. Each slot runs the most expensive model with the largest context. Cost rate spikes to $2/minute per slot — $480/hour at full burn. Fix: drop effort to medium for the parent issue's first pass and bump only for cycles that are obviously stuck.
The cap halts a near-finish. Session hit $25 with one subtask still mid-cycle. Cap was correct, you just want this one to finish. Fix: the orchestrator lets in-flight child runs complete their current iteration; new child runs do not start. To resume, open a fresh autopilot session on the same card with a smaller cap targeting just the remaining work.
When budget caps are wrong
Budget caps are wrong when the work is open-ended exploration and you genuinely want to spend whatever it takes. Set sessionCostBudgetUsd: null and watch manually. They are also wrong as a substitute for a real spec — capping spend on an ambiguous parent does not produce a good outcome, it just produces a cheap bad outcome.
For how spend is isolated per branch (so promoting one slot does not duplicate cost), see the feature-branch workflow.
Try it on your own folder
Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.
Related questions
- What is autopilot mode for Claude Code?Autopilot picks personas, parallelism, and budget. It loops until the work converges or the cost cap hits. The mental model and when to use it.
- How does multi-persona AI agent orchestration work?Product author → engineer → reviewer → tester. How round-robin persona cycles produce better output than single-persona loops, and how to configure them.
- How do AI agents fit a feature-branch workflow?One agent → one branch → one PR, isolated by worktree, with pre-push hooks preventing agent-side pushes. The exact branch naming and promote flow.
- Can an AI agent backlog evolve itself?When personas split a parent issue into subtasks, the backlog grows. How to keep that growth productive instead of runaway.