Can AI agents handle tech debt cleanup?

Yes — scope each agent to one file or one module, run them in parallel slots under QA autopilot, and let the typecheck/test gate fail-close on every change. An M2 Mac runs 4 slots comfortably; an EC2 c7i.4xlarge runs 8+. The whole point is that refactors are small, bounded, and verifiable.

Tech debt is the textbook fit for parallel coding agents because each unit of work is independent. Renaming a deprecated API call across 60 files is 60 separate jobs with no inter-job dependencies. KanBots was built to dispatch those 60 jobs from one board with the QA autopilot guarding every merge.

The trick is single-file scope and a green-tests gate. The worktree never lands on your real branch without the configured check commands passing, so the failure mode of "agent made a confident change that breaks the build" turns into "worktree sits in the discard pile while you look at it" — not a broken main.

The pattern: many small agents, not one big one

It's tempting to dispatch one agent on "migrate everything from moment to date-fns." Don't. Single-large-task runs blow context budget, lose track of which files they've already done, and produce inconsistent transformations across the codebase. Instead:

  • Generate a card per file (a script or a 30-second product persona run will do this).
  • Each card's title is Migrate moment → date-fns in src/billing/format.ts.
  • Each agent's prompt is "do this for exactly this file; if you need to touch another file, stop and ask."
  • Run with QA autopilot enabled so typecheck and tests run after the change. The worktree is gated.

With 60 cards and parallelism 4, you finish in roughly 15 cycles — at the rate of one agent finishing per cycle, not all four sequentially — because the slots are independent. A typical small refactor runs 4–7 minutes per agent including the test run, so the whole sweep takes 60–90 minutes wall time.

How KanBots wires this up

Open the autopilot modal, pick QA flavor (not Feature-Dev — you already have the cards), and set parallelism. The QA autopilot loop is "dispatch on a card, wait for completion, run check commands in the worktree, if green mark the card review, if red dispatch a fix run on the same card." So a single misfire doesn't tank the run; the autopilot iterates until the check commands pass or the per-card cycle budget hits the cap.

Configure your check commands in .kanbots/config.json. For a typical TypeScript repo:

{
  "checks": {
    "typecheck": "pnpm typecheck",
    "tests": "pnpm test",
    "lint": "pnpm lint"
  },
  "autopilot": {
    "qa": {
      "max_fix_cycles_per_card": 3,
      "session_cost_cap_usd": 50
    }
  }
}

max_fix_cycles_per_card: 3 is the safety net. If an agent's first attempt fails the typecheck, autopilot dispatches a second agent on the same card with the failure output in context. Third strike and the card moves to failed for human eyes — usually because the refactor crossed a boundary the single-file scope wasn't enough for.

Walkthrough — a real tech-debt sweep

  1. Pick the debt. Say you're removing all uses of a deprecated logger. Run rg -l "from '@old/logger'" src/ to get the file list.
  2. Generate cards. Either script it (kanbots cards create-from-list files.txt via the OSS CLI) or dispatch a single product-persona agent with the file list and "create one card per file" prompt.
  3. Open autopilot. Flavor: QA. Parallelism: 4 (or 8 if you're on a big box). Cost cap: $50 for the session.
  4. Point it at the column with your debt cards. Click Start.
  5. Watch the board. Cards flip running review as their typecheck passes. failed cards are the ones to look at first — they're where the refactor leaked across files.
  6. When the session halts, you have a column of review cards. Each has a worktree and a one-line diff summary. Promote them as a batch: either land them all as individual commits, or open one mega-PR.

Failure modes and fixes

The agent edits files outside its scope

Symptom: an agent dispatched on src/billing/format.ts also edits src/billing/types.ts because a type changed. Now two agents touching the same module are racing each other in parallel slots. Fix: the engineer persona prompt for tech debt should be tight — "edit only the file named in the card title; if a type or interface elsewhere needs to change, stop and post a decision request." When the decision lands, you batch the type changes into a parent card the autopilot picks up first.

The test suite is too slow for parallelism

Symptom: 4 agents running typecheck simultaneously thrash CPU and each typecheck takes 3x longer than baseline. Fix: in .kanbots/config.json, set "checks.typecheck": "pnpm typecheck --skipLibCheck --incremental" or scope to changed files only. If your suite simply doesn't scale to parallel slots, drop parallelism to 2 and accept the longer wall time. The math still works out compared to single-threaded.

The check commands pass but the code is wrong

Symptom: typecheck and tests are green but the refactor changed observable behavior. Fix: this is why Promote → draft PR exists. Don't auto-land; review the diff. For high-stakes refactors, add an e2e check command that runs Playwright/Cypress; the QA autopilot will run it as part of the gate.

When this is the wrong tool

The pattern requires single-file scope. If your refactor crosses module boundaries — moving a function from one package to another, renaming a public API used in three apps, changing a database schema — single-file scope can't express the change. The agent will either refuse, or worse, half-do the change in a way that the typecheck doesn't catch.

Use a different shape for those: one careful agent on Heavy effort with the entire affected scope in its prompt, not a swarm of small ones. Or split the work manually into a sequence of single-file changes that can be parallelized.

Performance refactors are also dicey. The agent can verify "tests still pass" but not "this code is now faster." Benchmarks need a human or a custom check command wired into the QA gate.

See also: the safety patterns for running parallel agents without clobber and machine-level guidance on how many slots your laptop can really run.

Try it on your own folder

Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.