The temptation with a bug is to paste the error message to the agent and ask it to "fix it". Sometimes it works; more often the agent plausibly fixes the wrong thing, masks a symptom, or breaks a neighbor. Debugging with an agent works when it goes systematically, in the same steps as good debugging in general — the agent just performs them faster.
Reproduce first, then fix
You can't reliably fix what you can't stably trigger. The first step isn't a fix, it's reproduction:
- Work out with the agent the exact steps under which the bug appears, and the data it occurs on.
- Ideally — a failing test that reproduces the problem. This is gold: now the agent (and you) have an objective criterion "fixed = test green".
Until the bug reproduces stably, any "fix" is guessing. If the bug is flaky, tell the agent so: "reproduces about one time in three" — this changes the approach.
Hypothesis and localization — against the code
Next — not "fix it" but "find the cause". Ask the agent to reason and rely on the code:
- "Where is this value formed and why is it wrong here?"
- "Trace the path of this data from the input to the point of error."
- "What hypotheses are there for why it fails exactly like this, and how to check each?"
Require links to concrete places in the code, not general reasoning. An agent that reads files directly localizes the cause more reliably than a model answering from memory. And keep in mind that a confident hypothesis may be a hallucination — check it, don't accept it.
Fix the cause, not the symptom
Having found the cause, aim for the minimal fix that removes it, not a workaround that hides the symptom. Common anti-patterns to watch for:
- a stub that suppresses the error instead of fixing what caused it;
- a "just in case" change in five places when the cause is in one;
- tailoring to the specific case instead of fixing the general logic.
A good fix is explainable: it's clear why it fixes the bug.
Check that it's gone and nothing broke
A fix isn't finished until it's checked by two questions:
- Is the bug gone? Run that same reproducing test/steps — now it should be clean.
- Did anything nearby break? Run the rest of the tests. The fix could have affected a neighbor — that's a regression, and the agent won't recall it on its own.
This is part of the general acceptance of AI output: a fix is the same kind of result, which is verified rather than taken on faith.
What this means in practice
Debugging with an agent is a discipline, not the incantation "fix it". A product engineer leads the agent through the steps: reproduce stably (ideally with a test) → find the cause against the code → minimally fix the cause → check that the bug is gone and there's no regression. The agent gives speed at each step, but the order and judgment stay with the human; otherwise you get a quickly made non-fix.
What's next
Once the code is written and fixed, its quality is locked in: review and testing of code, and then — configuring the agent so it does part of these checks itself.