The code is written — next it needs checking: reviewing and covering with tests. The agent helps on both fronts, but this is exactly the zone where it's easy to fool yourself: the agent that writes code and the agent that checks it are prone to the same blind spots. So here it especially matters who holds the boundary.

The agent as a first review pass

The agent is an excellent first review pass that takes the routine off a human:

  • catches the obvious — a forgotten error handling, edge cases, a style violation, suspicious places;
  • explains unfamiliar code and suggests what to look at more closely;
  • checks against given rules, if it has them (see configuring the agent and the executable standard).

But a first pass isn't the final one. What really matters in review — the contract at the seams, edge cases, behavior tests — is covered in detail in the article on how to review AI code. The key point: architectural and risky decisions stay with the human. The agent highlights, the human judges.

Test generation — with a caveat

The agent quickly writes tests, and that's a big help: covering routine cases, throwing together edge-value checks, raising coverage. But there's a trap hidden here:

  • Tests from the same agent inherit its blind spots. If the agent misunderstood how it should work, it'll write a test that locks the wrong behavior in as "correct". Such a test is green — and useless.
  • A test that checks nothing looks like a test. The agent sometimes generates checks that always pass.

So tests written by the agent get extra scrutiny: verify that they assert exactly what should be the case per the requirements, not what the current code happens to do. A good approach is to write tests from the acceptance criteria and expected behavior, not "cover what's there".

A verifiable criterion beats volume

For both review and tests, one rule holds: a check should have an objective criterion, not "the agent said it's fine". Tests either pass or not; a style rule is either observed or not. Rely on what gives an unambiguous signal, not on the model's confident tone — it's unrelated to correctness.

A separate strength of the approach is when review rules are put into executable form (executable standard, versus SonarQube): then the agent checks against the same standard on every PR, not "however it turns out this time".

What this means in practice

The agent takes the routine of review and test generation off you well, but it doesn't replace judgment — especially where the checker and the checked tend to err the same way. A product engineer uses the agent as a fast first pass and generator, but keeps architectural decisions, verifies that tests assert correct behavior, and relies on objective criteria rather than the model's confidence.

What's next

So the agent does part of the review and checks itself, and does it the same way from time to time, it's configured for the project: configuring agents — skills, rules, tools, and memory.