AI as a design critic: what a code generator doesn't do

AI is well known in the role of a code generator. A skill looks at a specification or a prompt and produces an implementation. That's useful, but it isn't the most valuable thing AI does in architectural work.

The most underrated role is AI as a design critic: checking the design before a single line of code is written. Validating the spec, going through an Event Storming draft, hunting for contradictions in the Ubiquitous Language, spotting aggregates overloaded with invariants. The things a human architect stumbles on precisely because they read the spec top to bottom and can't hold all 16 sections in their head at once.

What's inside (in 30 seconds):

Three roles of AI: code generator, code reviewer, design critic. They're often confused, but in reality they're different tasks with different classes of errors

9 categories of design checks: Maturity level consistency, Ubiquitous Language, Bounded Context, Aggregates, Actors/Roles, Commands, Domain Events, Failure Domains, Data Ownership, Acceptance Criteria, NFR

The ucp-spec-review skill — paired with ucp-spec-design. It makes the symmetry complete: every design skill now has a symmetric review

Existing service: the skill works on a spec reconstructed from the real state of the code — not only on a fresh draft

The trigger for its creation — a Senior architect's comment under a publication saying that in the industry AI does this class of work better than a human

Three roles of AI in working with code and architecture

They're often lumped together. Yet these are three distinct classes of tasks, with different requirements, different errors, and different points of entry into the process.

Code generator

AI writes code from a description. Prompt — specification — implementation. Example skills in our chain: ucp-pattern-design, ucp-api-design, ucp-ddd-tactical-design, ucp-bootstrap-design.

What AI does well: it reproduces idiomatic code from clearly defined rules. With the Use Case Pattern skills, the same task produces the same code across different developers, regardless of the implementation language.

What AI does poorly without rules: "average code from the training data" — the wrong persistence layer, a controller with business logic, missing idempotency. There's a separate article on this: «AI writes code. So why a methodology?».

Code reviewer

AI checks already-written code against the rules. Prompt — diff (or the whole file) — a list of findings citing the rules. Example skills: ucp-pattern-review, ucp-api-review, ucp-ddd-tactical-review, ucp-auth-review.

What AI does well: it goes through a checklist of dozens of rules faster than any human. It doesn't get tired. It doesn't skip "this is the third time already." It cites the rule code (R-7, JS-2.5, BR-C5) — so the finding is clear to the author.

What AI does poorly without context: it doesn't see "does this fit the architecture as a whole," it doesn't weigh trade-offs. That's the human's job; AI clears out the trivial so the human can focus on the non-trivial.

More on this — «How to review code that AI wrote».

Design critic

AI checks the design — the spec, an Event Storming draft, the Context Map — for contradictions, gaps, inconsistencies. Before a line of code is written. This is the most underrated role.

What AI does uniquely well:

Cross-section checks. A human reads the spec top to bottom. The glossary in §2, a command in §7, an event in §8, an AC in §15 — dozens of pages apart. A mismatch like "the command CancelOrder in §7 vs. the term cancellation missing from the glossary" a human will miss in 80% of specs. AI checks all 16 sections in a single pass.
Duplicates and synonyms in the Ubiquitous Language. Order vs. Purchase vs. Sale for one entity. An architect notices only once they're already sitting in review; AI catches it right away.
Counting invariants in aggregates. Here even an architect usually doesn't count — and an aggregate with 12 invariants is almost always a real split candidate.
Orphans. An event with no consumer, an actor with no role, a BR with no AC. The "something extra or something missing" class of errors — AI is stronger here.

What AI doesn't do: it doesn't make the architectural decision. It doesn't choose between Saga and 2PC, between event sourcing and snapshot. That's still the human.

9 categories of checks: what `ucp-spec-review` validates

The ucp-spec-review skill is paired with ucp-spec-design. A symmetric design ↔ critic pair, like the other methodology skills. Input — a spec (16 sections, maturity level 1/2/3) or an Event Storming draft; output — a list of findings with rule codes and priorities.

There are nine rule categories. Each answers a specific class of "what usually breaks in a spec."

1. Maturity level consistency (`SR-LV-*`)

The declared maturity level and the actual depth of the spec match. A spec for "Order Service (Level 3)" with no aggregates is a critical desync. A spec for "Notification (Level 1)" with three aggregates is over-engineering — move it to Level 2.

2. Ubiquitous Language (`SR-UL-*`)

Every glossary term is used in other sections. One concept — one name (no Order / Purchase / Sale for one entity). Commands and events from §7-§8 use glossary terms, not freshly invented ones. Every term has a definition, not just a mention.

This is the most common category of findings on fresh specs. On a Level 3 service with 30+ terms in the glossary, people don't have the working memory to check every mention.

3. Bounded Context (`SR-BC-*`)

§1 has an explicit scope AND not-scope. Every command in §7 belongs to this context, not a neighboring one. There's no invisible overlap with neighbors (Catalog is upstream — and yet in §3 we have Product as an entity? An error).

4. Aggregates / Domain Model (`SR-AG-*`, Level 3)

Every aggregate has ≤ 7 invariants (split candidate). The aggregate root is explicit, sub-entities are inside. There are no circular references. Value Objects are immutable. Identity types are OrderId, not Long.

5. Actors / Roles (`SR-AR-*`)

Every actor mentioned in §10 or §7 is present in the §5 roles. Every role has a permissions matrix over the §7 commands and §9 queries. Every command in §7 declares its initiating roles.

6. Commands (`SR-CM-*`)

For every command: a success result, a set of business errors (referencing §13), pre- and post-conditions. At Level 3 — the target aggregate. CQRS: commands don't return "fat" read DTOs. Money commands — mandatory idempotency.

7. Domain Events (`SR-EV-*`, Level 2+)

Every event has at least one consumer (internal or external from §14). Names are past-tense verbs (OrderPaid, not PayOrder). Retryable events require an idempotent receiver. At Level 3 — publication via the Outbox, not eventBus.publish() after save().

8. Failure Domains (`SR-FD-*`)

Every external neighbor has a strategy on failure (graceful degradation / queue / fallback path / reject). §16 NFR states where eventual consistency is acceptable and where strong consistency is mandatory. External calls have a timeout or a Circuit Breaker.

9. Data Ownership / NFR / Acceptance Criteria

Every entity in §3 has exactly one owner service. PII fields have an explicit storage and deletion policy. Every business rule in §6 is covered by at least one AC in §15. Every command has a happy + error path AC. NFRs are measurable thresholds, not "fast" / "reliable."

Trigger: the market's reaction

This skill and this article didn't come from a plan — they came from a comment under a LinkedIn publication about the 7 steps to AI. A Senior architect wrote in the comment roughly this:

"AI works especially well at step 0 — validating an Event Storming draft, hunting for missing actors, checking the Ubiquitous Language for consistency. This isn't 'AI after the spec,' it's AI as a design critic. Often more useful than AI as a code generator."

A precise observation from a practitioner. Replying to it isn't enough; it makes sense to build it into the product. A day later ucp-spec-review appeared.

That's exactly the value of open feedback: the market highlights gaps faster than the team sees them on its own.

Existing service: the same skill on a reconstructed spec

An objection from the same comment: most AI coding is increments in existing codebases, not on a blank slate. There, step 0 is different:

Reconstruct the real boundaries — not from Confluence, but from an audit of the dependency graph, broker events, and shared DB tables. We reconstruct data ownership as it actually is.
The spec comes out partial — half of it is already done, and often done poorly. But even a partial spec can be run through ucp-spec-review to get a list of "what needs to be filled in right now, what can wait."
A decision by layers — what we touch along a clean boundary as a new module, and what for now keeps working by the rules of the existing architecture.

ucp-spec-review works in es mode (Event Storming draft) or on a partially filled spec — it skips rule categories that require fully filled sections and focuses on what's there.

This turns the spec audit from "let's rewrite the whole documentation from scratch" (which no one does) into "10 minutes on the skill, get a prioritized list — what's critical, what can wait."

The design ↔ critic ↔ code chain

To make it clear how this fits into the overall process:

business description / Event Storming draft
        ↓
ucp-spec-design   →  spec in docs/spec/  (16 sections, level 1/2/3)
        ↓
ucp-spec-review   →  list of findings with rule codes
        ↓
human edits the spec (or re-runs ucp-spec-design with corrections)
        ↓
ucp-spec-review (Fast)   →  0 Critical
        ↓
ucp-pattern-design / ucp-ddd-tactical-design / ucp-api-design
        ↓
code in repo
        ↓
ucp-pattern-review / ucp-api-review / ...   →  findings on the code
        ↓
PR review by a human
        ↓
merge

The "spec → code" transition used to be irreversible. The spec is written, code is written from it, two weeks later a hole is found — the code is rewritten. With the spec-design ↔ spec-review skill pair, holes are highlighted before code generation. This is what "AI as an architect's partner" means, not "AI as a typewriter."

Where it applies, where it doesn't

It applies:

A new service from scratch — after ucp-spec-design, run ucp-spec-review before the first code generation.
Auditing an existing service — you have a standard that isn't being followed. Import the spec as markdown, run it through ucp-spec-review — get a snapshot of its weak spots.
An Event Storming draft — after the workshop, before formalization. es mode will highlight missing actors and language inconsistencies.
A regular design audit — once a quarter, run it over all the service specs in the monorepo. On large teams this removes "Confluence rot."

It doesn't apply:

A very thin spec (< 5 of 16 sections filled). The skill will return "not enough material" — nothing to check.
Making the architectural decision — Saga vs. 2PC, monolith vs. microservices. That's the human's job or superpowers:brainstorming, not a review skill.
Sanity-checking the business logic for correctness — a spec can be perfectly internally consistent yet implement the wrong business process. That's the job of a business analyst and a product manager.

What's next

UCP skills — catalog and installation — how to add ucp-spec-review to your project (after cloning the repo, install.sh picks it up automatically)
Use Case Pattern: a step-by-step guide — how ucp-spec-review fits into the "business description → code" chain
Use Case specification: a universal template — the format the skill validates against
Adoption on your team — the services page describes the format: 2–6 months, methodology + AI skills + support from the author