Taste rules agents can fail

Kami’s most useful design rule is not its most lyrical one.

The page can say “good content deserves good paper.” Fine. The line that makes the system trustworthy is smaller and less decorative: do not use rgba for tags, because WeasyPrint draws double rectangles.

That is the turn. Without the rendering bug, Kami could be mistaken for moodboard prose: parchment instead of pure white, ink blue as the only accent, warm neutrals, serif-led hierarchy, controlled shadows, fixed line-height bands. With the exporter bug on the page, the taste becomes operational. A visual preference has met a failing renderer and returned as a rule an agent can obey.

Kami made the pattern easiest to see because the constraint is so plain. The system does not ask an agent to “make it elegant”, which has almost no operational content. It names the cheap impostor and blocks it. Use solid hex tag colours. Keep the exporter from drawing the wrong thing. Let the visual standard survive contact with the file that leaves the design tool.

AI work is producing more of these small hard artifacts now: skills, style guides, evaluation harnesses, reference repos, memory documents, and little warnings that say, in effect, when the agent is about to be clever, make it answer to something inspectable.

I saw the same movement in adewale/testing-best-practices, a Claude Code skill created on April 11, 2026. “Write sharper tests” would be another decorative wish. This skill loads language-specific guidance, pushes property-based testing when appropriate, looks for skipped tests and log-instead-of-assert behaviour, distinguishes test tiers, and asks the agent to assess assertion density and mock fidelity. Testing taste becomes an intervention.

Teaching an agent to be suspicious of its own tests is faintly absurd, of course. The absurdity is also the point. A model can generate a large number of plausible tests very quickly, and plausible tests are one of software’s more treacherous substances. They look responsible. They make CI turn green. They reassure everyone until the first real bug walks through them with its shoes on.

Plausible tests are not enough. Agents inherit suspicion more reliably when it is written down: a test with no meaningful assertion is paperwork, an integration test made entirely of mocks is theatre, and a suite which cannot fail for the right reason is not a safety net so much as a decorative hammock.

A painterly editorial collage for taste rules agents can test, showing rules, jigs, export checks, and design constraints. — Rules, jigs, export checks, and design constraints.

Amp’s Opus 4.7 note belongs in the same pane. Its practical advice is not to script every move for the model, but to define success clearly: no public API changes, no database changes, the relevant tests pass, the typechecker passes. Subtle distinction, important consequence. Steps lend the model a pair of hands; success criteria lend it judgement.

Capability makes the distinction more valuable. A weak model asks for instructions because it cannot infer enough. A strong model asks for constraints because it can infer too much. It can solve a neighbouring problem with fluent confidence, polish the wrong surface, or refactor a piece of code that was meant to be left alone. The craft is no longer just prompting the model forward; it is drawing the boundary within which forward motion counts as progress.

Kami, testing skills, and model-operation notes rhyme because they all take something that used to live tacitly in a senior person’s head and make it legible to a machine. Not perfectly, and not finally, but concretely. The designer’s restraint becomes a palette and a shadow rule. The test engineer’s scepticism becomes an anti-pattern catalogue. The staff engineer’s sense of “done” becomes a list of invariants the agent preserves.

Automation replacing taste is the tempting story, but it is too neat and mostly wrong. Automation increases the return on taste that can be expressed as structure. Vague taste still evaporates at the edge of the model. Structured judgment compounds.

Competent AI-generated work can still feel strangely bad. It has capability without discipline. The sentences parse, the UI renders, the tests run, the document exports, but nothing in the system has told the agent what it is making or which forms of success are cheap impostors. Not failure in the old sense. Abundance without standards, which is more exhausting because it arrives wearing the costume of productivity.

Interesting builders are responding by making standards portable. They are producing skills, style guides, reference implementations, evaluation fixtures, memory documents, and little systems that can be handed to an agent before the work begins. The strongest of these do not say “make it good.” They name the cheap impostor and put a fence around it.

In an earlier software era, infrastructure meant servers, queues, databases, and deployment scripts. Those still matter, but AI work adds a stranger layer: preference that can be run, checked, and failed. What a strong document feels like has to become a rubric. What a reliable test proves has to become an assertion. What an agent leaves alone has to become an invariant.

Hand-wavy standards no longer survive first contact. The agent does exactly what the rule permits, and the rule reveals whether the taste behind it was real.

Sometimes the whole argument is hiding in the small warning: do not use rgba for tags because the renderer creates double rectangles.

Chris Chabot · May 2026

technical blog