Field entry, 3 May.

I had a suspicion that sounded less like a bug report and more like an accusation.

I would hand Codex a large Markdown file, maybe a 25KB design pattern, and say: follow this. The result would follow some of it. Sometimes the opening mood. Sometimes the broad component shape. Sometimes the easy parts. But large sections would simply not appear to have entered the agent’s working reality. It felt like sending a letter abroad and receiving back a reply to the first page.

It was not random. It felt like there were rules somewhere. First X, last Y, maybe a chunk from the middle. Some protective mechanism to prevent context overload. Sensible, probably. Infuriating, definitely.

So I asked the agent to investigate the code and find the actual rules, which is one of those small questions that opens a trapdoor under a lot of AI workflow advice. “Give it good instructions” is true, but incomplete, because the runtime has to actually read them; a gentleman may own a fine library and still fail to open the relevant volume.

Not merely list the file, know the file exists, ingest a summary of the first part, or include the path in an attached context object, but read the instructions with enough fidelity that the important details survive into the work.

The investigation found what I expected in spirit: there are budgets and truncation paths. Project docs can share a fixed budget. Command output may be shaped with head/tail truncation. Large files can be summarized or partially included depending on how they enter the system. The exact mechanics matter, but the broader lesson matters more: “I gave the agent the spec” is not the same as “the agent operated with the full spec.”

This explains a lot of bad AI UI work.

You hand over a design document full of spacing, behavior, edge cases, and visual rhythm. The agent builds something plausible but wrong. You wonder whether the model has poor taste, or whether it ignored you, or whether your prompt was not forceful enough. Sometimes the answer is simpler and more annoying: the details were not actually in the model’s usable context by the time it made decisions.

The failure mode is subtle because the agent may still sound like it read the file. It can mention the title. It can reference a section. It can comply with the high-level direction. That makes the missing pieces feel like negligence rather than truncation.

But large instructions are not a spell. They are data. Data has transport rules.

This changes how I want to use big design docs with agents.

If the file is important, do not merely attach it. Ask the agent to produce a coverage map before implementation. Which sections did it read? What are the non-negotiable requirements? What are the visual constants? What should be tested? Which parts are ambiguous? If the file is large, chunk it deliberately. Turn key requirements into fixtures. Use screenshots. Use browser validation. Ask for read-back of the details that matter before the first patch lands.

This is not busywork. It is context verification, the equivalent of asking the guide to point at the route before setting off.

The same habit applies to architecture docs. A twenty-page protocol design can contain one paragraph that changes everything. If the agent misses that paragraph, it may build the wrong thing with great confidence. The solution is not to yell “read carefully” in all caps. The solution is to create a workflow where careful reading leaves artifacts.

Summaries are not enough either. A summary can preserve the idea and lose the constraint. Design often lives in constraints: exact spacing, collapse behavior, empty states, animation timing, ordering, disabled states, what not to show, when to hide the internal chatter. Those are the first things to vanish when context gets compressed.

The uncomfortable conclusion is that long-form instructions need tests just like code.

If a UI pattern matters, create a rendered fixture and compare against it.

If a protocol rule matters, create a regression case.

If a writing voice matters, provide representative passages and ask for a style inventory, not just “write in my voice.”

If a document is large, make the agent prove it has not only seen it, but internalized the parts that will make the work succeed.

This is especially relevant for the field journal work itself. If these entries are supposed to represent my actual sessions, then the archive cannot become atmospheric context. It has to become evidence. Dates, prompts, incidents, root causes, corrections, all traceable.

Otherwise the same bug repeats at the level of memory: the agent will write something plausible, it may even sound good, and none of that will matter if plausible is not the standard.

Hand-drawn notebook detail plate showing large document budgets, coverage map, and missing clauses.
FIG. 02 — LARGE DOCUMENT BUDGETS, COVERAGE MAP, AND MISSING CLAUSES.

Field note

The new rule is that important instructions need an ingestion check: before implementation, ask for a map; before trusting the map, spot-check the source; before publishing the result, remove anything that cannot be traced.

This makes the workflow slower, which is fine; some things should be slower, and accuracy is one of them.