I had a suspicion that sounded less like a bug report and more like an accusation.
I would hand Codex a large Markdown file, maybe a 25KB design pattern, and say: follow this. Results would follow some of it. Sometimes the opening mood. Sometimes the broad component shape. Sometimes the easy parts. Large sections simply did not appear to enter the agent’s working reality. It felt like sending a letter abroad and receiving back a reply to the first page.
After a few runs, it stopped feeling random. First X, last Y, maybe a chunk from the middle. Some protective mechanism to prevent context overload. Sensible, probably. Infuriating, definitely.
So I asked the agent to investigate the code and find the actual rules, which is one of those small questions that opens a trapdoor under a lot of AI workflow advice. “Give it careful instructions” is true, but incomplete, because instructions only help when the runtime carries them into working context; a gentleman may own a fine library and still fail to open the relevant volume.
The agent had to do more than list the file, know the file existed, ingest a summary of the first page, or include the path in an attached context object. The instructions needed to survive into the work with enough fidelity to constrain the result.
The investigation found the shape of the problem: budgets and truncation paths. Project docs can share a fixed budget. Command output may be shaped with head/tail truncation. Large files can be summarized or partially included depending on how they enter the system. Exact mechanics matter, but they all point at the same operational fact: “I gave the agent the spec” is not the same as “the agent operated with the full spec.”
Plenty of bad AI UI work starts there.
You hand over a design document full of spacing, behavior, edge cases, and visual rhythm. The agent builds something plausible but wrong. You wonder whether the model has poor taste, or whether it ignored you, or whether your prompt was not forceful enough. Sometimes the answer is simpler and more annoying: the details were not actually in the model’s usable context by the time it made decisions.
Failure is subtle because the agent may still sound like it read the file. It can mention the title. It can reference a section. It can comply with the high-level direction. Missing pieces then feel like negligence rather than truncation.
But large instructions are not a spell. They are data. Data has transport rules.
Since then, I have wanted to use big design docs with agents differently.
When a file is load-bearing, attachment is not enough. Ask the agent to produce a coverage map before implementation. Which sections did it read? What are the non-negotiable requirements? What are the visual constants? What gets tested? Which parts are ambiguous? If the file is large, chunk it deliberately. Turn key requirements into fixtures. Use screenshots. Use browser validation. Ask for read-back of the details that matter before the first patch lands.
Not busywork. Context verification is the equivalent of asking the guide to point at the route before setting off.
Architecture docs deserve the same habit. A twenty-page protocol design can contain one paragraph that changes everything. If the agent misses that paragraph, it may build the wrong system with great confidence. Yelling “read carefully” in all caps does not solve it. A stronger workflow makes careful reading leave artifacts.
Summaries are not enough either. A summary can preserve the idea and lose the constraint. Design often lives in constraints: exact spacing, collapse behavior, empty states, animation timing, ordering, disabled states, what not to show, when to hide the internal chatter. Those are the first details to vanish when context gets compressed.
Uncomfortable conclusion: long-form instructions deserve tests just like code.
- For a UI pattern, create a rendered fixture and compare against it.
- For a protocol rule, create a regression case.
- For a writing voice, provide representative passages and ask for a style inventory instead of “write in my voice.”
- For a large document, make the agent prove which parts it absorbed and which parts still require a spot-check.
Session-derived writing has the same risk. If the archive is supposed to represent actual work, it cannot become atmospheric context. It becomes evidence: dates, prompts, incidents, root causes, corrections, all traceable.
Otherwise the same bug repeats at the level of memory. The agent will write something plausible, it may even sound polished, and none of that will matter if plausible is not the standard.
Load-bearing instructions deserve an ingestion check: before implementation, ask for a map; before trusting the map, spot-check the source; before publishing the result, remove anything that cannot be traced.
This makes the workflow slower, which is fine; accuracy is allowed to cost time.