A test file made the agent look less magical and more dangerous.

Read test.txt. Wait ten milliseconds. Write different text to the same file outside the agent. Ask the agent’s edit tool to replace World with Universe. The correct result is not a clever patch. The correct result is an error:

File was modified externally since last read. Please re-read the file before editing.

One small refusal changed the status of the project. Before that test, the prototype was a way to study coding agents from the inside: Bun, TypeScript, Anthropic streaming, tool calls, a terminal UI, and enough borrowed anatomy from OpenCode and Claude Code to see the pattern. After the test, the interesting question was narrower and better: what has to be true before an agent is allowed to touch a user’s files?

One file, two clocks

Implementation keeps a per-thread record of files the agent has read. A successful Read stores the file path and mtime. Before Edit writes, it checks the current mtime against the recorded one. If the file moved under the agent’s feet, the edit stops.

Closer to a door latch than a grand autonomy feature. Without it, the agent can overwrite the correction you made in an editor five minutes ago because its context still contains the older file. With it, the model has to come back through observation before it acts.

Small edge cases tell you where the boundary really sits. A different thread does not inherit the conflict. A missing file clears the record. A successful edit records the new state, so the next edit can proceed without treating the agent’s own write as hostile. The rule is not “never edit.” It is “do not pretend stale knowledge is authority.”

At that point, coding-agent design stops being a prompt-design problem. The prompt can say “be careful” forever. The product has to notice the file changed.

Resource batches

Parallel tools produced the same lesson from another angle.

Happy path first. If the model asks to read A.txt twice and write B.txt, those calls can run together. They do not threaten the same resource. If it asks to read A.txt and edit A.txt, the calls belong in different batches. The test names the whole rule:

Read(A.txt), Read(A.txt), Write(B.txt) // one batch
Edit(A.txt)                            // next batch

In a chat demo, this machinery disappears. A user sees the agent “using tools.” The runtime sees resource keys and modes. Read declares a read lock on a path. Edit, Write, and Delete declare write locks. Two reads can share. Anything involving a write to the same path waits.

Serial execution would be safer and worse. A useful agent wants to search, inspect, and test without standing in line for no reason. Full parallelism would be faster and dishonest. Resource batching is the middle claim: move quickly where the work is independent, slow down where state can be damaged.

Delegation boundaries

Once file authority became explicit, subagents looked different too. They were not cute names for different prompts. They were permission shapes.

Six subagents made the boundary visible:

AgentSpecialtyPhilosophy
TaskImplementation”Just get it done”
FinderCode search”Find first, ask questions later”
OracleTechnical advice”Measure twice, cut once”
PainterFrontend work”Form follows function, but form matters”
LibrarianDocumentation”The truth is in the source”
KrakenBulk changes”One pattern, many files”

Names matter less than fences. Finder and Oracle are read-only. Task and Painter can edit. Kraken splits bulk work into a read-only scoping pass and an executor with focused write access. Subagents are also fire-and-forget: they cannot call other subagents, and only their final message returns to the orchestrator.

Blunt, but useful. Delegation without a boundary is just recursion with better branding. A finder that can edit has stopped being a finder. A bulk executor with shell access can turn a mechanical refactor into an unbounded side-effect machine. The product has to remember why the work was delegated in the first place.

A painterly editorial collage for inspectable coding agents, showing planner, tools, memory, and execution loop.
Planner, tools, memory, and execution loop.

Permission boundaries

Permissions are the visible version of the same argument.

Rules are evaluated in order. First match wins. Safe reads can pass. Common build commands can pass. git push, rm -rf, and sudo pause. Custom rules can override the built-ins:

{ tool: "Bash", action: "ask", matches: { command: "*git*push*" } }
{ tool: "Read", action: "allow" }
{ tool: "Delete", action: "ask" }

Permission tests have an almost comic spread of commands: ls -la, cat file.txt, npm test, bun run build, git status, git diff HEAD, git commit, git push origin main, rm -rf /, sudo rm file, some-random-command. That list is the product conversation in miniature. An agent that asks before every harmless read teaches users to click through danger. An agent that treats every command as safe has confused convenience with trust.

A permission dialog is the easy part. Making the boring path quiet enough that the dangerous path still means something is harder.

Managing context

Context management follows the same pattern. The loop estimates thread tokens, compresses old messages near the limit, keeps recent tool-use pairs together, and records compaction state so later summaries can be incremental.

None of that sounds as dramatic as “agentic coding.” It is the part that decides whether the next turn has continuity or amnesia. A summary that drops the wrong tool result can make the agent confidently explain a world that no longer exists. A stale file record catches one class of that problem on disk. Compaction catches another inside the conversation.

Course correction

Course correction is the least glamorous feature and probably the most familiar one to anyone who has watched an agent fail. Every three turns, the loop inspects recent messages for repeated failures. If the same tool keeps failing, or the work is visibly repeating, it injects a correction message instead of letting the model continue with more confident grammar.

Again, the mechanism is small. That is the point. Autonomy is not proved by the model continuing forever. Sometimes the most agentic move is to interrupt the agent.

Refusal changed the project

Building the prototype made the old sentence too small. “I asked the agent to change code” is not the product. The product is the loop around that request: which files it read, which writes were serialized, which commands paused, which context survived, which subagent carried which authority, and which stale assumption was forced back through observation.

By the end, the project had become more useful than its original study question. A test refused to edit a file.

Repository: https://github.com/chrischabot/coding-agent-poc

Refusal is the bench with the casing removed.

Chris Chabot · January 2026