Building a Coding Agent from Scratch

Foundry - Agentic Workflow Engine

There’s a certain irony in using Claude to build a system designed to orchestrate Claude. It’s turtles all the way down, as they say, but the exercise proves more illuminating than recursive.

Foundry began as an attempt to answer a deceptively simple question: how do AI coding assistants actually work? Not in the hand-wavy “it’s machine learning” sense, but in the brass-tacks architectural sense. What makes them tick? What makes them fail? And more importantly, what would it take to build one from scratch?

To find out, I did what any self-respecting engineer would do: I studied the competition. OpenCode, an open-source coding agent, provided valuable architectural insights. But the more interesting exercise was reverse engineering Anthropic’s own Claude Code CLI—a Bun-compiled binary that, with some persuasion, yielded its TypeScript source. Examining how Anthropic structures their agent loop, manages tool execution, and handles context provided a fascinating reference implementation. Not to copy, but to understand the design decisions that shaped a production system.

The Anatomy of an Agent

Strip away the marketing language and an AI coding agent is, at its core, a loop. A rather sophisticated loop, to be sure, but a loop nonetheless:

Receive a message
Think about it
Decide what tools to use
Use those tools
Observe the results
Repeat until done

This is the THINK-ACT-OBSERVE pattern that underpins most agentic systems. The elegance lies not in the loop itself but in the machinery that makes each step work.

Parallel Execution and Resource Conflicts

The naive approach to tool execution is sequential: do one thing, then the next. But Claude, like any reasonably intelligent agent, often wants to do several things at once. Read three files. Search two directories. Run a test while checking git status.

The challenge emerges when tools compete for the same resource. Two simultaneous reads of the same file? Harmless. A read and a write to the same file? Potentially catastrophic. The solution is a resource-based batching system where tools declare their execution profiles:

// Multiple reads can run in parallel
Read(fileA) + Read(fileA) + Write(fileB) // All parallel

// But writes to the same resource must serialize
Write(fileA) + Read(fileA) // Sequential

This seemingly small detail turns out to be crucial for performance. Real-world coding tasks often involve dozens of file operations, and serializing them all would make the system feel sluggish. Parallel execution with intelligent conflict detection keeps things snappy.

The Delegation Problem

A single agent working in isolation hits limits quickly. Complex tasks require different modes of thinking: broad exploration, deep analysis, careful modification, bulk transformation. Asking one agent to context-switch between these modes is like asking a surgeon to also handle hospital administration, catering, and janitorial services—possible, perhaps, but not advisable.

The architecture that emerged uses specialized sub-agents, each tuned for a particular kind of work:

Agent	Specialty	Philosophy
Task	Implementation	”Just get it done”
Finder	Code search	”Find first, ask questions later”
Oracle	Technical advice	”Measure twice, cut once”
Painter	Frontend work	”Form follows function, but form matters”
Librarian	Documentation	”The truth is in the source”
Kraken	Bulk changes	”One pattern, many files”

The main agent acts as an orchestrator, delegating to specialists as needed. The Finder rapidly explores a codebase to answer “where is X defined?” The Oracle provides measured architectural guidance. The Kraken—so named for its many-tentacled approach—handles bulk refactoring across dozens of files.

Critically, sub-agents operate in “fire-and-forget” mode. They cannot call other sub-agents (preventing infinite delegation loops) and only their final message returns to the orchestrator. This constraint forces clean boundaries and prevents the architectural equivalent of a phone tree from hell.

The Trust Problem

Giving an AI agent access to your filesystem is, let’s be honest, a bit like handing your car keys to a teenager. The potential for mischief is considerable.

The permission system uses first-match rules with glob pattern support:

// Dangerous commands require confirmation
{ tool: "Bash", action: "ask", matches: { command: "*git*push*" } }

// Safe operations proceed without friction
{ tool: "Read", action: "allow" }

// Some things are simply forbidden
{ tool: "Delete", action: "ask" }

This allows a useful middle ground between “trust nothing” (too cumbersome) and “trust everything” (too risky). Read operations flow freely. Write operations proceed but are logged. Potentially destructive operations pause for human confirmation.

Context Management

LLMs have finite context windows, and coding tasks can sprawl across many files and long conversations. The system tracks token usage and employs automatic summarization to compress older conversation history while preserving the essential information.

File state tracking provides another layer of intelligence. If a file changes externally during a session—perhaps you edited it in your IDE while the agent was working—the system detects the modification and can alert the agent to re-read before making changes. This prevents the infuriating scenario of having your manual edits overwritten by an agent working with stale information.

Course Correction

Agents fail. They make incorrect assumptions, generate buggy code, misunderstand requirements. The architecture includes a course correction mechanism that detects repeated failures and attempts recovery strategies. When an agent finds itself in a loop—trying the same failing approach multiple times—the system can step back, reassess, and try a different tack.

This is perhaps the least glamorous but most practically important feature. Real-world coding is messy, and an agent that gives up at the first sign of trouble is worse than useless.

What It Teaches

Building a coding agent teaches you things that using one never could. You develop an intuition for why certain prompts work better than others, why some tasks confound AI and others don’t, where the architectural chokepoints live.

More broadly, it illuminates the current state of the art. We’re at an inflection point where AI can meaningfully assist with coding, but we’re nowhere near the “just tell it what you want” fantasy. The gap between those two realities is filled with careful engineering: resource management, permission systems, delegation hierarchies, context optimization.

The repository is open source and available at: https://github.com/chrischabot/foundry

It’s not meant to replace your existing tools. It’s meant to help you understand them. And perhaps, in understanding them, to imagine what comes next.