The Local Cloud Should Be Inspectable

Field entry, 7 May.

Agentic development is not usually slow because the agent cannot type fast enough. It is slow because the work keeps leaving the room.

A coding agent can read a file, change it, run a test, inspect the failure, and try again with the brisk confidence of a small team that has accidentally been condensed into one process. Then it touches the actual application boundary: KV, object storage, a queue, a database, a Durable Object, a scheduled job, a browser renderer, an AI gateway, a Git-backed artifact store. Suddenly the clean local loop turns into a ceremony of accounts, dashboards, remote state, partial emulators, CLI flags, and logs that live just far enough away to become folklore.

That is where the speed goes. Not in the model. In the trip from cause to evidence.

For agentic software, the platform around the code is part of the toolchain. If the local version of that platform is simple, quick to start, possible to inspect, easy to reset, and boring enough to trace, the agent can work like an engineer: make a hypothesis, run the smallest check, read the real state, and correct the implementation. If the local platform is a black box with a friendly command wrapped around it, the agent starts behaving like everyone behaves around black boxes: it guesses, retries, adds defensive code, and occasionally writes the software equivalent of “perhaps the spirits were unavailable.”

This is the practical reason for open-cloud: a self-hosted, API-compatible replica of the Cloudflare developer platform, written in TypeScript on Bun, backed by SQLite and the filesystem. It is not trying to be the edge. It is trying to make the edge-shaped development loop local enough that a person or an agent can hold the whole system in their hands.

Cloudflare’s own local story is better than most. workerd is open source, and Miniflare moved onto workerd so local Workers run on the same underlying runtime family as production. The current local development docs make the split explicit: worker execution can be local while bindings may be locally simulated or connected to remote resources.

That is useful, and also not quite the same thing as owning the whole development substrate. The wider pattern in developer infrastructure is familiar by now: an open local tool becomes a hosted product surface, a generous simulator turns into an implementation detail of a vendor CLI, or a once-inspectable project is archived, absorbed, or made irrelevant by the closed part of the platform growing around it. This is not always villainy. Sometimes it is just maintenance, economics, and the quiet gravity of the business model. But the result is the same for local agent work: the piece you need to understand is now the piece you cannot quite open.

The problem is not that every cloud service should have a perfect clone. The problem is that development gets worse when the only honest version of a service is remote.

What local means here

open-cloud takes a narrower and more useful position than “run Cloudflare on your laptop.” It targets Cloudflare API compatibility, not bit-for-bit platform parity. A Worker written against env.DB, env.CACHE, env.BUCKET, env.QUEUE, or env.ARTIFACTS should run unchanged on the real platform, but global propagation delay, anycast routing, internal production limits, and other edge-shaped weather are deliberately outside the local contract.

That distinction keeps the project honest. The goal is not to impersonate a planet. The goal is to provide the local services that let you build, test, debug, and reason about the application before the planet gets involved.

The service list is broad because modern Workers applications are broad: Workers, D1, KV, R2, Durable Objects, Queues, Vectorize, Workers AI, AI Gateway, Cache, Static Assets, Browser Rendering, Cron Triggers, Rate Limiting, Secrets, Hyperdrive, Analytics Engine, Email Workers, Workflows, MCP, OAuth, service bindings, and now Artifacts. Most of those services reduce to a familiar local shape if one resists the temptation to decorate them: SQLite for durable ledgers, filesystem blobs where bytes should be bytes, in-process routers for internal calls, and JSON logs for the trail.

This is not glamorous architecture, which is one of its better qualities.

The server starts as one process. PlatformServer owns HTTP routing, admin endpoints, optional built-in services, shutdown, metrics, and persisted worker registration. WorkerRuntime owns route matching, dispatch, ctx.waitUntil, service-binding targets, and optional Worker-thread isolation. createBindings() builds the env object at deploy time, caches shared service instances by resource id, and keeps the binding shape close to Cloudflare’s own worker types.

The shape matters because agents do not only need APIs. They need stable nouns. If a worker uses env.DB.prepare().bind().first(), the local version should not make that call pass through a bespoke testing abstraction with a cheerful name and a different set of lies. The more the local service looks like the production binding, the less translation work the agent has to invent.

State with handles

SQLite is the load-bearing material of the project. D1 is SQLite because D1 is already SQLite-shaped. KV is a table per namespace. Queues are durable rows with pending, processing, retry, and dead-letter states. Durable Objects keep per-instance storage in their own SQLite files while a coordinator serializes calls through an actor loop. R2 stores blobs on disk and metadata in SQLite. Cache persists request and response records. Secrets and rate limits use the same general discipline: put the important state somewhere inspectable, with enough structure that tests can prove behaviour without mocking the thing being tested.

That last clause is doing more work than it first appears to. A mock can tell you that your code called a queue. It cannot tell you that the queue recovered a stuck message after a process restart, that the dead-letter policy fired at the right boundary, or that a Durable Object alarm survived long enough to be restored. Those are not unit-test details. They are the product.

The test suite therefore spins up real services against temporary directories. No global state. No fake SQLite. No worker dispatch mock. The full suite is now 451 tests across 25 files and runs in roughly nine seconds on this machine, including a real git CLI integration test for Artifacts. That is the kind of number that changes behaviour: if the real integration test is cheap enough, the agent can run it instead of narrating why it probably would have passed.

This is one of the central benefits of an inspectable local cloud. It turns platform behaviour into ordinary evidence. Want to know what a namespace contains? Open the SQLite file. Want to reset a service? Delete the temp directory. Want to understand why a binding returned a value? Read the implementation file, not a dashboard export and a prayer.

Artifacts made the point sharper

The new Artifacts service made the argument less theoretical. Cloudflare Artifacts gives Workers a Git-native storage surface for agent workflows: create a repo, fork it, hand an agent a remote and a token, let it push work, inspect the result. That is exactly the sort of service that agentic development wants, because agents produce code, diffs, histories, and small working worlds, not just chat messages with aspirations.

The local version in open-cloud is a from-scratch TypeScript Git server. It does not shell out to git, and it does not use isomorphic-git. Each repository is one SQLite file under data/artifacts/{namespace}/{name}/repo.sqlite. Objects are zlib-compressed and chunked into rows. Refs, deltas, operation logs, and repository metadata live beside them. The REST control plane and Git smart-HTTP endpoints sit behind the same platform server as the rest of the services.

There is a faintly unreasonable amount of machinery hiding behind that calm description: protocol v1 and v2, info/refs, ls-refs, fetch, upload-pack, receive-pack, side-band framing, pack reading and writing, delta application, shallow clone by depth and date, streamed pack responses, hashed tokens, per-token and per-IP rate limits, quotas, token audit logs, fsck, garbage collection, and a small pack reuse cache. The point of building it locally is not to win a prize for reimplementing Git badly in a different language, which would be a very competitive category. The point is that the agent’s repository service can now be tested, inspected, forked, reset, and debugged in the same loop as the worker that asked for it.

The tricks are pleasingly concrete. Forks use link(2) after a WAL checkpoint for instant zero-copy creation, then VACUUM INTO breaks the hard link before either repository diverges. Tokens are stored as SHA-256 hashes, bound to a namespace and repo, and compared without handing a leaked registry database a usable credential. Pack responses stream so a large clone does not materialize the entire framed pack in memory. Mutating operations take per-repo locks, and two-repo operations acquire those locks in sorted order so the deadlock does not get to feel clever.

Most importantly, the compatibility test talks to the real git binary over HTTP. It initializes a working tree, commits files, pushes to the local Artifacts service, clones the repo back, checks shallow clones, checks protocol variants, and verifies the worktree. That single test is worth a surprising amount of prose because it refuses to let the implementation be “Git-like” in the comfortable private sense. A real client either understands the wire format or it does not.

Debugging without theatre

The local platform also changes what debugging feels like.

In the remote version of many problems, the first task is finding the place where truth lives. The request log says one thing, the dashboard says another, the CLI says a third, and somewhere under that disagreement is the actual state transition. An agent can work through that, but it spends its context budget becoming a helpdesk clerk for the architecture.

In the local version, the truth has fewer hiding places. The server emits structured JSON logs. Admin endpoints expose health, readiness, liveness, worker lists, and Prometheus metrics. Service files are small enough to read. Persistent data is a directory full of ordinary files. The architecture document contains the request-flow diagrams rather than pretending the diagrams are a substitute for the code.

This does not make bugs disappear. It makes them embarrassingly available.

That is exactly what an agent needs. Agents are good at following evidence when the evidence is nearby and machine-readable. They are much worse when the evidence is spread across proprietary dashboards, partial logs, and “try this again with remote mode” advice. A local service that can be started in milliseconds, traced with logs, inspected with SQLite, and reset with a temp directory gives the agent the same advantage a human gets from a good test harness: fewer mysteries per unit of time.

The tradeoff

There is, of course, a cost to building local replicas. They can drift. They can create false confidence. They can make the wrong edge cases feel real because they are easier to test than the real ones. Pretending otherwise would be a fine way to build a charmingly wrong tool.

The answer is not to claim perfect parity. The answer is to draw the boundary clearly. open-cloud is for application behaviour, binding compatibility, local state, CI, self-hosting, and agent loops. It is not for proving global propagation semantics, production capacity limits, or every undocumented corner of a vendor’s internal control plane. When the platform-specific edge matters, test on the platform. When the application needs to iterate, keep the loop local.

This is the same division one wants in agent systems generally. Local truth for development. Remote truth for deployment. Do not confuse the two, and do not make either one carry the other’s job.

Why it helps agents

The benefit for agentic development is not simply speed, though speed is the visible part. The deeper benefit is that the agent can form better habits.

It can read the same service code the tests exercise. It can add a regression test against the real local binding instead of constructing a mock with suspiciously convenient behaviour. It can inspect persisted rows after a failure. It can run the server in a temporary directory, deploy a worker, call it, observe metrics, and shut everything down without needing a cloud account, a dashboard session, or a network connection. It can treat queues, object storage, databases, Git remotes, scheduled jobs, and service bindings as ordinary local instruments.

That is what makes the loop faster: the work remains close enough to verify.

At a certain point, agentic development stops being about whether the agent can write code and becomes about whether the surrounding system lets it find out what happened. A simple local platform is not a toy version of the cloud. It is a workbench version of the cloud. The edges are smaller, the labels are visible, and the screws are not hidden under a subscription tier.

That is the part worth preserving. Open source local infrastructure is not only a licensing preference; it is a debugging surface, a teaching surface, and an autonomy surface. If agents are going to build more of our software, they need local worlds where cause and effect are close together, where state can be opened, where tests can touch the real services, and where the platform does not disappear the moment the interesting question begins.

The useful cloud, in other words, is not always the largest one.

Sometimes it is the one you can start, break, inspect, and understand before lunch.