Ninety minutes to a real tool

Nineteen paintings turned up on this blog yesterday evening — hero images, mid-article figures — and I didn’t make any of them. A coding agent painted them, using a tool that did not exist when the evening started.

Earlier that evening the run had stopped. The agent scaffolding a page on this site reached the artwork and had nothing to make it with; its choices were to park the whole build and ask me to go produce a picture, or to ship a gray rectangle where the picture should go. It parked, and asked. You probably have an itch in this shape somewhere — the file you keep opening in the wrong app, the asset you keep making by hand. Mine was images.

imagegen is the fix, and its whole history fits inside the same evening: seventeen minutes from first commit to a live Homebrew tap, ninety minutes end to end. At 21:55 my site’s repo recorded a commit titled “Painted heroes and mid-article figures via imagegen,” written by the agent, which had already run the tool nineteen times to illustrate the blog you’re reading. The first outside user was an agent.

This morning the agent ran imagegen again, for this post about the tool, and the run wasn’t clean. The first draft of the workbench painting at the top of this page came back photorealistic — a photograph of a workbench where a painting of one should have been, exactly the product-photo realism this site’s design spec bans — and had to be re-prompted with the flatness spelled out harder.

The shape is deliberately unglamorous — a Rust CLI wrapping OpenAI’s image endpoints — because, as the README puts it, “the best interface to give an agent is a boring, generic CLI. CLIs compose with everything an agent already knows: pipes, loops, exit codes, $(...).” The same file compresses that stopped run to one line: “Without a tool for it, an agent either stops to ask you for assets or ships gray rectangles.”

Small as it is — 1,065 lines across four files, seven direct dependencies — everything in it is aimed at that user. Saved paths go to stdout and everything else to stderr, so mv $(imagegen gen "..." --quiet) site/hero.png just works. An agent that hits an error needs to tell “rephrase the prompt” apart from “the key is missing” without parsing prose, so exit codes are typed and stable: 0 success, 1 API or network error, 2 missing credentials, 3 blocked by moderation, 4 invalid input. The Homebrew formula asserts the failure contract in its own test block:

test do
  assert_match version.to_s, shell_output("#{bin}/imagegen --version")
  # No API key in the test environment: expect the auth error path (exit 2).
  output = shell_output("OPENAI_API_KEY= #{bin}/imagegen generate hello 2>&1", 2)
  assert_match "no API key found", output
end

And the repo ships a skill file — an instruction manual for agents — teaching cost discipline: draft at low for about $0.006 a shot, upscale the winner at high for $0.21. A tool whose primary user is an agent needs its manual written for one.

ProseDown is the same story at a different scale. The itch: markdown is everywhere — READMEs, design docs, agent output — and opening a heavy IDE just to read a file is friction. The README states the want exactly: “ProseDown is the thing you double-click from Finder and get a beautifully-rendered document before your hand leaves the trackpad.” That takes a native macOS app: a Tauri shell in Rust, a worker-based renderer in TypeScript, a SwiftUI layer for the parts only Swift can reach, about 3,800 lines all in, 8.2 MB on disk. Syntax highlighting and math lazy-load so “a document with no code pays ~0 for Shiki; a document with no math pays ~0 for KaTeX.” First source commit to signed, notarized, stapled, brew-installable release: seven days, eight commits.

The part of ProseDown I’d show a skeptic is the repo’s CLAUDE.md, which reads like a foreman’s briefing for the next agent on shift: “Performance isn’t a nice-to-have — it’s the product,” followed by prohibitions with reasons attached — don’t add scroll listeners, the ToC tracker uses IntersectionObserver on purpose, scroll handlers “will trash the 60-fps budget on long docs.” These repos are built with agents in the loop and it shows in the artifacts: the docs address a collaborator who reads fast, forgets nothing, and needs the why written down or it will helpfully undo your decisions.

A painterly collage of a small picture-dispensing machine and a reading lens on a workbench, beside a corked bottle and a workshop clock. — Ninety minutes and seven days.

Four words and the diffs you don’t read

“You can just do things” — the entirety of a Sam Altman tweet from December 2024, amplifying an attitude usually traced back to Pieter Levels — is permission, and permission was never the missing piece. Andrej Karpathy’s “vibe coding” tweet of February 2025 described the missing piece arriving — give in to the model, stop reading the diffs — and he scoped it himself: “not too bad for throwaway weekend projects.”

A brew-installed CLI with typed exit codes sits outside that scope. Simon Willison drew the line precisely: if you reviewed the code, tested it, and could explain it to someone else, “that’s not vibe coding, it’s software development.” Both of mine sit on the working side of that line — unit tests, documented failure modes, a notarized binary — at a cost that used to buy a vibe-coded toy. The itch is the old part — Eric Raymond’s 1997 line about every good work of software scratching a developer’s personal itch, McIlroy’s do-one-thing-well rule at nearly fifty. What changed is the price of doing it properly.

Robin Sloan named the aspiration in 2020: an app can be a home-cooked meal, software made for your own household with no pivot and no flood of ads coming. But his essay also recorded the blocker — roughly half his build time went to “wrestling with different flavors of code-signing and identity provisioning,” and he wished for “a modern, flexible HyperCard” that would have made it a one-day build. Geoffrey Litt predicted in 2023 that LLMs would dissolve exactly that bottleneck. About three hours into ProseDown’s day one, I had a release script that does the half Sloan lost his time to — build, codesign, notarize, staple, DMG, GitHub release, tap update — in one command.

So the threshold moved: an itch used to have three outcomes — live with it, bodge a script, or subscribe to somebody’s SaaS — because a real tool, with distribution and docs, only penciled out if it had an audience. Now the itch-to-tool cost sits below the itch-to-workaround cost. A tool with exactly one user can have typed exit codes. Anthropic’s own study of internal Claude usage found that 27% of what people built with it wouldn’t have been built at all. That band of software that never used to exist is precisely where these two tools live.

Nineteen percent slower

A METR study found experienced developers 19% slower with early-2025 AI tools on mature repos they knew well — while believing they’d sped up. Its authors note the result doesn’t generalize to greenfield work, which is exactly what a ninety-minute CLI is, but the lesson holds: the gain shrinks as the codebase and its history grow. Security scans keep finding flaws in roughly half of AI-generated code, which is why the review-test-explain line has to hold. And every tool you build is a tool you now maintain — agents cut that cost too, but only for as long as I keep reviewing what they do, and the reviewing stays on me.

There’s also a quieter catch, from the researchers who’ve pushed malleable software longest: generation alone doesn’t give you agency over your computing — “Bringing AI coding tools into today’s software ecosystem is like bringing a talented sous chef to a food court.” You can generate all the code you want; if it has nowhere composable to live, you’ve built nothing you control. Which is why both of these are terminal-and-filesystem tools distributed over Homebrew rather than web apps. Files, pipes, exit codes, $(...) — the substrate McIlroy’s generation left behind turns out to be the one an agent can actually cook in.

If you have an itch of your own, both repos are small enough to read in a sitting. The workbench painting on this page is the agent’s second try — on register, shipped.

Chris Chabot · July 2026

technical blog