Maestro CLI made authentication feel like a moral failure.
You log in. The browser says yes. The app says yes. Then the app kicks you back to the browser, as if you had done something suspicious in the three seconds since being welcomed inside. It is less software than a very small border crossing with an overactive stamp.
Maestro CLI hit that exact shape. Running the CLI against dev opened the auth page. I confirmed the login. The app returned and said it was authenticated. Then it sent me back to auth again, a polite revolving door with delusions of security.
First suspicion: token storage, because token storage is where authentication systems go to misplace their correspondence. Maestro had multiple environments: dev, staging, and prod. Authentication in one did not work in another. That meant token storage had to be keyed by domain or environment, not treated as one global blob of “user is logged in” optimism, so we checked that and discovered the more annoying version: prod and staging worked, but dev did not, which is how you know the bug has put on a waistcoat and decided to lecture.
Next came the practical failures. Multiple browser auth windows opened. Each window made the debugging loop riskier because failed auth loops can become accidental rate-limit generators. A terminal app that repeatedly opens login windows is not merely wrong; it is rude.
After the second browser window, guessing had lost its charm. Diagnostics became the next feature.
We added an errors.txt path under the local app data directory and started logging the states the UI could not explain: socket errors, auth failures, login errors, URL and connection state. From there, the work shifted from “maybe tokens are overwriting each other” to “let the app leave a trail we can actually inspect.”
Only after the file existed could we see that the bug was not one bug. It crossed environment-specific token storage, token candidates, /project/settings, socket connection behavior, and terminal UX; the client had been assuming a single blessed token string where current, access, and ID-style tokens all mattered, dev could require user environment values that prod and staging did not, a socket attempt after preflight failure created another auth-looking failure, and a TUI without copyable diagnostics had turned the user into a logging system with eyes.
Auth debugging is maddening because we call it “authentication” as if it were a small gate at the front of the building. In practice it is a corridor that passes through storage, routing, browser handoff, backend configuration, session transport, environment policy, and error presentation.
Any one of those can fail. Many of them produce the same user experience: please log in again.
Repair started when we stopped treating the browser loop as the problem. The loop was the symptom. The product stopped launching the browser until it understood why the previous attempt had failed. It fetched project settings before trying the socket. It logged the exact status/body when that preflight failed. It showed missing user environment requirements rather than collapsing everything into “auth failed.” It refused the socket connect when the preflight was already telling us no.
Retries feel like resilience, which makes that last bit easy to miss, but retries without a diagnosis are just a faster way to make the same mistake. In an AI-assisted loop, this becomes more dangerous because the agent is perfectly happy to keep testing, keep opening windows, keep trying variants, keep producing motion. Motion is not progress. Sometimes progress is making the system refuse to continue until the error has a name.
What landed was less glamorous than a single elegant fix. It was a diagnostics path. It was environment-aware storage. It was preflight checks. It was less eagerness. It was making the app say, in effect: I am not going to pretend this is a login problem until I have asked the house what sort of guest it expects.
I keep the pattern because it travels well: auth can fail loudly enough to be fixed without opening browser windows until the user worries about rate limits; it can be generous with evidence and conservative with retries; it can distinguish “I do not know who you are” from “I know who you are, but this environment requires something else”; and for terminal tools, it can always leave behind a file you can copy from.
Because when the app is trapped between a browser, a socket, and a remote service, the user’s clipboard may be the most reliable instrument in the room.
The more I use agents for debugging, the less I trust single-cause explanations for boundary bugs.
Auth loops are rarely just auth loops. Streaming bugs are rarely just streaming bugs. “It does nothing when I click send” is rarely just a button.
A sharper question is “which boundary is hiding the evidence?” In this case the missing facts lived between the successful browser login and the failed dev runtime connection. Once the CLI could write down what happened there, the bug remained annoying, but at least it became less spooky.