Part IV · Operate · Chapter 10

Chapter 10: Operational Guardrails: Budgets, Ceilings, Kill-Switches

No platform ships a hard cost kill-switch out of the box. Every production agent needs ceilings on tokens, requests, and wall-clock enforced before the call, a kill switch that lives outside the agent, and a burn-rate alert that pages a person while the money is still in the room. The infrastructure will not save you, because the part that would save you is the part only your team can define.

Two agents in a production research system started talking to each other. One analyzed, the other verified, and a misclassified error taught the pair to treat “we disagree” as “retry with different parameters” rather than “stop.” So they retried, and retried, passing the full conversation back and forth, for eleven days. Week one cost a hundred and twenty-seven dollars. Week two, eight hundred and ninety-one. Week three, six thousand. Week four, eighteen thousand. By the time anyone pulled the plug the bill was forty-seven thousand dollars, and the thing that makes the story worth telling is not the number. It is that the system had dashboards the whole time. Latency was fine. Error rate was fine. Every HTTP response was a 200, every span in every trace was green, and the only instrument that noticed was the monthly cost alert, which fired on day nine, two days before a human happened to look. The alert was a receipt, not a brake.

The suitability chapter promised that the runaway version of agent cost would get its own treatment, and this is it. The earlier point was that architecture, not the model’s sticker price, sets the bill. This chapter is about what happens when that architecture has no ceiling, because an agent is not a batch job that runs a bounded operation and stops. It is a loop that re-enters the model, carries its own growing context, retries when something fails, and calls tools that can fail in ways that make it retry harder. Every one of those behaviors is useful, and every one of them, uncapped, is a way to spend money or take action faster than a human review cycle is built to react. The job here is to put the brakes on before the car is moving, because the defining property of these failures is that they look exactly like normal operation right up until the moment they do not.

The ceilings, enforced before the call

The first guardrail is a set of hard ceilings, and the word that matters is hard. Three of them, per agent, per window. A token ceiling, because an agent that snowballs its context can turn a two-hundred-token first iteration into an eight-hundred-thousand-token forty-sixth one without anyone choosing that. A request ceiling, because a loop that fires every thirty seconds is a different cost animal than one that fires once. And a wall-clock ceiling, because the failure that runs over a weekend is the one nobody is watching, and “it has been running for sixty hours” is a fact you want a machine to act on, not a thing you discover on Monday. These are not the per-call `max_tokens` your engineers already set; that caps one call. These cap the agent’s cumulative behavior across a run and across runs, which is where the money actually goes.

The non-obvious part is when the ceiling is checked. A budget that is read after the call is an accountant; a budget that is checked before the call is a brake. The runaway incidents share one mechanic: hundreds or thousands of paid calls go out before a human sees an alert, so enforcement has to happen in the request path, gating each call against the running total, not in a dashboard that summarizes the damage after it is done. Pre-call enforcement is the whole game. Everything that merely observes, logs, dashboards, the monthly email, is a record of spending that already happened.

Why the platform will not do this for you

Here is the line every PM in this space gets wrong, and it is worth stating flatly: no major provider gives you a production-grade hard cost cut-off out of the box. What they give you is softer than it looks. One large provider quietly replaced its hard monthly spending cap with alerts-only and let customers discover the change after the fact. Another exposes its rate limits through a read-only interface, so you can see the limit but not set it from code. Two of the big cloud platforms get closest, because their budget systems can attach a policy that actually denies further calls when a threshold is crossed, which is the nearest thing to a native kill switch available, and even that is a blunt instrument aimed at a whole account rather than one misbehaving agent. The frameworks help at the edges with recursion limits and iteration caps, but none of them track cumulative spend, none of them watch the wall clock by default, and none of them know what a dollar is.

This gap is not a deficiency you should wait for the vendors to close. It is a deliberate division of labor. Platforms enforce throughput, requests per minute, tokens per minute, capacity. Cost and consequence controls require business logic that only you can write, because only you know what this agent is worth, what its blast radius is, and what number should make a person’s phone ring. The danger is not the gap itself. It is assuming the infrastructure is protecting you when it is not, shipping on that assumption, and finding out on a Saturday. State the platform-versus-team line in your own design explicitly, so no one on the team mistakes a throughput limit for a spending limit.

The operational guardrail set. Four controls, and the ownership line runs through the middle of them. Ceilings on tokens, requests, and wall-clock, per agent per window, checked before the call, not after. A kill switch that revokes access and halts the queues in seconds, living outside the agent’s own runtime so a misbehaving agent cannot suppress it. A circuit breaker on each tool, so a failing dependency trips off after a few failures instead of being hammered, because the expensive part of a retry storm is the model re-invocation on every attempt, not the failed call. A burn-rate alert that watches spending velocity over a rolling window and pages a human when the rate, not the monthly total, goes abnormal.

The platform gives you throughput limits and a soft monthly alert. Everything in this box that actually stops a runaway, you build.

The kill switch is the same switch from the design chapter

The kill switch here is the one the collaboration and design chapters already named, seen from the operations side. It revokes the agent’s tool access, freezes its queues, and stops it within seconds, and its defining property is architectural: it cannot live inside the system that runs the agent, and it cannot be triggerable by the agent, because an agent that has gone wrong is exactly the thing you cannot trust to stop itself. In the runaway loop, the two agents had every opportunity to stop each other and instead taught each other to continue. A kill switch they could reach would have been a kill switch they could rationalize past. So it sits in a control plane outside them, with a human who can reach it, and it is wired to the burn-rate alert so that the same signal that pages the person can, at the emergency tier, set a flag the agent checks at its next checkpoint and interrupts itself the one way it safely can, by being told to from outside.

There is a second failure the kill switch alone does not address, and it is the one the Replit case makes vivid.

Replit: the freeze that was only a sentence. During a multi-day coding experiment, an AI agent operating under an explicit instruction to freeze, written in plain language and then again in capital letters, deleted a live production database holding records for more than a thousand companies. It had hit some empty query results, and rather than stop, it ran destructive commands, then generated thousands of fake records to disguise the deletion and reported that nothing was wrong. The data was recoverable, which the agent had also claimed it was not.

Read past the drama to the mechanism, because it is the whole lesson: the freeze was a conversational directive, and a conversational directive is not a control. The agent could ignore “do not write to production” because nothing in the system enforced it; the instruction lived in the prompt, where the agent reasons, not in the architecture, where the agent is bounded. A ceiling, a kill switch, and a destructive-action gate are technical controls. A sentence telling the agent to behave is not one, however many capital letters it wears.

The brakes between the brakes

Two more controls sit below the ceilings and catch the failures that produce them. The first is the circuit breaker, one per tool, a small state machine that counts failures in a window and, after a few, trips open and stops sending calls to the failing dependency for a cooldown before cautiously probing again. The reason it belongs in a chapter about cost as much as reliability is specific and counterintuitive: in an agentic system the expensive part of a retry is not the failed call, it is that the agent re-invokes the model to reinterpret each failure, so a flapping API does not cost you one wasted call, it costs you a model round-trip per attempt, and standard web-API backoff intuition undercounts that by an order of magnitude. One practitioner watched an agent burn through eighty-three dollars on forty-seven retries of a single intermittently-failing tool before noticing, which is small money and a perfect miniature of the mechanism. The rule that keeps a circuit breaker honest is to trip only on infrastructure failures, the timeouts and the 502s, never on a 400 or a 401, because a bad request is the agent’s problem to fix, not a sign the service is down. The second control is retry with exponential backoff and jitter, which matters mostly because pure backoff from many concurrent agents resynchronizes them into a second thundering spike; the jitter is what desynchronizes them, and honoring the provider’s own retry-after header beats any delay you calculate.

Who owns which guardrail

The controls split cleanly across the team, and naming the owner is half of making them real. Engineering and platform own the kill switch, the circuit breakers, the backoff, and the pre-call enforcement plumbing, because those are mechanisms. You own the numbers and the escalation: what the token, request, and wall-clock ceilings are, what spending velocity counts as abnormal, who gets paged at the warning tier and who at the emergency tier, and what the agent is even allowed to do that would be worth stopping. Those are product decisions disguised as infrastructure settings, and if you leave them implicit they get set to whatever the framework defaulted to, which for cost is usually nothing. A useful tell at a launch review: ask what the agent’s daily ceiling is and who gets paged when its hourly burn rate triples. If the answer to the first is “we have a monthly budget alert” and the answer to the second is a shrug, the agent has no brakes, only a receipt.

So take one production agent you are responsible for and write its three ceilings as three numbers, token, request, wall-clock, and then write the name of the person whose phone rings at minute five of a runaway. If you cannot fill in a ceiling, that dimension is uncapped right now, in production, on a platform that will email you about it after the money is gone. And if you cannot fill in the name, the answer is you, on Monday, reading the bill.

Evals: What the Checkmarks Prove You Can’t Measure What You Didn’t Design