Part III · Failures · Chapter 14

Built Before, Authorized at the Moment

A health insurer ran an AI system to review medical-necessity claims, and a physician signed each denial, so on paper a human was in the loop on every decision. Reporting on the program described physicians signing off on more than three hundred thousand denials in a two-month window, at a reported average of around a second per claim. There was a human in the loop on every one of those denials, and there was no human oversight of any of them. The gate existed. It had been designed. Someone had specified that a physician must review each denial, and that specification was met, by the letter, at about a second a case.

This chapter is about the two things an agentic product decides before it ever runs, the security that bounds what the agent can do and the approval moment that decides what a human authorizes, and about how each of them, which the prior books handed largely to the product manager, is in truth a thing several people build together, with the product manager holding one piece. The insurer’s gate is the warning the whole chapter circles. It did not fail because no one designed it. It failed because designing a gate and staffing a gate are different jobs, and a gate designed by one person and funded by another, with no one owning the seam between them, is a gate that exists on the org chart and nowhere a patient could feel it. The two halves of this chapter, security and the approval moment, are the two places where the agentic product is most often handed to the product manager whole, and they are the two places where that handing is most quietly wrong.

Security was never five spec decisions

Start with security, because the prior books made the cleanest version of the over-assignment here, and it is worth correcting precisely. The argument those books made was that an agent’s security model is a set of decisions that belong in the product spec rather than buried in engineering documents, and they listed them: treat every natural-language input as untrusted, bound the blast radius of each tool, define the memory architecture, classify every action as automated or gated, make observability a compensating control. Five decisions, framed as the product manager’s to own and write down. Read against a real team, that framing puts the security of an autonomous system that acts on its own, in an adversarial world, on the desk of the person least equipped to build it.

The substance of security lives with people the triad never named. The threat model for an agentic system, the recognition that the agent is a non-human identity needing the same scoped, short-lived, cryptographically bound credentials a human would, the memory segmentation that decides what can write to the agent’s long-term store and what validates an entry before it lands, the privilege architecture that prevents the agent from escalating its own access, is the architect’s and the security engineer’s work, and it is deep work, not a spec bullet. An agent’s attack surface is not its API; it is its reasoning layer, the sequence of legitimate-looking tool calls it will follow when the inputs are assembled in the right order, and static analysis does not see that and a conventional penetration test does not model it. Finding the prompt-injection patterns that actually work against this agent, the inputs that walk it across its tool boundary one reasonable step at a time, is adversarial testing, and adversarial testing is a discipline and a role, the red team, which on most teams is not a standing function but a thing that happened once before launch, if it happened at all. A security section that is one paragraph about prompt injection is a checkbox; a security posture is an architect’s threat model plus a red team that keeps testing as the attackers learn.

What the product manager actually owns inside security is narrower and real: the blast-radius business judgment. For each tool the agent can call, what is the maximum acceptable damage if that tool were abused, and which actions are consequential enough that they require a gate by construction regardless of how convenient that makes the agent. That judgment, the worst acceptable damage per tool, the line past which an action must not run autonomously, is the product manager’s to own, because it is a question about what the business can absorb and what it owes the people the agent affects, and no architect can answer it. But it is one input to the security model, not the security model. The product manager names the blast radius the architecture must respect; the architect builds the architecture that respects it; the red team keeps proving whether it holds. Three roles, and the over-assignment was reading the first as the whole.

The approval moment is shared four ways

The approval moment is the place an agent pauses for a human to authorize a consequential action, and the prior books treated its design as the sharp new work the product manager owns. The design judgment is real and some of it is the product manager’s. But the approval moment, examined closely, is the single most shared artifact in an agentic product, because for the human’s authorization to mean anything, four different competences have to converge on one screen at one instant, and three of them are not the product manager’s.

The product manager owns which actions get an approval moment at all. That is the assignment problem, and it is a real one: gate too little and the irreversible action runs unwatched, gate too much and you manufacture the alert fatigue that trains the supervisor to wave everything through, which is its own way of having no gate. Deciding which actions stop and which run free, by harm asymmetry and reversibility and regulatory exposure and the cost of each review, is product work and it stays the product manager’s. But once an action is gated, what the human sees at that moment is a design problem, and it belongs to the designer. An approval moment is not a confirmation dialog; it is a decision package, and the difference is whether the human can actually judge or only click. The package has to show what the agent knows, what it is uncertain about, what proceeding will cost, and what the alternatives were, and it has to present that in a way a person under time pressure can absorb in the seconds they have, which is interaction design of the hardest kind, designing for a real human whose attention is decaying rather than an idealized reviewer who reads everything. The chapter on the designer’s craft is about exactly this; here it is enough to say the approval moment’s experience is the designer’s, not the product manager’s, and a decision package designed by someone without the designer’s skill is a wall of text the reviewer learns to scroll past.

What the human is being asked to judge is a domain question, and that belongs to the domain expert. The approval moment cannot ask the supervisor to re-derive the agent’s reasoning, because the supervisor usually cannot reproduce forty steps of machine reasoning and certainly cannot after months of the agent’s operation have dulled what they notice. What it can ask is for the human to authorize the action against what they know that the agent does not, the context, the stakes, the thing that is true about this case that never reached the model. But knowing what that thing is, what a competent authorization of this specific decision actually requires, is domain knowledge. What must a physician see to authorize a denial, what must an underwriter see to authorize an adverse credit action, what must a procurement lead see to authorize a payment-term change, is a question only someone who has made those decisions can answer, and a decision package that omits the field the domain expert would have insisted on is a gate that lets the wrong authorizations through while looking complete.

And whether the human can calibrate their trust at all depends on a signal the eval owner owns. An approval moment is most dangerous when the agent presents everything with equal confidence, because the human cannot tell the case the agent is sure of from the case it is guessing at, and a plausible-looking output triggers the same easy approval whether the agent is right or wrong. For the approval moment to help the human defer well, it has to carry a confidence signal that is actually calibrated, that means something, and a model’s own stated confidence is poorly calibrated by default, so the trustworthy signal is one the eval owner has measured against reality. A confidence number the human can rely on is an eval artifact, not a thing the model emits for free, and an approval moment built on the agent’s self-reported certainty is building the human’s judgment on a number no one checked.

So the approval moment is four hands on one surface: the product manager decides it should exist, the designer builds the experience of it, the domain expert decides what it must show for the authorization to be real, and the eval owner supplies the calibrated confidence that lets the human judge. Take any one of the four away and the gate degrades in a specific way, the unwatched action, the unreadable wall of text, the missing field, the meaningless confidence, and a release meeting cannot see which of the four is missing, because all four failures render as the same green confirmation step that someone, somewhere, will click.

The seam is the gate that was designed and never staffed

Return to the insurer, because it is the seam of this chapter. Someone designed a transaction-level gate: a physician must review each medical-necessity denial. That is the right design. The denial of care to an individual is irreversible and asymmetric and regulated, and it is exactly the kind of action that earns a per-decision human in the way. The design was correct, and it was the product side’s to make, and it was made. And then the gate was funded at a program-level budget, staffed as if it were an aggregate audit rather than a per-decision review, and a physician was given three hundred thousand decisions and a couple of months, which is about a second each, which is not review. The gate was a transaction-level mechanism funded at a program-level budget, and the person who designed it and the part of the organization that funded it were not the same, and no one owned the hand-off between them.

That is the failure, and it is not a design failure and not exactly a funding failure. It is a seam failure: the product manager designed a real gate and handed it across to an organization that resourced it as something cheaper, and the gate that existed in the design and the gate that existed in the budget were two different gates, and the gap between them was a patient at about a second. The lesson is the one this whole part keeps teaching from different angles. Specifying the control is not the same as the control existing. The boundary chapter showed it for enforcement, a rule the agent can ignore is not a wall. The eval chapter showed it for correctness, a green checkmark on an expired dataset is not a pass. Here it is for oversight: a gate designed but not staffed is presence performing as oversight, the absence of review wearing its uniform. A control is real only when the person who designs it and the people who build, staff, and resource it have closed the hand-off between them, and on the insurer’s team that hand-off was open, and the gate fell through it.

The gate that was drawn but never paid for

The failure was that a control can be designed by one person and funded, staffed, and given time by another, and when no one owns the distance between the drawing and the funding, you get a gate that exists on the slide and gives a human about a second on the floor. Security and the approval moment each have their owners, but the owner that matters most here is the one for the seam between design and resourcing.

For security, the architect and the security engineer own the substance: the threat model, the scoped non-human credentials, the memory architecture, the privilege model that the agent cannot escalate. The red team owns the adversarial testing, as a standing function and not a one-time launch report, because the threat environment learns the agent’s shape and last spring’s clean test is this autumn’s open door. The product manager owns the blast-radius judgment: the worst acceptable damage per tool and the actions that must be gated by construction. For the approval moment, the product manager owns which actions are gated, the designer owns the decision package the human reads, the domain expert owns what that package must contain for the authorization to be competent, and the eval owner owns the calibrated confidence signal that makes the human’s judgment possible. A control is not done when it is designed; it is done when the hand-off from the person who designed it to the people who build and staff it has been closed, verified, and owned, so that the gate in the budget is the same gate as the gate in the spec.

There is one question that exposes this faster than any audit, and it is the question the insurer never asked out loud: how many seconds does the human who staffs this gate actually have to judge each case. Everything else about a gate can look healthy, the design reviewed, the package well made, the confidence number present, while that single number quietly makes the whole thing theater. A gate that gives its human about a second is not a slower gate than one that gives ninety seconds; it is a different gate, one that authorizes nothing and only records that a human was nearby. Ask the seconds question of every gate your product ships, and where the answer is a fraction of what a competent judgment takes, you have found the insurer’s gate before it finds you, and the patient at the end of it before there is one.

Who Watches It, and Who Can Stop It The People the Agent Is Eroding