Chapter 17: Why Human-in-the-Loop Fails
“Human-in-the-loop” on the architecture diagram is not the same as a human who can actually catch the error. This chapter collects the named failure modes and makes them operational, so you can tell a real loop from a decorative one before it is the thing standing between your agent and a patient, a customer, or a court.
Every architecture review for an agentic product reaches the same slide. There is a box for the agent, a box for the action it takes, and somewhere between them a smaller box labeled “human review.” The box is usually a lie. Not a deliberate one. The human is really there, and really means to review. But that box is what lets the team say the system is supervised, the regulator say it meets the human-oversight requirement, the executive say a person is still accountable, and most of the time it earns none of those claims.
“A human is in the loop” and “a human can actually catch this error” are different claims, and the gap between them is where the failures live: the specific, named, well-studied ways a loop that exists on the diagram does nothing in the world. The chapter ends with a test you can run on your own diagram to tell the difference.
The irony you inherited
Lisanne Bainbridge wrote the foundational paper in 1983, before any of this, about industrial process control. She called it the Ironies of Automation, and it has aged into one of those papers that turns out to have been about a future its author could not have seen.
Her argument: when you automate a process, you do not eliminate the human. You automate the parts that are easy to specify and leave the human the parts that are not. So the human is left with the residue, the hardest and least structured work in the system, and asked to do it under the worst conditions. They are no longer practicing the task moment to moment, so their skill at it fades. And they are asked to step in only when the automation fails, which is to say rarely, unpredictably, and in exactly the situations the automation could not handle, which are the hard ones. The better the automation, the rarer the intervention, and the less ready the human is when the intervention finally comes.
The mechanism is the whole problem in one move. Your supervisory design improves the normal case and degrades the abnormal case, and the abnormal case is the only one that needed a human at all. You did not add a safeguard. You built a person into a role engineered to make them fail, and then wrote their presence on the slide as if it were protection.
Automation bias: the human bends toward the wrong answer
The first failure mode is the one most people would deny applies to them. Automation bias is the tendency to accept an automated system’s output without the independent check you would have applied to a colleague, even when your own judgment would have been right, and even when you are an expert.
The word “expert” is load-bearing. The intuition most teams carry is that bias is a novice problem, that a senior reviewer is immune because they know enough to push back. The evidence says otherwise, and it is not thin. Parasuraman and Manzey’s 2010 review, the canonical reference in the field, found automation complacency in experts and novices alike, found it could not be trained away with instructions, and found it produced both errors of commission, accepting a wrong recommendation, and omission, failing to notice a problem the system did not flag. The bias is not a skill gap. It is a property of how attention works when a confident machine is in the room.
It gets worse where it matters most. In a 2023 mammography study in Radiology, expert readers shown an incorrect AI classification moved their own scores toward the wrong answer. The machine did not just fail to help; it pulled trained radiologists off assessments they had gotten right on their own. A 2024 multi-site study found that when the AI added a highlight on the image showing where it was looking, physicians accepted its incorrect reading more often, because the visual cue felt like evidence. The thing we reach for to make AI more trustworthy, the explanation, made the bias stronger.
The numbers in that litigation make the point sharper than any principle. Of the denials that were appealed, around nine in ten were overturned, an alleged reversal rate that high, while only two in a thousand patients ever appealed at all. The lesson is not that those were bad people. The system was built so genuine review was impossible at the required throughput, and then the human’s presence was reported as oversight anyway. You build that system, without meaning to, whenever you put a human approval step in front of a volume the human cannot process.
Speed asymmetry: the loop the human cannot reach in time
The second failure mode is not psychological. It is physical, and you cannot train it away because it is about clock speed.
An agent operates on machine time. The human operates on human time. When an agent can take fifty actions in the span it takes a person to read one email, “the human approves each action” stops describing anything real. Either the human waves the actions through, which is automation bias wearing a stopwatch, or the human becomes the bottleneck the whole system was built to remove, at which point someone quietly removes the human. There is no third option at machine speed.
What actually happens, in the gap between those two, hides in plain sight. The reviewer who cannot keep up does not stop reviewing; they switch, without deciding to, from reading the work to scanning it for things that look wrong. They approve on a glance, a shape, a vibe of correctness. That is not slower review. It is a different activity, pattern-matching against a surface, and it carries none of the guarantee that real review was supposed to provide. The danger is that it feels like review from the inside and reads like review on the report. A senior reviewer skimming nineteen of an agent’s outputs and approving all nineteen has not done nineteen reviews quickly. They have done zero reviews and signed nineteen times, and unless someone names the difference, the system records it as oversight.
The clearest version of this is not from software. The Boeing 737 MAX MCAS system applied repeated nose-down inputs faster than crews could diagnose and counter, and reset itself each time the pilots pushed back, reapplying the input in a loop the safety analysis never modeled. The pilots were the humans in the loop. They were not told the system existed. They could not act fast enough on a situation they had no model for. Three hundred and forty-six people died inside the gap between “a human is in control” and “a human can intervene in time.” The agentic version rarely kills anyone, but the structure is identical: an automated process moving faster than the human’s ability to understand, decide, and succeed, with the human’s nominal presence standing in for an oversight the speed has already foreclosed.
The two-stage collapse
Put the two together and you get the pattern to watch for, because they do not arrive at the same time.
Speed asymmetry hits on day one. The moment you ship an agent that acts faster than a human reviews, the per-transaction loop is already decorative, whatever the diagram says.
Skill erosion arrives later, on a slower clock. As the human supervises instead of doing, the skill that supervision depends on fades, the way Bainbridge predicted. There is now a direct clinical measurement of this. In a 2025 Lancet Gastroenterology study across four centers, endoscopists who had each performed thousands of colonoscopies were measured on their unaided detection rate before and after routine AI assistance was introduced. Their detection rate on standard colonoscopies fell from 28.4 percent to 22.4 percent. The AI did not just help while present; its routine use degraded the doctors’ own ability when it was gone. The next chapter takes this up in full; here it is only proof that the second stage is real and measurable.
So the loop fails twice, on two timelines. It is theater from the first day because of speed, and it becomes theater in a deeper way over eighteen months because the human you were counting on is no longer the human who could have caught it. A supervision design that looks fine in the launch review can be hollow at launch and hollower a year later, and the dashboard will not tell you, because the dashboard measures the agent, not the supervisor.
Program oversight is not transaction oversight
There is a move teams make to feel safe that deserves its own warning, because it is so reasonable. They cannot review every transaction, so they review the aggregate. Weekly quality metrics, sampled audits, drift dashboards, an incident review when something breaks. This is real and valuable, and it is not the same thing as a human catching a specific bad output before it reaches a specific person.
The two-kinds-of-oversight chapter drew this line as a design choice: program-level oversight asks whether the system is behaving correctly in aggregate; transaction-level interrupt asks whether this output reaches the user without a human in the chain. The failure here is conflating them, treating a strong program-level practice as if it covered transaction-level risk. It does not. You can have airtight monthly reviews and still send a patient a message no clinician saw, deny a claim no physician read, ship a contract clause no lawyer approved. The aggregate was fine. The one that mattered went out anyway. Where a single bad transaction is catastrophic regardless of the average, program-level oversight is not a smaller version of the safeguard you need. It is a different safeguard that leaves the one you need unbuilt.
The Loop Test
If the loop is going to be more than a box on a slide, three things have to be true at once. Call it the Loop Test, and run it on every human-review step in your design, because a loop that fails any one of the three is decorative no matter how official it looks on the diagram.
- Time. The human can intervene before the action lands, given the agent’s actual speed. Not in principle, in milliseconds.
- Skill. The human still has the competence to know when to intervene. Not on their resume, in their hands this quarter.
- Attention. The human is doing genuine review, at a throughput that makes review possible. Not a signature, an actual look.
Fail any one and the loop is theater. The value of the test is that it tells you which one failed, because each has a different fix.
Take them in order. If the loop fails on time, the per-transaction loop is the wrong design and no amount of training rescues it; you need a different control, a hard constraint the agent cannot cross, a class of action it is not permitted to take on its own, a pre-commitment rather than a real-time veto. If it fails on skill, you have to design the practice back in, the way airlines mandate manual flying hours so the skill is there on the day the automation quits. If it fails on attention, you have a volume and interface problem: a reviewer with 1.2 seconds per item is not a reviewer, and adding more of them does not change the arithmetic.
None of the three is satisfied by putting the box on the diagram. Each has to be checked against the real agent, at its real speed, with the real person who will sit in the seat. The European Data Protection Supervisor, reviewing how human oversight performs in deployed systems, catalogued twelve assumptions that organizations make and that turn out to be false: that automation does not influence the human’s judgment, that a human supervising the system means there is no automated decision to govern, that combining human and machine always beats either alone, that an explanation improves oversight rather than deepening over-reliance. They named the result rubber-stamping, oversight that is performative rather than independent. The through-line of all twelve is one sentence. Oversight is something you engineer and verify, not something you get for free by leaving a person in the room. The person in the room is the beginning of the design problem, not the solution to it.