Part V · The Human System  ·  Chapter 15

Chapter 15: Deference: The Problem After Capability

The question is no longer whether the agent can reason as well as a person. On some tasks, under real conditions, it sometimes reasons better. That settles the capability question and opens a harder one, deference: how to build the workflow that shifts reliance toward whichever party is likelier to be right on this case, without collapsing into blind automation on one side or reflexive rejection on the other. It is a design problem, it has no settled answer, and it is yours.

In a field as young as this one, there is very little solid evidence about how AI actually performs alongside people on real work, as opposed to on the curated benchmarks that flatter it. Most of what circulates is demo and anecdote. There is one striking exception, and it is medicine, which has been running AI against human experts for years, in real settings, and publishing the results whether they are flattering or not. So when the question is how a capable agent and a capable human should share a decision, the best evidence we have comes from the place that has been testing it the longest and reporting it most openly.

A 2026 study in Science is the sharpest example. It put a frontier model against attending physicians on seventy-six real emergency presentations, pulled straight from the record, evaluated blind. At initial triage, where there is the least information and the most pressure to commit, the model reached the correct or near-correct diagnosis in sixty-seven percent of cases. The two attendings reached fifty-five and fifty. The blinding held; the evaluators could rarely tell which differential came from a machine. The capability question, on this task, under these conditions, was answered, and not in the direction the comfortable version of the story predicted.

What made the study different is also the lesson. The earlier studies that made AI look weak shared a design flaw: they forced the model into a single-choice answer with no room to reason, then measured what was left. Give the same models the same cases in free text, where they could reason in their own words, and accuracy on one replication went from a floor of near zero to a ceiling of a hundred percent. Same models, same scenarios. Those studies had not measured AI capability. They had removed the reasoning mechanism from a reasoning model and measured the absence. Once you stop doing that, the capability is real, and the argument has to move on.

The harder problem

If the agent is sometimes right when the human is wrong, and the human is sometimes right when the agent is wrong, then the design problem is no longer making either one better. It is deciding, case by case, which one to believe. That is deference, and it is harder than capability for a reason that does not go away with a better model: the right amount of deference is not a property of the agent, it is a property of the case. On this presentation the agent is likelier right; on that one the human catches what the agent missed; and the workflow has to move reliance toward the better party each time, in real time, without a person being able to compute in advance which kind of case they are in. A system that sets deference once, trust the agent, or trust the human, is wrong for most of the cases it will see, because it answered a per-case question with a per-system setting.

Deference is where this part of the book starts, because it is the root the other human-system problems grow from. Get deference right and you still have four consequences to manage, each its own chapter. The person doing the deferring is now in a transformed job, an operator turned supervisor, and that transition has to be designed or it fails on its own. The structure you build for them to defer through, the human-in-the-loop, fails in specific, named ways that look like oversight and are not. The skill the whole arrangement depends on erodes on a schedule the agent itself sets, so the supervisor gets worse at the exact moment the cases get harder. And the agent they are supervising is, functionally, a new member of the team that no one hired, onboarded, or manages. Those are the next four chapters, in that order: the role, the loop, the skill, and the colleague. They are not five warnings about human frailty. They are five faces of one problem, which is that an agentic product is only as good as the human system around it, and that system is the half almost no one designs.

The decision-theoretic bones of this are old. Horvitz, whose mixed-initiative work the design chapter drew on for where to place the autonomy boundary, framed the same question for an earlier generation of systems: act, or defer to the human, based on which is likelier to be right and at what cost. Deference is that question moved from the system’s internal logic out into the workflow and the human, where it becomes a product problem rather than a research one. The research can tell you the agent is capable. It cannot tell you how the physician knows, at three in the afternoon on the twentieth patient, that this is one of the cases where the agent’s confident differential is the one to follow and the last one was not. That knowing has to be built into the workflow, into how the disagreement between human and agent is surfaced, into whether the moment invites the person to reconsider or lets them rubber-stamp, into whether the agent’s confidence is legible or just loud. None of that is in the model. All of it is in the design, which means it is in your hands, and almost no team is treating it as the central problem it now is.

The two ways it collapses

Deference fails in two opposite directions, and a workflow that prevents one usually invites the other, which is what makes it hard.

The first collapse is blind automation. The agent is right often enough that the human stops checking, and reliance slides from earned to automatic. Now the cases where the agent is wrong sail through, because the one party who could have caught them has been trained by the agent’s reliability to stop looking. This is the failure the operate chapters and the supervision chapters keep circling, and it is worse here because the agent’s high average accuracy is precisely what produces it. Good performance manufactures the complacency that makes the rare bad performance lethal.

The second collapse is the mirror, reflexive rejection. The human distrusts the agent, overrides it as a matter of habit, and discards the cases where the agent was right and they were wrong. A capable agent that is reflexively overridden is an expense with no benefit, and the override feels like diligence while it quietly throws away the value. Algorithm aversion, the documented tendency to abandon a statistically better automated system after seeing it fail once, lives here: one visible miss and the human writes the agent off, including on all the cases where it would have been right.

The design target sits between them and it is not a fixed point. You are not trying to set trust at the correct level; there is no correct level, because the level is different per case. You are trying to build a surface that moves the human toward the agent when the agent is likelier right and toward their own judgment when they are, and that surface depends on the agent being able to signal which case this is, which loops back to confidence calibration from the observation and evals chapters: an agent whose expressed confidence is detached from its actual reliability cannot help the human defer correctly, because the signal the human would defer on is noise.

What this asks of the design

The instruments you already have are deference instruments; you just have to see them that way. The refund agent shows it at the smallest scale: when it escalates a borderline case, the human is being asked to allocate deference for that one case, follow the agent’s recommendation to deny, or override it, and they can only decide well if the escalation shows them what the agent saw and how sure it was. A queue of escalations with no confidence and no reasoning trains the reviewer to rubber-stamp, which is deference set to “trust the agent” by accident. The approval moment is where deference happens, so it has to do more than ask yes or no, it has to show the human what the agent knows and how sure it is, so the human can tell this case from the last one. The autonomy boundary is a coarse deference setting, full reliance below the line, full human control above it, and the art is putting the line where the case-by-case judgment actually shifts rather than at a round number. The audit surface is what lets you learn, after the fact, whether your deference was calibrated, the cases where the human overrode and was wrong, and where they followed and were wrong, are the data that tells you the surface is miscalibrated. None of these were built for deference originally. All of them are doing deference whether or not you designed them to.

So the move for a product manager is to stop asking “is the agent good enough to trust” and start asking “good enough to trust for which cases, and does my workflow let the human tell those cases apart in the moment.” The first question has an answer now and it is increasingly yes. The second question is the one your product lives or dies on, and it does not have an answer yet, not in the literature and not on most teams’ roadmaps. Take one agent where a human reviews its output and ask the deference question directly: when the agent and the reviewer disagree, what in your design helps the reviewer know which of them to believe this time. If the answer is nothing, the human is deferring by habit, in one direction or the other, and habit is the one thing deference cannot afford.