Part V · The Human System · Chapter 18

Chapter 18: Skill Erosion: Deskilling, Never-Skilling, Cognitive Surrender

Your agent does not just change what the team produces. It changes what the team can still do. Three distinct erosions follow, on three different timelines, and you are the one designing the conditions that decide which way each one goes.

The last chapter was about the day the loop fails. This one is about the slower thing underneath it, the reason the loop keeps getting weaker even when nothing visibly breaks. A human-in-the-loop control depends on the human having a skill. That skill is not fixed. It responds to the presence of the agent, and mostly it responds by fading. The failure in the last chapter assumed a competent supervisor who could not act in time. This chapter is about where the competent supervisor goes.

There are three ways the competence leaves, and they are worth separating because they happen to different people, on different clocks, and they need different fixes. Call them the three erosions: deskilling, never-skilling, and cognitive surrender. Most discussions of “AI deskilling” blur all three into one worry, and the blur is expensive, because a remedy for one does nothing for the others.

The Three Erosions. Three distinct ways AI removes the competence your supervision depends on:

Deskilling. A skill that existed fades because the agent now does it and the human stopped practicing. Affects your experts. Reversible, slowly.
Never-skilling. A skill that was supposed to form never does, because the novice met the automated version before the formative struggle. Affects your juniors. Far harder to reverse: you cannot remediate practice that never happened.
Cognitive surrender. The skill is intact, but the human stops using it, deferring to the agent’s output without checking. Affects everyone, immediately, and it is the one that scales.

They share a result and nothing else. Name which one you are looking at before you reach for a fix.

Deskilling: the skill that fades

Deskilling is the one with the cleanest evidence, because someone finally measured it in a setting where the stakes made measurement worth doing.

In 2025, a study across four endoscopy centers did something most AI evaluations never do: it measured what happened to the doctors, not the model. Nineteen endoscopists, each with more than two thousand colonoscopies behind them, had an AI polyp-detection tool introduced into their practice. The researchers then looked at their detection rate on the colonoscopies they performed without the AI, before the tool arrived and after. Unassisted adenoma detection fell from 28.4 percent to 22.4 percent. Roughly one in sixteen colonoscopies that would have caught something before the AI now missed it, performed by the same doctors, working alone, as they always had.

The mechanism is the one Bainbridge named four decades earlier, now visible in a clinical dataset. The AI took over the part of the task that held the doctor’s attention, the scanning, the noticing. With that cognitive load lifted, the attention pattern that produced expert detection relaxed. And when the AI was removed, the relaxed pattern stayed relaxed. The skill did not vanish; it decayed from disuse, the way any skill does, except that the disuse was engineered by a tool introduced precisely because it was supposed to help. The authors were careful, as good clinical researchers are, about the study’s limits: observational, one country, a short window, experienced doctors only. Take the number as a signal rather than a verdict. But it is the first direct measurement of a thing the field had only been able to worry about, and it points the way the worry pointed.

This is the validator-of-validator problem made concrete. You assign a human to check the agent. Over time, checking is all they do; they no longer perform the underlying work. The skill that made them a good checker came from doing the work, and it fades on exactly the schedule that makes them a worse checker, while their title and their place on the diagram say nothing has changed. There is a quieter version of the same decay, harder to feel from inside. A reviewer who has spent two years looking at AI-generated work, and less and less time producing their own, slowly recalibrates what “good” looks like. Their pattern of recognition drifts from what good work looks like toward what the agent’s work looks like, and the two are not the same. They begin, without noticing, to wave through the agent’s characteristic outputs because those outputs have become their reference for normal. The standard did not hold steady while the skill faded. The standard moved too, toward the thing it was supposed to be judging.

And there is a turn of the screw beyond that, for any agent that learns from the work it helps produce. The agent is trained, in part, on outputs these weakening validators approved. So each cycle, a slightly less capable reviewer signs off on the data that trains the next model, which is then reviewed by a reviewer who has practiced even less, who approves the data for the model after that. The ground truth itself erodes, one approval at a time, with no single step looking like a failure. This is not a near-term operational risk the way the others are; it is a slow structural one, and it is the reason the question “who validates the validator” does not have a comfortable answer. Right now, increasingly, the answer is no one, and the standard is drifting in the dark.

Never-skilling: the skill that never forms

Deskilling is reversible, slowly, because there was something there to restore. Never-skilling is worse, and most teams do not see it coming because it does not look like loss. Nothing was lost. Something just never arrived.

A field experiment in high school mathematics, published in PNAS in 2025, ran the test cleanly. Students were randomly assigned to no AI, to a standard chatbot that would give them answers, or to a constrained tutor that gave only hints. The students with the answer-giving chatbot improved while they had it, exactly as you would expect, and exactly as every productivity demo shows. Then the researchers took the AI away and tested them. The students who had used the answer-giving version performed worse than the students who had never had AI at all. They had not learned the mathematics. They had learned to get answers from the machine, which is a different skill, and a useless one the moment the machine is gone. The hint-only tutor did not produce the harm, which is the part worth holding onto: the damage came from the design, not the technology.

The medical literature gave this its own name in 2026, in a Nature Medicine piece that put never-skilling into peer-reviewed print alongside a third term I had not seen made explicit before: mis-skilling. A trainee who watches an AI state a wrong diagnosis confidently, accepts it, and files it as knowledge has not failed to learn. They have learned something false and stored it as expertise. That is a worse failure than ignorance, because the trainee does not know to distrust it, and it will surface years later in a decision no one traces back to the afternoon the model was wrong and the resident believed it.

The reason never-skilling matters more than it appears is the asymmetry the deskilling case does not have. You can ask a deskilled expert to practice again. You cannot ask a never-skilled junior to go back and have the formative struggle they skipped, because the struggle was supposed to happen during a window that has closed. The judgment that comes from drafting a thousand mediocre specs, or reading a thousand cases, or sitting through the slow miserable work of being wrong and finding out why, is not available as a remedial course. It was supposed to accrue while the person was junior, and the agent did the thousand specs instead.

Cognitive surrender: the skill that goes unused

The third erosion does not touch the skill at all. The human still has it. They simply stop reaching for it.

Two Wharton researchers ran the experiment in 2026 and gave the behavior a name. Across preregistered studies with around thirteen hundred people, when participants chose to consult ChatGPT, they accepted its answer without pausing to evaluate it in eighty percent of cases, including when the answer was wrong. They called it cognitive surrender, and distinguished it sharply from the ordinary offloading we have always done. A calculator is offloading: you hand off a computation you could do but would rather not, and you keep oversight of whether the result makes sense. Surrender is different in kind. It is handing off the judgment itself, deferring to the output because it arrived fluently and confidently and questioning it would take effort the moment does not seem to reward.

The finding that should worry a PM most is what moves the number. Surrender got worse under time pressure: a thirty-second timer cut people’s willingness to correct a wrong AI answer. It got better under performance incentives and real-time feedback. Read those two together against how AI actually gets deployed in a company. The entire pitch for the agent is that it makes things faster, which manufactures the time pressure that maximizes surrender, while the incentive to slow down and check is usually absent because checking is the part the agent was supposed to remove. The deployment that looks most successful on the throughput metric is the one most efficiently producing the surrender that makes the throughput meaningless.

Surrender is the erosion that scales, because it needs no time to set in. Deskilling takes months and never-skilling takes a career, but surrender is available on the first afternoon, to your most experienced people, precisely when they are busiest.

The pipeline you are draining

Put the three together on the org chart and you get the structural problem, the one that almost never appears in a book on shipping AI products.

Every domain of expert work runs on an apprenticeship. Novices do the formative work, the low-stakes drudgery that builds judgment. Intermediates develop that judgment under supervision. Experts supervise the next generation and do the work the judgment is for. The structure has a bottom rung, and the bottom rung is the boring work: the junior analyst’s models, the associate’s document review, the resident’s note, the new PM’s first dozen specs. It is exactly the work the agent is best at and cheapest at, and so it is exactly the work we are handing to the agent first.

Remove the bottom rung and the ladder still stands for a while, because the people already on it climbed before it was gone. The seniors are fine. The problem is that no one new is climbing, and in ten or fifteen years the seniors retire and the people who were supposed to replace them spent their formative years managing an agent that did the formative work. They have the title and the seniority and not the judgment, and they are now the humans in the loop, supervising agents they cannot evaluate because they never built the baseline the evaluation requires. The errors propagate upward with no one able to catch them. You win every quarter between now and then, and you hollow out the profession that was supposed to supervise the thing you built.

The evidence here is weaker than the rest of the chapter, and the argument is too important to oversell. The deskilling number is measured. The never-skilling experiment is real. The pipeline collapse is mostly logic, grounded in adjacent evidence but not yet demonstrated, because it operates on a timescale no study has run long enough to capture. It is a structural prediction, not a finding. I believe it the way I believe a bridge with a removed support will eventually fail: I have not watched this particular one fall, and the mechanism is not mysterious.

What this asks of you

None of the three erosions is your fault, and all three are your responsibility, because you are designing the conditions that determine which way each one goes. Not HR, not the engineering manager, not the model vendor whose defaults you inherited. The person who decides what the agent absorbs and what the humans keep doing is the person who decides whether the skill survives, and on this team that is you.

The design moves are not exotic. For deskilling, keep the experts practicing the underlying skill on some cadence even when the agent could do it, the way airlines mandate manual flying hours not because the autopilot is bad but because the pilot’s skill is perishable. For never-skilling, protect the formative struggle for juniors: let them do the hard version before you let them reach for the agent, the way the hint-only tutor preserved the learning the answer-giving one destroyed. For surrender, fight the time pressure your own deployment created, build in the friction and the feedback that the Wharton study showed restore checking, and refuse to let the throughput metric be the only one anyone looks at. And do not treat your supervisors as interchangeable: the literature is clear that the human’s ability to oversee AI is itself a variable, trainable and uneven, which means “a human reviews it” is not a specification until you have said which human and what they can still do.

Why Human-in-the-Loop Fails The Agent Is a Team Member Nobody Hired