Front Matter

Preface: The Job Did Not Shrink. It Shifted.

For most of the past twenty years, the binding constraint on enterprise product management was execution. The question a product manager answered every week was some version of “can we build it?” Scope, cost, dependencies, engineering capacity. The craft was translating messy human needs into something a team could ship inside a quarter.

That constraint is gone.

AI collapsed the build side of the equation faster than most of us noticed. Prototypes that took six weeks take a weekend. A first draft of a spec, a competitive summary, a set of release notes, a stakeholder email: work that used to fill a senior PM’s week now takes an afternoon, sometimes less. For a growing share of what product teams do, “can we build it?” has stopped being the interesting question. The interesting question is “should we, and if so, what should it actually do?”

This is the part most people get wrong, including some of the people who sign your paycheck. They look at the compression and conclude the job got smaller. It did not. The job changed shape. You have shipped enough products to know the difference between a task getting faster and a job getting easier, and nothing about the last two years has felt easier. That instinct is correct, and this book is an argument for why.

Every agentic product is two products: the agent that acts, and the supervisory layer that governs it. Call them Channel 1 and Channel 2. The first is the part everyone builds, with the same models and frameworks as everyone else, which is why it is commoditizing toward sameness. The second has no standard toolkit, does not demo, and is where agentic products actually succeed or fail. The reason they fail is almost never the model. It is that the team built Channel 1 and assumed Channel 2 would take care of itself. Designing Channel 2 is the new job, and it has four dimensions, technical, organizational, regulatory, and moral, that organize this entire book. That is the whole argument. Everything else is how.

A claim, stated carefully

The careless version of this argument is everywhere and it is doing damage, so I will say exactly what I am and am not claiming.

I am not claiming that routine work fell to some specific fraction of your week. I have looked for that number. It is not there. The studies that exist measure task speed, not weekly composition, and the two are not the same thing. A controlled study at MIT found structured writing tasks completed about forty percent faster with AI assistance, with the quality rated higher, not lower. That is real. But “the spec took forty percent less time” does not tell you what happened to the other thirty-nine hours of the week. It depends on what you and your organization did next.

So I will not sell you a ratio. What I will claim is sturdier than a ratio.

First: designing an agentic system is a different discipline from designing a deterministic one. Not a harder version of the same job. A different job, with new questions, new deliverables, and new processes you have to learn. A rules-based system does what you specified. An agentic system decides, acts, and compounds those actions, sometimes faster and at larger scale than the humans around it. Specifying behavior under uncertainty, designing the human system that supervises the agent, defining what “good” means when output is probabilistic, setting the safety boundary, watching for the slow drift that no demo reveals: none of this was on the old job description. All of it is on the new one. This book is, in large part, a field guide to that new discipline.

The difference shows up first in the frameworks you trust. Take the one most load-bearing example, the user journey map. Ask the simplest question, the one I cannot stop asking: what happens to it when there is no user, when the actor moving through the flow is an agent your customer delegated to? The map does not break. It moves down a layer, and a second map appears above it that no methodology you learned describes yet. The rest of the canon bends the same way under the same pressure, but I will not march you through each one; the journey map is the pattern, and once you see it you will see it everywhere.

Second: the productivity gain is real, and it is the thing that makes room for the new work. If AI had not compressed the drafting and the summarizing and the ticket-tending, there would be no hours to redirect. The compression is the enabler. It is not the headline.

Third, and this is the actual argument: the job shifted rather than shrank, and it is worth being exact about what shifted and what did not, because “the job changed” is the kind of thing every book says about its subject and most of the time it means nothing.

What did not change is the core that made the job worth doing in the first place. You are still deciding what should exist and what should not. You are still making calls under ambiguity with incomplete information in a room where the priorities truly conflict. You are still accountable for whether the thing you shipped was the right thing. That requirement, judgment about products and the people they serve, is exactly as central as it was twenty years ago. If anything it is more central, because it is now the part that cannot be handed to a machine.

What changed is where that judgment gets applied, how fast it has to operate, and what it is now accountable for. It used to be applied to a spec a team would build over a quarter, at the speed of a sprint, accountable to a launch review. Now it is applied to a system that decides and acts on its own, at a speed no human can keep pace with, accountable for outcomes that compound before anyone reviews them. Same faculty, radically different surface. The PM who treats the new surface as overhead, something to get through on the way back to writing specs, will be the PM a cost-cutting org decides it can do without. The PM who treats it as the craft becomes the center of gravity of the team.

There is a harder version of this worth saying plainly, because you have probably heard the softer version used as a threat. When leadership says product managers are getting automated, they are often right, about a particular kind of product manager. The one whose job had quietly become the ticket-moving, the status decks, the standup-running, the calendar of syncs. That job is being absorbed, and it should be. But that was never the job you signed up for. It was the silt that accumulated on top of it. The shift that is frightening the process manager is the best thing that has happened to the product manager in a decade, because it is washing away the silt and leaving the work you got into this for. The fear and the opportunity are the same event seen from two different careers.

That last sentence is a wager, and I will name it as one. Whether the freed time becomes judgment work or becomes a smaller headcount is not an economic law; it is a choice organizations make, differently depending on whether they are growing or contracting and on whether they can see the new scope at all. Left alone, much of the freed time is eaten by reworking what the AI produced and by overseeing it, a verification tax that is easy to underestimate. The reclaimed time becomes strategic time only when someone deliberately converts it. This book is about being the person who converts it, before someone else claims the hours for you.

Two objections, raised now and answered later

Two objections will have occurred to you already, and the book owes you real answers to both. Here I only want to put them on the table.

The first is your leadership’s: if the scope grew, why are some companies cutting product managers? Because titles lag the work, and the CEO still pictures the job as the PRDs and tickets that got automated, so he watches the floor fall out of the old job and calls it shrinkage. The new scope is invisible to him. And because in a period this uncertain the value of experience goes up: the difference between a resident and an attending is not knowledge, it is the calm of having seen enough unexpected things to know most are navigable. The juniors are faster than any prior generation; the judgment gap is as wide as it ever was. The chapter on why you are not a bridge anymore makes this case in full.

The second is the one floated to you in a skip-level: if you can prototype the product now, are you just a builder who owns the whole lifecycle? No, and the question contains a category error worth naming, because some of the people who sign your paycheck are making it. Validating a bet by building something rough enough to learn from was always the PM’s job. You did it with wireframes, with clickable mocks, with a slide that faked the flow. Vibe coding did not hand you the engineer’s job; it gave you a faster instrument for the job you already had. The CEO who watches a PM prototype with AI and concludes the PM has replaced the engineer has confused the PM doing their own long-standing work faster with the PM taking over someone else’s work, and has misunderstood both jobs at once. A prototype validates value; it does not ship enterprise-ready software, and the gap between the two is precisely the engineering team’s craft. The fantasy bets your career on an interface skill, and interface skills are temporary; I have learned and buried a long line of them, DOS to early HTML to CSS, and prompting is next. What survives every interface change is knowing what job you are doing and how to tell when the result is wrong. You still own the customer’s voice, the definition of success, the benefits and capabilities. And now you also own the safety boundary, the evals (the repeated, scored tests of whether the agent behaves, defined in full later), the drift, the provenance of a decision appealed a year later, the human system that supervises the agent without losing the skill to do so. The role does not dissolve; the non-delegable part expands. The chapters on vibe coding and on collaborating with the team do the work.

Why now is different

I have lived through enough technology waves to be suspicious of anyone who says this time is different. So I will say it carefully. I have shipped products through several of these waves, and they rhymed. SOA, big data, the IoT platform I built years ago, each arrived the same way. A new capability shows up expensive and scarce. You get a few years of architectural ferment, competing designs, nobody sure which one wins. Then the cost falls, a dominant design settles, and the market shakes out to a few reliable options you can finally build on with confidence. Ferment first, commoditization later. You could time your bets to it. The discipline was knowing which phase you were in and not standardizing too early.

This wave broke that pattern, and that is the specific thing that makes it different. The ferment and the commoditization are happening at the same time. The field is still in open architectural ferment, no settled answer to what the dominant design will be, while the cost of frontier inference has fallen roughly a thousandfold in three years and enterprise spending on it has more than tripled. With IoT I could wait for the dust to settle because the dust settling and the price falling were the same event, arriving together at the end. Here they have come apart. The thing is getting cheaper and more capable by the month while the question of what it fundamentally is remains open. You cannot wait for the shakeout the way you could before, because the economics are forcing adoption now, before the architecture has resolved. No prior wave asked that of us.

For the PM, the practical consequence is a discipline, not a panic. Be AI-literate, which means understanding the systems well enough to design them. And be tool-proficient, which means fluent at picking and chaining the right tool for the job, and willing to keep relearning, because the toolset resets constantly. There is a new skill, a new connector, a new product every week. Naming today’s tools in a book is a hazard, because they will date; the durable skill is not any one tool but the habit of staying current with what serves the work. I will name some anyway, as dated examples, and where a specific fact is the kind that decays in months I flag it at the point it appears, rather than asking you to guess which numbers to trust. The note on the evidence at the end of this preface says how that flagging works. The frameworks are built to outlast the tools; the tool names are deliberately disposable.

The part nobody writes about

There is a dimension of this that the frameworks and the cost models miss entirely, and it is the one I think about most.

You do not introduce an agent into an empty room. You introduce it into a team that has worked together for years, that has a way of arguing, a person everyone quietly defers to on hard calls, a rhythm to how decisions actually get made underneath the process diagram. An agent does not slot into that politely. It takes work the senior engineer used to do and was respected for. It changes whose judgment the team trusts at three in the afternoon when something is on fire. It quietly reassigns who feels useful. I have watched a capable team get worse for a quarter, not because the agent was bad, but because nobody designed for what it did to the people. The org chart did not change. Everything underneath it did.

Designing for that is not a soft skill, and it is not optional. It means deciding, before the agent ships, which judgments still route through a human and saying so out loud, so the senior engineer whose work is being absorbed knows what is now theirs to own rather than discovering by subtraction that it is gone. It means naming who is accountable when the agent is wrong, because a team that cannot answer that question answers it anyway, in the moment, by blaming whoever is nearest. It means noticing when an experienced person has stopped checking the agent’s output because arguing with it is more tiring than accepting it. And it means protecting the apprenticeship, because the work you let the agent absorb is the work the next generation learned judgment on; you can win this quarter while hollowing out the one three years out. None of that is in the model. All of it is in the room, and the room is yours.

I learned to watch for this in clinical teams long before I built agentic products. Medicine has spent decades absorbing machines into rooms full of people whose authority the machine partly displaces, and getting it wrong in ways that show up in the outcomes. It is the most common reason good agentic products fail in organizations that were perfectly capable of building them. Building the model is the part you already know how to staff. A team absorbing a new kind of colleague is the part nobody owns, and it is yours now. This book takes it as seriously as it takes evals.

What this book does

The shift is no longer coming; it is here, and this book is about the work it makes, not the trend it represents.

That means more of the concrete and less of the conceptual. Most chapters end with something to do, a small test you can run on a real product on a real Monday, not a principle to admire. The chapters on the human system and the moral weight end differently, on a question to sit with rather than a task to run, because that material does not reduce to a checklist and I will not pretend it does. The book covers the full arc of the new scope: deciding whether an agent should exist at all, prototyping to decide rather than to ship, working with the human team of engineers and designers and domain experts, designing the agent’s behavior and the supervisory layer that bounds it, the operational guardrails that keep a production agent from becoming a financial event, and an entire part on the human system, on why supervision fails, on skill erosion, and on the fact that you have effectively hired a team member nobody onboarded.

A word on how this book was made, since the subject demands the honesty. It was written with substantial AI assistance, in the way it tells you to work: the machine drafted and surfaced options, and a human chose, cut, ordered, argued, and put his name on the result. The facts in here will date, and I will flag the ones most likely to. The reasoning is the bet, and I put it in print so it can be checked, dated, and, where I got it wrong, caught. A model will answer the question you ask it and then tell the next person the opposite, accountable to no one. This is the other thing: one named person’s judgment about where agentic AI quietly burns good teams, offered by someone who can be held to it.

How to read this book

If you read it in order, the chapters compound. The early chapters are about deciding what should exist. The middle chapters are about designing how an autonomous system behaves and keeping it honest in production. The later chapters are about the human system around it and the people your agent will affect who never touch it. The last chapter walks through an ordinary week, to show the whole argument in motion rather than in pieces.

If you read it by chapter pull, that works too. Each one stands on its own, with cross-references where they earn their place.

A word on the title, because it sets an expectation worth correcting early. Why Agentic AI Products Fail sounds like it will hand you a catalog of failures to read about. It will not, or not mainly. The failures are the motivation, not the structure; the book is organized around how to build the supervisory layer whose absence is the reason these products fail. The autopsy is the first chapter of the operating manual, not the whole book.

If it helps to carry a small set of load-bearing ideas through the rest, these are the spine, and everything else hangs off them: the two channels (the agent and the supervisory layer); the autonomy ladder (authority earned, not scheduled); the outcome-centric spec and its eval set (the unit of work for an agent’s behavior); the human-system failures, deference and skill erosion (how the people around the agent come apart); and the people the agent never sees (the affected person inside the error rate). When a later chapter introduces a new construct, it is almost always a facet of one of these five.

What this book asks of you is not a new vocabulary. It is the willingness to treat the part of the job that did not get automated as the job, to design two products at once, Channel 1 and Channel 2, and to carry the accountability that comes with both. That is more than most books on this subject ask. It is the work the discipline actually requires now.

A note on the evidence

This book makes claims, and claims are only as good as what stands behind them. So a word on what stands behind each one, because the subject moves fast enough that you need to know how much weight any given example can bear.

The evidence comes in tiers, and I have tried to make the tier legible wherever a case appears. Peer-reviewed studies and primary regulatory text carry the most weight; where a claim depends on one, it is cited, and the numbers are the published numbers, not my rounding of them. Documented industry incidents are real events drawn from primary reporting; I name them where I can and carry them as examples of a pattern, not as a verdict on a particular company. Litigation-stage allegations are exactly that, presented as alleged rather than adjudicated, because the discovery is not finished and neither is the case. Vendor and company claims are labeled as such, because a company reporting its own agent did the work of seven hundred people is making a marketing statement, not publishing a measurement. My own cases are the clinical and product situations I have been close to, offered as a practitioner’s judgment, not as data. And a few scenarios are illustrative composites, assembled from real failure patterns into a single story that did not happen to one named company exactly as told; where a case is a composite, I say so on the page.

I flag this because the failure mode in a field this loud is to launder a vendor’s press release into a fact, or a plausible anecdote into a statistic. I have tried not to. Where a number is most likely to be wrong by the time you read it, the kind that decays in months, I say so at that point in the text rather than asking you to take the freshness on faith. The reasoning is the part I am putting my name to. The facts are dated to mid-2026, and the ones that move fastest are flagged where they sit.

A note on the examples

You will find clinical and healthcare cases throughout this book. It is not a book about healthcare AI. It is that medicine has been running the highest-stakes AI experiment in the world for decades, under the strictest regulation and the least tolerance for a confident wrong answer, and publishing the results, failures included. The governance, equity, and accountability problems enterprise AI is meeting for the first time were studied and partly solved in clinical practice a generation ago. When I reach for a clinical example, it is because that is where the evidence is most mature and the lesson transfers most cleanly.

How the Job Actually Changed: A Field Snapshot