Front matter

Preface

For most of the past twenty years, the binding constraint on enterprise product management was execution. The question a PM answered every week was some version of “can we build it?” Scope, cost, dependencies, engineering capacity. The craft was translating messy human needs into something a team could ship inside a quarter.

That constraint is gone.

AI collapsed the build side of the equation faster than most of us noticed. Prototypes that used to take six weeks take a weekend. Integrations that used to require a dedicated squad run on a foundation model and a handful of tools. For a growing share of what enterprise teams do, “can we build it?” has stopped being the interesting question. The interesting question is “should we, and if so, what should it actually do?”

That shift is the reason this book exists.

In healthcare, the field that has been running the longest and most rigorous AI experiment, a large majority of AI pilots never reach production at scale. The models usually work. The projects fail on something else: judgment about which problem to attack, how to specify what right looks like, what happens when the system is wrong, and who is accountable when it is. That is PM work. It is not secondary to engineering capacity. It is the binding constraint that replaced it.

This book is for product managers who are good at their jobs and honest about what they do not know.

You have shipped enterprise software. You have run workshops, argued over priorities, and delivered systems that real people use to do real work. You understand vertical markets, customer pain, and the discipline of translating messy human needs into something an engineering team can build. Now someone has put the word “agent” on your roadmap, and you are being asked to design a system that acts on its own. Not a workflow automation. Not a chatbot with a better prompt. A system that makes decisions, takes actions, and compounds those actions over time, sometimes faster and at larger scale than the humans around it.

Three ideas run through this book. One: AI is probabilistic, not deterministic, and that changes the PM toolkit at the level of first principles. Two: you are designing two products at once, the agent and the human system that supervises it. Three: the supervisor is not a fixed input. The supervisor population is being reshaped by the same deployment that depends on it. If you hold those three ideas in view through every chapter, the rest of the book is a toolkit. If you forget them, the frameworks will produce the right answers to the wrong problems.

Three running case-study threads, used selectively to fit the argument. Software engineering, because that is the closest empirical evidence to every PM’s daily work. Healthcare and clinical AI, because the evidence base is sharpest, the regulation is most rigorous, and the stakes anchor what irreversibility actually means. Enterprise SAP and operational systems, because that is the vocabulary most PMs reading this book live inside. Each chapter picks the thread that fits the argument best. No chapter is single-thread. No thread is decorative.

If you are strong on product but light on AI, Chapter 1 starts with the technical foundations you need, explained without jargon and grounded in concrete examples. If you already have AI fluency, skip to Chapter 2 and start with the role shift that reframes the rest.

Chapters 3 and 4 are the design half of the job: deciding when an agent is the right answer, and the four runtime artifacts every agent needs before it ships. Chapter 4 also expands substantially on adversarial security, because the threat model for agentic systems has matured fast and the runtime artifacts have to ship with it. Chapters 5 through 8 are the operating half: evals, production observation including the observability literacy a PM now needs, change management for the humans the agent now works with, and the maintenance problem nobody budgets for, the slow degradation of a deployed agent and of the instruments built to watch it. Chapters 9 and 10 widen the lens to frameworks, governance, and the people the agent affects who never touch the product. Chapter 11 compresses the whole book into a field manual of checklists.

If you are an experienced PM who already knows the frameworks, this book will not re-teach them. It will show you where they bend under agentic conditions, what new artifacts you need, and how to design the supervisory system that most teams are not building.

The voice throughout is that of someone who has been on both sides: a physician who has managed patients under general anesthesia while machines monitored their vital signs, and a product leader who has shipped enterprise software at SAP, Walmart, and a regulated health tech company. The clinical-product intersection is where most of the examples in this book come from, and why.

A note on the examples. You will find clinical and healthcare references throughout this book. This is not because the book is about healthcare AI. It is because healthcare has been running the most rigorous AI experiment in the world for decades, under the highest stakes, the strictest regulation, and the least tolerance for confident wrong answers, and publishing the results, including the failures, so everyone else can learn from them. ECG interpretation algorithms have been running inside hospital machines since the early 1980s. Anesthesiology built human-machine supervisory architectures borrowed from aviation cockpit design over thirty years. The governance, equity, and accountability problems that enterprise AI is encountering for the first time have been studied, debated, and partially resolved in clinical practice for a generation. When I reach for a healthcare example, it is because that is where the evidence is most mature and the lessons are most transferable.

One more thing. This book will not hype what agents can do. It will help you specify what they should do, what happens when they are wrong, and who is accountable. Those are the questions the market is not asking clearly enough. They are the questions that will separate products that last from demos that do not survive their first production incident.

Every technology era gave ancient business processes a better vessel. R/2 automated integration. R/3 standardized global processes. ESA opened the API surface. HANA eliminated batch. None changed what the system understood about the work. Agentic AI is the first vessel that begins to reason over the logic it carries. That is the shift you are being asked to design for, whether anyone on your team names it that way or not.

What Changed in v3.0

This is the third major revision of the book. v2.3 (April 2026) was a working manuscript with the chapter structure and the core frameworks intact. v3.0 (May 2026) is the public release: enriched, format-converted to HTML for the web, and integrated with the broader body of articles published in the months since v2.3.

Three substantive additions are worth naming explicitly, because they shift how the rest of the book reads.

First, the Supervision Paradox is named explicitly in Chapter 1 and threads through Chapters 4, 7, 8, and 10. Bainbridge’s 1983 ironies-of-automation paper, the recent empirical anchors (Anthropic 2026 RCT on AI-assisted coding, Bastani PNAS 2025, Lightrun 2026, Budzyń ACCEPT 2025, the NEJM 2025 deskilling-mis-skilling-never-skilling taxonomy, EASA SIB 2025-09), and the Klarna case together establish that the human in the loop is being eroded by the deployment the loop is supposed to make safe. The design implication recurs in every chapter: the supervisor is not a fixed input, and the safety architecture cannot assume otherwise.

Second, adversarial security has been expanded substantially in Chapter 4 and added as a fifth degradation vector in Chapter 8. The 2025 STAC paper (over ninety percent attack success on GPT-4.1 across four hundred and eighty-three test scenarios), the 2025 audit of eight hundred and forty-seven production agentic deployments (ninety-five percent tool privilege escalation, ninety-four percent memory poisoning, the OpenClaw incident), and the OWASP and MAESTRO frameworks anchor the treatment. The PocketOS Cursor incident from April 2026 closes the chapter as a concrete blast-radius example. The book stays at PM-literacy altitude on security. Enough to ask the right questions, push back on hand-waving answers, and recognize when the team is missing something.

Third, observability literacy is now a section in Chapter 1 and three new sections in Chapter 6 (real-time vs. retrospective, multi-agent observability, and data observability). The point is the same: the PM does not need to implement observability, but does need the vocabulary to ask engineering whether the right events are being emitted and the right metrics are being composed. The PocketOS case in Chapter 4 is referenced from Chapter 6’s real-time observability section because the timescale of the failure (nine seconds to delete production data and volume backups) is the empirical anchor for why retrospective dashboards are not enough for some classes of action.

Smaller changes throughout. Three running case-study threads (software engineering, healthcare, SAP enterprise) instead of two. The interactive cost calculator in Chapter 3 lets you stress-test the break-even assumptions yourself. Earned-versus-scheduled autonomy is named explicitly in Chapters 2 and 3 (the Utah Doctronic case anchors it). The Michelin Condition is introduced in Chapter 9 to handle the governance-economics question that did not have a clean home in v2.3. The DAX Copilot RCT case study is added to Chapter 5, because it is the cleanest published example of a well-designed eval suite passing against the wrong outcome metric.

The book’s structure is unchanged. Eleven numbered chapters plus this preface, plus two appendices. The Channel 1 / Channel 2 framing is still the spine. The four runtime artifacts and the six observation instruments are still the operational core. What has changed is that the supervision paradox is now named, the security treatment is now substantial, and the observability vocabulary has caught up to the rest of the toolkit. The book argues for the same discipline. It argues for it with a sharper picture of what is breaking around the discipline, and what the PM has to design against.

How to Read This Book

If you read this book in order, the chapters compound. Chapter 1 establishes the vocabulary and names the five recurring ideas. Chapter 2 reframes the PM role around them. Chapter 3 decides whether to build. Chapter 4 designs the runtime. Chapter 5 evaluates before launch. Chapter 6 observes after. Chapter 7 manages the human transition. Chapter 8 names the slow failure mode that makes month eighteen the hardest test. Chapter 9 turns frameworks into boundaries. Chapter 10 adds obligations to the affected person. Chapter 11 is the field manual you go back to.

If you read this book by chapter pull, it works too. Each chapter is self-contained with cross-references where they matter. The concept boxes are the load-bearing definitions. The diagrams visualize what the prose argues for. The endnotes anchor the empirical claims to specific sources.

What this book asks of you is not new vocabulary or new frameworks. It is the willingness to design two products simultaneously, to maintain that design across the deployment’s operational life, and to take responsibility for the people the agent will affect who will never see the interface. That is more than most agentic AI books ask. It is the work the discipline actually requires.