Front matter

Preface

For most of the past twenty years, the binding constraint on enterprise product management was execution. The question a PM answered every week was some version of “can we build it?” Scope, cost, dependencies, engineering capacity. The craft was translating messy human needs into something a team could ship inside a quarter.

That constraint is gone.

AI collapsed the build side of the equation faster than most of us noticed. Prototypes that used to take six weeks take a weekend. Integrations that used to require a dedicated squad run on a foundation model and a handful of tools. For a growing share of what enterprise teams do, “can we build it?” has stopped being the interesting question at all. The interesting question is now “should we, and if so, what should it actually do, and what happens when it is wrong?”

That shift is the reason this book exists.

In healthcare, the field that has been running the longest and most rigorous AI experiment, most AI pilots never reach production at scale (the cost chapter gives the figures, and the companion volume carries the sources). The models usually work. The projects fail on something else: judgment about which problem to attack, how to specify what right looks like, what happens when the system is wrong, and who is accountable when it is. That is PM work. It is not secondary to engineering capacity. It is the binding constraint that replaced it.

This book is for product managers who are good at their jobs and honest about what they do not know.

You have shipped enterprise software. You have run workshops, argued over priorities, and delivered systems that real people use to do real work. You understand vertical markets, customer pain, and the discipline of translating messy human needs into something an engineering team can build. Now someone has put the word “agent” on your roadmap, and you are being asked to design a system that acts on its own. Not a workflow automation. Not a chatbot with a better prompt. A system that makes decisions, takes actions, and compounds those actions over time, sometimes faster and at larger scale than the humans around it.

Three ideas run through this book. One: AI is probabilistic, not deterministic, and that changes the PM toolkit at the level of first principles. Two: you are designing two products at once, the agent and the human system that supervises it. Three: the supervisor is not a fixed input. The supervisor population is being reshaped by the same deployment that depends on it. If you hold those three ideas in view through every chapter, the rest of the book is a toolkit. If you forget them, the frameworks will produce the right answers to the wrong problems.

Three running case-study threads, used selectively to fit the argument. Software engineering, because that is the closest empirical evidence to every PM’s daily work. Healthcare and clinical AI, because the evidence base is sharpest, the regulation is most rigorous, and the stakes anchor what irreversibility actually means. Enterprise SAP and operational systems, because that is the vocabulary most PMs reading this book live inside. Each chapter picks the thread that fits the argument best. No chapter is single-thread. No thread is decorative.

If you are strong on product but light on AI, Chapter 1 starts with the technical foundations you need, explained without jargon and grounded in concrete examples. If you already have AI fluency, skip to Chapter 2 and start with the role shift that reframes the rest.

Chapters 3 through 5 are the design half of the job: deciding when an agent is the right answer, writing the two briefs that turn that decision into a buildable spec, and the four runtime artifacts every agent needs before it ships. Chapter 5 also expands substantially on adversarial security, because the threat model for agentic systems has matured fast and the runtime artifacts have to ship with it. Chapters 6 through 9 are the operating half: evals, production observation including the observability literacy a PM now needs, change management for the humans the agent now works with, and the maintenance problem nobody budgets for, the slow degradation of a deployed agent and of the instruments built to watch it. Chapters 10 and 11 widen the lens to frameworks, governance, and the people the agent affects who never touch the product. Chapter 12 names the institutional gap underneath all of it: the personnel infrastructure that does not yet exist for AI agents. Chapter 13 compresses the whole book into a field manual of checklists.

If you are an experienced PM who already knows the frameworks, this book will not re-teach them. It will show you where they bend under agentic conditions, what new artifacts you need, and how to design the supervisory system that most teams are not building.

The voice throughout is that of someone who has been on both sides: a physician who has managed patients under general anesthesia while machines monitored their vital signs, and a product leader who has shipped enterprise software at SAP, Walmart, and a regulated health tech company. The clinical-product intersection is where most of the examples in this book come from, and why.

A note on the examples. You will find clinical and healthcare references in this book, used sparingly and on purpose. This is not because the book is about healthcare AI. It is because healthcare has been running the most rigorous AI experiment in the world for decades, under the highest stakes, the strictest regulation, and the least tolerance for confident wrong answers, and publishing the results, including the failures, so everyone else can learn from them. ECG interpretation algorithms have been running inside hospital machines since the early 1980s. Anesthesiology built human-machine supervisory architectures borrowed from aviation cockpit design over thirty years. The governance, equity, and accountability problems that enterprise AI is encountering for the first time have been studied, debated, and partially resolved in clinical practice for a generation. When I reach for a healthcare example, it is because that is where the evidence is most mature and the lessons are most transferable.

One more thing. This book will not hype what agents can do. It will help you specify what they should do, what happens when they are wrong, and who is accountable. Those are the questions the market is not asking clearly enough. They are the questions that will separate products that last from demos that do not survive their first production incident.

Every technology era gave ancient business processes a better vessel. R/2 automated integration. R/3 standardized global processes. ESA opened the API surface. HANA eliminated batch. None changed what the system understood about the work. Agentic AI is the first vessel that begins to reason over the logic it carries. That is the shift you are being asked to design for, whether anyone on your team names it that way or not.


How to Read This Book

Five ideas recur through the whole book. Hold them in your pocket as you read; every chapter is, in some way, about one or more of them.

The five ideas to keep in your pocket

(they run through the whole book)

  1. Non-determinism is permanent, not a bug. The same input can give different outputs. You cannot test an agent the way you test ordinary software.
  2. Confident wrong answers happen at a measurable rate. The agent will not flag its own uncertainty; you design that surface.
  3. The tool boundary is the agent’s authority. What it can touch is what it can do, and what it can do wrong.
  4. The hard part is not the agent’s intelligence. It is the supervisory system around it, the product this book teaches you to build.
  5. The supervisor’s competence erodes. The agent that does the work degrades the human who is supposed to watch it.

If you read this book in order, the chapters compound. Chapter 1 establishes the vocabulary. Chapter 2 reframes the PM role around it. Chapter 3 decides whether to build. Chapter 4 turns that decision into the two briefs. Chapter 5 designs the runtime. Chapter 6 evaluates before launch. Chapter 7 observes after. Chapter 8 manages the human transition. Chapter 9 names the slow failure mode that makes month eighteen the hardest test. Chapter 10 turns frameworks into boundaries. Chapter 11 adds obligations to the affected person. Chapter 12 names the personnel infrastructure that has to be built for any of this to work at the institutional level. Chapter 13 is the field manual you go back to.

Underneath the chapters is a simpler shape, six phases the work moves through, and it is worth holding in mind as you read.

  • Phase 0, AI Literacy: the team becomes fluent enough to specify an agent and catch it when it is wrong (Chapter 1).
  • Phase 1, Discover and Decide: you decide whether to build at all, and produce the go or no-go (Chapter 3).
  • Phase 2, Design: you write the two briefs and the four runtime artifacts (Chapters 4 and 5).
  • Phase 3, Eval: you prove readiness with evals run many times, not once (Chapter 6).
  • Phase 4, Observe: you measure what the agent is actually doing in production (Chapter 7).
  • Phase 5, Operate: you keep it honest as it drifts, and govern, supervise, and eventually retire it (Chapters 8 through 13).

The field manual in Chapter 13 lays these out as a one-page table of phase, deliverable, and the question each phase exists to answer. If you ever lose the thread, find the phase whose question you cannot yet answer.

If you read this book by chapter pull, it works too. Each chapter is self-contained with cross-references where they matter. Each chapter opens with the same five-ideas box, with the one or two it leans on marked, and carries a single concept box for its keystone idea. Those are the load-bearing definitions. The checklists, worksheets, and red flags are the apparatus you take into your own work. Where you want the deeper evidence and the academic sources behind a claim, the companion volume is where that lives.

What this book asks of you is not new vocabulary or new frameworks. It is the willingness to design two products simultaneously, to maintain that design across the deployment’s operational life, and to take responsibility for the people the agent will affect who will never see the interface. That is more than most agentic AI books ask. It is the work the discipline actually requires.