Front Matter

The Series in One Page

This book stands on three earlier ones. You do not need to have read them, but you will meet their vocabulary, so here is the whole inheritance in one page. Each idea gets a fuller treatment in the earlier books; here it gets one honest sentence.

Every agentic product is two products: the agent that acts, and the supervisory layer through which humans direct, check, and correct it. The books call these the two channels, and the second one is the one teams forget to build.

An agent’s authority climbs an autonomy ladder, from suggesting, to drafting, to acting with approval, to acting with oversight, to acting alone. Autonomy is earned by demonstrated competence in the failure modes that matter, never granted on a schedule, and there is always a path back down.

Four runtime artifacts make an agent governable: the boundary that limits what it may do alone, the approval moment where a human decides, the audit surface that records what actually happened, and the recovery workflow that undoes what can be undone.

Before any agent is built, four suitability tests apply: the task repeats at volume, tool use is bounded, consequences are recoverable, and the outcome is measurable and trusted, meaning you would stake a decision on the metric.

An agent’s quality is not proven by tests but bounded by evals: a golden set of real cases with endorsed outcomes, run repeatedly because the same input can pass and fail, graded by a judge that must itself be calibrated, and summarized plainly in a coverage statement that says what was never tested. A model update counts as a deployment.

A running agent is watched through six instruments: task success rate, unintended action rate, override frequency, confidence calibration, rollback time, and incident recovery time. Instruments age; they need recalibration on roughly an eighteen-month clock.

Reliable automation erodes the skill of the human supervising it; the better the system, the weaker the oversight becomes. The books call this the supervision paradox, and deployed systems decay quietly along many fronts at once, a condition the series names silent degradation.

The agentic team adds seats the old triad never had, among them the eval owner who keeps the checkmark honest, the context owner who knows what the agent can see, and the agent supervisor, sometimes called AgentOps, whose week is the running agent. The expensive failures happen at the seams between owners.

Two shorthand cases recur. Cigna: insurance reviewers spending around a second per denial, humans in the loop in title only. Utah: a prescribing system that removed its physicians after a fixed count of approvals, autonomy by schedule rather than evidence.

That is the inheritance. The rest of this book is about the one dependency none of it manages: you.

Preface Part I: The Stakes