Glossary
The terms below are the ones the book uses as working constructs, the things you should be able to name in a room. Each is defined where it first does its work in the text; this is the quick reference. Chapter pointers are in parentheses.
Agent HR stack. The personnel functions a deployed agent silently requires: scope of work, access scope, a manager, re-evaluation on upgrade, decommissioning, treated as the onboarding of an actor rather than the configuration of a tool. (Chapter 19)
Affected person. The person who absorbs an agent’s output rather than operating it: the patient, applicant, candidate, or supplier who is never a user, never in your analytics, and often the person the product was actually for. Distinguished from the user and the supervisor. (Chapter 20)
Autonomy Ladder. The rungs of authority an agent can hold, from suggestion through bounded autonomous action. You climb on demonstrated competence at the current rung, never on a schedule. (Chapters 2, 3)
Background failure / the invisible action. A correct-looking output sitting on top of an action that never happened: the agent reports “done,” the eval scores the text as right, and nothing changed in the target system. Caught only by state validation, not semantic validation. (Chapters 9, 11)
Bridge operator vs bridge architect. The shift from running the translation layer between business and engineering to designing the system that does the translating; the old role is the one that drifted into project management. (Chapter 2)
Burn-rate alert. A control that watches spending velocity over a rolling window and pages a human when the rate goes abnormal, as distinct from a monthly budget alert, which is a receipt, not a brake. (Chapter 10)
Cognitive surrender. The erosion in which a person stops checking the agent’s output against their own judgment and accepts it by default, even when it is wrong. One of the three erosions. (Chapter 18)
Constitution (AI). The agent’s behavioral contract: the written principles governing what it may do, what it must never do, and how it decides under ambiguity. Runs from values at the top (fairness over speed) to non-negotiable constraints at the bottom (this action class always escalates). Distinct from a prompt, a policy, and a guardrail (the rule enforced in the execution path); the policy and guardrail are the constitution made specific and made to bite. Stated is not enforced: the constitution declares intent; evals, guardrails, and audit logs test whether the agent honors it. (Chapter 1; runtime enforcement in Chapter 20)
Constitutional runtime layer. Where the constitution bites: an enforced, non-delegable boundary that refuses an action on principle even when the technical path is open, with refusals logged as first-class events. Not a committee. (Chapter 20)
Currency question. The question the buyer is not yet asking a vendor: when was the training data last refreshed, on what corpus, curated by whom, with which retractions pulled, and are behavioral change notes published at each release. Absence of an answer is an answer. (Chapter 13)
Decision provenance / sealed decision artifact. The lineage of inputs, model version, configuration, reasoning, and human authorization that produced a specific decision, recorded as a write-once, signed bundle a stranger could reconstruct years later. (Chapter 14)
Deskilling. The erosion of a skill an expert once had, through disuse after an agent took it over. One of the three erosions. (Chapter 18)
Disparate performance. Aggregate accuracy hiding population-level failure: a model that is accurate overall while failing a specific group almost entirely, invisible in the summary metric, catchable only by stratifying before launch. (Chapter 20)
Four-layer stack. The four things any agent is made of: the model, the orchestration, the tools, and the context. The layer you can describe least precisely is where your quality problems will come from. (Chapter 1)
Four runtime artifacts. The four designed surfaces at the moment of action: the autonomy boundary, the approval moment, the audit surface, and the recovery workflow. If one is missing, the behavior defaulted rather than being designed. (Chapter 7)
Four suitability tests. The four conditions that must all hold to build an agent: repeats at volume, bounded tool use, recoverable consequences, and measurable and trusted output. (Chapter 3)
Iceberg. The framework for the second sense of context: the value an agent reads is the tip above the water; the meaning beneath it — ontology, semantics, references, relationships, authority, currency — is the rest. When data moves between systems the tip travels and the meaning often stays behind. (Chapters 1, 12)
Instrument half-life. The roughly eighteen-month useful life of an observation instrument, pegged to the frontier-model release cadence, after which it measures an agent that no longer exists unless it is re-calibrated. (Chapter 13)
Loop Test. The three conditions that decide whether a human-in-the-loop is real: does the reviewer have the time, the skill, and the attention to exercise judgment. Fail one and the oversight is nominal. (Chapter 17)
MVP house of cards. Disposable prototypes stacked into a system nobody designed, because the MVP’s “stop” decision was skipped; each card load-bearing, the structure failing quietly when one layer drifts. (Chapters 3, 5)
Never-skilling. The erosion in which a junior never acquires a skill at all, because the agent did the work they would have learned from. One of the three erosions. (Chapter 18)
Observation phase. The deleted back half of the product lifecycle: the resourced period after launch where you monitor real behavior, detect drift, and feed it back. For an agent, skipping it is not lost opportunity but lost control. (Chapter 11)
Platform emits, PM composes. The working contract for instrumentation: the platform owes you the raw events reliably and queryably; you own composing them into the instruments and setting the thresholds. (Chapter 11)
Program-level vs transaction-level oversight. Two different commitments inside “human-in-the-loop”: review of the system in aggregate versus a competent human looking at an individual output before it executes. Assuming the first discharges the second is the common, dangerous error. (Chapter 8)
Spec-driven development. A development pattern in which a written specification, not an ad hoc prompt, is the living source of truth that drives the plan, the tasks, and the implementation an AI coding agent produces. (Chapter 4, Appendix A)
Silent degradation. The composite condition in which a deployed agent keeps producing fluent, plausible output while its real performance decays across six drift vectors at once, none caught by the launch instruments. (Chapter 13)
Six instruments. The production metrics an agentic product requires: task success rate, unintended action rate, override frequency, confidence calibration, rollback time, incident recovery time. The absence of any one is a design finding. (Chapter 11)
Supervision paradox. The structural irony that the more reliable an agent becomes, the less able its human supervisor is to catch the failure when it comes, because sustained reliability erodes vigilance and skill. (Chapters 1, 17)
System-type declaration. Stating in the spec which of three products this is — a suggestion engine, a copilot, or an autonomous actor — because each has a different accountability model. (Chapter 7)
Three eval breaks. The three places the QA instinct fails on a non-deterministic system: determinism (a green check is one sample, not a proof), compounding (end-to-end reliability is the product of the steps), and the invisible action. (Chapter 9)
Two-channel agentic design. The recognition that an agentic product ships two products at once: the agent (Channel 1) and the supervisory system around it (Channel 2). The second is the one teams forget to design. (Chapters 2, 16)
Vibe coding. Building a working prototype by describing it to a model rather than writing the code; valuable as a learning instrument to validate a bet, dangerous when mistaken for a shippable product. (Chapter 5)