Appendix B · Reference

Glossary

Single-line definitions of the technical and conceptual terms introduced in this book. Sorted alphabetically. Category letters in parentheses refer to the platform taxonomy in Appendix A.

Accuracy: Proportion of classifier outputs that match ground truth; a single-step validator metric that breaks down for multi-step agent trajectories.
Actor-to-supervisor transition: The shift from performing a task to supervising an agent that performs it; the central change-management problem for agentic adoption.
Adaptive governance: Governance that adjusts rule intensity to context (acuity, consequence class, time), with structured override rights and audit.
Adversarial testing: Deliberate attempts to make an agent misbehave via prompt injection, tool abuse, credential leakage, or permission escalation.
Affected person: The individual whose life is changed by an agent’s decision, distinct from the user who operates the agent.
Agent: An AI system that uses tools and takes actions on behalf of a user or another system, typically across multiple reasoning steps.
Agent candidacy checklist: A planning artifact listing the criteria a problem must meet before an agent is built for it.
Agent identity: An IAM-bound principal under which the agent acts; a base platform dependency relabeled as agent-specific by most vendors.
Agentic AI: AI that plans and executes multi-step actions on behalf of a user, as distinct from AI that responds to a single prompt.
Algorithm aversion: The tendency of users to abandon an AI tool after a single visible failure even when its track record is otherwise strong.
Approval moment: The typed handoff point where an agent pauses, presents a decision package, and waits for human judgment before continuing.
AUC (area under the curve): A classifier metric measuring discrimination between two classes across thresholds; a single-step validator metric.
Audit surface: The reconstructable decision trajectory of an agent, composed from observable action traces rather than the model’s self-narration.
Automation bias: The tendency to accept an automated recommendation even when contradicted by the evidence in front of you.
Automation complacency: The erosion of operator vigilance as a system’s observed reliability increases over time.
Automation expectation: The pre-existing hand-off orientation users bring to a new AI tool from experience with every other AI tool they have used.
Autonomy boundary: The typed runtime declaration of what an agent may do unilaterally versus what requires a human decision.
Autonomy ladder: A graduated scale of agent autonomy from suggestion through copilot to autonomous actor.
Background failure: An agent output that passes semantic evaluation but is not matched by any state change in the target system; the agent said “done” but nothing happened.
Bainbridge irony: The 1983 observation that the more reliable the automated system, the more thoroughly atrophied the operator’s monitoring skill becomes; the foundational citation for the supervision paradox.
Base platform dependency (Category E): A non-agent-specific platform capability (IAM, audit logging, tenant isolation) that is required for agentic systems to function safely.
Behavioral monitoring: Observation focused on whether the agent’s judgment is correct, distinct from infrastructure monitoring of whether the system is running.
Blast radius: The scope of consequences an agent action can produce in the same execution; the PocketOS case (production data plus volume backups deleted in nine seconds) is the canonical example.
Bounded objective: An agent task with a clearly delimited goal, not an open-ended one.
Break-even volume: The task volume at which the marginal cost of running an agent falls below the marginal cost of the human process it replaces.
Brownfield deployment: An agentic deployment on top of an existing AI-capable platform license (SAP, Salesforce, ServiceNow, Microsoft); incremental floor price.
Channel 1: The agent itself: autonomy boundary, logging, error handling, recovery workflow.
Channel 2: The human experience of supervising the agent: queue, intervention surface, workflow fit, supervisory interface.
Classifier metric: A metric defined against a binary or multi-class ground truth (accuracy, sensitivity, specificity, AUC, precision, recall, F1); valid at single-step validators.
Compound probability: The product of per-step success probabilities across a chained agent workflow; 0.95 across ten steps yields roughly 0.60 end to end.
Confidence calibration: The degree to which an agent’s stated confidence tracks its observed correctness; well-calibrated agents are uncertain on things they get wrong.
Constitutional AI: Anthropic’s 2022 framework for training models to follow declared principles; the architectural reference for the Constitutional Runtime Layer.
Constitutional runtime layer: Platform-level rules that execute in the agent’s request path and return a refusal or modified action, independent of design-time governance.
Context sufficiency: Whether the agent receives the relationships, calculated semantics, and governance rules it needs to act, not only the raw fields.
Copilot: A declared agent system type: acts only on explicit step-by-step instructions; the human initiates each step.
Coverage statement: A PM artifact listing which user intents, failure modes, and adversarial inputs the eval suite tested, and which were known but deliberately deferred.
Currency question: The set of contract-time vendor questions that ensure the platform’s AI capabilities remain current through the contract period.
Data lineage: The recorded chain of provenance from agent output back through retrieved sources, training data, and source-system origin; required for upstream-data-wrong detection.
Data observability: Detection of when the data the agent reasons from is not what the agent thinks it is: freshness, completeness, referential integrity, context availability, knowledge graph mapping accuracy.
Decision package: The information required for a human to approve an agent’s proposed action: what the agent knows, what is uncertain, what the consequences are, what the alternatives are.
Decision-trace capture: The reconstructable sequence of an agent’s reasoning and tool calls; distinct from the agent’s self-narration.
Deference allocation: The case-by-case design problem of when the supervisor should rely on the agent and when they should pause and verify; harder than blanket trust or blanket review.
Deployment event: Any change to the agent’s model, prompt template, or tool configuration; treated as a deployment requiring re-baselining. Foundation model updates from the provider are deployment events.
Derived KPI (Category B): A metric computed on top of events the platform’s observability layer captures, not shipped as a named platform feature.
Deskilling: The loss of an existing capability through sustained AI substitution; the Budzyń ACCEPT colonoscopy result (twenty-eight to twenty-two percent adenoma detection in three months) is the empirical anchor.
Drift: Divergence of production behavior from the reference baseline; includes data drift, concept drift, prediction drift, and behavioral drift.
Earned autonomy: Movement up the autonomy ladder triggered by demonstrated competence in the specific failure modes that matter, not by a count or a calendar date.
End-to-end task success: Whether the agent completed the task the user intended, measured across the full trajectory rather than per step.
Eval / evaluation: A structured test of agent behavior under controlled conditions; partial observability of system behavior, not proof of correctness.
Eval sign-off checklist: A planning artifact enumerating the eval outcomes required before a release decision.
External audit trigger: A pre-declared condition that invokes external audit of an agent’s behavior, regardless of internal governance status.
F1 score: The harmonic mean of precision and recall; a single-step classifier metric.
Fermentation culture: The 2025 to 2026 enterprise pattern of using agent count as a competitive benchmark while the security and supervisory containers remain underbuilt.
First-contact: The point where an agent handles the beginning of a problem story, often a mismatch with models trained on end-of-story documentation.
Floor cost: The minimum cost per agent task including model inference, tool invocation, human review, and coordination overhead.
Greenfield deployment: An agentic deployment without an existing AI-capable platform license; full infrastructure and observability build cost.
Hallucination: Plausible-sounding agent output not grounded in reality; categories include factual errors, outdated references, spurious correlations, fabricated sources, incomplete reasoning, and upstream-data-wrong.
Horvitz principle: The observation that interruption has a cost; routing approval requests requires a budget, not unlimited attention.
Iceberg: The phenomenon where data transferred from a source system loses its relationships, calculated semantics, and governance rules in transit; the fields arrive, the meaning stays behind.
Incident recovery time: The organizational time from detecting an agent incident to freezing, attributing, notifying, reauthorizing, and resuming.
Instrument half-life: The roughly eighteen-month useful life of an agentic observation instrument before frontier model generation change requires re-calibration.
Interruption budget: A cap on the number and priority of approval requests a supervisor receives per unit time, with routing, deferral, and batching against the cap.
Kill-switch: An architectural intervention that stops an agent action upstream of execution; distinct from emergency shutdown of the service.
Large language model (LLM): A neural network trained on text to predict tokens; the model at the base of most contemporary agentic systems.
LLM-as-a-judge: The use of a language model to score another model’s output against a rubric; introduces its own error rate that must be calibrated against human labels.
LLM-as-judge bias: Documented systematic errors in LLM judges: longer-answer preference, position effect, same-model-family preference; present by default unless calibrated.
MAESTRO: Multi-Agent Environment, Security, Threat, Risk, and Outcome framework; the first agentic-specific threat modeling framework, Cloud Security Alliance, February 2025.
MedLog: The Harvard Medical School proposal for a clinical AI logging standard; nine fields covering model, user, target, inputs, artifacts, outputs, outcomes, feedback. The pattern generalizes.
Memory poisoning: An adversarial pattern that injects malicious content into an agent’s persistent memory store; present in ninety-four percent of audited production deployments (2025).
Mental model declaration: The explicit declaration of whether an agent is a suggestion tool, a copilot, or an autonomous actor; prevents user miscalibration.
Michelin Condition: The structural alignment between guide accuracy and business model: a guide is trustworthy when accuracy is the mechanism that drives the behavior that drives the revenue. Fails when revenue depends on the audience trusting the guide rather than acting on it.
Mis-skilling: The development of a capability calibrated to a flawed reference; the middle category in the NEJM 2025 deskilling-mis-skilling-never-skilling taxonomy.
MVP House of Cards: The pattern where MVP delivery culture defers governance layer after layer until the backlog exceeds the roadmap’s capacity to close it.
Never-skilling: The failure to acquire a foundational capability because AI was present during the entire formative window; the most consequential category in the NEJM 2025 taxonomy.
Non-delegable list: The set of decisions a PM declares must not be performed by the agent, regardless of autonomy level.
OpenTelemetry: The open-source observability standard for distributed tracing; the substrate most AI platforms emit agent traces against.
Orchestration layer: The software above the model that manages prompts, tool calls, memory, and multi-step workflow.
OWASP Top 10 for Agentic Applications: The OWASP Gen AI Security Project’s December 2025 vulnerability list specifically for agentic systems; policy framework, not platform primitive.
Override frequency: The rate at which humans reject or modify agent-proposed actions; falling over time can indicate supervisory erosion, not improved agent quality.
Pass@K: The practice of running an eval K times and reporting the success distribution; the correct quality gate for non-deterministic systems.
PII detection: Identification of personally identifiable information in agent input or output; a single-step validator where classifier metrics apply.
Planning artifact (Category C): A PM-team output tracked in a work-management system (suitability declaration, go or no-go memo, coverage statement); not a platform feature.
Platform primitive (Category A): A first-class platform feature the PM or builder consumes directly.
Platform emits, PM composes: The working contract for the six observation instruments: the platform provides events; the PM’s team composes the metrics.
Policy requirement (Category D): An organizational standard the platform must support but does not itself define (GDPR compliance, retention policies, kill-switch obligations).
Precision: The proportion of agent outputs classified as positive that are truly positive; a single-step classifier metric.
Project Glasswing: Anthropic’s 2026 controlled-deployment consortium for the Claude Mythos Preview model, restricted to defensive security work; example of frontier model capability outpacing public release.
Prompt injection: An adversarial input that attempts to override the agent’s instructions through crafted text in the user prompt, tool output, or retrieved content.
RAG poisoning: An adversarial pattern that corrupts the retrieval corpus an agent reads from; ninety-percent attack success at five malicious texts in a base of millions (PoisonedRAG, USENIX Security 2025).
Real-time observability: Observation that fires upstream of irreversible action; required for actions whose consequence timescale is shorter than alert-and-respond cycles.
Reasoning trace: The model’s self-narration of its chain of thought; distinct from action trace and treated as commentary rather than evidence.
Recall: The proportion of truly positive cases the agent correctly classifies as positive; a single-step classifier metric. Equivalent to sensitivity.
Recovery workflow: The typed declaration of how an agent recovers from an error: compensating action, rollback, or explicit non-recovery.
Reference dataset: The immutable, versioned collection of test cases against which an agent’s evals are run; carries source-document lineage.
Refusal detection: A classifier that identifies when an agent has correctly declined to answer; a single-step validator where classifier metrics apply.
Retrieval-augmented generation (RAG): A pattern where the system retrieves relevant text at query time and provides it to the model as context; base platform dependency, not a sufficiency proof.
Retirement workflow: The lifecycle primitive that decommissions an agent while preserving its audit trail and blocking new invocations.
Rollback time: The time from detecting an incorrect agent action to restoring the last known good state; measured as a distribution.
Router: A single agent step that selects which tool to invoke; a common single-step validator where classifier metrics apply.
Scheduled autonomy: Movement up the autonomy ladder triggered by a count or a calendar date rather than demonstrated competence; the pattern in the Utah Doctronic case.
Semantic validation: Verification that the agent’s output matches what the rubric expects; complementary to state validation.
Sensitivity: Same as recall; the proportion of truly positive cases correctly identified. A single-step classifier metric.
Sequential Tool Attack Chaining (STAC): An adversarial pattern that chains legitimate-looking tool calls toward an unauthorized outcome; over ninety-percent attack success on GPT-4.1 across four hundred and eighty-three test scenarios (AWS / UC Berkeley, 2025).
Shadow workflow: A parallel manual process maintained by users alongside the agent, indicating distrust; the canonical sign of a failing Channel 2.
Silent degradation: Gradual erosion of agent performance without a triggering incident; the failure mode continuous monitoring and periodic checkpoints catch differently.
Skill decay: The erosion of cognitive and strategic skills in people when agents take over tasks; invisible to the performer, visible in downstream handoffs.
Specificity: The proportion of truly negative cases correctly identified; a single-step classifier metric.
State validation: Verification that an agent’s claimed action produced a corresponding state change in the target system; complementary to semantic validation.
Suggestion engine: A declared agent system type: surfaces options and waits; the human always decides and acts.
Suitability assessment: A pre-build gate determining whether a problem is appropriate for an agent: bounded objective, tolerable error, clear success signal, delegatable authority, recoverable consequences, measurable outcome.
Supervision paradox: The structural condition that the deployment that requires human supervision is the same mechanism that erodes the supervisor’s ability to perform it; framework anchored on Bainbridge 1983 and the recent empirical literature.
Supervisory system: The second product shipped with every agent: the tools, metrics, and interventions the human uses to supervise agent behavior.
System type: The declared category of an agent (suggestion engine, copilot, autonomous actor) that determines UI affordances and user expectations.
Task success rate: The proportion of agent tasks that completed the user’s intended outcome, distinct from task completion rate.
Tolerable error: The explicit declaration of which categories and magnitudes of agent error are acceptable before a problem is suitable for agent automation.
Tool boundary: The enumerated, logged set of systems an agent can access; the most concrete expression of the agent’s authority.
Tool call accuracy: Whether the agent invoked the correct tool with correct parameters; a single-step validator where classifier metrics apply.
Tool privilege escalation: An adversarial pattern that uses an agent’s authorized tool calls to gain unauthorized access; present in ninety-five percent of audited production deployments (2025).
Trajectory: The full sequence of reasoning steps, tool calls, and observations an agent produces for one task.
Trajectory evaluation: Eval of the full multi-step agent path end to end, scored by match against a reference trajectory (exact, in-order, unordered, or superset) or by judge against criteria.
Trust boundary: The demarcation between inputs and tools an agent may act on autonomously versus those requiring explicit authorization.
Two-channel agentic design: The design discipline of treating Channel 1 (the agent) and Channel 2 (the supervisory system) as two products shipped together, not one product with a training plan.
Unintended action rate: The frequency at which an agent takes actions outside its declared autonomy boundary.
Upstream-data-wrong: A hallucination category where the model faithfully reproduces output from complete but operationally wrong input; undetectable at the model layer.
When-wrong spec: A PM artifact specifying how the system must behave when it is wrong, not only when it is right; operationalized into the four-question pre-launch review with named owners.