Back Matter

References and Sources

A note on sourcing. The evidence tiers used throughout the book are defined in the preface: peer-reviewed and primary regulatory material; documented industry incidents; litigation-stage allegations (presented as alleged); vendor and company claims (labeled as such); the author’s own cases; and a few illustrative composites (flagged on the page). The list below is the working bibliography by chapter; full formal citations are compiled at typeset. Dates and figures in a fast-moving field are stamped to mid-2026 and should be re-verified before publication.

Front matter and Part I: Decide

Field snapshot. Time-reallocation evidence: synthesized across multiple 2026 industry productivity studies and presented as an informed estimate, not a single measured finding.

AI literacy (Chapter 1). The four-layer stack: author framework. BIDMC / o1-preview diagnostic comparison (model 67.1% exact-or-close at triage vs 55.3% and 50.0% for two internal-medicine attending physicians, 76 ED cases): Brodeur et al., “Performance of a large language model on the reasoning tasks of a physician,” Science, April 2026. Supervision paradox preview: Bainbridge (1983) and the modern deskilling literature, fully cited in Chapters 17–18. Constitution as the agent’s behavioral contract: the term originates in Anthropic’s Constitutional AI (Bai et al., arXiv:2212.08073, 2022); “Claude’s Constitution,” Anthropic, 2023.

Not a bridge (Chapter 2). Bridge operator vs architect, the Autonomy Ladder, two-channel design: author frameworks.

Not every problem (Chapter 3). Token-price decline (~10x/yr) and the architecture multiplier: industry pricing data, figures stated as a range and stamped to mid-2026. Four suitability tests and earned-vs-scheduled autonomy: author frameworks. Utah Doctronic (2026 prescription-renewal pilot under a regulatory sandbox; NEJM Perspective NEJMp2601148). Klarna (assistant reported as doing the work of ~700 agents, then rebalanced toward humans): company-reported figure, labeled as a company claim. MVP house of cards: author article, “The House of Cards: Why MVP Thinking Is Breaking Healthtech” (2026).

Part II: Prototype & Collaborate

How the Work Splits (Chapter 4). The two-channel operating model, the dual brief (Human / Executable), and the work-unit shift: author frameworks. GitHub’s open-source Spec Kit (github.com/github/spec-kit; “Spec-driven development with AI,” The GitHub Blog, 2026).

Vibe coding (Chapter 5). AI-generated-code vulnerability figures: Perry et al. (Stanford, 2023) and subsequent 2024–2025 code-security audits (Veracode 2025 GenAI code-security report; Cloud Security Alliance vibe-coding research note). Package-hallucination / slopsquatting: Lanyado / slopsquatting research, 2024–2025. Lovable audit (a featured app with inverted authentication exposing ~18,697 records): security-researcher disclosure and trade coverage (Vibe Graveyard, 2025–2026).

Collaborator (Chapter 6). ChatGPT-Health under-triage of gold-standard emergencies (~52% routed to 24–48h care rather than the emergency department): Mount Sinai (Icahn School of Medicine), “ChatGPT Health performance in a structured test of triage recommendations,” Nature Medicine 2026 (s41591-026-04297-7).

Part III: Design

Design the behavior (Chapter 7). System-type declaration, four runtime artifacts, consequence classification: author chapter. The placement rule for the autonomy boundary draws on Eric Horvitz, “Principles of Mixed-Initiative User Interfaces” (CHI 1999). Current security frameworks: OWASP Top 10 for Agentic Applications (late 2025), MAESTRO (Cloud Security Alliance, 2025), and CISA / Five Eyes “Careful Adoption of Agentic AI Services” (April 2026). PocketOS nine-second deletion: primary first-hand account by Jer Crane (@lifeof_jer), X, 25 April 2026 (https://x.com/lifeof_jer/status/2048103471019434248). This is a separate incident from the 2025 Replit production-database deletion (Chapter 10).

Two kinds of HITL (Chapter 8). Regulatory tiers: EU AI Act Article 14; California SB 1120 (signed 2024, effective Jan 2025); the CMS prior-authorization pilot; credit adverse-action reason requirements; the Colorado AI Act (SB 24-205). Cigna PXDX claim-review allegations (physicians signing >300,000 denials over ~2 months, ~1.2s each; of appealed denials, ~90% alleged overturned): ProPublica (2023) and Kisting-Leung v. Cigna class action; litigation-stage, presented as alleged.

Evals (Chapter 9). The three eval breaks, pass@k, judge-bias, state-vs-semantic validation: author chapter. DAX Copilot RCT (high adoption and improved physician task-load ratings, but no statistically significant change in documentation time vs. control: DAX time-in-note -1.7%, p=0.66): “Ambient AI Scribes in Clinical Practice: A Randomized Trial,” NEJM AI 2025.

Part IV: Operate

Operational guardrails (Chapter 10). Platform cost-control reality: synthesized from platform documentation and 2026 agent-ops practice. The Replit production-database deletion under an unenforced code freeze (Jul 2025; Replit CEO public apology; AI Incident Database entry; documented by Jason Lemkin): public record. Distinct from the PocketOS incident (Chapter 7).

Observation (Chapter 11). The observation phase, the six instruments, platform-emits-PM-composes: author chapter. The procurement agent (340 transactions, fully confirmed, none delivered, six months): illustrative composite.

Silent degradation (Chapter 13). Epic Sepsis Model v1 external-validation critique: Wong et al., JAMA Internal Medicine 2021. GPT-4 behavioral drift (prime-number / benchmark task, ~84% to ~51% over three months): Chen, Zaharia & Zou (arXiv 2023; Harvard Data Science Review 2024).

Audit trails (Chapter 14). Decision provenance, the sealed decision artifact: synthesized from EU AI Act Articles 12/19/26, GDPR Article 22, sector retention rules, and the 2026 LLM audit-trail academic framework. Estate of Lokken et al. v. UnitedHealth Group (D. Minn., filed Nov 2023; nH Predict algorithm in Medicare Advantage post-acute denials): plaintiff allegations, not adjudicated facts.

Part V: The Human System

Change management (Chapter 16). Actor-to-supervisor transition: author frameworks. United 173 (1978) as a catalyst for Crew Resource Management (developed by United/NASA, 1979–1981): aviation human-factors record.

Why HITL fails (Chapter 17). The Loop Test: new framework. Supervision paradox: Bainbridge (1983) and the modern literature. ACCEPT / Budzyń colonoscopy result: peer-reviewed.

Skill erosion (Chapter 18). Deskilling: Budzyń et al. / ACCEPT trial, Lancet Gastroenterology & Hepatology 2025 (unassisted adenoma detection 28.4% → 22.4% after AI exposure). Never-skilling: Bastani et al., PNAS 2025 (high-school math; GPT-4 access improved practice but degraded subsequent unaided performance). Cognitive surrender (~80% acceptance of incorrect AI answers, 79.8% in the relevant condition, across 1,372 participants): Shaw & Nave, “Thinking, Fast, Slow, and Artificial,” Wharton, 2026.

The agent as team member (Chapter 19). The agent HR stack and its constituent constructs: author frameworks.

Part VI: Carry the Weight

The people the agent never sees (Chapter 20). User-vs-affected-person, disparate performance, the constitutional runtime layer: author chapter. The supervision-paradox-meets-regulation argument: cross-referenced to Chapters 17–18; BMJ Digital Health 2026 and the deskilling literature.

Agent behavior governance (Chapter 21). The four supervisory dimensions, the governance stack, the two-product thesis at scale: author chapter. Waymo two-year San Francisco safety data (2.1M miles autonomous; AI-fault rate 0.51 per million miles, human-fault rate 7.0; extrapolation to statistically significant sample requires ~690k miles): Waymo Research, “Waymo’s collision and other safety reportable incident data compared to human benchmarks for the purpose of safety assessment,” 2024. Figure labeled as company-reported.

A day in the life of the Agentic PM (Chapter 22). Illustrative composite synthesized from the book’s frameworks. Not a single case; representative of the supervisory discipline the book argues for.

Glossary