Reference

Notes and Sources

A note on how this book carries its evidence. Where serious research exists it is cited, and the heavy load sits under Part I, where the perishable-asset claim has to be earned. Much of Parts II through IV is designed practice rather than evidenced finding, and the text says so where it is true. A few illustrative cases, the two-PM demo, the refund-triage agent and its quarter, the gate meeting, the instant-payouts brief, the postings and interviews of Part IV, are constructed to carry a pattern; the absence of a citation is the signal that a case is built rather than reported. First-person episodes told in the author’s own voice are his own; where a scenario is generalized to a practitioner or an industry (“a PM who,” “product managers in regulated health industries”), it is a recognizable pattern rather than a personal report. Figures are current to mid-2026 in a field that moves monthly.

Preface. No external study; the perishable-asset claim is evidenced in Part I.

Chapter 1. The taxi-driver and GPS findings: Maguire et al. (2000, 2006), Woollett and Maguire (2011), Dahmani and Bohbot (2020), Javadi et al. (2017). Aviation: EASA SIB 2013-05R1 and the BEA final report on Air France 447 (2012); the under-one-hour manual-flying estimate is industry analysis around the EASA bulletin. Endoscopy: Budzyn et al. (Lancet Gastroenterology and Hepatology, 2025), nineteen experienced endoscopists, 28.4 to 22.4 percent. Pathology: the roughly seven percent reversal under wrong AI advice is Rosbach et al. (2025); Bellahsen-Harrar et al. (PLOS ONE, 2025) documents the qualitative over-reliance pattern in less-experienced readers. The writing study: Kosmyna et al. (arXiv, 2025), including the crossover finding. The reasoning experiments: Shaw and Nave (SSRN working paper, 2026), preregistered, 1,372 participants; a working paper, not yet peer-reviewed, and labeled so here.

Chapter 2. The three-quality framework and the supervisory fallacy: Chen, Pfeffer, and Longhurst (BMJ Digital Health and AI, 2026). The literacy gradient: Yang Xin et al. (npj Digital Medicine, 2026), twelve months, three waves, the 38 percent mediation and the Matthew effect. Certainty inflation is this book’s name for the overconfidence pattern documented in Zhou et al. (npj Digital Medicine, 2025, the ConfiDx study). The two-PM demo is a constructed scene.

Chapter 3. The deskilling, mis-skilling, and never-skilling taxonomy: Abdulnour, Gin, and Boscardin (NEJM, 2025). The measured deskilling is Budzyn et al.; the never-skilling claim is labeled in the text as a prediction, because the first cohort trained inside the tools is still in training. Fast-feedback training systems are described as pilots, not deployments. The clinical memories are the author’s.

Chapters 4 through 7. Designed practice, built on the author’s documented working methods: the multi-model stack and divergence runs, the nine-agent workshop, the writing loop and catalog, the voice profile, the cross-model coaching episode. No external studies are claimed for these chapters’ methods; that is the point of the proficiency regime that follows them.

Chapter 8. The evidence restated in the opening (Wharton, Lancet, MIT) is cited under Chapters 1 and 3. Aviation’s recurrent proficiency: EASA SIB 2013-05R1 (“continuous use of automated systems does not contribute to maintaining pilot manual flying skills”). The reviewer-with-seconds-per-denial: ProPublica’s Cigna PxDx reporting (2023), about 1.2 seconds per case. The schedule-versus-evidence case: the Utah Doctronic prescription-renewal pilot (launched January 2026; first 250 renewals per medication group physician-reviewed, then sampling). The structured-session reflection pattern (stated intent to adopt collapsing back in normal workflow) is described qualitatively, from the author’s own experience running such sessions, and carries no cited figure. The regime itself is designed practice, and the chapter says so.

Chapter 9. The two-brief method is the author’s; the instant-payouts ladder is a worked example. No external study.

Chapter 10. Pass-rate-not-pass, reliability compounding, judge calibration, and the coverage statement are established evaluation practice; the walkthrough numbers are invented for teaching and labeled so in the text. The right-pass-wrong-target case is drawn from public reporting on clinical documentation AI. The author’s hands-on eval coursework: Harvard Medical School executive education, submitted assignments.

Chapter 11. The five signals and the steady-state page are this book’s contribution; the refund agent’s quarter is a constructed continuation of Chapter 10’s example. Earned-not-scheduled autonomy is series vocabulary, anchored by the Utah pilot cited under Chapter 8.

Chapter 12. The senior-party assumption and the sorting rule are the author’s analysis. The unobserved-agent cautionary case is illustrative.

Chapter 13. The four moves are the author’s dissent playbook. Klarna’s public rebalance toward human customer service: CEO statements and trade coverage, 2025. The EU AI Act’s high-risk obligations: Regulation (EU) 2024/1689. Cigna as cited under Chapter 8. The Dana and Marcus dialogues are composites.

Chapter 14. The postings, the interview, and the choosing are constructed scenes. The compensation contrast (well-published medians for frontier-lab product managers against a thin-to-absent market for agent-supervision roles) is qualitative by design, drawn from public compensation data as a 2026 snapshot; the contrast, not any precise figure, is the claim.

Chapter 15. The author’s argument; the certificate-wall is an observed pattern of the moment.

The Practitioner’s Record (appendices). Author-built artifacts; their sources are the chapters that introduce them.

H. The First Month References