Reference

References

Consolidated bibliographic references for the empirical claims in the book, organized by the chapter where each is primarily anchored. Per-chapter endnotes carry the in-text superscripts; this page is the complete consolidated reference list. Articles by the author at data-decisions-and-clinics.com are cited inline rather than expanded here.

The Supervision Paradox (Chapters 1, 7, 10)

Bainbridge, L. “Ironies of Automation.” Automatica 19(6):775–779 (1983).

Anthropic. “How AI Assistance Impacts the Formation of Coding Skills.” Anthropic Research, 2026. anthropic.com/research/AI-assistance-coding-skills

Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, O., Mariman, R. “Generative AI Can Harm Learning.” PNAS 122(44), 2025. (Originally circulated as SSRN 4895486.)

Lightrun. “State of AI-Powered Engineering Report.” 2026.

Pragmatic Engineer. “The Impact of AI on Software Engineers in 2026.” Pragmatic Engineer Newsletter.

Budzyń, K. et al. “Endoscopist Deskilling Risk after Exposure to Artificial Intelligence in Colonoscopy: A Multicentre, Observational Study.” The Lancet Gastroenterology and Hepatology 10(10):896–903 (2025). doi:10.1016/S2468-1253(25)00133-5.

Abdulnour, R-EE., Gin, B., Boscardin, C.K. “Educational Strategies for Clinical Supervision of Artificial Intelligence Use.” New England Journal of Medicine 393(8):786–797 (2025).

European Aviation Safety Agency. Safety Information Bulletin 2025-09, “Manual Flying Skills Degradation,” September 2025.

Maguire, E.A., Gadian, D.G., Johnsrude, I.S., et al. “Navigation-Related Structural Change in the Hippocampi of Taxi Drivers.” PNAS 97(8):4398–4403 (2000).

Dahmani, L., Bohbot, V.D. “Habitual Use of GPS Negatively Impacts Spatial Memory During Self-Guided Navigation.” Scientific Reports 10:6310 (2020).

Javadi, A.H. et al. “Hippocampal and Prefrontal Processing of Network Topology to Simulate the Future.” Nature Communications 8:14652 (2017).

Kosmyna, N., Hauptmann, A., Olwal, A., et al. “Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing.” arXiv 2506.08872 (2025).

Shaw, S.D., Nave, G. “Thinking, Fast, Slow, and Artificial: How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender.” SSRN 6097646, Wharton Behavioral Lab (2026).

Chen, J.H., Pfeffer, M.A., Longhurst, C.A. “Why Are Humans Still in the Loop with Advancing AI Capabilities?” BMJ Digital Health 2:e000057 (2026). doi:10.1136/bmjdh-2026-000057.

Cost Model (Chapter 3)

HfS Research. RPA licensing economics surveys (2024 to 2025).

Deloitte Global RPA Survey, 2024.

EY. “Get Ready for Robots.” RPA practitioner analysis, 2024.

Gartner. “Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027.” June 2025 / February 2026.

McKinsey State of AI Global Survey, 2025.

KPMG AI Pulse Survey, Q1 2026.

Yao et al. “Pass@8 Reliability for GPT-4-class Agents in Retail Customer Service.” arXiv 2406.12045.

Tipirneni, R. et al. “DAX Copilot Randomized Controlled Trial.” NEJM AI, 2024, n=215 physicians.

Klarna. Press release, February 27, 2024 (AI customer service assistant deployment).

Bloomberg. “Klarna CEO Siemiatkowski on Cost as Predominant Evaluation Factor.” May 8, 2025.

Digital Applied Customer Service AI Statistics 2026.

SAP Joule pricing documentation. Salesforce Agentforce / Flex Credits pricing. ServiceNow April 2026 platform consolidation. Microsoft Copilot Studio billing rates (Microsoft Learn).

Adversarial Security (Chapter 4)

Greshake et al. “Sequential Tool Attack Chaining in Agentic AI.” arXiv 2509.25624v2, AWS / UC Berkeley (2025).

“Toward an Immune System for Agentic AI.” Stanford / MIT CSAIL / CMU / Elloe AI, 2025.

OWASP Top 10 for Agentic Applications 2026. OWASP Gen AI Security Project, December 2025. genai.owasp.org.

MAESTRO: Multi-Agent Environment, Security, Threat, Risk, and Outcome framework. Cloud Security Alliance, February 2025.

Microsoft Agent Governance Toolkit. Microsoft Open Source, April 2026.

PyRIT v3 (Python Risk Identification Tool). Microsoft Azure.

Anthropic. Claude Mythos Preview red team report (2026). red.anthropic.com/2026/mythos-preview/

“Anthropic Is Giving Some Firms Early Access to Claude Mythos to Bolster Cybersecurity Defenses.” Fortune, April 2026.

Anthropic Project Glasswing (2026). anthropic.com/glasswing.

Crane, J. PocketOS / Cursor incident report (April 2026). runcycles.io.

Evaluation (Chapter 5)

Zheng, L. et al. “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.” NeurIPS 2023, arXiv 2306.05685.

Saadi, S., Fleti, F., Rajjoub, O.H. et al. “Comparison of Three Large Language Models’ Ability to Assess the Risk of Bias Using ROBINS-I Tool.” BMJ Digital Health 2:e000034 (2026).

Zou, Y. et al. “PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation.” USENIX Security 2025.

Turpin et al. “Language Models Don’t Always Say What They Think.” arXiv 2305.04388, 2023.

Heinze et al. JAMIA, 2024 (FHIR extension proliferation in large healthcare organizations).

Observation and Drift (Chapters 6, 8)

Akhawe, D., Felt, A.P. “Alice in Warningland: A Large-Scale Field Study of Browser Security Warning Effectiveness.” USENIX Security 2013 (Chrome SSL warning click-through rates).

Parasuraman, R., Manzey, D.H. “Complacency and Bias in Human Use of Automation.” Human Factors, 2010.

Vaughan, D. The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA. University of Chicago Press, 1996. (Origin of normalization of deviance.)

Wong, A. et al. “External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients.” JAMA Internal Medicine, 2021. (Epic Sepsis Model audit, n=27,697 patients, 38,455 hospitalizations.)

Haug, C.J., Harrison, E.M. “Which Human-in-the-Loop? Why Context, Culture, and Health Systems Matter.” NEJM AI 3(3) (2026). doi:10.1056/AIe2600084.

Rotalinti, Y., Ordish, J., Liu, X. et al. “Identifying and Understanding Significant Change Due to Drift When Assessing AI Models in Healthcare.” BMJ Digital Health 2:e000085 (2026). MHRA expert working group.

Leung, H-H., Duckworth, C., Burns, D., Guy, M., Boniface, M. “How Data Drift Impacts the Safety and Interpretability of Machine Learning Models Predicting Risk from Blood Glucose Control.” BMJ Digital Health 2:e000269 (2026).

Chen, L., Zaharia, M., Zou, J. “How Is ChatGPT’s Behavior Changing Over Time?” arXiv 2307.09009 (2023).

Stankowski et al. CHI 2026, environmental reversion field study at SAP and Microsoft.

Pan, M.Z. et al. “Measuring Agents in Production.” arXiv 2512.04123 (2025).

Noori, M. et al. “A Global Log for Medical AI” (MedLog). arXiv 2510.04033, Harvard DBMI (2025).

Moffatt v. Air Canada, 2024 BCCRT 149.

OpenTelemetry GenAI Semantic Conventions, v1.37+. GenAI SIG, April 2024 onwards.

Hallucination, Equity, Obligations (Chapter 10)

Kim et al. “Medical Hallucination in AI: A MIT / Harvard Medical School / Google Research Taxonomy.” 2025.

Matthias, A. “The Responsibility Gap: Ascribing Responsibility for the Actions of Learning Automata.” Ethics and Information Technology 6:175–183 (2004).

Anthropic. “Constitutional AI: Harmlessness from AI Feedback.” 2022.

Anthropic. “Constitutional Classifiers.” 2025.

Hwang, Y-M., Rice, B.T., Hernandez-Boussard, T. “The Inverse Care Law in the Age of AI: Geographic Disparities in Health Care Technology Access.” NEJM AI 3(4) (2026).

Amirfar, S.J. “I Hope You Are Doing Well: Will AI Widen or Close Health Care’s Disparity Gap?” NEJM AI 3 (2026).

European Union. Regulation (EU) 2024/1689 (Artificial Intelligence Act), Article 10 (data governance for high-risk AI systems), 2024.

Author’s Articles Referenced

The following articles by Yoram Friedman are cited as the longer treatments of arguments compressed in this book. All available at data-decisions-and-clinics.com unless otherwise noted.

“The Healthcare AI Spectrum” (seven generations of healthcare AI). “The Quiet Erosion” (cognitive-decay synthesis). “The Last Generation That Can Supervise AI” (Bainbridge applied to clinical and software). “What Physicians Know That Cannot Be Written Down” (medium constraint). “The Education Model Is Cracking” (apprenticeship erosion).

“The Cost Model Your Business Case Is Missing” (SAP Community series). “Utah Climbed the Autonomy Ladder. Nobody Designed the Rungs” (earned-versus-scheduled autonomy). “Stop Waiting for Clean Data”.

“You Built the Agent. Nobody Designed the Experience” (four runtime artifacts). “Security Was the Next Sprint” (STAC, 847-deployment audit, OpenClaw). “The Agent Worked, Limitless and Unguarded” (flea magnet, fence-and-model).

“What the Checkmarks Actually Prove” (SAP Community version of three eval breaks). “You Cannot Measure What You Did Not Design” (observation phase). “The Stack Is Green. The Agent Is Wrong” (data observability).

“Your Agent Worked. Your Users Bypassed It” (Stankowski environmental reversion).

“Silent Degradation: What a Deployed Clinical AI Looks Like at Month Eighteen”. “The Architect Who Should Have Read JAMA”.

“Governance: The Word Nobody Agrees On”. “Why Healthcare AI Governance Isn’t What You Think It Is”. “The 3 A.M. Problem”. “The Guide Is Not the Business” (Michelin Condition).

“What You Owe the People Who Will Never Be in the Room”. “Not All AI Errors Look Like Errors”. “Trained on the Wrong End of the Story”. “Do No Harm, Encoded” (Constitutional Runtime Layer).

The SAP Community series (posts 0 through 9) is published on community.sap.com.