Chapter 9 · Across all stages, governance

From Frameworks to Boundaries

Across all stages: governance

By this point in the guide, you have a new toolkit: problem candidacy, autonomy boundaries, runtime artifacts, evals, observation metrics, and supervision design. You also have twenty years of frameworks you have used successfully in deterministic SaaS.

The temptation at this stage is binary. Either you assume the old frameworks no longer apply and throw them out, or you assume they apply unchanged and keep using them exactly as before. Both moves are wrong.

Every framework this chapter revisits was built for deterministic systems. The reason each bends under agentic conditions is the same: the system you are now designing for does not produce the same output twice, and the frameworks assumed it did. The more accurate position is that the frameworks are still right, at a different layer. They were built for a world where the PM understands the problem better than the system does. In agentic systems, the system sometimes sees more of the problem space, faster, than the humans around it. Your job now is to use the frameworks to locate where human judgment still matters and where the agent can safely take over. That is a boundary-drawing exercise, not a wholesale replacement.

One further assumption hides inside every framework in this chapter. Each assumes the thing being analyzed, the product, the market, the agent, stands still long enough to be analyzed. In agentic systems, the foundation underneath the agent turns over every three to four months, and the behavior of a framework’s object of analysis shifts with it. Chapter 8 names this explicitly. Applying any framework here without the maintenance discipline described there is applying it to the agent you used to have, not the one running in production.


What Changed Is the Layer, Not the Framework

Chapter 2 named what bends in each of these frameworks under agentic conditions. The question now is not whether they still apply. It is how the PM uses them to locate where human judgment still matters and where the agent can safely take over. The compressed read on each:

Innovator’s Dilemma. The unit of disruption is a cognitive input, not a product category. Use the framework to ask which parts of your customer’s cognitive work the agent is replacing and what they should be doing instead.

Crossing the Chasm. The whole product now includes a documented accountability chain and a legible supervision interface. Without those, the early-majority enterprise buyer will not treat the agent as core infrastructure.

Jobs to Be Done. The two-layer structure from Chapter 2 applies. Layer 1 jobs are user-articulated. Layer 2 jobs are system-discovered. If the agent acts on Layer 2 jobs without explicit design, you will find behavior your users never consented to.

User journey maps. The journey collapses into a short series of intervention points. Mapping those points is now the design work, and the artifact that replaces the wall of sticky notes is the boundary map below.

Benefits over capabilities. You are no longer selling a specific outcome. You are selling a bounded organizational capability to generate outcomes continuously, including outcomes the system discovers. The discovery pattern is now part of the product definition, and the limits on what the agent may do with a discovery are part of the sale.

Concept
The Boundary Map

When the human journey collapses in agentic systems, the journey map transforms into a boundary map with four quadrants: where the agent operates alone, where the human must be in the loop (approving before action), where the human can be on the loop (monitoring, not approving), and where the human takes over completely.

This boundary map is the design artifact that replaces the wall of sticky notes. It explicitly shows control allocation, not just user emotion, and it evolves as the agent climbs the Autonomy Ladder.

The frameworks point to a deeper shift that the rest of this chapter is about. When the object of analysis can discover jobs you did not specify and act on them within bounded authority, governance stops being a document and becomes a product surface.


The When-Wrong Spec, with Named Owners

Chapter 2 introduced the when-wrong spec as the missing artifact that separates a product from a demo with a launch date. Here is what the artifact should contain when the frameworks of this chapter converge on it, and when it has been operationalized into a four-question review with named owners across the executive team.

Agent washing, the failure pattern named in Chapter 2, is the failure mode the when-wrong spec prevents. The analyst expectation that a significant share of agentic projects will be abandoned by 2027 (referenced in Chapter 3) is largely about specifications that were never written. The when-wrong spec is the specification.

In the fully built form, four questions with four named owners.

The head of engineering owns rollback time. How long does the system take from an incorrect action to the last known good state, measured in actual production conditions? This is the framework version of Chapter 4’s recovery workflow plus Chapter 6’s rollback-time instrument, made into a single accountability item the head of engineering can answer with a number.

The CFO owns per-task cost. What does the agent cost per successful outcome, including review time, rework, and multi-agent coordination overhead? This uses the cost model from Chapter 3, with the brownfield-versus-greenfield distinction, the token-bill drivers, and the 1.5x correction. The CFO is not asked to become an AI expert. The CFO is asked one question that lives inside their existing domain.

The head of legal owns the audit surface. Is every decision the agent makes reconstructable for a regulator, an auditor, or a customer? This uses Chapter 4’s audit-surface artifact and Chapter 7’s accountability-as-design-problem. The head of legal is asked whether a real query, run six months from now, can produce a defensible answer. If the answer is “the agent did it,” the audit surface failed before the question was asked.

The head of product owns the approval moment. Where is a human required to intervene, and how is that moment designed so it actually gets used rather than bypassed? This uses Chapter 4’s approval moment plus Chapter 7’s actor-to-supervisor transition plus Chapter 1’s supervision paradox. The head of product is asked whether the approval moment is a decision package or a speed bump, and whether it survives the predictable erosion of supervisor competence over the deployment’s operational life.

Concept
Four Questions, Four Owners

The when-wrong spec operationalized into a pre-launch review the CEO runs before any agent ships. Four questions, four named owners. Engineering owns rollback time. CFO owns per-task cost. Legal owns the audit surface. Product owns the approval moment. Each owner has a single, measurable question that lives inside their existing domain.

The review converts a values debate into a capability check: if any owner cannot answer their question, the agent is not ready. Answered before launch, not during the incident. Governance becomes instrumented rather than editorial.


Governance as a Product Surface

Two of the most common failure patterns in agentic governance share the same root cause: governance exists in a document and not in the product. This section merges the earlier treatments of authorization and policy-encoding into one argument, because the reader encountering them separately wonders whether they are two problems. They are one.

Every authorization and escalation decision in this section is a Channel 2 design question. Channel 1, the agent itself, does not escalate to anyone without the surfaces this chapter describes. Channel 2 is where escalation exists.

The frameworks point to a design pattern well-established in safety-critical fields: the explicit authorization stage. Chapter 4 introduced the FAA consequence taxonomy (minor, major, hazardous, catastrophic) for runtime design purposes. This chapter uses the same taxonomy for governance. Every action the agent can take is classified by consequence before deployment. That classification determines which actions proceed without interruption, which trigger an approval moment, which require escalation to a named human with documented authority, and which the agent may never take regardless of instruction.

This is not a compliance exercise. It is the product design. The boundary is not a policy. It is the interface. And the escalation path is not an org chart. It is a product surface that must be designed, tested, and observed in production the same way every other surface is.

The translation from policy to product surface is where most teams fail. A committee defines principles. The principles become a slide deck. The slide deck becomes a training module. The training module becomes a completion metric. Meanwhile the agent is running in production with no designed escalation path, no logged boundary events, and no supervision interface that reflects any of the policies on the slides.

Policies that are not encoded into the product are not in effect. They are aspirations with a compliance timestamp.

The translation requires the PM to specify where in the UI the policy becomes visible to the user, which agent events trigger a logged boundary event, how the escalation chain is made legible at the moment it matters, and what the user sees when the agent reaches a decision requiring authorization it does not have. None of this is technically difficult. All of it requires a PM to own the translation, rather than leaving it at the boundary between product and legal.

Concept
Policy as Product

A governance policy that exists only in a document is not in effect. It is an aspiration. The PM’s job is to translate every governance policy into a product surface: where the policy becomes visible in the UI, which events trigger a logged boundary action, how escalation is made legible at the moment it matters, and what the user sees when the agent reaches a decision requiring authorization it does not have.

The policy is not a slide. It is an interface element. If it is not in the product, it is not real.

One more observation worth naming. Most enterprises already have the pieces of what good AI governance requires. They have data governance controls, access management, audit logging, bias testing capabilities, incident response processes. The gap is almost never capability. It is integration. Seventy percent of the pieces are already in place, scattered across data governance, IT governance, and operational controls. The PM’s job is to wire them into the product as a single governed system, not to build a parallel AI governance track that competes with the ones already running.


Adaptive Governance: The 3 A.M. Problem

One more framework extension this chapter needs to name. Traditional enterprise governance was built for decisions that can wait. A bank transaction can be held for review. A procurement approval can sit in a queue overnight. A compliance check can block a release. The governance was slow because the consequences of slowness were low.

Clinical operations do not have that luxury. A patient arriving at 3 A.M. with an arrhythmia cannot wait for a committee review. A rapid-response call cannot sit in a queue. The governance must be context-aware, and it must accommodate legitimate override under defined conditions without suspending audit.

Agentic AI in operational contexts inherits the same constraint. A supply-chain agent handling a factory outage, a fraud-detection agent responding to a pattern at 2 A.M., a clinical-decision-support agent escalating a critical lab value, all operate in environments where the governance pattern of finance does not apply. The rules cannot be copied from a world where waiting costs nothing to a world where waiting costs lives or dollars.

Concept
Adaptive Governance

Governance rules that do not model urgency fail at the worst possible moment. The four requirements of adaptive governance: (1) Context-aware rules, the governance path depends on the operational urgency, not only on the consequence class. (2) Structured override rights, authorized operators can override within defined conditions, with every override automatically logged and reviewed. (3) Audit trails of why, not just what, the record captures the reasoning, not only the action. (4) Learning loops, the governance design updates based on override patterns and incident review, not only on the original policy.

If your governance design treats every decision as a banking transaction, you have imported the wrong model. If it treats every decision as a clinical emergency, you have imported a different wrong model. The design job is to know which pattern applies to each decision class in your product.


The Michelin Condition

One framework worth naming, because it cuts through a category of governance argument that does not have a clean home elsewhere.

Michelin began rating restaurants in the early twentieth century to encourage motorists to drive farther, wear out their tires faster, and buy more replacements. The guide was free at the start because the business model was the tires. The accuracy of the ratings was load-bearing for the business model: people had to actually drive to the restaurants. If the ratings were corrupt, the restaurants would not draw the traffic, the tires would not wear, and the parent company’s revenue would not flow. The structural alignment between guide accuracy and business outcome made Michelin trustworthy in a way that did not require any heroic editorial commitment. Accuracy was the mechanism that drove the behavior that drove the revenue.

Call this the Michelin Condition. A guide-business is structurally aligned with accuracy when the platform’s revenue depends on the audience trusting the guide enough to act on it in the way the business model requires. The condition fails when revenue comes from a third party whose interest is not in accurate information but in the audience’s attention or behavior. Search engines, social media platforms, and ad-supported review sites all fail the Michelin Condition in different ways. The user trusts the platform to surface useful results. The platform’s revenue depends on advertisers paying for placement. The structural alignment is between platform revenue and advertiser preference, not between platform revenue and user accuracy.

Concept
The Michelin Condition

A guide-business is trustworthy when accuracy is the mechanism that drives the behavior that drives the revenue. Michelin’s ratings were trustworthy because diners had to actually drive to the restaurants for the parent company to sell more tires. The condition fails when revenue comes from a party whose interest is not in accurate information but in the audience’s attention or behavior.

The test for any guide, recommendation engine, or AI system that surfaces information for action: ask whether the platform’s revenue depends on the guide being accurate, or merely on the audience trusting that it is. If the latter, the Michelin Condition has failed before anyone makes a corrupt editorial decision. Governance cannot assume commercial alignment with accuracy. It must be designed in explicitly, as an external constraint, not an internal incentive.

The application to agentic AI is direct. An agent that is monetized by a vendor on the basis of the answers it provides is a Michelin-aligned product when the buyer’s renewal depends on the answers being accurate. The same agent is not Michelin-aligned when the vendor’s revenue comes from somewhere else, such as advertiser payments routed through the agent’s outputs, vendor-paid placement of recommendations, or a downstream commercial relationship the buyer cannot see. Healthcare AI is currently navigating a version of this question with respect to pharmaceutical-funded clinical decision support. Consumer AI is navigating it with respect to ad-funded search and recommendation. The PM who is buying or building such a system needs to know which side of the Michelin line the revenue model sits on, because the governance work the buyer has to do is different on each side. On the aligned side, governance can lean on the commercial incentive. On the unaligned side, governance has to be a designed external constraint, not an internal one. The Michelin Condition fails before any corrupt editorial decision is made. By the time you can detect the corruption, the structural alignment was already gone.1


The Iceberg Applied to Audit

One thread from Chapter 1 lands in this chapter as a practical design observation. The audit surface, as a product requirement, is the tip of the iceberg. The context that makes the audit surface interpretable, organizational conventions, data lineage, delegated authority, prior decisions, relationship history, is the mass below the waterline. If the audit surface records only the observable action and does not carry enough of the context to make the action interpretable, the surface is technically present and practically useless. The three things lost in transit from Chapter 1, relationships, calculated semantics, security context, are the things the audit surface has to carry, not strip.

A practical test. Take any logged action in your product. Show it to a reviewer who was not in the room when the agent acted. Can the reviewer tell what the agent was asked to do, on what evidence, with what authority, and why the action was the right call? If not, the surface is the tip of the iceberg with no context attached.


What Do We Do With Frameworks Now

The frameworks now serve a different primary purpose. They used to help the PM understand the user and specify the software. They now help the PM locate where the human still matters and where the agent can be trusted to act.

That is a different kind of PM judgment. It requires holding two questions simultaneously. Where does the user’s expertise remain the authoritative input? And where does the system now see more than the user does? The user journey map collapsed into a boundary map. The benefit was not a specific feature. It was an organizational capability that could not be fully enumerated at the time of sale. In each case, the framework still worked. The PM who knew which layer to apply it at got the right answer. The PM who applied it at the old layer got a well-organized answer to the wrong question.

The core craft has always been drawing the line between what the software should do and what the human should remain responsible for. In agentic AI, that line is no longer static. It moves as the agent earns autonomy, as users develop supervisory skills, as the system’s reliability in specific domains is established and demonstrated. The PM’s job is to design that movement deliberately: defining the criteria for each rung of the autonomy ladder, the evidence required to move up, and the signals that trigger a step back down.

That is the judgment call the frameworks were always training you for. The surface has changed. The underlying discipline has not.

Notes

  1. The Michelin Condition is treated at length in Friedman, “The Guide Is Not the Business,” data-decisions-and-clinics.com, 2026. The healthcare-AI application (pharmaceutical-funded clinical decision support and the gap between physician trust and patient verification) is developed there in more detail than this chapter allows. The general principle, that governance cannot assume commercial alignment with accuracy and must be designed in explicitly when the alignment is missing, is the load-bearing claim for any guide-business model.