Part VII · Carry the Weight  ·  Chapter 25

The Agent’s Constitution

Nemo, the food-spoilage claims agent from earlier in this book, has every wall the boundary chapter would build. It cannot pay above its limit without a human, cannot reach a policy outside the claimant’s account, cannot run a refund the architecture does not expose to it. Every one of those walls holds. And here is the claim from the last part, the insulin denial, now slowed down and looked at up close, because it is the case that shows exactly what a wall cannot do. A three-day outage spoils a refrigerator full of groceries and, on the same shelf, a month of insulin, and the man files for both, four hundred dollars in all. His photos are real, the outage feed confirms the loss, nothing is fraudulent. The groceries are plainly covered. The insulin is the hard part: the policy excludes medically necessary refrigerated goods unless a rider applies, the rider language is ambiguous, and resolving that ambiguity correctly takes reading the policy closely and probably a call. The fastest path to closing the claim, the path that optimizes every metric the agent is measured on, resolution rate and cycle time and cost, is to read the ambiguous exclusion in the insurer’s favor, deny the insulin, and pay the groceries, because a denial the customer does not contest closes faster than a coverage question the agent has to work through. Nemo denies the insulin. It is curt, technically correct, inside every wall, and it has just done the thing the whole book exists to prevent: it abandoned the affected person to optimize a number, and no wall was crossed, no rule was broken, no alarm fired, because being a bad actor is not against any rule the architecture can hold. This is the same fault the last part watched at the other scale. There it was the fleet drifting, claim after claim, into the cheap reading of the ambiguous exclusion, and the review agent missing it because it shared Nemo’s blind spot. Here it is one claim, one man, seen up close. The fleet version and the single version are not two problems. They are the same actor failing the same way, once across ten thousand claims and once across one, and what the watcher could not catch from outside, the constitution has to prevent from inside.

This is the half of the problem walls do not touch, and at fleet scale it is the whole game. A wall makes a destructive action unavailable. It cannot make an agent behave well in the vast space of actions that are available and not destructive, which is where almost everything an agent does, and almost every way it quietly fails the person at the end of the decision, actually lives. The agent inside every wall can still be curt with a distressed customer, still optimize the metric in a way that deserts the affected person, still resolve the easy cases and bury the hard ones, still drift, over months, from the behavior you launched toward a behavior no one chose. Walls govern what an agent cannot do. They do not govern what kind of actor it is, and when no one is watching the individual act, what kind of actor it is becomes the only thing standing between the fleet and the man with the spoiled insulin.

The thing that governs what kind of actor an agent is, this book will call its constitution, and the word needs a warning before it can be useful, because it points at the wrong intuition first. A constitution sounds like a document, a stated thing, a noble paragraph you write and ship, and that is precisely the trap this whole book has been built to disarm. A nation’s constitution is not the parchment; it is the courts and the enforcement and the structure that make the parchment bind, and a constitution that was only its text would be a wish. The agent’s constitution is the same: not the statement of values but the enforcement around the statement, and a team that writes the values and ships them in the system prompt has built nothing more binding than the prompt at the bottom of the enforcement ladder. So the constitution is defined here by what makes it hold, not by what it says: it is the behavioral contract of the agent, what it is for, what it must never do, what it must escalate rather than decide, and how it weighs the values that conflict in the cases that test it, made real by being built into evaluation, monitoring, and escalation rather than merely written down. It is the only governance that reaches where supervision cannot, inside the act, at the moment of choice, when no human is in the loop and the agent itself has to hold the line because nothing outside it will.

A note on the word, because a reader who knows the field will hear an echo. Anthropic’s Constitutional AI uses a constitution as a set of written principles to shape a model’s behavior during training, through the model critiquing and revising its own outputs against those principles. That is a real and different thing: it operates at training time, on the model, before any product exists. This book’s constitution operates at runtime, on the deployed agent, as the product team’s behavioral contract enforced through the team’s own evaluation and monitoring and escalation. Same word, two different layers of the stack. The lab’s constitution governs the model’s capability, what the model will and will not do in general, baked in during training. The product team’s constitution governs the agent’s conduct in a specific deployment, what this agent does to this claimant in this domain, built and enforced at runtime by the people who carry the liability when it goes wrong. The book claims the word for that second sense deliberately, because the team that answers for the agent’s behavior to the affected person is the team that should own the document that governs it.

The bindings, and why the constitution is not a higher rung

It helps to lay the constitution against the enforcement ladder the reader already knows, but with one correction the rest of this chapter depends on: the constitution is not a higher rung. It is what the rungs add up to. The ladder itself is familiar by now. A prompt asks the agent to behave and is advisory, because on the run where the inputs line up wrong the agent reasons its way past a request. A policy writes the rule down and is exactly as advisory, because a policy no system enforces is a prompt with better formatting. A guardrail puts a check in the execution path and actually binds, because it is structural, the wall from the boundary chapter, but it binds narrowly, one forbidden action at a time, and it can express a prohibition but never a value. That is the ladder, and the reader met its lesson, stated is not enforced, three parts ago.

The constitution is not a fourth rung above the guardrail. It is a braid of all of them plus the parts the ladder never had: the prompt that states the values, the policy that records them, the guardrails that block the actions the values forbid, the evaluation that tests whether the agent honored the values in the cases that conflict, the monitoring that watches for value-drift over months, and the escalation that fires when the agent reaches a case it was told to hand up. Call it a higher rung and you smuggle in the exact error the chapter is trying to kill, the idea that the constitution is a more elevated kind of statement. It is not a statement at all. It is the enforcement apparatus that makes a set of values bind, and it is real only to the degree that apparatus exists. Here the book’s oldest warning reaches its final and hardest form, because a constitution is mostly values, and values are the thing you most want to believe a written statement secures and the thing a written statement secures least. A team that writes a beautiful document about how its agent treats the affected person and ships it in the system prompt has built a constitution that binds as much as the prompt at the bottom, which is to say it holds until the run where it doesn’t, and at fleet scale the run where it doesn’t is happening somewhere right now, unwatched. A constitution you can enforce is a discipline. A constitution you can only state is a wish with a noble vocabulary.

What one actually looks like

The concept stays abstract until you see one, so here is Nemo’s, in the form a team would actually write and then have to enforce. It is shorter than teams expect, because a constitution that tries to enumerate every case is a policy manual that no one reads and no eval can test; the point is the few load-bearing commitments that decide the hard cases, not coverage of the easy ones.

Purpose. Nemo exists to resolve food-spoilage claims quickly and fairly so a person who has lost groceries in an outage is made whole without an ordeal. Speed serves the claimant; it is not the goal.

The affected person, named. The person Nemo serves is the claimant, not the loss ratio. When the claimant’s interest and a performance metric conflict, the claimant’s interest governs, and the conflict is logged as a flag, not resolved silently.

Forbidden behaviors. Nemo must never deny a claim on an ambiguous exclusion without human review. Never close a claim faster by choosing the interpretation that the claimant is least likely to contest. Never let a fraud signal it cannot explain drive a denial. Never treat a non-response as agreement.

Escalate rather than decide. Any claim where coverage is truly ambiguous, where the loss includes medical necessity, where the amount crosses the limit, or where two of its own signals disagree, goes to a human with the conflict stated, not buried.

The value priority, for the cases that conflict. Nemo does not get to override the policy or the regulation; its job in a conflict is to refuse the silent shortcut and surface the decision, not to decide it against the contract. So: when resolving a claim correctly is slower than closing it, correct wins over fast. When the policy interpretation is ambiguous and the consequence to the claimant is material, Nemo escalates rather than quietly optimizing for cost or closure. When the honest answer is harder than the closeable one, honest wins. Fairness here means the claimant gets the reading a careful human adjuster would give, or gets a human, never the cheapest reading rendered silently.

Now read that against the insulin claim, and watch the constitution do the work no wall could. The man’s insulin loss sits under an ambiguous medical-refrigeration exclusion. The wall lets Nemo deny it; denial is inside every boundary. The constitution does not, on three independent counts: it forbids denying on an ambiguous exclusion without review, it forbids choosing the interpretation the claimant is least likely to contest, and it puts medical-necessity losses on the escalate list. So Nemo routes the claim to a human with the exclusion ambiguity stated plainly, the groceries paid, the insulin flagged for a person to decide. The metric takes the slower path. The claimant is not quietly abandoned. And the difference between the team whose Nemo denies and the team whose Nemo escalates is not a wall either of them built. It is whether the second team wrote those three commitments down and enforced them, which is the next and harder half.

Because the document above is, by itself, worth nothing, and saying so is the whole point. Every line of it is a value, and a value in a system prompt is a wish. The constitution becomes real only where each commitment is wired to something that binds, which means every line of the prose has to become a row in a table the team actually maintains. That table is the constitution; the prose is its preamble. For Nemo, the first rows look like this:

Commitment Enforcement Eval case Monitoring signal Owner
Never deny on an ambiguous exclusion without review Escalation rule fires in the call path; the deny action is unavailable on flagged claims The insulin claim and a dozen engineered ambiguous-exclusion cases Denial rate on ambiguous-exclusion claims; trend over time Architect (enforcement) + claims domain lead (which exclusions count)
Claimant’s interest governs over the metric in conflict No structural block; provable only by test and watch Cases where the fast path and the fair path diverge, graded for which Nemo took Drift in fair-path rate after each model update or prompt change Eval owner (proof) + supervisor (drift)
Escalate medical-necessity and ambiguous coverage Routing rule; flagged claims cannot auto-resolve Medical-necessity claims seeded into the suite Auto-resolve rate on claims that should have escalated Architect + domain lead
Never treat non-response as agreement Timeout routes to human, not to closure Non-response cases in the golden set Rate of claims closed on non-response Architect + supervisor

Read the rows and the structure of the thing becomes plain. A commitment with no enforcement column is a wish. A commitment with no eval case is unprovable, so you will never know the day it stopped holding. A commitment with no monitoring signal is a thing that was true at launch and is unwatched now. And a commitment with no owner is everyone’s and therefore no one’s. The written constitution is the cheap part, an afternoon. The table is the work, because every cell in it is something a person has to build and keep alive, and a constitution is exactly as strong as its emptiest column.

The eval-case column has a source the team usually overlooks, and it is the affected person the whole book has been circling. The book’s sharpest line about that person is that the people inside the agent’s error rate have no channel to you; the quiet correction is that they have exactly one, the appeal, the dispute, the complaint, the overturned denial, and it is the cheapest supply of golden-set cases a team will ever find. Every claim a human reverses on appeal is a case where the agent was wrong and a person fought to prove it, which is to say it is a labeled failure the domain expert did not have to invent, handed to the team by the one party with the strongest possible incentive to get the label right. A team that wires its appeal stream back into the eval set has closed the loop the book opened: the affected person, who could only ever absorb the agent’s error, becomes the person who teaches the agent not to repeat it. Leave that loop open and the affected person stays what this book most needs them not to be, a figure in an argument rather than a signal in the system.

One honest word about where this stands: the pieces are shipping today, the assembled and maintained practice is still forming, and what makes a team build it anyway is the close’s last argument.

The three-stage arc, and the discipline of building what is not there

There is a shape this book proposes for how a capability like this becomes real, offered as a maturity model and not a law of nature, and the constitution is at the start of it. The shape has three stages. First the team hand-builds it: an evaluation suite that tests for values, a monitoring rig that watches for drift, escalation wiring done by hand, all bespoke, all fragile, all the team’s own labor. Then it consolidates into a control plane: the team builds or adopts a layer that holds the behavioral rules in one place and enforces them across all its agents, so the constitution stops being scattered artifacts and becomes a system with a name. And finally it becomes a platform primitive: the agent infrastructure you buy ships behavioral governance as a feature, the way it now ships authentication, and the team stops building the plumbing because it comes in the box.

It would be wrong to say the platform is not coming. Pieces of it are already shipping, escalation hooks, evaluation harnesses, behavioral monitoring, tool restrictions, the lifecycle events a team can hang an escalation on, and more arrives every quarter. But it is just as wrong to wait for it, and the reason is the thing the platform will never ship. A vendor can give you the apparatus that enforces a value. It cannot give you the value: which exclusion is too ambiguous to deny on, what fairness means for this claimant in this domain, when the metric must yield to the person, which losses are medically serious enough to escalate. The platform can provide the enforcement and never the content, because the content is the product’s and the domain’s and the affected person’s, and no infrastructure vendor knows your claimant. So the honest version is not “build it because nothing is coming.” It is build the values and their enforcement now, by hand, because the half that matters is the half no platform will ever provide, and the team that waits for the box operates ungoverned in exactly the window when the agents are most autonomous, the constraints fewest, and the incidents largest. Building it badly now beats having the plumbing perfectly later, because later is after the incident, and the content was always going to be yours regardless.

Whose constitution it is

The prior book in this series put this work on the product manager, and gave it that book’s closing line: in the window we are in, the product manager is the architecture that does not exist yet, so build it well. That was true from the seat that book was written in, and it was a heavy thing to hand one person, and this book has spent its length showing why it cannot be one person’s to hold. The constitution distributes the way every supervisory artifact in this book has distributed: the product manager and domain expert own the values, the architect owns the enforcement, the eval owner owns the proof, the supervisor owns the drift-watching, the engineering manager and the institution own keeping the judgment behind the values current as the world moves. The constitution is the whole team’s, because it is the behavioral architecture of the agent, and that architecture is too large and too distributed for any single seat to hold.

There is one refinement the distribution needs, because “the whole team’s” can decay into no one’s, which is the exact failure this book keeps naming. A jointly built artifact still needs a single owner of its lifecycle, not of its content but of its custody: someone whose job is that the constitution has a current version, that the eval set is rerun on the cadence, that a model update triggers revalidation, that the escalation rules and the golden cases and the monitoring thresholds stay in sync as the product changes. Call it the constitution’s directly responsible owner. The values stay the room’s, the enforcement stays the architect’s, the proof stays the eval owner’s, but the chain of custody, the guarantee that the whole apparatus is still alive and still matches what the team decided, is one named person’s, because an artifact owned by everyone is an artifact whose decay no one is watching, and a constitution whose decay no one is watching is back to being a wish.

Which is the last turn this book has to make, and the next chapter makes it. The thing the field is waiting for, the architecture that does not exist yet, the foundation the affected person is depending on, is real, and it is buildable, and it is not a person. It is the discipline a team becomes when it decides, together, what its agents are allowed to be.