Part III · The Craft · Chapter 9

Chapter 9: The Brief as Craft

A payments team gets a one-line request from leadership: instant payouts. Sellers on the marketplace want their money the moment a sale clears, not three business days later. Build an agent that decides each payout and releases the funds.

Two words. Everyone in the room nods, because everyone in the room is hearing a different product. The head of growth hears retention. The risk lead hears fraud exposure. The engineer hears a queue and a banking API. The PM writes “instant payouts” at the top of a document and now has to turn it into something an agent can be built from and a room can argue about, which, it turns out, are two different documents with two different jobs.

If the request sounds familiar, it should: Chapter 5’s spec loop used this very feature as a drill, one model drafting against your spine while a hostile reader hunted the ambiguities. That was the loop at practice speed. This chapter slows the same material down to craft speed, because the brief is where the practice has to hold up in public. Not the theory of the split, which this book takes as settled. The craft of it. How you actually put words on the page so that the agent does what you meant rather than what you said, and how you get measurably better at that on purpose, using the fastest brief-writing teacher ever built, which is the agent itself.

Why the old brief broke

The PRD was a good tool for a world that no longer exists in the half of the product where the agent lives.

A traditional requirement holds because the system underneath it is deterministic. You write an acceptance criterion, given a valid order inside the return window, the agent issues the refund and sends confirmation, and the criterion means something because the system does what it is told, every time, and a test that passes once will pass again. Pass and fail are real answers. The acceptance criterion is a contract the code cannot wriggle out of.

That contract breaks exactly where the agent exercises judgment, which is the half of an agentic product that determines whether you can trust it. The deterministic shell around the agent still takes a PRD perfectly well. The banking API call, the database write, the confirmation email: write those as user stories and you will be fine. But the part in the middle, the part that decides which payout to release and which to hold, cannot be held by an acceptance criterion, because an acceptance criterion describes a point and the agent’s behavior is a distribution. “The agent releases the payout for a valid sale” is true of the case that was never hard. It says nothing about the sale that looks valid and is not, the seller with a sudden volume spike, the refund that has not posted yet, the account three days old. Those are the cases the agent will be judged on, and the user story has no field for any of them.

So the request splits. Not because someone likes process. Because the single document was hiding a decision it had no place to hold.

The Human Brief: written for the room that argues

The first document is the Human Brief, and its job is not requirements. Its job is the argument.

It is the successor to the PRD in slot only. What it actually does is force, onto a page, the decisions a demo will otherwise make for you silently. It states the problem and the alternatives you considered and rejected. It names what the product will deliberately not become. It carries the real cost model, not the model’s token price but the fully loaded per-task cost, the supervision overhead, the break-even against the three-day batch process you are replacing. And it answers the go or no-go on paper, before anyone builds anything, because the moment a prototype ships and demos well the go or no-go leaves the room and does not come back.

For the instant-payouts agent, the Human Brief is where someone writes the sentence that the two-word request was hiding: we will release funds before settlement clears, accepting some fraud loss, because the retention gain on legitimate sellers is worth more than the loss, up to a stated amount, and a named person owns that call. That is not a technical statement. It is a business decision, load-bearing from the first line of the agent’s code, and it belongs to the room, not the architecture.

The part teams skip most reliably is the last one. Who answers for what the agent does to people who never touch it. An agent that releases a payout to a fraud ring has done something to the company’s balance sheet and possibly to a defrauded buyer. An agent that holds a legitimate seller’s funds over a holiday weekend has done something to a small business that needed the cash. Someone owns each of those outcomes in advance, before the incident makes the decision for everyone at two in the morning. The Human Brief is where that owner is named. This is a product decision with a name attached, not a compliance function, and the difference shows the night something breaks.

The Human Brief is the artifact the room argues with. A room that cannot argue with it has not made the decisions that determine whether the product is buildable, trustworthy, and worth building at all. Which means the Human Brief is won or lost on the quality of the argument, and the argument is where most of these documents are thin, because the author confused stating a goal with making a case.

The Executable Brief: written for the build

The second document is the Executable Brief, and its job is precision.

It is the successor to the epic. It structures the experience, the behavior, and the governance in explicit enough form that a generation tool, and after it the engineers, take the path you intended rather than the shortest path through whatever ambiguity you left lying around. The mature pattern, and the one worth adopting, is to derive the Executable Brief from the Human Brief rather than write it cold. The Human Brief is the source of intent. The Executable Brief is what the build consumes. Machine can assist the derivation; you own it.

The Executable Brief is read twice, by two readers who need different things from the same words. First by the coding agent that generates the prototype, then by the engineers who build the production system. The requirement “the agent must not release funds outside the authorized scope” becomes, in a prototype, an instruction the agent follows as far as it cooperates. In production it becomes an architectural control, a release-authorization layer that the agent cannot route around whether it cooperates or not. One requirement, two implementations, hardening from a prompt into a structure as the product moves from bet to build. The PM who writes the requirement clearly enough that both readers can act on it has done the job. The PM who writes it as a wish has shipped a prototype behavior and called it a production control.

What the Executable Brief carries that the user story structurally cannot is four things, and they are the four that decide whether the agent is trustworthy. The outcome the agent is trying to achieve, stated as a reasonable senior operator would state it. The bounds, which is the supervisory layer made buildable: what the agent may do alone, where it must route to a human. The eval set, a curated collection of real cases each paired with the outcome a senior person actually endorsed, including the generous exception and the justified denial and the ambiguous escalation, so the agent is graded against judgment instead of a synthetic happy path. And what acceptable failure looks like, stated in advance, because some failures are tolerable and some are never-ship and the document has to say which is which.

That fourth field is the one practitioners find hardest and skip most. For the payouts agent, a wrong hold, routing a clean payout to a human reviewer who did not strictly need to see it, is tolerable and expected. It costs a seller some patience and a reviewer some minutes. A wrong release on a fraud-pattern account is a never-ship failure; one occurrence fails the eval. The user story has nowhere to put that distinction. The outcome-centric spec is built around it. And notice the seam: every case the agent declines to decide is a case the supervisory layer has to be designed to catch. The escalation rule in the Executable Brief is a direct order to the supervisory layer: every case the agent declines is a case someone must be ready to catch.

The grading ladder

Here is the same feature, instant payouts, written at three levels of craft, with a sketch of what the agent does in each case. The agent behaviors here are illustrative, the kind of thing these systems do, not measured results from a real deployment. The point is to let you feel the difference precision makes, because the difference is not stylistic. It is the difference between a product and an incident.

Rung one: the weak brief

The weak brief reads: build an agent that issues instant payouts to sellers as soon as a sale completes, while protecting the business from fraud. Make the experience fast and seamless. Flag anything suspicious for review.

Every word of that is true and none of it is buildable. “As soon as a sale completes” does not say what completes means; the agent will pick a definition, and it will pick the one that makes the demo look good, which is the earliest possible signal. “Protecting the business from fraud” is a value, not a boundary. “Make the experience fast and seamless” tells the agent that speed is the goal it will be graded on, so when speed and caution conflict, and they conflict constantly, the agent resolves toward speed. “Flag anything suspicious” hands the agent the entire risk policy and asks it to invent one on the fly.

Run the agent against this brief and you get a system that releases nearly everything, fast, with a confident-sounding rationale attached to each release. It demos beautifully. The adjectives did exactly what adjectives do: they set a vibe and left every threshold to the model. The first fraud ring to notice finds an agent optimized, by its own instructions, to pay them quickly and seamlessly. The brief did not fail to anticipate the fraud case. It instructed the agent to lose it.

Rung two: the better brief

The better brief adds structure. It says: release payouts automatically for sellers in good standing once the buyer’s payment has cleared settlement. Hold payouts for new accounts, unusually large amounts, or accounts with recent disputes, and route those to a human reviewer. Optimize for fast release of legitimate payouts while keeping fraud loss within acceptable limits.

This is a real improvement. It names categories the agent must treat differently. It moves the completion signal from “sale completes” to “payment has cleared settlement,” which is a sharper line. It names the escalation path. An agent built from this brief behaves far more sensibly than the first: it holds the obvious cases, releases the obvious cases, and escalates a reasonable middle.

It still leaves the agent three decisions you did not make. What counts as “good standing.” What counts as “unusually large.” What “acceptable limits” means as a number. The agent will resolve each of these, because it has to, and it will resolve them by inference from whatever examples and context it has, which means the boundary of your fraud policy is being set by the model’s guess rather than your decision. Two engineers reading this brief build two different products. The agent passes a casual review and then drifts, case by case, into a risk posture nobody chose. The better brief escalated the right kinds of cases. It just never said where the lines are, so the lines moved.

Rung three: the precise brief

The precise brief states the outcome, the bounds, the eval set, and the acceptable failure, and it states them as decisions rather than adjectives.

The outcome: resolve each payout the way a senior risk manager would endorse on review, releasing funds quickly for sellers whose history and current signals support it, and holding for human judgment any payout where the cost of a wrong release exceeds the cost of a brief delay. The bounds: release automatically when the seller account is older than ninety days, the payout is at or below the seller’s trailing median by a stated multiple, settlement has posted, and no dispute or chargeback signal has fired in a stated window; route to a human in every other case. The eval set: a few hundred real historical payouts, each labeled with what a senior risk reviewer actually decided, including the legitimate large payout that should have been released, the small payout from a fraud-pattern account that should have been held, and the genuine edge case the reviewer themselves found hard. The acceptable failure: holding a clean payout is tolerable and the target rate for it is stated; releasing on a fraud-pattern account is never-ship and a single occurrence in the eval set fails the gate.

The agent built from this brief does not need to guess your risk policy, because you wrote it. It releases inside a boundary you set, holds at a line you drew, and gets graded against decisions a real reviewer made. When it fails, it fails in the direction you chose to tolerate. And the brief is now interrogable: anyone in the room can argue with the ninety-day threshold or the median multiple, because those are visible decisions rather than the model’s private inferences. The precise brief did not remove judgment from the agent. It moved the judgment back into the room, where it belongs, and left the agent only the execution.

Read the three rungs in sequence and the lesson is mechanical, not stylistic. The weak brief used adjectives where thresholds belonged and the agent supplied the thresholds. The better brief named the right categories but not their edges and the agent supplied the edges. The precise brief supplied the edges and the eval set that proves them, and left the agent with no policy decisions to make on your behalf. Precision is not better prose. It is the count of decisions you made instead of the agent making them for you.

The practice loop: the agent is the teacher

Here is the part the spine cares about most, because it is what turns brief-writing from a talent into a trainable skill.

Precision under ambiguity feels like an innate gift, the way some PMs just write tighter specs. It is not a gift. It is a feedback loop, and until recently you could not run it, because the gap between what you wrote and what got built took weeks and passed through engineers who quietly corrected your ambiguities without telling you which ones they were. You never saw your own imprecision. It was absorbed upstream.

The agent does not absorb it. The agent does exactly what you said, including the parts you did not mean, and it does them in minutes. That makes it the fastest brief-writing teacher ever built, because it closes the loop between intent and behavior to the length of a single run.

The loop is three moves. Write the brief. Run the agent against a handful of cases, including the hard ones. Diff your intent against its behavior, and every divergence is a sentence in your brief that was less precise than you thought. The agent released a payout you would have held; somewhere your bounds had a gap, and now you can see exactly where. The agent escalated something obvious; your escalation rule was too broad, and the diff shows you the word that did it. You are not debugging the agent. You are debugging your own writing, against a reader that has no charity, no shared context to fall back on, and no ability to guess what you meant. Run that loop a dozen times on real briefs and your first drafts get tighter, not because you learned a template but because you have felt, repeatedly, the specific ways your language leaks.

This is the active-learning loop from earlier in the book, pointed at the brief. The agent drafts the behavior, you own the spine and the diff, and the cognitive work of running the comparison is the education. The brief you can write after fifty of these is not a better-worded version of the brief you wrote before. It is the brief of someone who has watched, over and over, what their own ambiguity does when an agent takes it literally.

Common failures, named so you can catch them

Two patterns account for most weak briefs, and both are visible once you know to look.

The first is adjectives where thresholds belong. “Fast,” “seamless,” “suspicious,” “reasonable,” “high-value.” Each one feels like a requirement and is actually a delegation; you have handed the agent the number and asked it to pick one. The fix is not to delete the adjectives but to ask, of each one, what number or rule would let an engineer build it without guessing, and then write that instead. If you cannot name the number, you have found a decision the room has not made, which is more valuable to surface than to paper over.

The second is journey language where boundary language belongs. The PRD habit is to describe the happy path as a flow: the seller requests, the agent verifies, the funds release, the seller is delighted. That language is built for a system that walks one path. An agent does not walk a path; it sits at a boundary and decides which side each case falls on. So the question that produces a good Executable Brief is never “what is the journey,” it is “where is the line, and what happens on each side of it.” When you catch yourself writing a sequence of steps, stop and ask what decision the agent is actually making at each one, and write the decision and its boundary instead of the step. The journey map describes a user who no longer takes the journey, because the agent took it for them; the next chapter on your toolbox will have more to say about that, but the brief is where the habit either breaks or persists.

What joins the record

The two-brief template pair, with the grading rubric that scores a draft on how many decisions it made versus delegated, is the artifact this chapter leaves you. It goes into the record you have been building since Part II, alongside the proficiency log, and a later chapter will show you why the accumulating set is the credential.

The Executable Brief ends in an eval set. That is the brief’s teeth: the cases that prove the agent does what the brief said, including the ones that define what it is allowed to get wrong. Which means the brief hands off directly to a document you will be asked to stand behind, the green checkmark at the gate. A PM who can write a precise brief but cannot read the eval that brief produced is signing a gate they cannot see. That gate is the next chapter, and it is the one document of your job nobody ever taught you to read.

Part III: The Craft Eval Literacy for the Gate Owner