Part I · Foundations · Chapter 1

What You Are Actually Building

A team shipped an agent that booked travel for the company. It read an employee’s request, found the flights, compared them against policy, and booked the one that fit, with no human in the middle for the ordinary case. It worked. For about six weeks it was the most popular thing the team had ever built, and then it booked a non-refundable international fare for a trip that had been cancelled an hour earlier, because the cancellation lived in a calendar the agent could not see, and nobody on the team could say whose job it had been to notice that the agent could not see it. Not the engineer, who had built exactly what the spec described. Not the product manager, who had defined a clear and reasonable goal. Not the designer, who had made a clean confirmation screen for a confirmation that, in this case, no human ever saw. The failure did not belong to any of them, which is another way of saying it belonged to the shape of the team itself.

This book is about that shape, and the chapters ahead take it apart role by role. But you cannot take apart a team built for agentic products without first being precise about what an agentic product is and what it demands, because almost everything that is hard about the team follows from properties of the product that did not exist a few years ago. So this first part builds the floor. If you have spent time shipping agents you will know some of this, though I am going to put it together in a way aimed at the team rather than the individual, which changes what matters. If you are starting from the beginning, start here and you will have everything the rest of the book leans on. I am not going to assume you have read anything else. Three things make an agentic product the thing it is: it decides, it holds some amount of autonomy, and it needs a second product nobody remembers to build. Take them in order.

A system that decides

Ask an engineer what they built and they will point at a thing. For thirty years that thing did what it was told. You wrote rules, the software followed them, and if it did something you did not expect, that was a bug, a place where the rules you wrote did not match the rules you meant. The whole craft of building software, and the whole shape of the team that built it, rested on an assumption so deep nobody stated it: that the system would do what it was instructed, and the work was getting the instructions right.

An agent does not run your rules. It reasons toward a goal you give it and decides the steps itself. You tell it to resolve the customer’s billing complaint, and it decides to pull the account, read the last three invoices, notice the double charge, issue the refund, and write the apology, or it decides to escalate, or it decides to ask a question, and which of those it does is not in any rule you wrote. It chose. That is the line between an agent and everything that came before it. A tool does what you invoke and stops. An agent decides what to do next, takes the action, sees what happened, and continues, in a loop, toward the goal, without you in the middle for each move.

Hold the two words apart, because the team will need them precise. An agent is the thing, the system that decides and acts. Agentic is the property that makes it one, and it comes in degrees. A product is agentic to the degree that it acts on its own between the points where a human checks it. A spell-checker that suggests a correction is barely agentic; you approve every change. The travel system was very agentic; it acted, often, with no one watching the individual act. Most products live somewhere between, and where they live is a decision someone has to make on purpose.

What this does to a team is the long subject of the book, but here is the short of it. When the system did what it was told, the team’s job was to tell it the right things, and the system’s behavior was the sum of the instructions, so getting the instructions right got the behavior right, and the work lived upstream, in the deciding and the specifying and the building. When the system decides for itself, the instructions no longer determine the behavior. They bound it, they shape it, they make some choices likelier than others, but the agent will encounter situations no one specified and choose anyway, the way the travel agent chose to book a fare for a trip cancelled in a calendar it could not read. The behavior is now something that happens at runtime, in the world, in situations the team never saw, and a whole category of the team’s responsibility moved downstream with it, into watching what the thing does. There was no one whose job that was, because until recently there was no such job. And the watching never ends, because the agent does not hold still. The model underneath it gets updated by a vendor on a Saturday, the data it reads goes stale, the world it operates in shifts, and its behavior drifts with all of them. The agent you launch is not the agent you will have in six months, which means the team is not building a thing and walking away. It is taking responsibility for a thing that keeps changing after it ships, which is a kind of ownership software teams have rarely had to hold.

One more precision, and it is the one this book is named for. I have been saying “the agent,” singular, the way the field does, and it is a useful simplification for learning the ideas. It is also not how real systems are built. A production agentic product is usually several agents working as a group: an orchestrator that holds the goal and parcels out the work, and specialists that each do one part and hand back the result. When I say “the agent” in the chapters ahead, read it as “each agent, and the team of them,” because the moment a product is a team of agents, the question of who supervises which one, and who owns the seams between them, stops being a metaphor for the human team and becomes a literal second copy of the same problem. The team builds a team.

How much rope

If an agent acts on its own between the points where a human checks it, then the whole game is in how far apart those points are, and “should this be an agent” turns out to be the wrong question, because it has a yes-or-no shape and the real answer is a number. An agent is not autonomous or not. It holds some amount of authority, somewhere on a scale, and the team’s job is to decide how much, to design for that amount, and to know what would justify giving it more.

The scale has five rungs. At the bottom the agent suggests: it proposes, and a human does everything, decides, acts, commits. One rung up it drafts, producing the whole artifact, the email, the plan, the code, while a human reviews it and commits. Up again it acts with approval, taking the action itself but only after a human signs off on each one. The fourth rung is acts with oversight, where it acts on its own and a human watches in aggregate, not approving each move but able to see the pattern and step in. At the top it acts autonomously, with no human in the loop for the individual decision at all. The travel agent that booked the fare was on the fourth rung sliding toward the fifth, acting on its own with a supervisor who, as it happened, was watching nothing in particular.

The five rungs matter less than the one line that runs between two of them, and if you remember nothing else from the ladder, remember where that line falls. It falls between the third rung and the fourth, between acts with approval and acts with oversight, and it is the line where a human stops authorizing each action before it happens and starts seeing actions after they have already happened. Below the line, every consequential act passes through a person first, so a human error-check sits in the path by default and nothing irreversible occurs without someone having said yes to that specific thing. Above the line, the act occurs and the human finds out, whether a second later or a week later, which means the error-check is no longer in the path and has to be rebuilt on purpose somewhere else or it simply is not there. That is the line this whole book is organized around: a product below it is supervised by the act of approving, which the team already knows how to do, and a product above it needs the second product this chapter is about, because the approving that used to catch the errors is gone. Find your own product on the ladder, then ask the only question that matters: is it above the line or below it. If it is above, the rest of this book is about the half of it you have probably not built.

The rungs are easy. The rule that governs them is the part teams get wrong, and it is worth stating flatly: you do not earn a higher rung by scheduling it. You earn it by demonstrating, at the rung you are on, that the system behaves well enough to deserve more rope. Most agentic failures come from products placed two rungs above what they had earned, an agent given the authority to act autonomously when it had not yet proven it could act with oversight, because a roadmap said the autonomous version shipped in the third quarter and the third quarter arrived. The competence did not arrive with the calendar. The autonomy did anyway. One can read a telehealth prescribing pilot, run under a regulatory sandbox, as a product whose authority advanced on a timetable rather than on evidence that the supervision could keep up, which is the same mistake in a setting where the cost of it is measured in people. The opposite move is also real and also healthy: a company that had pushed an agent to a high rung and found the failures unacceptable pulled it back down, put humans back in the loop, and treated that not as a retreat but as discovering, in production, the rung the thing had actually earned.

Now place the ladder on a team, because the climb is not just a product decision, it is an organizational one, and the two get confused. Every rung upward removes a human from the loop, and you had better have designed for what that human was catching before you took them out. On the drafting rung, a person reads every draft, so a person catches the agent’s errors as a side effect of doing their job; the catching is free, baked into a step that exists anyway. On the oversight rung, no one reads each action, so whatever the reviewer was quietly catching, the malformed output, the edge case, the thing that smells wrong, is now uncaught unless someone designed a different mechanism to catch it and staffed a person to run that mechanism. Climbing the ladder is not a setting you change. It is a transfer of responsibility from a human who was catching errors by hand to a system that has to catch them by design, and to a different human who has to watch that system. The team that climbs without building the catching, and naming who owns it, has not made the agent more autonomous. It has made the failures invisible until they are incidents.

There is a trap waiting for the human you leave on watch, and it is the reason the team has to be built the way the rest of these pages argue. A human supervising a reliable automated system gets worse at supervising it, precisely because it is reliable. This is not a failure of character or training. It is a documented and decades-old finding from aviation and process control, the irony of automation: the better the machine, the less the human practices the judgment they are there to provide, and the less ready they are on the rare occasion the machine needs them. A supervisor who watches an agent be right nine hundred times in a row is not being trained into vigilance by those nine hundred successes. They are being trained out of it. Their attention drifts, their skepticism dulls, and the muscle they would need to catch the nine-hundred-and-first case, the one that matters, has quietly atrophied from disuse. The agent’s reliability is the thing eroding its own supervision.

Sit with what that does to the idea of a single owner. If supervising an agent erodes the supervisor, then supervision cannot be a thing one person holds steadily over time, because the person holding it is being degraded by the holding. The watcher needs watching. The supervisor’s own competence has to be measured, refreshed, and rotated by someone whose job that is, and the design that decides when the agent may act has to assume a human whose attention is decaying rather than a human who is constantly sharp. None of that is a single role’s work, and none of it is the product manager’s gift to give, because a product manager cannot mandate another person’s practice hours or measure another person’s drift. The reliable agent does not just need a supervisor. It needs a system around the supervisor, and a system is a team. This is why responsibility for an agentic product cannot rest on one seat: the failure modes are designed to defeat any single sightline, starting with the sightline of the one person watching.

Every agentic product is two products

That mechanism, the catching that has to exist once the human steps out of the loop, is not a feature you bolt onto the agent. It is a second product, and it is the one teams forget.

Picture a daycare. Someone fills a room with wonderful toys, blocks and paint and a small climbing frame, stocks the shelves with snacks, and walks out to watch television, leaving the toddlers to it. The toys are good. The snacks are good. The room is, by the measure of what is in it, an excellent daycare. It is also about to become a disaster, and not because anything in the room is bad, but because the one thing the room needed was the adult, and the adult is the thing that did not get designed. I will not stretch the metaphor past its use. The toys are the agent’s data and tools, the things it acts with. The adult is the supervision. A room full of capability with no one watching is not a capable room. It is a liability with good production values. The image has a second half worth holding onto, because it is the one the ladder a few pages back was about: the adult does not watch a two-year-old and a twelve-year-old the same way, and the child earns the longer leash by showing, at each age, that it can be trusted with a little more. Supervision is not a fixed thing you switch on. It is a relationship that loosens as the watched party proves itself, which is the autonomy ladder told as a sentence about people, and it is the reason the adult is never simply present or absent but always calibrated to how much the room has earned.

Here is the fact the daycare makes concrete, and it is the single most important idea in this book: every agentic product is two products. The first is the agent itself, the thing that decides and acts, the part everyone means when they say they are building an agent. The second is the layer that supervises it, how a human sees what the agent is doing, intervenes before it does something irreversible, investigates when something goes wrong, and stays accountable for the result. Call them Channel 1 and Channel 2. Channel 1 is the agent. Channel 2 is the supervisory layer. They are both products. They both need to be designed and built and staffed. And the entire argument of this book can be compressed into one sentence about them: every team knows how to build Channel 1, and Channel 2 is the one they forget.

This is also the line that separates two things people now say in the same words, and confuse for each other to their cost. There is using AI to build a conventional product faster, a feature, a dashboard, the next release of something familiar, where the agent is a tool in the workshop and the thing you ship does what it is told, every time. And there is building a product that is itself an agent, that decides and acts on its own once it leaves your hands. The first kind has no Channel 2, because there is nothing to supervise after it ships; a dashboard does not act. It is one product, and the change AI brings to it is a change to how fast and how leanly a team can make it. The second kind is two products, and the second of them is the supervisory layer this book is about. The same team can do both, often in the same week, with the same tools, which is exactly why the two get blurred. But they are not the same job, and the team that treats them as the same, that brings the lean make-it-faster posture to a product that has to be watched after it ships, has built half of what it shipped. The rest of this book is about the half that the first kind never needed and the second kind cannot live without.

It is worth being exact about why teams forget it, because the reason is structural and not a failure of attention. Channel 1 is what the demo shows. It is the impressive part, the part that gets the budget approved and the part the engineers find interesting to build. Channel 2 is invisible when everything works, which is most of the time, because a well-behaved agent rarely needs supervising and so the supervision looks like overhead right up until the morning it is the only thing standing between you and the incident. You can build a flawless Channel 1 and ship a liability, and many teams have, because the thing they did not build does not announce its absence until it is too late to build it calmly. There is also a deeper current under this, and it is the reason the supervisory layer is where the money is going to be: when the capability becomes a commodity, when any team can wire up a competent agent from the same models everyone else uses, the agent stops being the thing that differentiates the product. What differentiates it is whether you can trust it, and trust is built in Channel 2. The value moves to the layer that supervises.

Channel 2 is not one thing, and the team needs to see its parts, because each part is a different person’s job and the seams between them are where products fail. It has four dimensions. The first is technical: whether a human can actually see what the agent is doing and stop it in time, the logs and the kill switch and the latency of intervention. The second is organizational: whether the supervisor is a real role with a name and a headcount, or an assumption, a someone who is presumed to be watching but was never actually assigned. The third is regulatory: what oversight the law requires for this particular decision, which is not the same for a refund and a diagnosis and a loan denial. And the fourth is moral, and it is the one with no obvious owner: the person the agent affects who is never in the room. Not the user who operates the product and not the supervisor who watches it, but the patient, the applicant, the candidate, the supplier, the person who absorbs the consequence of the agent’s decision and was never in your analytics and never in your design review and is, very often, the person the product was actually for. When the supervision is thin, they are the one who pays.

Those four dimensions are what Channel 2 has to supervise. Where it actually gets built is narrower and more concrete, and it comes down to four surfaces the agent touches at the moment it acts. There is the autonomy boundary, the line between what the agent may do on its own and what it must hand back to a human. There is the approval moment, the point where a consequential action pauses for a person to authorize it, and the question of what that person is shown in order to decide. There is the audit surface, the record of what the agent did and why, durable enough to answer a question asked months later. And there is the recovery workflow, the path back when the agent has done something wrong, the undo, the rollback, the escalation. Every agentic product has all four whether or not anyone designed them, because if you do not decide where the boundary sits, the boundary sits wherever the code happened to put it, and if you do not design the recovery, the recovery is whatever the on-call engineer improvises at two in the morning. These four are not a single person’s work either, and that is the point worth carrying out of this chapter: the boundary is a product decision the architect must then physically enforce, the approval moment is a design problem and a domain-judgment problem at once, the audit surface is built by the platform and specified by whoever will need it in court, and the recovery workflow is owned by whoever runs the thing in production. One agent, four surfaces, and already no single owner.

A word on the word, before it starts carrying weight, because the title of this book hangs on it. When this book says the team, it means one thing and not the others it could mean: the people who build the agent and the people who supervise it, the product manager and the engineer and the designer and the new owners the agentic product forces into being. Not the agents. When several agents work together later in this book, that is a fleet, never a team. Not the users the agent serves, and not the person on the receiving end of its decisions. And not the supervisors alone, as if watching were a separate department. The team is the humans who own the thing, all of them, and the argument of the book is about how that group has to change. Hold that meaning steady and the rest stays clear; let it blur and the spine of the book blurs with it.

Read all of this as a staffing question, because that is what this book is about and it is where the daycare stops being a metaphor and starts being an org chart. Technical supervision needs someone who owns the observability and the controls. Organizational supervision needs someone whose actual job, on paper, is to watch the running agent, which is a role most companies have not created. Regulatory supervision needs someone who knows what the law demands of this decision. Moral supervision needs someone who holds the interest of the person who is not in the room. These are not all the product manager, and they are not all the engineer and the designer either. They are, by name, the people this book is about: the architect who decides how the supervision is structured and what gets physically enforced rather than merely requested, the eval owner who can say whether the agent is still correct and not just still running, the agent supervisor or AgentOps whose actual job is to watch the fleet in production, and the domain expert who knows what a wrong answer costs and who pays it. Some of those titles do not exist yet on most teams. The responsibilities exist whether the titles do or not, and on most teams the honest answer to “who owns this one” is, for at least two of them, no one. That is the empty half of every agentic product, the second column of a two-column problem, and the team was built with one column. You are not shipping an agent. You are shipping an agent and the layer that supervises it, and the layer is the part you have not staffed. What you decide to put in it, and whether the thing is even worth building in the first place, is the planning the next chapter is about.

Deciding What to Build, and Whether To