Part II · The Practice · Chapter 5

Chapter 5: The Loop That Teaches You Back

On a Sunday this past winter I ran a full design thinking workshop alone. No flights, no hotel meeting room configured for creativity, no markers in four colors, no sticky notes arranged in optimistic bricks. Nine AI agents, each built to disagree with the others, plus a tenth whose only job was to enforce the process and protect the phase boundaries. Setting up the personas and writing the business challenge took about two hours. Running the session took one more. Total cost in API calls, under ten dollars.

The output was good. The agents took a heart failure readmissions problem I thought I understood, a hospital data platform closing the medication-reconciliation gap, and reframed it out from under me. It was the community physician agent who said it first: when a discharged patient walks into her clinic three days later, she often does not have the discharge medication list, so she is reconciling from the patient’s memory, and what looks like non-adherence is compliance with the wrong instructions. The hospitalist agent had not said it, because that failure happens after her patient leaves. The clinical architect had not said it, because his lens is the infrastructure inside the hospital. The reframe came from the one agent positioned to see the other side of the handoff, and it changed the following week’s product conversation.

That is the output, and the output is not why I am telling you this. The output reframed one product. The session reframed how I work. What I want you to notice is not the answer the nine agents produced. It is what running them did to me.

The faucet and the loop

There is a default way of using AI, and almost everyone is using it. The model is a content faucet. You turn it on, output comes out, you glance at the output, you sign off, you ship. The faucet produces volume and the volume feels like productivity, and the loop, if you can even call it a loop, has no validator role in it except by accident. You are the user of a tool.

Chapter 1 named what that pattern does to the operator and Chapter 3 named why it is hard to escape, so one sentence suffices here: the faucet absorbs the volume that used to build your judgment and asks for verdicts you are no longer equipped to give. It is not a tool that makes you more productive while you stay the same. It is a tool that quietly changes who you are while you watch the output, and the change is in the wrong direction.

There is another way to use the exact same models, and it produces the opposite effect on the operator. Same tools, opposite outcome. Instead of being the customer at the faucet, you are the supervisor at the top of a loop, and the loop is a system you built on purpose, with roles deliberately split so that no single model ever both produces and approves its own work. The faucet hands you an answer. The loop hands you an answer, an argument you had to defend to get it, and a sharper version of yourself for having run it. The output is a byproduct. The learning is the product, and unlike the faucet’s output the learning compounds.

This is the inversion the whole chapter turns on. Most people will tell you AI usage is a spectrum from light to heavy, and that heavier is more advanced. It is not a spectrum. It is two different machines that happen to share a model, one that atrophies the operator and one that educates them. Which machine you are running has nothing to do with how much AI you use and everything to do with whether you are the customer or the supervisor, and you can be the supervisor on a fifteen-minute task or the customer on a three-week one.

The anatomy of a loop

Here is the loop I actually run, the one I built for writing before I understood it was a template for everything. Six roles, and I am one of them.

The argument and the spine are always mine. Before any model touches anything, I write a one- to three-page spine document by hand: the thesis, the argument, the lived experience that grounds it, three or four claims that need evidence, the gaps I need filled. This is the part that cannot be delegated, because it is the part that determines what all the volume is for. A loop with no human-owned spine is a faucet with extra steps.

Then research runs in parallel, for triangulation, not for coverage. I open three deep-research sessions at once, in different models, feed the same questions to each, and read the three reports side by side. The agreement is usually the canonical evidence. The disagreement is usually where the interesting argument lives. The omissions are what I learn to chase next. This is the divergence run from the last chapter, promoted from a calibration exercise into a working step, and it does for an unfamiliar claim what an academic does checking it across multiple databases, except the synthesis happens in twenty minutes instead of two weeks. More sources is not the point. Disagreement is the point, because disagreement is where you are forced to adjudicate, and adjudication is where you learn.

Then one model drafts. I hand it the spine plus the synthesized research and ask for a draft that hits the argument I want to make. The draft is always wrong in interesting ways, and we iterate, sometimes ten or fifteen passes, until it makes the argument cleanly. The drafting model is doing volume work. It is not deciding anything.

Then a different model judges. Before anything is final I run every factual claim, every citation, every dated study through a second model in validator mode. Fact-checking by a different model from the one that drafted, because a model checking its own work is the faucet again, the producer approving the producer. The judge catches hallucinated citations, misremembered statistics, inverted findings. It is not perfect. It catches far more than my own reading ever would.

And I own the spine and the ship decision, start to finish. The thesis is mine, the personal experience is mine, the final calls about what is true and what ships are mine. I am the supervisor at the top of the loop, doing the work that none of the models can do, which is deciding what the volume is for and whether it is good enough to leave the building.

Role separation is the active ingredient

Now the part that matters most, because it is the part people get wrong when they try to copy this.

The stack is not the point. The specific models do not matter, and they will be different by the time you read this. What does the work is the role separation, and you can have all six roles inside one chat window or spread across five vendors and the result is the same if the roles are real and different if they collapse. A loop is not defined by how many tools are in it. It is defined by whether the producing role and the judging role are held by different parties, so that no output gets approved by the thing that made it.

This is the same mechanism that made the nine-agent workshop work, seen from a different angle. Those agents produced a better reframe than the human workshop they replaced not because the models were smart but because the agents had distinct, irreconcilable, persistent perspectives and an instruction every one of them carried: you never agree just to agree, if you see a flaw you name it. In the hotel meeting room that instruction is also given, and it is called the brainstorming rules, and it fails, because the VP of product is in the room and the person who should name the flaw has learned through long professional experience to read the room before speaking. The agents have no room to read. The governance concern that gets softened in the conference room because the lawyers are not in it yet arrives on schedule from the agent, because the agent has a flight to catch from nowhere. Role separation produces friction, and friction is the thing a single perspective, human or model, cannot generate on its own. The faucet has no friction. That is precisely what is wrong with it.

So when you build your own loop, the design question is never how many models. It is which roles must be held apart so the work cannot quietly approve itself. Producer and judge, always. Often researcher and synthesizer. Always, at the top, you.

From writing to the work

The loop I described produces articles, and you do not, mostly, produce articles. So translate it, because the pattern is general and the surface is incidental. Here are three PM loops with the role split made explicit, each one a structure you run rather than a tool you turn on.

The discovery loop is the nine-agent workshop, generalized. You write the spine: the problem as you currently understand it, the framing you are bringing, the assumption you most want stress-tested. Then you stand up a council of deliberately conflicting perspectives, the clinical safety anchor against the product lead, the business viability skeptic against the innovation advocate, the one positioned to see the user the others cannot, and you make them disagree before you let them converge. You own the synthesis and the decision about what the reframe means. What the loop teaches you is where your own framing had a blind spot, and it teaches it the way the community physician agent taught me, by surfacing the observation that is obvious in retrospect and invisible beforehand, from the corner of the room you did not staff.

The spec loop turns the model into the fastest precision teacher ever built. It is the same choreography I run on my own writing, where the producing model and the judging model are deliberately different parties, pointed at a spec instead of an argument. Take a real one-line feature request, the kind that arrives as instant payouts and means nothing yet. You write the spine: the decision, the boundary, what counts as wrong. One model drafts the spec from your spine. A different model, configured as a hostile reader, judges it, told explicitly to find the ambiguity an engineer could exploit, the requirement that hides an adjective where a threshold belongs, the journey language standing in for a boundary. You read the gap between what you meant and what the judge could misread, and you tighten. The role split here is producer against adversary, and the adversary is doing for free what a sprint of misbuilt work used to do for a price. (The full craft of the two-document brief is its own chapter later in the book; here the point is only that the loop, not the tool, is what teaches the precision.)

The analysis loop inverts the order. You have a dashboard, a cohort, a set of customer calls. The faucet version is to ask the model what the data says and ship its synthesis. The loop version is to write your own read first, three sentences, before the model touches it, then have one model produce its independent read, then a second model adjudicate the two against the raw numbers, with you owning the final interpretation. The front half of that, committing your own read before you see the machine’s, is a discipline this book will make much of in Chapter 8, where it gets its proper name. The role split is your judgment against the model’s, refereed by a third party that touches only the evidence. What it teaches is exactly where your reading of data has gone stale and where it still outruns the machine, which is the most valuable thing any loop can tell you, because it is information about you.

Three loops, one pattern. In each, the human owns the spine and the ship decision, the producing role and the judging role are held apart, and the cognitive work of running the loop is the education. Write down the role split for the loop you run most, one page, and you have one of this part’s keepers. By the end of the book you will see why one page mattered.

The catalog, or why the reps must connect

There is one more piece, and without it the loops produce learning that evaporates.

I keep a catalog of everything I write. It started as titles and dates. Within a month it held the core argument of each piece, the key evidence, the frameworks introduced or referenced, the cross-links. Today it holds more than a hundred entries, each with a paragraph summary, each named framework with its own record of origin and core insight and design consequence, every cross-reference traceable. The catalog is not the body of work. The articles are the body of work. The catalog is the index, the synthesis layer that lets me write the next thing without losing track of the previous hundred.

It is bidirectional memory, and that is the property that matters. I hand it to a model at the start of a piece so it can find the framework I want to extend, the example I already used, the article I do not want to repeat, and it returns answers I had forgotten I had given. I search it before a meeting, before a talk, before a hard email, for the reference or the dated study that fits the moment. It remembers on my behalf the things I could not hold in working memory while doing the job, and it surfaces the gaps, because a topic with no entry is a topic the work has not yet covered, which is sometimes the most useful thing the catalog produces.

Your loops will produce a great deal of work. Without something that ties it together, each run starts from zero and the compounding never happens. The catalog is what makes a year of loops into a body of judgment rather than a pile of outputs. Keep one from the first piece, before you think you need it, because the value is in the accumulation and the accumulation cannot be retrofitted.

There is a reason this is the practice and not just the productivity hack. I had undiagnosed ADHD for most of my life, and the only way I could ever learn at any real depth was the way I worked in libraries: skip the lecture, run my own research, extract and reorganize and cross-reference, build a personal binder from primary sources that bore no resemblance to the syllabus. The lectures were the source. The notebook was the curriculum. I knew the method for decades. What I did not have was scale, because the depth I could reach in a week was capped by how many books I could carry from the stacks. The loop removed the cap. The catalog is the binder, grown past anything I could carry.

I had to build the course in order to take it

So here is what the loop actually is, named plainly, because the productivity framing undersells it by an order of magnitude.

Done as a faucet, the work is content generation and the operator atrophies. Done as a loop, the exact same work is an active-learning protocol, and the cognitive labor of running it is the education. Every piece I write is a synthesis exercise that forces me to defend a thesis against three independent research traces, work through ten or fifteen drafts of an argument, and check every claim against a judge that did not write it. In my experience the cognitive work inside a single article is roughly what I would have gotten from a full graduate seminar on the same topic, compressed into a week and shaped to a question that mattered to me at the time.

The field I work in, agentic AI for product managers, the personnel infrastructure that does not exist for agents, has no textbook and no course. There is not one. So I had to build the course in order to take it, and the building was the taking. Everything else in this chapter is downstream of that one move. The loop is not a tool you use to produce work faster. It is a curriculum you administer to yourself, and the artifact it leaves behind, the catalog, the drafts, the adjudications, is the transcript.

The loop has multiple models in it. You have been running them as roles. Now learn to read them as individuals, because the day you choose one for a product you will be choosing a temperament, and you should be the one in the room who knows which.

Reps, Not Reading Reading the Actor Under the Costume