Part II · Prototype & Collaborate · Chapter 6

Chapter 6: The PM as Collaborator: Dev, UX, and Domain Experts

An agentic product fractures the clean “PM owns what, engineering owns how” split into several runtime domains no single role covers. Your real job is working across that seam, with engineers, designers, and domain experts. The failure mode vibe coding makes tempting is trying to own all of it alone.

The last chapter ended with a handoff: you build the prototype, the engineering team builds the thing that ships. That handoff is the small version of this chapter’s subject. The large version is that an agentic product does not have one clean handoff. It has a fractured ownership map that the old PM-and-engineering division does not describe, and most teams discover the fracture the way you discover a missing stair, in the dark, with weight on the wrong foot.

The usual playbook treats the team around the PM as a governance footnote, a list of who signs what at launch. That is a mistake the agentic shift makes expensive. The people you work with are not approvers; they own runtime domains you cannot own, and the product is the seam between you.

The clean split that no longer holds

Traditional software had stable lines. The PM owned the what, the requirements and the success criteria. Engineering owned the how, the implementation. QA owned test coverage, platform owned infrastructure. Those lines held because the software did what it was told and stopped.

An agentic system acts on its own, across tools and APIs and other systems, over time. That breaks the lines, because “what it does” is no longer something you specified once; it is something the system decides, repeatedly, in conditions you did not enumerate. The product splits into several runtime ownership domains, each with a different natural owner, and none of them is simply “the PM owns the what.”

The runtime ownership map. An agentic product has at least these distinct domains, and the owner of each is mostly not you:

Goal definition – what the agent is authorised to try, its scope and success criteria. Yours, and higher-stakes than before: underscope and it fails, overscope and it harms.
The control plane – kill switches, spend governors, rate limits, sandboxing. Platform engineering. A new layer that must live outside the agent’s own runtime.
Rollback – reversing actions already taken, often across several systems at once. Engineering and DevOps. Harder than traditional rollback because the agent’s actions span systems.
Supervisory interface – how a human monitors, approves, pauses, and redirects the agent. UX and product. A new design discipline.
Correctness certification – verifying outputs meet a domain’s real standard, beyond what an eval checks. The domain expert, with you. Evals catch structural failure; experts catch domain failure that looks structurally fine.

Read down the owner column. The product is what happens when these people coordinate, not when you specify and they build.

Working with engineers: rollback, the platform, and the switch you cannot own

The single insight that should change how you work with engineering is about the kill switch, and it is sharper than it first sounds. A production agent needs a hard stop: something that revokes its tool access, halts its queues, and freezes it in seconds. The non-obvious part is the ownership. The kill switch cannot be owned by the same system that runs the agent, and it cannot be triggerable by the agent itself, because an agent that has gone wrong is exactly the thing you cannot trust to stop itself. As the people writing the risk standards put it, killing the parent agent does not recall the child agents it already spawned. The switch has to live in a control plane that is architecturally separate from the agent, owned by a platform or security team, with a real human on call who can reach it.

That is an engineering decision with a product consequence, which makes it yours to understand even though it is theirs to build. The same is true of rollback. When an agent has taken twenty actions across four systems before anyone noticed it was wrong, “undo” is not a button; it is a distributed-systems problem that engineering owns and that you must scope the requirements for. The thing you bring to these conversations is not the implementation. It is the answer to “what must we be able to stop, and how fast, and what must we be able to reverse,” which is a product question that determines what engineering has to build.

A concrete number makes this real for the conversation: ask your engineering lead what the rollback time is for the agent’s most consequential action. If they do not have a number, you have found work, and it is the kind of work that does not get done unless the PM names it.

Working with designers: the approval moment is the product

Designers own a domain that did not exist in the old division, and it is the one most teams forget until a user is confused at the wrong moment. When an agent is about to act, there is an instant where a human either approves, redirects, or lets it proceed. That instant, the approval moment, is the central design surface of an agentic product, not an afterthought bolted to the end. It is where trust is built or destroyed, where the human’s authority over the agent is either real or theater.

This is a new design discipline, sometimes called agentic UX, and it does not map to any prior design category. Designing a form is not designing the moment a human decides whether to let an autonomous system spend money, send a message, or change a record. The questions are different: how much does the human need to see to approve well, how do you keep them from rubber-stamping, what does “pause and let me look” feel like, how does an override get recorded for the audit that may come a year later. The designer owns the answers, and your job is to insist the approval moment is treated as a first-class part of the product rather than a confirmation dialog someone adds in the last sprint.

Working with domain experts: the correctness no eval can certify

The third collaborator is the one teams most often skip, and skipping them is the most expensive omission in the book. Evals, the subject of a later chapter, catch structural failures: the output is malformed, the agent called the wrong tool, the format is broken. They cannot catch a failure that is structurally perfect and substantively wrong, because that requires knowing the domain well enough to see that a plausible answer is the wrong answer. Only a domain expert can certify that.

Built against the form, not the voice. When researchers tested a general AI assistant on real emergency-department triage, it under-triaged about fifty-two percent of genuine emergencies, routing people who needed an emergency department to wait-and-see care. The system was not broken in any way an eval would catch; it produced fluent, well-formatted, confident dispositions.

It was built against the intake form, the structured fields, rather than against what an experienced triage nurse actually listens for: the cadence, the thing the caller mentions as an aside, the wrongness underneath the words. A domain expert would have caught the gap before launch. The eval never could, because the outputs looked correct. Correctness in a domain is a judgment only someone steeped in it can render, and no test suite substitutes for that.

The practical move is to put the domain expert in the loop before launch, certifying the cases that look fine and are not, and to treat their certification as a gate the eval cannot replace. In regulated and clinical work this is not optional; everywhere else it is the difference between an agent that is plausibly right and one that is actually right.

The four co-owners, and the accountability that stays yours

Underneath the runtime domains sits a smaller, sharper structure for the launch decision itself: the four co-owners of whether an agent is ready. Engineering owns the rollback time. The CFO owns the per-task cost, including the runaway case. Legal owns the audit surface, what can be reconstructed when a decision is questioned. And product, you, owns the approval moment, where the human’s authority is exercised. The discipline is that each of these has a named owner who can answer for it before launch, not a diffuse “we looked at it.” A pre-launch review where any of the four questions has no name attached is a review that has found its own gap.

What does not get delegated is the synthesis. The engineers own rollback, the designers own the approval moment, the experts own correctness, but holding them together, deciding whether the whole adds up to a product that should exist and behave the way it does, is the PM’s, and it is the part of the job the previous chapters have been describing. You do not own the domains. You own whether they cohere.

The failure that vibe coding makes tempting

There is a specific failure this part of the book has been building toward, and it is worth naming because the last chapter made it more tempting than ever. When you can prototype the product yourself, mock the UI yourself, and describe the backend to a model yourself, it becomes possible to believe you can own all of it alone, that the collaborators are friction to route around. The vibe-coding chapter showed why that belief ships a house of cards. This chapter shows the deeper cost: the PM who builds everything alone is not just shipping insecure code, they are personally occupying five runtime domains they are not equipped to own, which means each of those domains is unowned by anyone who can actually do it. The kill switch does not get architected, the approval moment does not get designed, the domain correctness does not get certified, because the one person doing all of it could only ever do one of them well.

The agentic shift did not make the team optional. It made the seams between the team the place the product lives. Pick one agent you are responsible for and ask who owns each of the five domains by name. Every domain that resolves to “me” or to no one is a domain that is about to fail, and it will fail in production, where the failures are expensive and the postmortem lands on your desk.

Vibe Coding for Value: Prototyping to Decide You Built the Agent. Now Design the Behavior