The Team Member Nobody Hired
Early in my career, not long after my internship, I worked as the physician on shift at a cardiac center. Subscribers transmitted their ECGs over the phone, and a team of cardiac nurses, most of them with twenty years on the job, read the traces, made the calls, and dispatched the ambulances. I was the MD on the shift. On paper, the supervisor. In practice I was learning from them while pretending to supervise, and what I learned fastest was not cardiology. It was who to trust without checking and who to check twice. None of that was on anyone’s credentials. It was how they behaved under pressure: whether they admitted uncertainty or presented every reading as certain, whether their voice in a chaotic moment got tighter or steadier, whether they told you when something had gone wrong or covered it. Twenty years of experience meant something on paper. On the actual shift, what mattered was trust, calibration, composure, and honesty about what you did not know, the qualities an interview tries to approximate and never fully can.
I have been thinking about those nurses while writing this book, because I have spent it supervising an actor whose credentials tell me nothing and whose behavior tells me everything. The book was drafted with an AI agent. I read what it produced, caught where it drifted, learned its tendencies, the places it reaches for a flourish, the claims it will confidently invent if I am not watching, the kind of error it makes when it is uncertain, and I decided, sentence by sentence, what to keep. I was doing with the agent exactly what I did with the nurses: judging an actor by how it behaves rather than by what its documentation claims, because the documentation was never the thing that kept the patient safe or the chapter honest. And that is the subject of this chapter, because the thing I was supervising is not a tool I used. It is a team member I added, the most consequential one most teams have added without noticing they were hiring.
The apparatus nobody pointed at it
When you add a human to a team, an entire apparatus activates without anyone thinking about it. The person was interviewed against the role. Someone checked references, which is to say someone asked whether their behavior matched their resume. They were onboarded, told what they own, what they may not touch, who to escalate to. They have a manager who notices when they drift. They can be coached, reassigned, and if it comes to it, let go, with a record of why. None of this feels like infrastructure, because it is so old it has become invisible. It is the apparatus a century of organizations built to answer one question: how do you put an actor with its own judgment into a system and stay accountable for what it does.
You just added an actor with its own judgment to your system, and it went through none of that. There was no interview, because the procurement category was software license, not hire. There were no references, because nobody asks a model to demonstrate that its behavior under pressure matches its documentation. There was no onboarding beyond a system prompt someone wrote in an afternoon. There is no manager, because managing it is in no one’s job description. And there is no termination clause, because you do not fire a subscription. The agent is a team member nobody hired, and the apparatus that would have caught a human’s problems was never pointed at it. The absence stays invisible the longest of anything in this book, because it does not announce itself. The agent works. It is only later, when it acts in a way no one anticipated and no one owns, that you discover the category was missing the whole time.
Someone will call this anthropomorphism, treating software as if it had judgment or agency, and the objection is fair enough to answer directly. I am not claiming the agent thinks the way the nurses thought. I am claiming the org chart does not know the difference, because the supervisory apparatus exists for an actor that takes action, not for an actor that is conscious. Removing the consciousness does not remove the action; it just removes the excuse for skipping the apparatus. If an agent acts in your workflow, generates output, makes decisions inside an authority boundary, and hands results to humans who treat them as input to their own work, it is structurally a team member, whatever is or is not happening behind its API. And if you would rather not call it a team member, call it what you actually have: a persistent contractor with broad access, no statement of work, no acceptance criteria, no performance bond, and no termination clause, an actor in the workflow without the machinery that exists precisely to make a new actor safe to add.
The missing category
The reason competent organizations keep getting caught by this is structural: an AI agent fits no existing slot in the org. It is not an employee, no contract, no review cycle, no manager. It is not a vendor, there is no sales rep who answers for its behavior and its behavior changes without a release you approved. It is not a tool in the old sense, a tool does what you tell it and this one decides. It is not infrastructure, infrastructure does not develop a personality or give different answers to the same question on different days. It sits in the gap between all of them, and because it fits none of them, none of the governance built for them applies. Procurement waves it through as a software line. IT treats it as an integration. The people function never hears about it, because that function is for people. The thing acting on your behalf, at scale, all day, is governed by no one whose job is to govern actors.
That gap is not a temporary immaturity the market will fix on its own. It is a category that has to be built, and organizations are starting to build it in plain sight. Titles are appearing that did not exist eighteen months ago: an agent supervisor who owns what the fleet is doing right now and where it is escalating, an agent quality lead who owns whether it is still correct, an operations manager for AI who owns the cost per completed task. The forecasts say enterprise applications will embed agents at a scale, within a year or two, where informal oversight stops being possible, and at that scale the titles are the org chart admitting what this chapter is about. The agent is staff, and staff need the apparatus. The only choice is whether you build it on purpose or discover you needed it after the incident.
The useful way to hold the agent in mind, then, is as a kind of hire you have managed before: the contractor with broad access and a thin contract. A contractor is not steeped in your culture, did not come up through your norms, and defaults to their own habits the moment your instructions run out. You manage that with a statement of work that says exactly what they may do, an access scope no broader than the work requires, an escalation path for when they hit something out of bounds, and a way to end it when it is not working. The agent needs every one of those and ships with none, and it has, if anything, more access than you would hand a new contractor, because wiring it into your tools is the entire point and the wiring happens before anyone has asked what it should be forbidden to touch. This is why the practitioner language has started borrowing from regulated hiring. Know-your-agent, the phrase now circulating as the agent analogue to the bank’s know-your-customer, is a structured enrollment that captures, before the agent reaches production, its identity, its scope, its permissions, what memory and context it can reach, its escalation policy, and the conditions under which it gets decommissioned. That list is a contractor file. The field is reinventing the statement of work because the absence of one proved expensive.
The institution that already solved this
The contractor analogy is useful and it is also too weak, because a contractor is a low-trust actor doing bounded work, and an agent is a judgment-bearing actor you are letting decide. There is an institution that has spent a century learning how to add exactly that kind of actor safely, and I spent years inside it before I ever wrote a product spec. A hospital does not hire a physician by checking that they are licensed and turning them loose. A license is the floor. What the hospital actually does is credentialing and privileging: it grants specific privileges, procedure by procedure, at defined levels of supervision, and it grants them provisionally at first. The new cardiologist is not simply permitted to place a stent. They are permitted to place one proctored, with a senior watching, and only after a reviewed period of proctored cases does the privilege become independent, and even then it is not permanent. It is reviewed continuously against data, the outcomes and the complication rates, and it is revocable the moment the data says it should be. Trust is granted per task, earned through evidence, and withdrawn on evidence. That is not a metaphor for the autonomy ladder. It is the autonomy ladder, with a full institutional apparatus around it that the agentic field is missing.
Read the agent through that lens and the seats this book has named stop looking speculative, because each is a part of a credentialing pipeline that medicine runs every day. The autonomy ladder is privileging: an agent earns a higher rung the way a physician earns an unsupervised privilege, by demonstrated performance at the rung below, not by a calendar or a vendor’s claim. The proctored provisional period is the agent’s first weeks acting with a human reviewing every consequential decision, the rung the travel agent’s team skipped. Know-your-agent is the credentialing file, the enrolled record of what this actor is permitted to do and under what supervision. And the standing review of outcomes that decides whether a privilege continues is exactly the supervisor’s and eval owner’s job, the watching this book keeps insisting someone must own, except medicine made it a committee with a name and a cadence and the authority to pull a privilege, and the agentic team made it nobody’s. One correction the borrow needs before anyone uses it: the hospital reviews its physicians on a slow clock, reappointment in years, ongoing review in months, and the agent’s substrate moves on the vendor’s release calendar. The privilege review runs on the agent’s clock, not the hospital’s: a model update is a deployment, and a deployment reopens the privilege. The thing the org chart is missing is not novel. It is the trust pipeline that the one profession that routinely onboards autonomous, judgment-bearing, occasionally-wrong actors built decades ago, and the agentic team is being asked to build it from scratch under deadline because it did not know the blueprint already existed.
One caution keeps this borrow honest rather than smug, and it comes from the same field. Healthcare’s record with AI specifically is poor, and this book’s own sepsis case is the proof: a model marketed with a strong score, validated far worse in the world, firing for months after clinicians had already acted. Medicine, so good at credentialing human actors, governed the AI model as software, a procurement decision, an integration, a validation report filed once, instead of as an actor that needed a credential, a privilege, a proctored period, and continuous review. It onboarded the algorithm the way it would onboard a database and was surprised to find the database had judgment, and was sometimes wrong. So the lesson is not “be like healthcare with AI.” It is the sharper one: healthcare failed with AI precisely where it succeeded with humans, because it treated the model as a tool when it was an actor. The machinery to govern the actor exists, in medicine and in aviation and in banking, fields that learned to add high-autonomy actors the hard way. The agentic team’s task is to claim that machinery for the agent, which no one yet does, and not to repeat the specific mistake of the very fields that built it.
The agent’s career, and the rung it cannot climb
Credentialing answers one axis of what a human team member accrues, the task axis: which procedures this actor is privileged to perform, earned one at a time. But a person on a team accrues a second thing alongside the privileges, and it is worth following the analogy one step further, because it is where it breaks in a way that is the whole book. A human does not only earn privileges; they earn rank. They start as an intern doing only what is scoped and checked, and over years they become the senior the team turns to for the problem nobody scoped, and at the top of the ladder sits someone trusted with the unbounded call, the judgment about a situation no one wrote a procedure for. The autonomy ladder is the agent’s version of this climb, and for a while the mapping holds: the agent earns its way from suggesting to acting the way a junior earns their way from supervised to trusted, and the orchestrator that directs other agents in the next part of this book is, quite literally, a senior agent given authority over juniors. But the human ladder does not stop at the rungs the agent can reach, and the two places it keeps going are the two the agent never will.
The first is unscoped judgment. A senior human is trusted precisely on the problems no one anticipated, the situation off the edge of every procedure, and that trust is what “senior” means. An agent can be made extraordinarily reliable inside its scope and remains, at the edge of it, an intern who is very fast, because the thing that makes a human senior is judgment about the case the training set never held, and that is the one thing the agent does not get better at by climbing. The second, and it is the sharper one, is liability. A human’s accountability rises with their rank: the resident answers for the chart, the attending answers for the patient, the executive signs the filing and goes to the meeting when it is wrong. The agent’s liability does not rise with its autonomy at all. It stays exactly zero, no matter how high the agent climbs, because an agent cannot be accountable; the liability stays pinned, the whole way up, to the human who deployed it. So the agent’s climb has a strange shape no human career has: rising authority and flat responsibility, more power at every rung and never one ounce more answerability, and every ounce the agent does not carry is carried by a person whose own exposure goes up as the agent’s autonomy does. That is the affected-person problem and the someone-must-answer problem from earlier in this book, told as an HR chart: the agent can be promoted toward the top of the ladder, but the accountability that is supposed to ride up with rank gets left behind on a human at the bottom.
This is why the question people ask half as a joke, can an agent run the company, has a real answer, and the answer is no, for a reason that is not about capability and will not be fixed by a better model. The top of a human ladder is the seat defined by holding accountability for what no one foresaw, the unbounded judgment and the unbounded answerability fused into one chair, and an agent can hold the first half and never the second. It can be given the authority of the highest rung. It can never be given the accountability, because accountability is a thing only a person can carry, and a rung that is all authority and no accountability is not the top of the ladder. It is the most dangerous seat on it. The ladder has a ceiling, and the ceiling is not how capable the agent becomes. It is the point where the next rung would require it to answer for itself, and that rung is reserved, permanently, for a human, which is the same sentence this book has been writing from its first page in a different vocabulary: somewhere up the stack, a person is still accountable, and no amount of the agent’s climbing moves that person’s name off the line.
Onboarding has a file name now
Here the abstraction becomes concrete, and it becomes concrete on the engineering team first, because that is where the agent was hired first and where someone already had to solve the onboarding problem out of necessity. When a human engineer joins a team, they are onboarded into the codebase: the conventions, the patterns the team uses and the ones it has banned, the architectural decisions that are not up for relitigating, the hard-won “we never do it this way, because the last time we did it cost us a weekend.” A human absorbs that over months, through code review, through the senior who says not like that and explains why. The agent that now writes code joins with none of it, and writes confidently in whatever style its training suggested, against conventions it has never been told, which is to say it joins exactly like a contractor with no onboarding and full commit access.
The engineering teams that work well with agents solved this the way you onboard anyone: they wrote it down. The guidance file that the coding agents read, the one named CLAUDE.md or AGENTS.md or its equivalent depending on the tool, is the agent’s onboarding document, and treating it as anything less is the mistake. It holds the conventions, the constraints, the architectural decisions the agent must respect, the things this team does not do and why, the standing instruction a senior would have given across a hundred code reviews. It is the new hire’s first-week document, written once, for an employee who will never absorb it any other way. The sharp version of the point is the one worth carrying: if you have not written your team’s guidance file, your codebase is being written by an employee you never onboarded, working from whatever habits it happened to arrive with. The file is not configuration. It is the agent’s induction into how this particular team works, and it is the first artifact of the agent-as-team-member made real, produced by the engineers because the engineers were the ones who could not avoid the problem.
The role already living the answer
Which brings the chapter to its sharpest point, and it is a point about the engineer that reframes the whole book. This book has spent its length describing a role that does not fully exist yet on most teams, the supervisor of the agent, the human whose job is to watch an actor with its own judgment, calibrate trust in it, catch it when it drifts, and decide what to accept. We have argued that this role is necessary, that the failures are designed to defeat a single sightline, that the supervisory apparatus has to be built. And all the while there has been one seat in the box that has been doing exactly this job, full time, out of necessity, for longer than the rest of the org has known the job existed. The engineer working with a coding agent is the supervisor, already. They read the agent’s output in the most unforgiving medium there is, code that either runs or does not, passes the test or fails it, holds under load or collapses. They onboard the agent through the guidance file. They catch its drift, learn its tendencies, calibrate how much to trust which kind of output, and decide, commit by commit, what to keep and what to throw away. Every discipline this book prescribes for the supervisor of an agent, the engineer already practices, because the agent landed on them first and they had no choice but to invent the practice.
And there is a thing the engineer has that no other role on the team has, which makes them not just a supervisor but the deepest reader of agent behavior in the building. No one else sees the agent expressed as precisely as the engineer does. A product manager sees the agent’s output as a result, good or bad. A designer sees it as an experience. The engineer sees it as code, the most literal and least forgiving expression of a model’s behavior that exists, where every assumption the model made is written out explicitly and either works or breaks. Through that medium the engineer develops an intimate understanding of how different models actually behave, their tendencies, their characteristic mistakes, what one model reaches for that another avoids, the way one is cautious where another is reckless, a feel for the personality of the thing as it shows up in the work. No other role has that read, because no other role sees the model’s behavior rendered so exactly. The engineer is not only the role that already does the supervisor’s job. It is the role with the most granular, most tested knowledge of the actor being supervised, which makes the engineering team the existence proof for this entire book: the supervisor is not a theoretical new seat the org has to imagine. It is a person already at the table, already doing the work, and the rest of the organization’s task is less to invent the role than to learn from the one place it was forced into being.
It is worth being exact about what the existence proof does and does not settle, because it is easy to over-read. The engineer supervising a coding agent is the local prototype of the supervisory posture, not the organization’s answer to it. What the engineer proves is that the posture is real and learnable: that a human can hold the boundary on an agent, read its drift, and catch it, as a standing part of the work rather than a theory. What the engineer does not do is supervise the business agent the company ships to its customers, the claims agent, the triage agent, the refund agent, whose failures land on an affected person and whose correctness only a domain expert can judge. That supervision is a different object with different owners: the eval owner for trust, the AgentOps supervisor for the running behavior, the domain expert for correctness, the architect for the enforced boundary. The engineer’s example is the seed those seats grow from, not a substitute for them. A team that watches its engineers supervise their coding agents and concludes the supervision problem is handled has drawn precisely the wrong lesson; it has confused the one place the posture was forced into existence with the many places it still has to be deliberately staffed. The right lesson is the opposite: if the hardest-to-fool role on the team had to invent this out of necessity, the customer-facing parts of the product, the ones with no engineer reading their every move as code, need it more, and have no one improvising it yet.
This is also the bridge to the harder version of the problem, the one the next part takes up, because the engineer supervising one coding agent is the small case. The teams furthest ahead are running agents that build agents, fleets of them, and the engineer who supervises a single agent’s code is about to become the engineer who supervises a system of agents supervising each other, which is the agent-as-team-member problem raised one level, where the team member you never hired is itself hiring team members you have never met. The existence proof scales into a governance problem, and that is where the book goes next. For now the point stands: the agent is staff, the engineering team already treats it as staff because it had to, and the practice the rest of the organization needs was invented by the people the agent was handed to first.
Hiring the team member nobody hired
The agent-as-team-member is not one person’s responsibility, which by now is the expected shape, and the division is worth naming because the apparatus has owners. Someone owns the agent’s statement of work, what it may do and what it may never do, which is a product and a risk decision, the product manager with security. Someone owns its onboarding, the guidance file and the conventions it must follow, which on the engineering team is the engineers and the architect, because they know the codebase the agent is joining. Someone owns its access scope, the credentials and the boundary, which is the architect, the same enforcement decision from earlier in the book seen from the hiring side. Someone owns its ongoing management, watching what the fleet does and whether it is still behaving, which is the agent supervisor, the role the org is now naming. Someone owns its performance review, whether it is still correct, which is the eval owner. And someone owns its decommission, the revocation of access and memory when it is retired, which is security. The agent went through none of this on the way in, and building the apparatus after the fact, role by role, is the work, with the engineering team’s existing practice as the template for the rest.
So take the most consequential agent your team runs and treat it, for an afternoon, as a new hire who started without any paperwork. Ask who wrote its statement of work, who onboarded it and into what, who scoped its access, who manages it, who reviews whether it is still good at its job, and who would revoke its access if it had to go. For a human in a consequential seat every one of those has an owner, and the discomfort of asking them about the agent is the measure of how much of the hiring you skipped. The engineers, if you have them working with coding agents, will recognize every question, because they have already had to answer most of them. The rest of the organization is where the questions still come back empty, and the empty ones are the category the agent has been operating in all along.