Part II · The Work Reshapes · Chapter 4

Is There Still a UX?

When I was building IoT products at SAP, the most talented person on the team was the UX lead, and he did not think of himself as a UX lead. He thought of himself as an artist. He said so. His initial designs were usually stunning and usually impossible to implement, and the first round of every project was him and the engineers negotiating his vision back down to something that could ship. He would push back on them not because a layout was technically hard but because the compromise was ugly, and he was usually right, and the product was better for the fight. He also insisted on joining the PM for the customer meetings and the workshops, because he wanted the customer’s voice directly and would not design from a requirements doc someone else had filtered. His craft was the screen. He arranged what the user would see, how it would feel to move through it, where the eye would land. Everything he was good at assumed there was something to look at.

Here is the thing I did not understand at the time. His real subject was never the pixels. It was the person on the other side of them. Where the eye lands, how it feels to move through a flow, when someone hesitates and when they commit, that is behavior, not decoration. The screen was the instrument he used to study and shape it, and it was a good instrument because it made the human side observable. The discipline got so good with that instrument that it came to mistake the instrument for the subject. We called it UI design and validation, and we let ourselves believe the object was the interface.

I think about him now when I look at the products my teams ship, because for a growing number of them there is nothing to look at. The agent runs. It reads, it reasons, it calls forty tools in ninety seconds, it acts. The person who is supposed to be in charge of it sees a three-line summary in the morning and a single question the agent decided was worth asking. There is no screen the artist could have made beautiful. And so the comfortable answer to the title of this chapter, “no, the screen is dying, so UX is dying with it,” is exactly wrong. Take the screen away and the thing the discipline was always secretly about, the human, does not disappear. It is exposed, with nowhere left to hide. There is still a UX. It just lost its crutch.

This chapter is about what design becomes once the crutch is gone. But “the screen went away” is not the real force at work, and starting there would miss the mechanism. The screen receded because of something underneath it, and that something, not the absence of pixels, is what reshapes the entire discipline.

The ladder is the engine

The force is the autonomy ladder. It is the single most useful idea for understanding what is happening to UX, because everything else in this chapter is a consequence of climbing it.

An agent’s independence is not a switch. It is the same ladder the foundations laid out, and a given product sits on a rung, or moves between them, for a given task. At the bottom it suggests, proposing while the human does everything. One rung up it drafts, producing the whole artifact for a human to review and commit. Up again it acts with approval, taking the step itself but only after a yes on each one. The fourth rung is acts with oversight, acting on its own while a human watches the pattern in aggregate and can step in. At the top it acts autonomously, with no human in the loop for the individual decision, surfacing only what crosses a threshold it judges worth raising.

Now watch what happens to the human as you climb, because this is the whole argument. The human’s role does not stay fixed while the agent gets more capable. It transforms, rung by rung, into a different job:

Rung	What the agent does	What the human is	What design must produce
Suggests	Proposes; human decides and acts	A chooser	Comparable options, honest uncertainty
Drafts	Produces the whole artifact; human commits	A reviewer of work	A reviewable draft, clear provenance
Acts with approval	Proposes an action, waits for a yes	An approver	A decision package sized to the stakes
Acts with oversight	Acts on its own; human watches the pattern	A supervisor	A reviewable trail, exception flags
Acts autonomously	Acts, speaks only on threshold	A supervisor at a distance	Calibrated trust, drift prevention

The same product climbs this ladder as it earns trust, and different tasks inside one product sit on different rungs. As the human rises from reader to supervisor, the design object rises with them: from arranging information to sustaining judgment over work the person no longer does themselves.

The word “user” quietly stops fitting somewhere around the middle of that ladder. An approver is already half a supervisor, and a reviewer and a silent-rung overseer are supervisors outright; they are not completing a task, they are overseeing an autonomous worker that completes it. The shift from designing for a user to designing for a supervisor is not a slogan and it did not happen all at once. It is what the autonomy ladder does to the human as the product climbs it, and the serious AI-infused products are climbing.

That is why the screen receded. On the bottom rungs the screen is everything, how the agent informs and how the human chooses; as you climb, the screen shrinks from a workspace to a window, and at the top to a few lines and a question. The screen did not die of its own accord. It got displaced by autonomy, and the human’s job got displaced with it, from doing to supervising. Not every product is high on this ladder, and many never will be; the claim is not that screens are vanishing everywhere, only that wherever autonomy rises, the human’s role rises with it, and the design object follows. So the real claim of this chapter is the one this whole section makes about every craft, now reaching the designer: as autonomy rises, you stop designing the thing the user operates and start designing the supervisor who oversees the agent.

Two things wearing one word

Before going further, clear up a confusion that wastes a lot of published advice. “AI UX” covers two problems that sit at opposite ends of that ladder and have almost nothing to do with each other.

The first is the product that still has a screen, with AI added near the bottom rungs. Copilot inside the code editor, the generate button in the design tool, the assistant docked in the corner of the dashboard. The human is a chooser, and the design problem is the familiar one with a new ingredient: fit the agent’s suggestions into an interface a person still navigates with their hands and their eyes. For this case the answer to the chapter’s title question is the comfortable one: yes, there is still a UX, and it is mostly the UX you already know. The screen still matters, the patterns still hold, the discipline that learned to lay out a dashboard still applies, and the new work is incremental, designing the suggestion’s entry point, the accept-and-reject affordance, the way generated content is marked as generated. The field has good patterns for this, and most of what gets published under “agentic UX” is really about this case, which is part of why it feels solved. It is the easier half, and it is close to handled. The role that does it well is increasingly the design engineer, the designer who ships the interface they design, and that title is already real in the market, named in hiring at companies like Vercel, Linear, Stripe, and Cursor. If your product is on the bottom rungs, this is the chapter you can mostly skip, and the answer to its title is a reassuring yes.

The second is the product living near the top of the ladder, where the screen is incidental or absent, and here the same title question has a different and far less comfortable answer. You set a goal, you wait, you review an outcome. A coding agent works a ticket overnight. A procurement agent reviews a stack of vendor contracts. A claims agent takes a photo and returns a risk assessment. The interaction is not navigation, it is delegation and, later, judgment. The human is a supervisor. This chapter is about this case, because this is where the field has not caught up and where teams keep shipping an experience nobody designed. Call it the agent-native product. The screen is no longer the product; the supervisor’s judgment is, and the role that designs for it does not have a settled name yet. This chapter will argue for one, the behavioral designer, but unlike the design engineer that is not a title you will find in a job posting today. It is a name for a responsibility the market has not yet staffed, which is the whole problem. So the honest two-part answer to “is there still a UX” is yes for the screen that has AI on it, and not the UX you know for the agent that has no screen at all, and the rest of this chapter is about the second answer, because the first one does not need a book.

One product, every rung at once

To keep the rest of this concrete, hold one product in mind. Imagine an insurer launches an agentic system for food-spoilage claims, the small payouts people file after a power outage ruins a refrigerator full of groceries. Call it the claims agent; the team that built it named it Nemo. It takes a first notice of loss, photos of the spoiled food, a short description, the policy number. It pulls the policy and checks coverage and limits. It runs computer vision over the photos to estimate the loss and catch the obvious frauds, the stock image, the same salmon filed twice. It checks the utility’s outage feed against the customer’s address. Then, depending on the amount and the signals, it either pays, asks, or escalates, and it logs every step for the auditor and the regulator. This is not science fiction; it is roughly what the 2026 claims-automation vendors describe, and it is useful here because the same product lives on every rung of the ladder at the same time, depending on the claim in front of it.

A clean ninety-dollar claim with a confirmed outage sits near the top: the agent pays it and reports later. A six-hundred-dollar claim sits in the middle: the agent drafts the payout and waits for a person to approve. A two-thousand-dollar claim, or one with a fraud signal, drops to the bottom: the agent assembles the file and a human decides. One product, one agent, and the human is a reader for one claim, an approver for the next, and a supervisor of a portfolio for the hundreds that resolved while they were asleep. Every idea ahead lands somewhere in that one example.

The lab moves with the human

If the human’s role transforms as the product climbs, the way you test the design has to transform with it. The old craft had a rigorous empirical heart that often went unremarked: usability testing. You sat a person in front of an interface, gave them a task, and watched where they struggled. UI validation was applied behavioral science with a screen as the stimulus. It never had to call itself that, because at the bottom of the ladder the behavior was easy to see.

Higher on the ladder the stimulus changes and the questions get harder, but the method survives and matters more. You are no longer testing whether someone can find the button. You are testing how a supervisor behaves over a session with an autonomous system. How often does the agent interrupt before the person starts dismissing interruptions without reading them. Does the approval message carry enough for a conscious decision, or does it produce a reflexive yes. How does the supervisor’s cognitive load move across a long session, and at what point does their vigilance fall off. Does the person catch a seeded error, an action the agent took that they should have stopped, and how does the catch rate change after an hour, after a week, after the agent has been right so many times that checking feels pointless. Run that test on the claims product and it gets uncomfortably concrete: slip one claim into the adjuster’s queue where the photos are real but the outage feed shows no power loss at that address, and measure how many sessions deep the adjuster is before they stop catching it. The number you get is not a usability score. It is the moment the supervisor stopped supervising, and it is the single most important thing the design has to move. None of those can be answered by looking at a comp. All of them can be tested, with the same discipline that made usability testing trustworthy: a real person, a realistic task, an observer watching for the moment behavior breaks. The deliverable is no longer a heatmap of where people clicked. It is a map of where supervisors drift, and a design that pushes the drift point further out.

The drift modes, and designing against them

If the supervisor is who you are designing for at the top of the ladder, then the supervisor’s failure modes are the design brief. And the supervisor fails in a particular, dangerous way. A user fails by getting confused or giving up, a visible, self-correcting failure, they leave. A supervisor fails by staying present and going passive, by approving without reading, by trusting a system that has been right fifty times and is wrong on the fifty-first. That failure is invisible and it accumulates. The whole job of designing the supervisor is designing against it.

The failure modes are not random. The human-factors literature, built over decades in aviation, anesthesia, and process control, names them, and every one shows up in agent supervision. A behavioral designer who cannot name these is decorating. One who can is the failure-prevention layer of the product.

Drift mode	What happens	The design counter
Automation complacency	Monitoring degrades as the agent proves reliable; checking stops paying off, so the supervisor stops checking	Vary friction by consequence, not by frequency; never let a high-stakes action inherit the low scrutiny a clean streak earned
Vigilance decay	Sustained attention to a mostly-quiet process falls off within the first half hour	Engineer attention rather than assume it; surface the decision-relevant moments, keep the rest quiet
Over-reliance paradox	A plausible explanation lowers the felt cost of checking, so showing more reasoning deepens misplaced trust	Surface uncertainty over confidence; use deliberate friction; ground claims in what the supervisor can verify (treated in full below)
Mode confusion	The supervisor does not know which autonomy rung is live right now, and is surprised by an action they thought they had to approve	Make the current rung continuously, unmistakably visible; mark every transition

Watch all four arrive in the claims product, because they do not stay theoretical for long. Automation complacency comes first: after a few hundred clean auto-pays the adjuster stops reading the six-hundred-dollar decision packages and approves on reflex, because the agent never seems to be wrong. The counter is to tie friction to the stakes and not to the streak, so a two-thousand-dollar claim always demands a richer package and a one-line justification from the adjuster no matter how perfect the last month looked, and to keep a spot-check mode where the agent auto-pays but still routes a random sample back for review. Vigilance decay comes next, half an hour into a smooth queue, and the counter is to stop streaming every case and surface only the exceptions, the threshold crossings and the fraud flags and the mismatched data, so the adjuster’s attention is spent where it changes an outcome. Mode confusion is the quiet one: a policy change moves the agent from “suggest only” to “auto-pay under two hundred,” and the adjuster is startled the first time a claim resolves without them, a rung change they never saw. The counter costs almost nothing, a persistent indicator that always says which rung is live (“Mode: auto-pay under $200, review required above”) and an explicit message whenever the rule changes.

Three of those have clean design counters and you can build them today. The fourth, the over-reliance paradox, is where the discipline’s instinct is most confidently wrong, and it needs its own section, because the obvious fix makes it worse. It is already visible in the claims case: the agent that captions every decision with a fluent “based on your under-seven-hundred-fifty auto-pay policy and similar claims this quarter, the fair payout is six hundred dollars” is teaching the adjuster to stop checking whether the outage data actually matches the address, because the explanation feels like proof. Hold that thought; the next section is about why showing more reasoning makes it worse, not better.

The approval moment is the interface, so size it to the decision

On the upper rungs the agent’s autonomous work happens in the gaps between approvals. By design, those gaps are the part the supervisor is not in. The only places the human and the product actually meet are the moments the agent pauses to ask. Those moments are the interface. Teams build this backwards, lavishing attention on the reasoning the supervisor never sees and treating the approval step as a speed bump.

Which raises the question that matters most in practice: is the approval moment a one-line text message, or a full dashboard assembled on the fly. Neither fixed answer works, and “always show everything” is a trap. A standing dashboard for every action manufactures alarm fatigue, the supervisor learns the wall of detail means nothing in particular and clicks past all of it, including the one that mattered. A one-line “approve?” for every action is the opposite failure, not enough to support a conscious decision, so it produces a reflexive yes. Both fixed answers train the very drift you are trying to prevent.

The approval surface has to scale to the stakes, and the autonomy ladder already tells you how. A low-stakes, reversible action on a high rung needs a line of text, if it interrupts at all. A high-stakes, irreversible action needs a decision package built in the moment: not a standing dashboard of everything the agent knows, but exactly the facts that change this decision, surfaced now, for this action. What the agent is about to do. What it is uncertain about. The one or two things that cannot be undone. The alternatives it set aside and why. Assembled on the fly, scaled to the consequence, and built for authorization rather than re-derivation, because the supervisor cannot re-run forty steps of machine reasoning and should not be asked to. They are there to authorize the action against what they know that the agent does not: the context, the stakes, the thing that is true about this customer or this quarter that never reached the model.

The difference is concrete enough to show. Here is the six-hundred-dollar claim as most teams ship it:

Nemo recommends a payout of $600. [Approve] [Reject]

That is a reflex trainer. It carries nothing the adjuster can weigh, so after the tenth one they stop weighing. Here is the same moment built as a decision package:

ACTIONIssue $600 to policy 38492, send standard notification

WHYEstimated loss $610 from three photos; outage confirmed seven hours at this address (utility feed); policy covers spoilage to $750, no deductible

UNCERTAIN30% of one photo could not be identified; fraud check clean, but the model is not validated in this region

NOT REVERSIBLEOnce paid, clawback requires a formal investigation

RULED OUTDecline (coverage is present); pay the $750 maximum (no high-value items detected)

CONTROLS[ Approve ] [ Adjust amount ] [ Send to fraud team ]

Same recommendation, same dollar figure. But the second one gives the adjuster the two things that actually decide it, the unvalidated fraud model and the irreversibility of the payout, and it surfaces them only because this claim crossed into the rung where they matter. That is the surface scaling to the stakes rather than to a house style: the richer the consequence, the more the card carries, and the cheaper the claim, the lighter the moment.

And when the same product runs a rung higher, for the sub-two-hundred-dollar claims it settles on its own, the supervisor does not get a card at all. They get a tile at the end of the day: the agent paid a hundred and forty-three claims under two hundred dollars, zero overrides, two pulled for fraud review. That is the screen shrinking from a workspace to a window, the approver becoming a reviewer, and the design problem changing from “what goes on this card” to “what belongs in this digest and what can stay in the log.”

The supervisor’s authority should itself be tunable along the ladder. The best products let the human move the agent up and down the rungs per domain and per level of risk, turning autonomy up as trust is earned and down the moment it is not. The enterprise policy that says “a human must approve before the agent acts” becomes, in the interface, a rung the supervisor sets rather than a rule bolted on from outside. And the movement is itself a signal the team can watch: which way the supervisor moves the agent, and whether they ever ratchet autonomy up faster than the agent’s reliability justifies.

Underneath all of it is a fact about trust that screen UX never had to reckon with. In screen software, trust came from consistency, the button does the same thing every time. In agent software, trust is behavioral and asymmetric. It accrues slowly, through many small moments where the agent respected a boundary or asked at the right time, and it collapses in one unexpected action. The person who told the agent to clean up their calendar and lost three real meetings is not coming back, however capable the model. Capability trust, does it do the task right, builds with each success. Behavioral trust, does it stay inside the lines I set, breaks all at once. The supervisor’s behavioral trust is the thing the designer is really managing, and it is far easier to destroy than to build.

The thing that should make every designer nervous

So far this reads like a craft you can master: read the ladder, run the new usability test, counter the drifts, size the approval surface. If the chapter stopped here it would be a pattern catalog, and it would be lying, because the central problem of keeping a supervisor calibrated does not have a solved answer, and the obvious answer is wrong.

The obvious answer is transparency. The supervisor cannot see the agent’s reasoning, so show it to them. Explain the recommendation, surface the chain of thought, cite the sources, and trust will calibrate itself. This is the founding assumption of nearly the entire field of explainable AI, and the best evidence we have says it does not hold.

The sharpest result comes from a controlled study by Schoeffer, De-Arteaga, and Kühl, published at CHI in 2024. Explanations, they found, move how much people rely on the AI, but they do not move the accuracy of that reliance. A feature-based explanation did not help people tell a correct AI recommendation from an incorrect one. It changed how much they leaned on the AI regardless of whether the AI was right. And this is not one contrarian paper; it is where the weight of the evidence sits. Explanations that are too technical, or too demanding, or even too simple can all deepen misplaced trust rather than correct it, and the effect is worst on the least experienced users, the very supervisors least able to absorb the error.

The mechanism is almost cruel in its logic. A plausible explanation lowers the felt cost of not checking. If the agent appears to have shown its work, the supervisor’s sense of risk drops, and with it their scrutiny. The explanation does not arm them to catch the error. It reassures them past the point where they would have caught it. The same pattern shows up at the edge of the literature in a form every product leader should sit with: in domains where people lean on AI assistance heavily, their own unaided skill measurably erodes, so the explanation meant to keep the supervisor in the loop is quietly removing the judgment the loop depends on. The drift and the deskilling feed each other.

This does not mean give up on legibility. It means stop equating legibility with explanation. What the evidence supports is narrower and harder. Surface uncertainty rather than confidence, because a confident tone is exactly what triggers over-reliance, and a model’s self-reported confidence is poorly calibrated anyway, so an empirically derived reliability number beats the agent’s own assurance. Use friction on purpose: a cognitive forcing function, a small demand that the supervisor actually engage before they accept, scores worse on user preference and better on decision accuracy, which is the trade the situation calls for. Ground the explanation in something the supervisor can actually verify, “I did this because you told me last week to prioritize the enterprise accounts,” not a paragraph of reasoning they have no way to check. And declare scope, say plainly what the agent is not equipped to judge, so trust is not applied where it does not belong.

The field has not solved this. There is no validated way to produce appropriate trust, neither the under-trust that makes a supervisor reject a correct agent nor the over-trust that makes them wave a wrong one through, across domains and users. Anyone who tells you their explanation pattern fixes trust calibration is selling the thing the research says does not exist. That is not a reason to avoid the problem. It is a reason to treat it as the open, dangerous, central problem of the discipline, and to design with the humility the evidence demands.

The absence of a theory does not leave you with nothing to do on Monday. You cannot reliably manufacture calibrated trust, but you can measure where it is failing, and that you can start this week. Run the seeded-error session from earlier in this chapter as a standing instrument rather than a one-time test, on a cadence, tracking the catch rate over time and across tenure. You will not get a trust-calibration theory out of it. You will get the one thing the theory would have given you anyway, an early-warning number that tells you which supervisors have slid into rubber-stamping before the incident does, and a falling catch rate is a signal to rotate the reviewer, retrain, or tighten the gate, none of which requires solving the underlying problem. Measure the drift you cannot yet prevent; it converts an unsolved theory into a maintained metric.

What the artist becomes

Which brings us back to the UX lead who thought of himself as an artist. His real subject was always the human. The screen was how he reached it. So when autonomy rises and the screen recedes, his craft does not end; it loses its disguise and has to face its subject directly.

The career splits, and the split is already visible in who companies are hiring. For products that still live on the lower rungs, with a screen the human operates, the artist’s heir is the design engineer, the person who designs and ships in code because a static mockup no longer survives contact with a live, generative product, the role named in hiring at Vercel, Linear, Stripe, and Cursor earlier in this chapter. It is a real answer, but it is an answer for the bottom of the ladder.

For the agent-native case at the top, the heir is different, and it does not have a settled name yet. Call it the behavioral designer, because that names the subject plainly. Its deliverables are not wireframes. They are the rung map, which tasks sit at which level of autonomy and what the human is on each. The drift-prevention plan, the specific countermeasure for each supervision drift in this product, where complacency will set in and what arrests it, when vigilance will fall and what re-engages it, which autonomy transitions risk mode confusion and how the live rung stays unmistakable. The supervisor test protocol, the seeded-error sessions that measure catch rate over time. The recovery scripts, the words the product uses to make a person whole after the agent gets it wrong. And increasingly the artifacts that used to belong to engineering: the system prompt, the eval rubric, the very questions the agent asks the supervisor instead of the form fields the user used to fill in. The clearest real-world version is a recent enterprise posting for a lead designer on an agentic platform, whose listed deliverables are feedback loops, error states, fallback behaviors, and alignment with the model team on constraints and confidence, with not a wireframe in sight.

There is a discomfort here worth naming rather than smoothing over. On most teams today the system prompt, arguably the most consequential behavioral-design artifact in an agent-native product, is written by a machine-learning engineer or a product manager, not a designer. Whether that should change is unresolved. But it is the cleanest test of whether your organization understands what design has become. If the things that shape the supervisor, what the agent may do, when it speaks, how it asks, what it discloses, are owned by whoever touched the code last, then the supervisor is being designed by accident, and the discipline that spent decades learning how humans behave under uncertainty is sitting outside the room where that behavior is now decided.

The part nobody has written yet

I have walked up the ladder, through the new usability test, the drift modes, the approval surface, the trust problem, and the changing craft, leaning on the people doing the best work in each. I want to be honest about where that work stops, because the edge of it is exactly where this book has something to add.

The field has good patterns for the lower rungs, screens with AI on them. It has first-generation, practitioner-grade frameworks for the middle rungs, agents with a thin interface a supervisor can still glance at. What it does not have, anywhere I can find, is a worked theory of the top rung, the fully headless case. The agent that runs overnight with no interface at all and hands a person a summary in the morning. The agent that acts in systems the human never opens and surfaces a nugget only when its own model decides one is warranted. How do you keep a supervisor calibrated over a system they never watch in real time. How do you run a usability test on an experience with no screen to sit in front of. What is the signal that an action was taken by an agent and not a person, and who is owed it. There is no settled answer to any of this, and no settled answer to who on the team owns the question.

Push the claims example one rung past where any pattern library goes and you can feel the floor disappear. Imagine the insurer runs a second agent overnight, a claims guardian that nobody watches, scanning yesterday’s settled claims and the data around them for the things the daytime agents missed: recoveries left on the table, mis-coded losses, fraud clusters that only show up across many small claims at once. No one ever opens its interface. Each morning it sends one email: three high-risk clusters it found, one question it needs answered, “should we open a special investigation on this grocery chain,” and links into the existing claims systems rather than into itself. Now ask the questions the design discipline has no answer for. How do you keep that supervisor calibrated when they never see the agent work and only ever read its morning nuggets, so they have no felt sense of how often it is right. When should the email even fire, every morning, or only when the risk crosses a line, and how does that choice shape whether the human keeps paying attention. How many times must the supervisor answer a given kind of question before the agent is allowed to act on that pattern without asking, and who decides that the trust has been earned. And the hardest one for the lab: how do you seed a test case into a morning packet, the cluster that should not be there, to measure whether the supervisor’s catch rate is decaying over months when there is no session to observe and no screen to record. The vendors who build these systems have solved the orchestration and the audit trail; the overnight multi-agent pipeline is a documented, shipping thing. What none of them has is a theory of how to keep the human at the end of it calibrated. That is not an oversight at the edge of the field. It is the center of the next one.

The honest thing to do with a frontier you cannot yet chart is not to wait for the map; it is to assign the question and start measuring badly. Two moves are available even now. First, give the question an owner: name the person responsible for the headless agent’s supervisor before the agent ships, because the failure mode that guarantees the worst outcome is the one where the morning email goes to a distribution list and the distribution list is no one. Second, run the seeded-packet test even though it is crude: salt the morning summary with a planted cluster that should not be there, on an irregular cadence, and see whether the supervisor flags it or signs off. You will not get a calibration theory; you will get the only number that matters before there is one, whether the human at the end of the pipe is still reading. A bad measurement of the right thing beats a perfect theory you do not have, and it is what a team can do with the headless case on Monday, which is more than the field’s silence suggests is possible.

That absence is not a gap in the research. It is the frontier, and naming it plainly is the most useful thing this chapter can do. The top rung is not an edge case. The projections say it is about to become the common case, and the design discipline is going to meet it mostly unprepared, still reaching for the screen, still designing the user when the job is to design the supervisor.

So a working definition, offered not because the field has converged on one but because it has not, and because the work needs a shape to grow around. The experience of an agentic product is the design of the human’s role at whatever rung of the autonomy ladder the product occupies: at the bottom, the legibility of information; in the middle, the quality of choices and approvals; at the top, the calibration of a supervisor’s trust and the prevention of their drift, over work they no longer do themselves. Where there is a screen, that work sits as a layer on top of ordinary interface design. Where there is not, that work is the product. It is the system prompt, the eval, the escalation policy, and the log.

The morning, again

Go back to the person opening her laptop, reading the three lines the agent left her and the one question it decided to ask. She is near the top of the ladder, and she does not have a screen to operate, she has a worker to oversee. In the next few seconds she will either stay a supervisor or quietly stop being one. She will read the question or wave it through. She will catch the thing the agent got wrong, or it will go out with her name on it.

Either way, someone designed the conditions under which she made that call. Someone decided what the agent would surface and what it would keep to itself. Someone decided this was a one-line question and not a decision package, or the reverse. Someone decided whether anything in the way it was presented would make her pause at the right moment, or let a long clean streak lull her past it. Every one of those was a behavioral design decision, and together they were the entire experience, and they decided whether she stayed a real supervisor.

The artist used to make the thing the user operated. The thing was never the point; the person was. As the agent climbed the ladder the person stopped operating and started overseeing, the screen shrank to three lines and a question, and the design became everything that holds her attention where it belongs. There is still a UX. It is bigger and more consequential than it was when there was a screen to hide behind. The only question left is whether anyone on the team will admit they are the one designing the supervisor, or whether she will keep being designed the way she is designed on most teams today, which is to say, by accident.

The Work Reshapes Who Wrote This Code, and Who Answers for It?