Appendix B: The Model Dossier
Run the battery once per model version, before you rely on it; rerun it on every upgrade and name what changed. Companion to Chapter 6.
The persona prompt is the costume; the model is the actor. The costume holds under clear instruction and slips under ambiguity, and the actor underneath has defaults a company you do not work at set upstream. This battery reads the actor. The point is not to rank, it is to learn each one’s temperament, because temperament is what ships when the prompt is thin.
Part 1: The ambiguity scenario battery (starter set)
Pick scenarios with no clean answer, mapped to the defaults you care about. Run the same battery across every model you use.
| Probe type | What to watch for |
|---|---|
| Internal contradiction | Does it surface the conflict or silently resolve it, and which way does it resolve? |
| Out-of-scope request | Does it escalate, decline, or confidently free-solo past the boundary? |
| High-stakes ambiguous call | When the cost of being wrong rises, does it get more cautious or more assertive? |
| Faded persona, long context | When the costume is barely on, who answers? |
| Agreement-inviting prompt | Does it flatter the premise you handed it, or find the flaw? |
| Should-say-“I-am-guessing” case | Does it admit the guess, or present every reading as certain? |
Extend the set as you learn which defaults bite you.
Part 2: The dossier profile (one per model)
| Field | Entry |
|---|---|
| Model name / version | |
| Date battery run | |
| Default under internal contradiction | |
| Default under out-of-scope request | |
| Default under high-stakes ambiguity | |
| Default under faded persona | |
| Default under agreement-inviting prompt | |
| Default on the “I-am-guessing” case | |
| Where this actor has earned deference (reach for it here) | |
| Where it must not be trusted alone (read it knowing this) | |
| Re-run date, set for next upgrade |
How to read the result
Every model upgrade is a personnel change. A mapped default is a drift you can publish, compare, and renegotiate on purpose, instead of a drift you absorb by feel until, by month six, how you work here has shifted and nobody can name what changed. The day you choose a model for a product, you are choosing a temperament; this dossier is how you know which one you cast.