A tutorial teaches you a concept. A playbook tells you what to run when a specific situation shows up. This is the second kind. If you already understand the fundamentals and now need an operating manual, something you and your team can reach for under pressure, this is built for that.
Each play below has the same anatomy: a trigger that tells you when to run it, the sequence of moves, and a note on who owns it when more than one person is involved. The plays are ordered roughly by how often you'll reach for them, not by difficulty.
Treat the playbook as a living document. The specific phrasings will age as models change, but the structure, named situations mapped to named responses, is durable. Borrow it, then adapt the plays to your own work.
Play 1: The Cold Start
Trigger: You're facing a new task and don't have a proven prompt for it yet.
Do not start typing your final prompt. Start by writing down, in plain language, what a perfect output would contain. List the sections, the tone, the length, the must-haves and must-avoids. That description becomes your spec.
Then build the prompt from the spec in this order:
- State the role and goal in one sentence.
- Provide context the model needs but doesn't have.
- Give the task instruction.
- Specify the output format, ideally with a template.
- Add constraints last.
Owner: whoever will use the output most owns the spec. The spec is the real artifact; the prompt is just its implementation. For a full treatment of building from scratch, see our step-by-step approach.
Play 2: The Format Lock
Trigger: The content is right but the structure keeps drifting.
When a model gets the substance correct but won't hold a consistent shape, stop describing the format and start showing it.
- Paste a filled-in example of the exact output you want.
- Or provide a skeleton with placeholders the model fills in.
- Use delimiters and labeled sections so the boundaries are unambiguous.
The mistake here is escalating your descriptions, adding more adjectives about the format. Descriptions are weak; examples are strong. One good example ends most format problems.
Play 3: The Grounding Pass
Trigger: The task is factual and the cost of a made-up answer is high.
This is the play for anything where being confidently wrong is worse than admitting uncertainty: client deliverables, research summaries, anything that gets forwarded.
- Supply the source material directly in the prompt.
- Instruct: answer using only the provided material.
- Add the escape hatch: if it's not in the source, say so.
- Require the model to point to the supporting text for each claim.
Owner: a human reviewer still signs off. Grounding reduces fabrication, it does not remove the need for a check. Our common mistakes guide explains why skipping the review step is the most expensive error of all.
Play 4: The Decomposition
Trigger: A single prompt is trying to do three jobs and doing all of them poorly.
When output quality sags on a complex task, the cause is usually that you've crammed multiple stages into one request. Split it.
How to split
- Identify the natural stages: research, structure, draft, polish.
- Make each stage its own prompt.
- Pass the validated output of one stage as the input to the next.
- Insert a human checkpoint at the highest-risk stage, usually structure.
The cost is more moving parts and more latency. Run this play only when single-prompt quality has actually failed, not preemptively.
Play 5: The Tone Calibration
Trigger: The output is competent but sounds wrong, too stiff, too casual, off-brand.
Tone is hard to describe and easy to demonstrate.
- Provide two or three sentences written in the target voice.
- Name the audience explicitly: "writing for a busy executive who skims."
- Specify what to avoid: jargon, hedging, exclamation points, whatever the offense is.
Tone is also the most personal axis, so this play often needs a real person to judge the result. You can't fully automate taste.
Play 6: The Constraint Audit
Trigger: The model is ignoring a rule you clearly stated.
Before adding more emphasis, diagnose why the rule is being dropped:
- Is it buried in the middle of a long block? Move it to the top or bottom.
- Is it phrased as a prohibition? Rewrite it as a positive instruction.
- Does it conflict with another instruction? Resolve the contradiction.
- Is it competing with five other rules? Cut the low-priority ones.
Shouting in capital letters is not a play. Diagnosing the structural reason is. The framework piece treats constraints as a first-class part of prompt design rather than an afterthought.
Play 7: The Regression Guard
Trigger: You're about to change a prompt that's already in regular use.
The danger of editing a working prompt is that you fix one case and silently break three others. Guard against it.
- Keep a small set of test inputs with known-good outputs.
- Run the current prompt against them and save the results.
- Make your edit.
- Rerun and compare. If anything that used to pass now fails, you've regressed.
Owner: whoever maintains the prompt owns its test set. A prompt without a test set is a prompt nobody can safely improve. The best practices guide makes the case for treating prompts like code in this respect.
Sequencing the Plays
These plays aren't standalone, they chain. A typical lifecycle:
- Cold Start to get a working draft.
- Format Lock and Tone Calibration to make it usable.
- Grounding Pass if the task is factual.
- Decomposition if quality won't reach the bar.
- Constraint Audit whenever a rule slips.
- Regression Guard every time you touch a prompt that's earning its keep.
The mark of maturity is recognizing which play a situation calls for before you start fiddling. Random tweaking is the opposite of a playbook.
Reading the Signals
The hardest part of using a playbook is diagnosis, knowing which play the situation actually calls for. The same surface symptom can have different causes, and running the wrong play wastes time.
A short diagnostic guide:
- Output is the wrong shape but the content is right. That's a Format Lock, not a Constraint Audit. Show an example instead of restating rules.
- Output is confidently wrong on facts. That's a Grounding Pass. More instructions won't fix a model that has nothing to ground against.
- Output is fine in isolation but degrades when the task gets bigger. That's a Decomposition. The prompt is overloaded, not poorly written.
- One specific rule keeps slipping while everything else holds. That's a Constraint Audit. Don't rewrite the whole prompt to fix one line.
Spending thirty seconds on diagnosis before you touch the prompt saves the much larger cost of editing in the wrong direction. The examples and use cases library is a useful reference for matching real symptoms to the play that resolved them.
Frequently Asked Questions
How is a playbook different from a checklist?
A checklist is one fixed sequence you run every time. A playbook is a library of responses, and your job is to pick the right one for the situation. The checklist for 2026 covers the fixed-sequence version; this playbook covers the situational one.
Do I need a team to use a playbook?
No. The ownership notes matter more in a team, but a solo practitioner benefits from naming situations and standardizing responses. It turns ad hoc fiddling into a deliberate practice.
Which play should I learn first?
The Format Lock and the Cold Start. They solve the two most common frustrations, drifting output and the blank-page problem, and together they cover a large share of everyday prompting.
How often should the playbook change?
Revisit it whenever you switch primary models or hit a recurring problem no existing play solves. The structure is stable; the specific phrasings inside each play will need refreshing as models evolve.
Can these plays be automated?
Partly. The mechanical plays, Grounding Pass, Regression Guard, can be built into tooling. The judgment-heavy ones, Tone Calibration especially, still need a human in the loop.
Key Takeaways
- A playbook maps named situations to named responses, unlike a single fixed checklist.
- Start every new task from a spec describing the ideal output, not from the prompt itself.
- Fix format and tone problems with examples, not more description.
- Decompose only when single-prompt quality has actually failed; it adds real cost.
- Diagnose ignored constraints structurally before adding emphasis.
- Never edit a working prompt without a test set to guard against regressions.