Most teams treat summarization as a single prompt and a hopeful glance at the result. That works until the summary drops the one caveat that mattered, or invents a number that nobody said, or balloons to three paragraphs when a client wanted three sentences. At that point summarization stops being a convenience and becomes a liability you have to babysit.
A playbook fixes this by turning summarization into a set of repeatable plays instead of one-off improvisation. Each play has a trigger that tells you when to run it, an owner who is accountable for it, and a place in the sequence so the plays compound instead of colliding. The goal is not a perfect prompt. The goal is a system where quality is the default and failures are caught early by design.
This article lays out the plays we run, in the order we run them, with the triggers and owners attached. Use it as a reference you can hand to anyone on your team, not as a one-time read.
The Core Plays
Play 1: Define the Summary Contract
Before any prompt is written, name what the summary is for. A summary that feeds a sales rep before a call has different requirements than one that goes into a compliance log. The contract specifies the audience, the maximum length, the must-keep elements, and the things that must never be inferred.
- Trigger: a new summarization use case enters the workflow.
- Owner: the workflow lead who understands the downstream reader.
- Output: a one-paragraph contract pinned to the prompt template.
Play 2: Constrain the Source Window
Quality degrades when the model is asked to summarize more than it can hold cleanly. Chunk long documents, summarize each chunk, then summarize the summaries. This map-reduce pattern beats stuffing everything into one window and praying.
- Trigger: source text exceeds roughly a quarter of the context budget.
- Owner: the engineer who built the pipeline.
- Output: a chunking rule documented next to the prompt.
Play 3: Anchor With Extractive Quotes
Ask the model to pull verbatim quotes that support each claim before it writes the prose summary. Extraction anchors the summary in the source and makes fabrication visible, because an unsupported claim has no quote behind it.
- Trigger: the summary will be used for decisions or shared externally.
- Owner: the prompt author.
- Output: a quote-then-summarize prompt structure.
Running the Plays in Sequence
The plays are not interchangeable. They build on each other, and running them out of order wastes effort.
The Standard Order
- Define the contract so everyone agrees on the target.
- Constrain the source window so the model works with material it can handle.
- Anchor with extractive quotes so claims are traceable.
- Generate the summary against the contract.
- Score the output before it ships.
Skipping step one is the most common failure. Teams jump straight to prompting and then argue about whether the result is good, because they never agreed on what good meant.
When to Branch
Not every summary needs the full sequence. A throwaway summary for a private Slack note can skip the quote-anchoring play. A summary headed for a client deliverable runs every play and adds a human review gate. Match the rigor to the stakes, and write the branching rule down so the decision is not remade every time.
The Scoring Play
A summary you cannot measure is a summary you cannot trust. Scoring closes the loop.
What to Score
- Faithfulness: does every claim trace to the source?
- Coverage: are the must-keep elements from the contract present?
- Length compliance: does it fit the contract's ceiling?
- Tone fit: does it read the way the audience expects?
How to Score Cheaply
You do not need a research lab. A second model pass that checks the summary against the source quotes catches most faithfulness failures. A simple length check is a one-line script. For high-stakes summaries, a human spot-check on a sample each week keeps the automated scores honest. Teams that pair this with disciplined prompt iteration see the score curve climb steadily rather than bouncing around.
Assigning Owners and Triggers
A play without an owner is a suggestion, and suggestions do not run reliably.
Make Ownership Explicit
Each play names a single accountable person, even if others help. When faithfulness scores drop, there is no debate about who investigates. When the contract needs updating because the audience changed, one person holds the pen.
Make Triggers Automatic Where You Can
The best trigger is one the system fires for you. A length check that runs on every output does not depend on anyone remembering. A faithfulness score that posts to a dashboard surfaces drift before a human notices. Reserve manual triggers for judgment calls, and automate the mechanical ones.
Maintaining the Playbook
A playbook rots if no one tends it. The plays that worked on last quarter's documents may not fit this quarter's.
Review on a Cadence
Walk the plays every month or so. Ask which ones earned their keep, which triggers fired too often or too rarely, and which contracts drifted from what readers actually need. Retire plays that no longer pull their weight rather than carrying dead weight forever.
Treat Prompts as Versioned Assets
Store prompt templates in version control alongside the contracts and scoring rules. When a prompt changes, the change is reviewable and reversible. This is the same discipline that makes Building a Repeatable Workflow for Prompting for Summarization Quality sustainable rather than fragile.
Common Failure Modes the Plays Prevent
Naming the failures the plays exist to stop makes the whole system easier to defend and easier to teach.
Silent Omission
The most dangerous summarization failure is the one nobody sees: a caveat or qualifier quietly dropped, leaving a summary that reads cleanly but misleads. The contract play prevents this by naming the must-keep elements up front, and the scoring play's coverage check confirms they survived. Without both, omission hides in plain sight because the output still looks complete.
Confident Fabrication
A close second is the invented number or claim that sounds authoritative. The anchoring play is the direct defense: a claim with no supporting quote is a claim the model made up. Make extraction a required step and fabrication stops hiding behind fluent prose, because the missing quote is now visible evidence.
Length Creep
Summaries drift longer over time as prompts accumulate "also mention" additions. The contract's length ceiling and the automated length check hold the line, so a summary meant to be skimmed in seconds does not quietly become a three-paragraph read that defeats its own purpose.
Frequently Asked Questions
How many plays should a summarization playbook have?
Fewer than you think. Five to seven well-defined plays with clear triggers beat a sprawling manual nobody reads. Start with the contract, source constraint, anchoring, generation, and scoring plays, then add only what real failures justify.
What if my team is too small to assign separate owners?
One person can own multiple plays. The point of ownership is accountability, not headcount. Name yourself the owner of every play if you have to, but write it down so the responsibility is explicit rather than assumed.
Do I need a separate model to score summaries?
Not necessarily. A second pass with the same model, prompted to check claims against extracted quotes, catches most faithfulness problems. Reserve human scoring for high-stakes outputs and periodic audits of the automated scores.
How is a playbook different from just writing a good prompt?
A good prompt is one play. A playbook is the system around it: the trigger that tells you to run it, the owner who is accountable, the sequence that makes plays compound, and the scoring that proves it worked. Prompts drift; systems endure.
How long before a playbook pays off?
The contract and length-checking plays pay off almost immediately because they stop the most common arguments and the most embarrassing failures. The scoring and review cadence compound over weeks as drift gets caught early instead of in front of a client.
Key Takeaways
- Treat summarization as a set of named plays, not a single hopeful prompt.
- Every play needs a trigger that tells you when to run it and an owner accountable for it.
- Run the plays in order: contract, source constraint, anchoring, generation, scoring.
- Match rigor to stakes by branching, and write the branching rule down.
- Automate mechanical triggers like length checks; reserve manual triggers for judgment.
- Review the playbook on a cadence and version your prompts so the system improves instead of rotting.