Plays That Stop AI From Making Things Up

A guide explains concepts. A playbook tells you exactly what to run, when to run it, and who is responsible when it fires. Reducing hallucinations is rarely a single clever prompt; it is a set of repeatable plays that activate under specific conditions and that someone on your team actually owns. When accuracy is left to whoever happens to be writing the prompt that day, results swing wildly and nobody can explain why a wrong answer slipped through.

This playbook organizes the work into named plays. Each one has a trigger that tells you when to deploy it, the mechanics of how it works, and an owner who keeps it healthy. Treat it as a menu you draw from based on the situation, not a script you run top to bottom every time. The aim is to make accuracy a deliberate operational choice rather than an accident of good intentions.

How to use this playbook

Each play below answers three questions: when does this fire, what does it do, and who owns it. Map the plays to your own roles. In a small team one person may own several; in a larger one they distribute across prompt engineers, reviewers, and the person who owns the client relationship.

The sequencing principle

Run grounding plays first, abstention plays second, and verification plays last. Grounding prevents most fabrication at the source, abstention catches what grounding misses, and verification is your safety net before anything reaches a client. Skipping straight to verification means you are inspecting quality in rather than building it in.

Play 1: Ground every factual answer

Trigger: Any prompt that asks for facts, figures, policy, or anything a client could act on.

Mechanics: Retrieve the relevant source documents and place them in the prompt. Instruct the model to answer only from the supplied material and to flag when the material is silent. This converts open recall into closed reading, which is where fabrication collapses.

Owner: The prompt engineer who builds the workflow. They are responsible for ensuring the retrieval step actually returns the right passages.

For teams standing up grounding for the first time, A Step-by-Step Approach to Reducing Hallucinations Through Prompting sequences the setup cleanly.

Play 2: License the model to abstain

Trigger: Whenever a wrong answer would be worse than no answer, which covers nearly all client-facing work.

Mechanics: Add explicit permission to decline. State that "I cannot find this in the provided material" is a correct response, and that guessing is not. Reinforce it by asking the model to separate supported claims from inferences.

Phrases to standardize

"If the answer is not supported, say so plainly."
"Mark any claim that is an inference rather than a stated fact."
"Returning nothing is better than returning an unverified answer."

Owner: Whoever maintains your prompt library. These phrases should live as reusable snippets, not be reinvented each time.

Play 3: Demand traceable citations

Trigger: Any output where a reviewer or client may need to verify a claim later.

Mechanics: Require the model to attach the specific source passage to each factual claim. The discipline of citing surfaces unsupported claims as obviously empty, because there is nothing real to point to.

Owner: The reviewer. They spot-check that cited passages actually say what the model claims, since a fabricated citation is still a fabrication. The reusable phrasings in Reducing Hallucinations Through Prompting: Best Practices That Actually Work help here.

Play 4: Decompose complex questions

Trigger: Multi-step reasoning, comparisons across several facts, or calculations.

Mechanics: Ask the model to work through intermediate steps before committing to an answer. Step-by-step reasoning reduces leaps to wrong conclusions in logic and math. It will not invent missing knowledge, so pair it with grounding.

What this play does not fix

It does not help recall a single obscure fact the model never learned.
It does not replace source material; it only organizes what is present.

Owner: The prompt engineer, who decides which workflows warrant the extra reasoning tokens.

Play 5: Run a pre-delivery verification pass

Trigger: Before any AI-generated output reaches a client.

Mechanics: A second check, human or a separate model prompt, that asks "is each claim here supported by the source?" The verifier does not rewrite; it flags. Flagged claims go back for grounding or get removed.

Owner: The reviewer, with the account owner as backstop for anything client-sensitive. This mirrors the discipline in The Reducing Hallucinations Through Prompting Checklist for 2026.

Play 6: Measure and tune on a known-answer set

Trigger: Whenever you change a prompt, swap a model, or onboard a new use case.

Mechanics: Maintain a small evaluation set with known correct answers and some deliberately unanswerable questions. Run old and new prompts against it and track accuracy, fabrication rate, and whether the model correctly refuses the unanswerable ones.

The scorecard

Accuracy on answerable questions
Fabrication rate on confident wrong answers
Abstention quality on unanswerable questions

Owner: The prompt engineer owns the set; the reviewer audits the results. The underlying mechanics live in The Complete Guide to Reducing Hallucinations Through Prompting.

Assigning owners across a small team

You do not need a large org to run these plays. A practical split for a three-person team:

Prompt engineer: owns grounding, decomposition, and the evaluation set.
Reviewer: owns citations, verification, and result audits.
Account owner: owns the abstention standard and the final go or no-go on client delivery.

The point of naming owners is accountability. When a fabricated answer reaches a client, you can trace which play failed and who tunes it, instead of shrugging at the model.

When to escalate ownership

Some situations warrant pulling a play up to a more senior owner. A new client domain the team has never worked in, a regulated industry where a wrong claim carries legal weight, or a workflow feeding an automated downstream action all justify tighter ownership of grounding and verification. The rule of thumb: the higher the cost of a confident wrong answer, the more senior the person who signs off on the play. For everyday internal drafting, lightweight ownership is fine; for client deliverables that drive decisions, the account owner should be in the loop on every verification pass.

Common ways a playbook breaks down

Even a well-designed playbook fails if the plays are run inconsistently or treated as suggestions. A few failure modes show up repeatedly across teams.

Plays that depend on memory

Any play that requires someone to remember to add a phrase or run a check will eventually be skipped under deadline pressure. The fix is to embed the play in a template or tool so it fires by default rather than relying on discipline in the moment.

Verification that rubber-stamps

A verification pass that always returns "looks good" is not a verification pass; it is a formality. Reviewers need a concrete checklist and the authority to send work back. If verification never flags anything, it is either perfect work or, far more likely, a check no one is actually performing.

Evaluation sets that go stale

A known-answer set built once and never updated slowly stops reflecting the questions you actually field. Refresh it as your use cases shift so the metrics keep meaning something. The examples in Reducing Hallucinations Through Prompting: Real-World Examples and Use Cases are a good source of fresh test cases.

Frequently Asked Questions

Do I need to run every play on every task?

No. Grounding and abstention apply almost universally, but decomposition only matters for multi-step questions and full verification passes belong to client-facing outputs. Match the play to the stakes of the task.

Who should own the plays in a solo practice?

You own all of them, but stagger them in time. Build grounding and abstention into your prompts up front, then deliberately switch hats to review and verify before sending anything out, rather than judging your own output in the same pass.

How do these plays interact with model upgrades?

Upgrades change baseline accuracy but not the need for the plays. Re-run your evaluation set after any upgrade to confirm the new model still respects abstention instructions and grounding, since behavior can shift in subtle ways.

What is the fastest play to adopt first?

Grounding. Supplying source material and constraining answers to it removes the largest single source of fabrication, and it requires no new tooling beyond a way to fetch the relevant documents.

How do I keep the playbook from being ignored?

Tie the plays to owners and bake them into your prompt templates so they fire by default. A play that depends on someone remembering to add a phrase will eventually be forgotten; a play embedded in a reusable template runs every time.

Key Takeaways

A playbook turns accuracy into named plays with triggers and owners, not a one-off clever prompt.
Sequence grounding first, abstention second, and verification last to build quality in rather than inspect it in.
Citations only count when a reviewer verifies the cited passage actually supports the claim.
Assign clear owners so a fabricated answer can be traced to the play that failed and the person who tunes it.
Re-run a known-answer evaluation set on every prompt change, model swap, or new use case.

How to use this playbook

The sequencing principle

Play 1: Ground every factual answer

Trigger: Any prompt that asks for facts, figures, policy, or anything a client could act on.

Owner: The prompt engineer who builds the workflow. They are responsible for ensuring the retrieval step actually returns the right passages.

For teams standing up grounding for the first time, A Step-by-Step Approach to Reducing Hallucinations Through Prompting sequences the setup cleanly.

Play 2: License the model to abstain

Trigger: Whenever a wrong answer would be worse than no answer, which covers nearly all client-facing work.

Phrases to standardize

"If the answer is not supported, say so plainly."
"Mark any claim that is an inference rather than a stated fact."
"Returning nothing is better than returning an unverified answer."

Owner: Whoever maintains your prompt library. These phrases should live as reusable snippets, not be reinvented each time.

Play 3: Demand traceable citations

Trigger: Any output where a reviewer or client may need to verify a claim later.

Play 4: Decompose complex questions

Trigger: Multi-step reasoning, comparisons across several facts, or calculations.

What this play does not fix

It does not help recall a single obscure fact the model never learned.
It does not replace source material; it only organizes what is present.

Owner: The prompt engineer, who decides which workflows warrant the extra reasoning tokens.

Play 5: Run a pre-delivery verification pass

Trigger: Before any AI-generated output reaches a client.

Owner: The reviewer, with the account owner as backstop for anything client-sensitive. This mirrors the discipline in The Reducing Hallucinations Through Prompting Checklist for 2026.

Play 6: Measure and tune on a known-answer set

Trigger: Whenever you change a prompt, swap a model, or onboard a new use case.

The scorecard

Accuracy on answerable questions
Fabrication rate on confident wrong answers
Abstention quality on unanswerable questions

Owner: The prompt engineer owns the set; the reviewer audits the results. The underlying mechanics live in The Complete Guide to Reducing Hallucinations Through Prompting.

Assigning owners across a small team

You do not need a large org to run these plays. A practical split for a three-person team:

Prompt engineer: owns grounding, decomposition, and the evaluation set.
Reviewer: owns citations, verification, and result audits.
Account owner: owns the abstention standard and the final go or no-go on client delivery.

The point of naming owners is accountability. When a fabricated answer reaches a client, you can trace which play failed and who tunes it, instead of shrugging at the model.

When to escalate ownership

Common ways a playbook breaks down

Even a well-designed playbook fails if the plays are run inconsistently or treated as suggestions. A few failure modes show up repeatedly across teams.

Plays that depend on memory

Verification that rubber-stamps

Evaluation sets that go stale

Frequently Asked Questions

Do I need to run every play on every task?

Who should own the plays in a solo practice?

How do these plays interact with model upgrades?

What is the fastest play to adopt first?

Grounding. Supplying source material and constraining answers to it removes the largest single source of fabrication, and it requires no new tooling beyond a way to fetch the relevant documents.

How do I keep the playbook from being ignored?

Key Takeaways

A playbook turns accuracy into named plays with triggers and owners, not a one-off clever prompt.
Sequence grounding first, abstention second, and verification last to build quality in rather than inspect it in.
Citations only count when a reviewer verifies the cited passage actually supports the claim.
Assign clear owners so a fabricated answer can be traced to the play that failed and the person who tunes it.
Re-run a known-answer evaluation set on every prompt change, model swap, or new use case.

Plays That Stop AI From Making Things Up

How to use this playbook

The sequencing principle

Play 1: Ground every factual answer

Play 2: License the model to abstain

Phrases to standardize

Play 3: Demand traceable citations

Play 4: Decompose complex questions

What this play does not fix

Play 5: Run a pre-delivery verification pass

Play 6: Measure and tune on a known-answer set

The scorecard

Assigning owners across a small team

When to escalate ownership

Common ways a playbook breaks down

Plays that depend on memory

Verification that rubber-stamps

Evaluation sets that go stale

Frequently Asked Questions

Do I need to run every play on every task?

Who should own the plays in a solo practice?

How do these plays interact with model upgrades?

What is the fastest play to adopt first?

How do I keep the playbook from being ignored?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?

Plays That Stop AI From Making Things Up

How to use this playbook

The sequencing principle

Play 1: Ground every factual answer

Play 2: License the model to abstain

Phrases to standardize

Play 3: Demand traceable citations

Play 4: Decompose complex questions

What this play does not fix

Play 5: Run a pre-delivery verification pass

Play 6: Measure and tune on a known-answer set

The scorecard

Assigning owners across a small team

When to escalate ownership

Common ways a playbook breaks down

Plays that depend on memory

Verification that rubber-stamps

Evaluation sets that go stale

Frequently Asked Questions

Do I need to run every play on every task?

Who should own the plays in a solo practice?

How do these plays interact with model upgrades?

What is the fastest play to adopt first?

How do I keep the playbook from being ignored?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?