AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Scenario 1: The Support Bot That Invented FeaturesWhat was happeningThe fix and its limitScenario 2: Document Q&A With Confident Wrong AnswersWhat was happeningThe fix and its limitScenario 3: The Research Assistant and Fake CitationsWhat was happeningThe fix and its limitScenario 4: Data Extraction That Filled Empty FieldsWhat was happeningThe fix and its limitScenario 5: The Coding Helper That Hallucinated APIsWhat was happeningThe fix and its limitWhat the Scenarios ShareGrounding is the workhorseRetrieval quality sets the ceilingAbstention and verification fill the cracksOver-correction lurks behind every fixMapping the Scenarios to Your WorkIdentify your grounding sourceLocate where your gaps will appearFrequently Asked QuestionsWhy did grounding not fully fix the document Q&A case?How can a model fabricate a citation when given a real source list?Is allowing blank fields always the right call for extraction?Why does the coding helper still hallucinate with documentation supplied?Which scenario is most like my situation?Key Takeaways
Home/Blog/Grounding Prompts in Action: Five Scenarios That Tell
General

Grounding Prompts in Action: Five Scenarios That Tell

A

Agency Script Editorial

Editorial Team

·December 23, 2023·8 min read
reducing hallucinations through promptingreducing hallucinations through prompting examplesreducing hallucinations through prompting guideprompt engineering

Principles are easy to nod along to and hard to apply. The gap closes when you see the same technique work on a specific, recognizable problem. This article walks through five scenarios where a prompt was fabricating, what change fixed it, and—just as important—what the change did not fix.

The scenarios are drawn from common patterns: a support bot, a document Q&A tool, a research assistant, a data-extraction task, and a coding helper. Each one shows the mechanism in context, so you can map it to your own work. Where a fix had a limit, we say so, because knowing the boundary is what keeps you from over-trusting a single technique.

For the underlying concepts behind these examples, see Stop Your Model From Inventing Facts at the Prompt Layer.

Scenario 1: The Support Bot That Invented Features

A customer support assistant kept describing product capabilities that did not exist. Users asked whether the product could do something, and the bot, eager to help, said yes and described how.

What was happening

The prompt asked the model to answer product questions but supplied no product documentation. The model reconstructed answers from training and general expectations of what such products do.

The fix and its limit

Grounding the bot in the actual feature documentation and instructing it to answer only from that text stopped the invented features. The limit: when documentation was incomplete, the bot still occasionally guessed until an abstention clause was added to cover the gaps.

Scenario 2: Document Q&A With Confident Wrong Answers

A tool that answered questions about uploaded contracts returned answers that sounded authoritative but cited clauses that were not in the document.

What was happening

The retrieval step pulled roughly relevant passages, but the prompt did not require the model to tie its answer to a specific clause. The model blended retrieved text with assumptions about typical contracts.

The fix and its limit

Requiring the model to quote the exact clause supporting each answer exposed the gaps and forced abstention when no clause matched. The limit: when retrieval pulled the wrong passages entirely, the model grounded its answer in irrelevant text—an upstream problem prompting could not solve.

Scenario 3: The Research Assistant and Fake Citations

A research helper produced summaries studded with academic citations, several of which referred to papers that did not exist.

What was happening

The model was asked to support claims with citations but had no real source list to draw from, so it generated citations that looked structurally correct and were entirely invented.

The fix and its limit

Supplying a real list of source documents and instructing the model to cite only from that list eliminated the fabricated references. The limit: the model occasionally cited a real source that did not actually support the specific claim, which required a verification pass to catch.

Scenario 4: Data Extraction That Filled Empty Fields

A task extracting structured fields from messy text kept inventing values for fields that were simply not present in the input.

What was happening

The output schema required every field, and the model, faced with a required slot and no data, supplied a plausible guess rather than leaving it blank.

The fix and its limit

Allowing fields to be explicitly marked "not present" gave the model a place to put honesty, and the invented values disappeared. The limit: the model sometimes marked present-but-hard-to-find values as missing, an over-correction that needed tuning. This balance mirrors the calibration discussed in Build a Fabrication-Resistant Prompt in Eight Moves.

Scenario 5: The Coding Helper That Hallucinated APIs

A coding assistant suggested functions and parameters that did not exist in the library being used, sending developers chasing methods that were never real.

What was happening

The model drew on a blurred memory of many libraries and versions, confidently mixing real and imagined APIs. No authoritative reference grounded its suggestions.

The fix and its limit

Supplying the relevant library documentation or type definitions in the prompt and instructing the model to use only documented APIs sharply reduced the invented calls. The limit: for very large libraries, not all relevant documentation fit in the prompt, so gaps remained where grounding was incomplete.

What the Scenarios Share

Across all five, the same pattern repeats, and it is worth naming directly.

Grounding is the workhorse

Every successful fix started by replacing memory with supplied source material. When the source was present and relevant, fabrication dropped sharply.

Retrieval quality sets the ceiling

Several limits traced not to the prompt but to what was retrieved. A perfect prompt over the wrong passages still produces grounded-but-wrong answers.

Abstention and verification fill the cracks

Where grounding was incomplete, an abstention clause prevented guessing, and a verification pass caught the subtler errors that survived. For the failure modes that recur across these patterns, see 7 Prompting Habits That Make AI Fabricate More, Not Less.

Over-correction lurks behind every fix

In several scenarios, pushing a fix too hard introduced the opposite problem: a model that abstained too readily or marked present data as missing. The lesson is that none of these techniques has a single correct intensity. Each needs tuning until the model answers what it can and declines what it cannot, and that balance point shifts with the task and the source.

Mapping the Scenarios to Your Work

The value of these examples is not the stories themselves but the mapping to whatever you are building. A quick translation helps.

Identify your grounding source

For each scenario, the first question was always what authoritative source the answer should come from. Ask the same of your task. If you cannot name a source, that is your highest-priority gap, because no prompt fixes missing ground truth. Document Q&A grounds in the document; the coding helper grounds in library docs; your task grounds in something specific you must identify.

Locate where your gaps will appear

Each scenario had a characteristic place where grounding ran out—thin documentation, wrong retrieval, oversized references. Predict yours. Knowing in advance where the source will fall short tells you where to aim your abstention clause and where to expect the questions that trigger fabrication, which become your most valuable test cases.

Frequently Asked Questions

Why did grounding not fully fix the document Q&A case?

Because grounding can only work with the passages it receives. When retrieval pulled the wrong clauses, the model faithfully grounded its answer in irrelevant text. The prompt did its job; the upstream retrieval did not. Fixing it required improving retrieval, not the prompt.

How can a model fabricate a citation when given a real source list?

It can cite a source that exists but does not actually support the specific claim. The citation is real; the connection is invented. This is subtler than a fake reference and usually requires a separate verification pass that checks whether the cited passage genuinely backs the statement.

Is allowing blank fields always the right call for extraction?

It is the right call to stop invented values, but it introduces the opposite risk: marking present values as missing. The fix needs tuning so the model abstains only when data is genuinely absent, not whenever it is hard to find. Calibration, again, beats either extreme.

Why does the coding helper still hallucinate with documentation supplied?

Usually because the documentation is too large to fit entirely in the prompt, leaving gaps where the model falls back to memory. Grounding reduces fabrication only over the portion of the reference it actually sees. Incomplete grounding leaves incomplete protection.

Which scenario is most like my situation?

If you answer questions over your own documents, scenarios one through three apply most. If you extract structured data, scenario four. If you generate code, scenario five. The shared lesson is to ground in real source material first, then patch the gaps with abstention and verification.

Key Takeaways

  • Across every scenario, replacing the model's memory with supplied source material was the workhorse fix that dropped fabrication sharply.
  • Citation and quoting requirements expose gaps and force abstention, but models can still cite real sources that do not support the claim.
  • Allowing explicit blank or not-present values stops invented data, while risking over-correction that needs tuning.
  • Retrieval quality sets the ceiling: a perfect prompt over the wrong passages still produces grounded-but-wrong answers.
  • Abstention clauses and verification passes fill the cracks where grounding is incomplete, especially for large references that do not fit in the prompt.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification