Principles are easy to nod along to and hard to apply. The gap closes when you see the same technique work on a specific, recognizable problem. This article walks through five scenarios where a prompt was fabricating, what change fixed it, and—just as important—what the change did not fix.
The scenarios are drawn from common patterns: a support bot, a document Q&A tool, a research assistant, a data-extraction task, and a coding helper. Each one shows the mechanism in context, so you can map it to your own work. Where a fix had a limit, we say so, because knowing the boundary is what keeps you from over-trusting a single technique.
For the underlying concepts behind these examples, see Stop Your Model From Inventing Facts at the Prompt Layer.
Scenario 1: The Support Bot That Invented Features
A customer support assistant kept describing product capabilities that did not exist. Users asked whether the product could do something, and the bot, eager to help, said yes and described how.
What was happening
The prompt asked the model to answer product questions but supplied no product documentation. The model reconstructed answers from training and general expectations of what such products do.
The fix and its limit
Grounding the bot in the actual feature documentation and instructing it to answer only from that text stopped the invented features. The limit: when documentation was incomplete, the bot still occasionally guessed until an abstention clause was added to cover the gaps.
Scenario 2: Document Q&A With Confident Wrong Answers
A tool that answered questions about uploaded contracts returned answers that sounded authoritative but cited clauses that were not in the document.
What was happening
The retrieval step pulled roughly relevant passages, but the prompt did not require the model to tie its answer to a specific clause. The model blended retrieved text with assumptions about typical contracts.
The fix and its limit
Requiring the model to quote the exact clause supporting each answer exposed the gaps and forced abstention when no clause matched. The limit: when retrieval pulled the wrong passages entirely, the model grounded its answer in irrelevant text—an upstream problem prompting could not solve.
Scenario 3: The Research Assistant and Fake Citations
A research helper produced summaries studded with academic citations, several of which referred to papers that did not exist.
What was happening
The model was asked to support claims with citations but had no real source list to draw from, so it generated citations that looked structurally correct and were entirely invented.
The fix and its limit
Supplying a real list of source documents and instructing the model to cite only from that list eliminated the fabricated references. The limit: the model occasionally cited a real source that did not actually support the specific claim, which required a verification pass to catch.
Scenario 4: Data Extraction That Filled Empty Fields
A task extracting structured fields from messy text kept inventing values for fields that were simply not present in the input.
What was happening
The output schema required every field, and the model, faced with a required slot and no data, supplied a plausible guess rather than leaving it blank.
The fix and its limit
Allowing fields to be explicitly marked "not present" gave the model a place to put honesty, and the invented values disappeared. The limit: the model sometimes marked present-but-hard-to-find values as missing, an over-correction that needed tuning. This balance mirrors the calibration discussed in Build a Fabrication-Resistant Prompt in Eight Moves.
Scenario 5: The Coding Helper That Hallucinated APIs
A coding assistant suggested functions and parameters that did not exist in the library being used, sending developers chasing methods that were never real.
What was happening
The model drew on a blurred memory of many libraries and versions, confidently mixing real and imagined APIs. No authoritative reference grounded its suggestions.
The fix and its limit
Supplying the relevant library documentation or type definitions in the prompt and instructing the model to use only documented APIs sharply reduced the invented calls. The limit: for very large libraries, not all relevant documentation fit in the prompt, so gaps remained where grounding was incomplete.
What the Scenarios Share
Across all five, the same pattern repeats, and it is worth naming directly.
Grounding is the workhorse
Every successful fix started by replacing memory with supplied source material. When the source was present and relevant, fabrication dropped sharply.
Retrieval quality sets the ceiling
Several limits traced not to the prompt but to what was retrieved. A perfect prompt over the wrong passages still produces grounded-but-wrong answers.
Abstention and verification fill the cracks
Where grounding was incomplete, an abstention clause prevented guessing, and a verification pass caught the subtler errors that survived. For the failure modes that recur across these patterns, see 7 Prompting Habits That Make AI Fabricate More, Not Less.
Over-correction lurks behind every fix
In several scenarios, pushing a fix too hard introduced the opposite problem: a model that abstained too readily or marked present data as missing. The lesson is that none of these techniques has a single correct intensity. Each needs tuning until the model answers what it can and declines what it cannot, and that balance point shifts with the task and the source.
Mapping the Scenarios to Your Work
The value of these examples is not the stories themselves but the mapping to whatever you are building. A quick translation helps.
Identify your grounding source
For each scenario, the first question was always what authoritative source the answer should come from. Ask the same of your task. If you cannot name a source, that is your highest-priority gap, because no prompt fixes missing ground truth. Document Q&A grounds in the document; the coding helper grounds in library docs; your task grounds in something specific you must identify.
Locate where your gaps will appear
Each scenario had a characteristic place where grounding ran out—thin documentation, wrong retrieval, oversized references. Predict yours. Knowing in advance where the source will fall short tells you where to aim your abstention clause and where to expect the questions that trigger fabrication, which become your most valuable test cases.
Frequently Asked Questions
Why did grounding not fully fix the document Q&A case?
Because grounding can only work with the passages it receives. When retrieval pulled the wrong clauses, the model faithfully grounded its answer in irrelevant text. The prompt did its job; the upstream retrieval did not. Fixing it required improving retrieval, not the prompt.
How can a model fabricate a citation when given a real source list?
It can cite a source that exists but does not actually support the specific claim. The citation is real; the connection is invented. This is subtler than a fake reference and usually requires a separate verification pass that checks whether the cited passage genuinely backs the statement.
Is allowing blank fields always the right call for extraction?
It is the right call to stop invented values, but it introduces the opposite risk: marking present values as missing. The fix needs tuning so the model abstains only when data is genuinely absent, not whenever it is hard to find. Calibration, again, beats either extreme.
Why does the coding helper still hallucinate with documentation supplied?
Usually because the documentation is too large to fit entirely in the prompt, leaving gaps where the model falls back to memory. Grounding reduces fabrication only over the portion of the reference it actually sees. Incomplete grounding leaves incomplete protection.
Which scenario is most like my situation?
If you answer questions over your own documents, scenarios one through three apply most. If you extract structured data, scenario four. If you generate code, scenario five. The shared lesson is to ground in real source material first, then patch the gaps with abstention and verification.
Key Takeaways
- Across every scenario, replacing the model's memory with supplied source material was the workhorse fix that dropped fabrication sharply.
- Citation and quoting requirements expose gaps and force abstention, but models can still cite real sources that do not support the claim.
- Allowing explicit blank or not-present values stops invented data, while risking over-correction that needs tuning.
- Retrieval quality sets the ceiling: a perfect prompt over the wrong passages still produces grounded-but-wrong answers.
- Abstention clauses and verification passes fill the cracks where grounding is incomplete, especially for large references that do not fit in the prompt.