Once you have grounding, refusal calibration, and a measurement loop in place, the obvious hallucinations mostly disappear. What remains is a harder, quieter class of failures that the basic toolkit does not touch: the model that confidently resolves a contradiction between two sources by inventing a third answer, the model that hallucinates to fill a partial answer rather than flagging the gap, the model that gets manipulated into ignoring its grounding by a cleverly phrased input.
These are the cases that separate practitioners who have shipped a careful demo from those who run reliable systems at scale. This article assumes you know the fundamentals and focuses on the edge cases, failure modes, and nuances that only show up once the easy wins are behind you.
Handling Conflicting and Ambiguous Sources
Real knowledge bases contradict themselves. Two documents give different numbers, a policy was updated but the old version still lives in the index, a source is outdated. A naively grounded model often picks one silently or, worse, splits the difference into something neither source says.
Make the Model Surface Conflict Instead of Resolving It
Instruct the model that when sources disagree, it should report the disagreement rather than choose. The desired behavior is to present both positions with their sources and let a human or downstream rule adjudicate.
- Silent resolution is a hidden hallucination: the answer looks grounded but the reasoning that produced it was invented.
- Surfacing conflict also exposes data quality problems in your knowledge base that would otherwise stay buried.
Distinguish Stale From Current
When recency matters, give the model the means to prefer the more recent source — metadata, timestamps, or explicit instruction — rather than treating all retrieved material as equally authoritative.
Catching Partial-Answer Fabrication
A subtle failure: the source contains part of the answer, and the model fills the rest from imagination, producing a response that is half-grounded and fully confident. Because part of it is supported, faithfulness checks that score at the answer level can pass it.
Score at the Claim Level
Decompose answers into individual claims and check each against the source separately. A claim-level check catches the invented half that an answer-level check waves through. This is the kind of rigor that How to Measure Reducing Hallucinations Through Prompting: Metrics That Matter argues for and that becomes essential at this level.
Require Explicit Gap Acknowledgment
Instruct the model to state which parts of a question it could answer from the source and which it could not. Forcing it to name the boundary makes partial fabrication far less likely than leaving the boundary implicit.
Defending Against Adversarial and Injection Inputs
When inputs come from untrusted sources — user messages, scraped web content, documents you did not author — those inputs can contain instructions that try to override your grounding. A retrieved document might literally say to ignore previous instructions.
Separate Instructions From Data
Structure your prompts so the model treats retrieved or user-supplied content as data to be analyzed, never as instructions to be followed. Make the boundary explicit and reinforce it, because models will otherwise obey instructions wherever they find them.
- This is where anti-hallucination work overlaps with security; the same input that injects a fabrication can inject a policy violation.
- Test with deliberately adversarial inputs, the same way you test with absent-answer questions.
Treat Tool Outputs as Untrusted Too
When a model calls a tool and feeds the result back into its context, that result can also carry injected content. Apply the same instruction-versus-data discipline to tool outputs. The patterns here are part of the broader discipline in Reducing Hallucinations Through Prompting: Best Practices That Actually Work.
Advanced Verification Patterns
Single-pass self-verification helps, but it shares the model's blind spots. At the expert level, verification gets more structured.
Chain-of-Verification
Have the model generate its answer, then generate a list of verification questions about its own claims, answer those independently, and revise based on the results. Decomposing verification into discrete checks catches errors that a vague check yourself instruction misses.
Diverse Verifiers
Where the stakes justify it, verify with a different model or a different prompt framing than the one that generated the answer. A verifier that shares the generator's exact configuration shares its exact blind spots; diversity is what makes the second pass worth the cost.
Gate Verification on Confidence
Running full verification on every answer is wasteful. Use the model's expressed or estimated confidence to route only uncertain answers through the expensive checks. This selective gating is what makes heavy verification economically viable, and A Framework for Reducing Hallucinations Through Prompting shows where it fits in the larger architecture.
Managing Drift and Model Changes
A system that was well-tuned six months ago may be quietly degrading. Model versions change, your data shifts, and prompts that once worked produce subtly different behavior.
- Treat your evaluation set as a regression suite and re-run it on every model or prompt change.
- Watch production signals for slow drift that the frozen evaluation set cannot see, since real inputs evolve in ways your test set does not.
- When you upgrade models, do not assume your defenses transfer; re-tune against the new model's actual behavior.
Designing for Graceful Degradation
Expert systems are defined less by how they behave when everything works and more by how they fail. The advanced practitioner designs for the failure path deliberately rather than hoping it never arrives.
Decide the Default Failure Mode
When the model is uncertain, what should happen? For some applications the safe default is to refuse; for others it is to escalate to a human; for others it is to answer with a visible caveat. Choosing this consciously, per application, is more important than any single prompting trick, because the failure path is where the real damage happens.
Make Uncertainty a First-Class Output
Rather than forcing every answer into confident prose, design the system to emit a structured signal of how confident it is, and route downstream behavior on that signal. An answer the model is unsure about should travel a different path than one it is sure about, and that is only possible if uncertainty is captured rather than smoothed away.
Instrument the Failure Cases Specifically
General metrics tell you the aggregate rate; they do not tell you how the system behaves on the hard cases you care about. Build a dedicated slice of your evaluation set for conflicting sources, partial answers, and adversarial inputs, and track those separately. The aggregate can look healthy while the hard cases quietly regress. This targeted measurement extends the discipline in How to Measure Reducing Hallucinations Through Prompting: Metrics That Matter.
Keep a Human Path for the Irreducible Cases
Some questions cannot be answered safely by any prompting technique — the source is genuinely ambiguous, or the stakes are too high to automate. The mature design accepts this and routes those cases to a human cleanly, rather than pushing the model to produce an answer it should not.
Frequently Asked Questions
How should the model handle two sources that disagree?
It should surface the disagreement rather than silently pick one or blend them, because silent resolution is a hidden hallucination — the answer looks grounded but the reasoning was invented. Instruct it to present both positions with their sources and let a human or a downstream rule decide.
Why is partial-answer fabrication so easy to miss?
Because part of the answer is genuinely supported by the source, so answer-level faithfulness checks pass it while the invented half slips through. The fix is to score at the claim level, checking each statement against the source separately, and to require the model to name which parts it could and could not answer.
What makes self-verification fail?
The verifier usually shares the generator's blind spots, especially when it uses the same model and prompt framing. Structured verification — decomposing claims into discrete checks, or using a different model or framing for the verifier — addresses this by introducing the diversity that a single pass lacks.
How do I keep an advanced setup from degrading over time?
Treat your evaluation set as a regression suite and re-run it on every model upgrade and prompt change, since defenses tuned to one model's behavior do not automatically transfer. Also watch production signals for slow drift that a frozen test set cannot capture as real inputs evolve.
Key Takeaways
- The hard cases — conflicting sources, partial answers, adversarial inputs — survive basic grounding and need targeted techniques.
- Make the model surface source conflicts rather than silently resolving them, which is a hidden hallucination.
- Catch partial-answer fabrication with claim-level scoring and explicit gap acknowledgment.
- Separate instructions from data, including tool outputs, to defend against injected fabrications.
- Use structured, diverse, confidence-gated verification, and re-tune defenses on every model change.