Grounding looks deceptively simple. Retrieve some passages, paste them into the prompt, ask the question. Yet teams that follow this recipe often end up with answers that are wrong, vague, or confidently fabricated, and they cannot understand why, because they did everything the tutorials said. The trouble is that most failures happen in the seams between steps, where no single component looks broken but the whole pipeline misbehaves.
This article names seven failure modes we see repeatedly. For each one we explain the mechanism, the cost it imposes, and the specific change that corrects it. None of these are exotic. They are the ordinary ways grounded systems disappoint, and recognizing them early saves weeks of frustrated tuning.
Read these not as a list to memorize but as a diagnostic. When your grounded answers go wrong, the cause is almost always one of these seven.
Mistake 1: Retrieving the Wrong Passages
Why It Happens
The model can only be as good as the chunks it receives. If retrieval returns passages that do not contain the answer, the model is set up to fail before it generates a single word. This often traces back to poorly written search queries or an index built on badly cleaned text.
The Cost and the Fix
The cost is silent: answers look plausible but are built on irrelevant material. The fix is to inspect retrieval output directly, separate from the model. If a human cannot answer from the retrieved chunks, neither can the model. Improving the query and the index beats every prompt tweak. The full inspection routine is laid out in Build a Grounded Prompt Pipeline in Eight Concrete Steps.
Mistake 2: Chunking That Splits Meaning
Why It Happens
When documents are split mechanically by character count, a single idea gets cut across two chunks. Retrieval then returns half an answer, or a chunk that mentions a topic without the detail that resolves it. Tables, lists, and step sequences suffer the most.
The Cost and the Fix
Answers come back incomplete or subtly wrong because the model never saw the full picture. Fix it by splitting on natural boundaries, paragraphs and sections, and adding a small overlap between adjacent chunks so ideas that straddle a boundary survive in at least one piece.
Mistake 3: Drowning the Model in Context
Why It Happens
Believing more context is safer, teams retrieve twenty chunks when three would do. The relevant facts are now buried among loosely related text, and the model's attention spreads thin. Important details near the middle of a long context get overlooked.
The Cost and the Fix
You pay more, wait longer, and get worse answers. The fix is counterintuitive but firm: retrieve the smallest set of chunks that fully answers the question. Quality of selection beats quantity every time.
Mistake 4: No Instruction to Stay Within the Context
Why It Happens
Without an explicit rule, the model blends retrieved facts with its own training knowledge. When the two agree, no one notices. When they conflict, or when the context is silent, the model fills gaps from memory and presents the result as if it came from your documents.
The Cost and the Fix
This produces the worst kind of error: a fabrication wearing the costume of a sourced fact. Add a direct instruction to answer only from the supplied context and to say so when the context lacks the answer. This single sentence prevents a large share of hallucinations.
Mistake 5: Ignoring Conflicting Sources
Why It Happens
Real document sets contradict themselves. An old policy and a new one both get retrieved, and the model picks one, often the wrong one, with no awareness that a conflict exists. Nothing in the pipeline flagged the contradiction.
The Cost and the Fix
Users receive outdated or inconsistent answers and lose trust in the system. The fix is twofold: prune stale documents from your index, and instruct the model to surface conflicts rather than silently resolving them. Asking it to note when sources disagree turns a hidden risk into a visible one.
Mistake 6: No Source Attribution
Why It Happens
Teams skip attribution because it feels like extra work and the answers read fine without it. But an answer with no traceable source cannot be verified, and fabrication hides perfectly inside fluent prose.
The Cost and the Fix
When something goes wrong in production, you cannot tell whether the model misread a source or invented the claim, so you cannot fix it. Require the model to cite which chunk supports each claim. Attribution makes fabrication obvious and gives you an audit trail. The reasoning behind this discipline is expanded in Grounding Prompts with Retrieved Context: Best Practices That Actually Work.
Mistake 7: Testing on a Single Happy Example
Why It Happens
It works on the one question you tried, so you ship it. But that question happened to retrieve cleanly. The questions users actually ask are messier, vaguer, and more varied, and they expose weaknesses your single test never touched.
The Cost and the Fix
Confidence built on one example collapses on contact with real traffic. Fix it by building a test set of ten to twenty real questions with known answers and running the whole set after every change. Patterns of failure only appear across many cases, as the worked scenarios in Grounding Prompts with Retrieved Context: Real-World Examples and Use Cases show.
How These Mistakes Compound
One Failure Hides Another
These seven rarely arrive alone, and that is what makes them dangerous. Bad chunking degrades retrieval, so a chunking problem disguises itself as a retrieval problem. A missing instruction to stay in context lets the model paper over thin retrieval with invented detail, so a retrieval gap hides behind a fluent fabrication. When you debug, fix the earliest mistake in the chain first, source material and chunking, then retrieval, then the prompt, because fixing a later one while an earlier one festers just moves the symptom around without curing the cause.
Why Looking Plausible Is the Real Trap
The thread connecting every mistake here is that the broken output still looks good. A grounded system rarely fails loudly; it fails by producing confident, well-written answers that happen to be wrong. That is precisely why retrieval inspection, source attribution, and a standing test set matter so much. They are the only tools that distinguish an answer that is right from one that merely sounds right, and without them these seven mistakes stay invisible until a user pays for them.
Frequently Asked Questions
Which of these mistakes is the most common?
Retrieving the wrong passages, by a wide margin. Teams spend days adjusting prompt wording when the real problem is that the answer was never in the retrieved chunks. Always inspect retrieval first.
How do I tell a retrieval problem from a prompt problem?
Look at the chunks the model received. If they contain the answer and the model still got it wrong, the prompt or the model is at fault. If they do not contain the answer, no prompt change will help, and retrieval is your target.
Is adding more context ever the right move?
Occasionally, when answers are genuinely missing detail that lives in additional chunks. But far more often, missing detail signals a chunking or query problem. Add context deliberately and measure the effect, rather than as a reflex.
Why does attribution matter if the answers look correct?
Because looking correct and being correct are different. Attribution is how you verify the difference and how you debug failures later. Answers that cannot be traced cannot be trusted at scale.
Key Takeaways
- The most damaging failures live in retrieval and chunking, not in the model; inspect retrieved passages before blaming the prompt.
- Always instruct the model to answer only from the supplied context and to admit when the answer is missing.
- Retrieve the smallest set of relevant chunks; drowning the model in context lowers quality and raises cost.
- Require source attribution so fabrication becomes visible and answers stay auditable.
- Test on a varied set of real questions, not a single convenient example, and rerun it after every change.