The dangerous thing about a vector database is that it almost always returns something. There is no error, no empty page, no obvious signal that anything is wrong. It hands back the nearest vectors it can find, and if those vectors are garbage, you get confident, well-formatted garbage. That silent quality drift is why so many similarity search projects underwhelm without anyone quite knowing why.
This piece names seven failure modes that show up again and again. For each, we explain the mechanism, the cost you pay, and the corrective practice. None of these are exotic. They are the ordinary mistakes that turn a promising prototype into a search that users quietly stop trusting.
If your retrieval feels slightly off but you cannot point to the cause, the answer is probably below.
Mistake 1: Mismatched Embedding Models
Why It Happens
Someone embeds the corpus with one model, then later embeds queries with a different one, or upgrades the model and forgets to re-embed stored content. The vectors no longer live in the same space, so distances become meaningless. The system still returns results because the math still runs; the results are just noise dressed as relevance.
The Fix
Treat the embedding model and version as an immutable property of an index. Record it in metadata, assert it at query time, and re-embed everything when you migrate. The Standing Up Your First Similarity Search, Step by Step walkthrough builds this discipline in from the start.
Mistake 2: Chunks That Are Too Large or Too Small
Why It Happens
Teams either embed entire documents, which blurs many topics into one vague vector, or split content so finely that each chunk loses the context needed to be meaningful. Both extremes hurt retrieval, but they fail differently: oversized chunks return vaguely-related giants, undersized chunks return precise fragments stripped of their surroundings.
The Fix
Aim for chunks of a few hundred words with modest overlap, then test against real queries and adjust. Chunking is the highest-leverage tuning knob most people ignore. Inspect what comes back and look for context that got severed at a boundary.
The cost of getting this wrong is subtle but real. Oversized chunks waste the user's attention by burying the answer inside a wall of loosely related text, and they make any downstream model work harder to find the relevant sentence. Undersized chunks return fragments that read as confident but lack the surrounding context needed to be correct. Both degrade trust slowly, since neither produces an obvious error, just a steady stream of results that are almost but not quite right.
Mistake 3: Ignoring Metadata and Filtering
Why It Happens
In the rush to get semantic search working, teams store only vectors and IDs. Later they need "similar items from this quarter" or "results this user is allowed to see," and the data simply is not there. Retrofitting metadata means reprocessing the entire index.
The Fix
Store filterable fields from day one: source, date, category, language, access level. Combining hard filters with similarity search is what makes results both relevant and correct. The deeper rationale appears in Opinionated Rules for Running Embeddings in Production.
Mistake 4: Trusting the Index Without Measuring Recall
Why It Happens
Approximate nearest neighbor indexes trade accuracy for speed, and the default settings lean toward speed. Teams accept whatever the index returns without ever checking how many genuinely relevant items it missed. The search feels fast and looks fine, while quietly dropping good matches.
The Fix
Periodically compare approximate results against a brute-force baseline on a sample of queries. If recall is too low, raise the index's search-effort parameters. Knowing your recall turns a black box into a tunable system, a point explored in Twelve Items to Verify Before You Trust a Vector Index.
The reason this matters more than it seems is compounding. A retrieval system that quietly misses ten percent of relevant items feels fine in a demo, where you test queries you know will work. But across thousands of real, varied queries, that ten percent becomes a steady undercurrent of "the system should have found that and did not." When retrieval feeds an AI assistant, the missed items are simply absent from the model's context, and the assistant answers as if they never existed.
Mistake 5: Letting Near-Duplicates Dominate Results
Why It Happens
When content contains many near-identical passages, boilerplate, repeated disclaimers, slightly edited versions, the top results fill up with copies of the same thing. The user sees five flavors of one answer and none of the variety they needed.
The Fix
Deduplicate during ingestion and add a diversity step at query time that suppresses results too similar to ones already chosen. A little post-processing turns a repetitive list into a genuinely useful one.
Mistake 6: Stale Indexes
Why It Happens
Content changes, but the index does not. New documents go unindexed, deleted ones still surface, and edits are never reflected. Because nothing errors, the drift between reality and the index grows unnoticed until users complain about missing or outdated results.
The Fix
Build ingestion as a scheduled pipeline that adds, updates, and removes vectors as content changes. Freshness is a maintenance habit, not a one-time setup. Treat the index like any other living dataset.
Mistake 7: Using Vectors Where Keywords Would Win
Why It Happens
Semantic search is exciting, so teams reach for it even when queries are exact: product codes, error numbers, names, legal citations. Vectors are great at meaning and clumsy at precise tokens, so an exact lookup gets fuzzed into approximate nonsense.
The Fix
Use keyword or exact matching for precise queries and vectors for meaning, often blending both. Knowing which tool fits which query is half the battle, a theme in Flat, Graph, or Inverted: Choosing How Vectors Get Searched.
A practical tell is whether the user is typing something they expect to match exactly. An order number, a SKU, a person's name, or an error code is a precise token, and the user will be baffled if a "close" but wrong match comes back. Reserve vectors for the questions where the user is describing a concept in their own words and would happily accept a paraphrase. The strongest systems route each query to the right mechanism rather than forcing everything through one.
Frequently Asked Questions
Why does bad vector search rarely throw an error?
Because the database is doing exactly what it was designed to do: return the nearest vectors. Whether those vectors are meaningful is outside its concern. Quality problems live in the embeddings, chunking, and tuning, none of which produce errors when they go wrong.
How do I know if my chunks are the problem?
Read the chunks that come back for real queries. If you see context cut off mid-thought, sprawling multi-topic blobs, or fragments too short to stand alone, chunking is your issue. Adjust size and overlap, then re-check the same queries.
What is recall and why should I measure it?
Recall is the fraction of truly relevant items your index actually returns. Approximate indexes sacrifice some recall for speed. Measuring it against a brute-force baseline tells you whether your speed settings are silently dropping good matches you care about.
Can metadata really not be added later?
It can, but adding filterable fields after the fact usually means reprocessing every record, because the data was never captured at ingestion. Storing it upfront is far cheaper than reconstructing it from sources you may no longer have.
When should I not use a vector database at all?
When your queries are exact lookups, product codes, IDs, precise names, keyword or exact matching wins. Vectors excel at fuzzy, meaning-based queries. Forcing them onto precise lookups produces worse results than a plain index would.
How often should I rebuild or refresh the index?
Match it to how fast your content changes. High-churn content needs near-continuous updates; stable reference material can refresh on a slower schedule. The key is having an automated process so freshness never depends on someone remembering.
Key Takeaways
- Vector search fails silently, returning confident results even when the underlying vectors are meaningless.
- Keep embedding model and version immutable per index, and re-embed everything when you migrate.
- Chunk size and overlap are high-leverage knobs; inspect returned chunks for severed context.
- Capture filterable metadata at ingestion, because retrofitting it usually means reprocessing the whole index.
- Measure recall against a brute-force baseline so the approximate index stays honest.
- Deduplicate, keep indexes fresh, and route exact lookups to keyword matching rather than vectors.