The RECALL Model: Five Stages for Embedding Pipelines

Teams building vector search tend to make decisions in a scattered order, picking a database before they have decided what a record is, tuning an index before they have measured recall. The result is rework. A framework imposes a sensible sequence and a shared vocabulary, so a team can reason about where they are and what comes next.

This article introduces the RECALL model, a five-stage structure for designing and operating vector database systems. The name is a mnemonic for the stages and a reminder that recall, the fraction of relevant items you actually retrieve, is the quality the whole system exists to protect. Each stage has a clear question to answer and a signal that tells you it is solid enough to move on.

Use it as a map. You can enter at any stage to diagnose a problem, but building in order saves the most rework.

Stage One: Represent

The Question It Answers

What is a single retrievable unit, and what does it mean for two units to be similar? Before any code, you decide whether you are matching documents, sections, or images, and what kind of similarity, topical, stylistic, visual, you actually want. Getting this wrong poisons everything downstream.

Knowing You Are Done

You can describe, in one sentence, what a record is and what "similar" means for your use case. If two people on the team would define the retrieval unit differently, you are not done. The data-preparation items in Twelve Items to Verify Before You Trust a Vector Index operationalize this stage.

The reason this stage comes first, and feels almost philosophical, is that every later decision inherits its answer. If you have not decided whether "similar" means topically related or stylistically alike, you cannot choose a model that captures the right notion, and you cannot judge whether results are good. Teams that skip straight to picking a database are answering a Stage Four question before they have answered the Stage One question, which is exactly how projects end up technically working and practically useless.

Stage Two: Encode

The Question It Answers

Which model turns your units into vectors, and how do you prepare the input? This stage covers choosing an embedding model suited to your content and chunking text into units that match how people query. The model choice dominates quality more than any later decision.

Knowing You Are Done

You have pinned a specific model and version, confirmed it fits your domain, and settled chunk size and overlap against real queries. The mechanics live in Standing Up Your First Similarity Search, Step by Step.

Stage Three: Curate

The Question It Answers

What metadata travels with each vector, and how do you keep the corpus clean? Here you decide which filterable fields to store, deduplicate near-identical content, and strip boilerplate. Curation is what lets similarity search combine with hard constraints and avoid repetitive results.

Knowing You Are Done

Every vector carries the metadata you will filter on, duplicates are handled, and noise is removed. If you cannot answer "similar items, but only recent ones," curation is incomplete. The mistakes this stage prevents appear in Seven Ways a Vector Store Quietly Returns Junk.

Stage Four: Approximate

The Question It Answers

How do you index for fast retrieval without losing too many relevant results? This stage is about choosing an index structure, sizing it to your scale, and tuning the speed-versus-recall dial deliberately rather than accepting defaults.

Knowing You Are Done

You have measured recall against a brute-force baseline and tuned the index until recall meets a target tied to your stakes. The trade-offs behind index choice are mapped in Flat, Graph, or Inverted: Choosing How Vectors Get Searched.

Stage Five: Locate and Learn

The Question It Answers

How do queries actually retrieve, and how does the system improve over time? This combines the query path, embedding the query, filtering, re-ranking, with the operational loop, freshness, monitoring, and cost control. It is where the system meets real traffic and keeps adapting.

Knowing You Are Done

Queries return filtered, re-ranked results; ingestion keeps the index fresh automatically; and you track recall and cost as live metrics. This stage never fully closes, because operation is continuous. The operating posture is detailed in Opinionated Rules for Running Embeddings in Production.

Applying the Framework

Building Versus Diagnosing

When building, move through the stages in order; each depends on the one before. When diagnosing a problem, jump to the stage that matches the symptom: vague results point to Represent or Encode, repetitive results to Curate, missing results to Approximate, stale results to Locate and Learn. The framework turns a fuzzy complaint into a located cause.

This diagnostic mapping is where the framework earns its keep in practice. "The search is bad" is unactionable; "results are vaguely relevant but never precise" points straight at chunking in Encode; "the same answer appears five times" points at deduplication in Curate; "it cannot find a document I know exists" points at recall in Approximate. By translating a symptom into a stage, the team stops arguing about where to look and starts fixing the actual cause, which is usually upstream of where the symptom appeared.

Using It to Onboard and Communicate

Beyond building and diagnosing, the framework gives a team a shared language. Saying "we are solid through Encode but Curate is shaky" communicates more in eight words than a paragraph of vague status. New team members can locate themselves quickly, and stakeholders can understand where effort is going without a technical deep dive. A named structure with agreed stages turns scattered individual knowledge into something the whole team can reason about together, which is often worth more than any single technical optimization.

Why Recall Anchors It All

Every stage exists to protect recall, the fraction of relevant items you actually surface. A weak representation, a wrong model, dirty curation, or an over-aggressive index all show up as lost recall. Keeping that one number in view across all five stages keeps the whole system honest, a habit reinforced by the examples in Inside Five Products Powered by Nearest-Neighbor Lookup.

This is why a single metric can tie together five seemingly different concerns. Recall is downstream of every choice you make, so it acts as an integration test for the whole pipeline. When recall drops, the framework tells you which stage to investigate; when recall is high and stable, you have evidence that all five stages are working in concert. Anchoring on one outcome metric prevents the common failure of optimizing each stage in isolation while the system as a whole quietly underperforms.

Frequently Asked Questions

Do I have to follow the stages strictly in order?

For building, mostly yes, since each stage depends on decisions from the previous one. For diagnosing, no, you enter at whichever stage matches the symptom. The order matters most the first time through; afterward the framework is a map you navigate freely.

Why is the framework named after recall specifically?

Because recall is the quality the entire system exists to deliver: retrieving the relevant items. Latency, cost, and freshness all matter, but a fast, cheap, fresh system that misses relevant results has failed at its core job. The name keeps that priority visible.

Which stage do teams underinvest in most?

Curate. Teams enthusiastically pick models and tune indexes but skip metadata, deduplication, and cleaning. That omission shows up later as un-filterable results and repetitive lists, problems that are expensive to fix after the corpus is built.

Can I use this framework with any vector database?

Yes. The stages are technology-agnostic; they describe the decisions any vector search system requires, regardless of which database or service you use. The tooling choice is one decision inside the Approximate stage, not the framework itself.

How does the framework handle model upgrades?

A model upgrade re-enters the Encode stage and forces a re-embedding pass that ripples forward through Curate, Approximate, and Locate. Treating it as re-entering the framework, rather than a one-off patch, ensures you re-verify recall after the change.

Is this framework only for large systems?

No. Small systems still pass through the same five questions, just with simpler answers. The framework scales down cleanly: a tiny project might spend an hour per stage, while a large one spends weeks, but the sequence and the recall anchor hold either way.

Key Takeaways

The RECALL model structures vector search into five stages: Represent, Encode, Curate, Approximate, and Locate and Learn.
Each stage answers a specific question and has a clear signal that it is solid enough to proceed.
Build in order, since each stage depends on the last; diagnose by jumping to the stage matching the symptom.
Recall, the fraction of relevant items retrieved, anchors every stage and keeps the system honest.
Curate is the most underinvested stage, and its omission surfaces as un-filterable, repetitive results.
The framework is technology-agnostic and scales from tiny projects to large systems without changing shape.

Use it as a map. You can enter at any stage to diagnose a problem, but building in order saves the most rework.

Stage One: Represent

The Question It Answers

Knowing You Are Done

Stage Two: Encode

The Question It Answers

Knowing You Are Done

Stage Three: Curate

The Question It Answers

Knowing You Are Done

Stage Four: Approximate

The Question It Answers

Knowing You Are Done

Stage Five: Locate and Learn

The Question It Answers

Knowing You Are Done

Applying the Framework

Building Versus Diagnosing

Using It to Onboard and Communicate

Why Recall Anchors It All

Frequently Asked Questions

Do I have to follow the stages strictly in order?

Why is the framework named after recall specifically?

Which stage do teams underinvest in most?

Can I use this framework with any vector database?

How does the framework handle model upgrades?

Is this framework only for large systems?

Key Takeaways

The RECALL model structures vector search into five stages: Represent, Encode, Curate, Approximate, and Locate and Learn.
Each stage answers a specific question and has a clear signal that it is solid enough to proceed.
Build in order, since each stage depends on the last; diagnose by jumping to the stage matching the symptom.
Recall, the fraction of relevant items retrieved, anchors every stage and keeps the system honest.
Curate is the most underinvested stage, and its omission surfaces as un-filterable, repetitive results.
The framework is technology-agnostic and scales from tiny projects to large systems without changing shape.

The RECALL Model: Five Stages for Embedding Pipelines

Stage One: Represent

The Question It Answers

Knowing You Are Done

Stage Two: Encode

The Question It Answers

Knowing You Are Done

Stage Three: Curate

The Question It Answers

Knowing You Are Done

Stage Four: Approximate

The Question It Answers

Knowing You Are Done

Stage Five: Locate and Learn

The Question It Answers

Knowing You Are Done

Applying the Framework

Building Versus Diagnosing

Using It to Onboard and Communicate

Why Recall Anchors It All

Frequently Asked Questions

Do I have to follow the stages strictly in order?

Why is the framework named after recall specifically?

Which stage do teams underinvest in most?

Can I use this framework with any vector database?

How does the framework handle model upgrades?

Is this framework only for large systems?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?

The RECALL Model: Five Stages for Embedding Pipelines

Stage One: Represent

The Question It Answers

Knowing You Are Done

Stage Two: Encode

The Question It Answers

Knowing You Are Done

Stage Three: Curate

The Question It Answers

Knowing You Are Done

Stage Four: Approximate

The Question It Answers

Knowing You Are Done

Stage Five: Locate and Learn

The Question It Answers

Knowing You Are Done

Applying the Framework

Building Versus Diagnosing

Using It to Onboard and Communicate

Why Recall Anchors It All

Frequently Asked Questions

Do I have to follow the stages strictly in order?

Why is the framework named after recall specifically?

Which stage do teams underinvest in most?

Can I use this framework with any vector database?

How does the framework handle model upgrades?

Is this framework only for large systems?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?