The SPADE Model for Structuring Extraction Work

Most people approach extraction as a single act: write a prompt, get data. That works until the documents get messy, and then the lack of structure shows. A framework helps because it breaks the work into stages you can reason about independently, so when something fails you know which stage to fix rather than rewriting the whole prompt and hoping.

This article introduces SPADE, a five-stage model for extraction work: Schema, Prompt, Ambiguity, Decode, and Evaluate. The name is a memory aid, but the value is in the separation. Each stage owns a distinct concern, and treating them separately is what keeps a pipeline maintainable as it grows. The stages run roughly in order, though you will loop back as testing reveals gaps.

The framework is deliberately tool-agnostic. It applies whether you are typing into a chat box or building an automated pipeline, because the concerns it separates exist regardless of the surrounding machinery. Learn the five stages once and you have a structure for every extraction problem you meet.

S: Schema

The first stage defines what you are extracting before you decide how.

Specify the Target Record

List every field with a name, type, and required flag, and decide how repeated structures and lists are represented. The schema is the contract the rest of the framework serves; everything downstream either fills it or checks it. When to invest most here: always, because a vague schema undermines every later stage. The schema-first principle anchors The Complete Guide to Prompting for Data Extraction.

P: Prompt

The second stage maps input onto the schema with minimal room for interpretation.

Instruct, Format, Exemplify

State the task, specify the exact output structure, and include at least one worked example. The prompt's only job is to make the mapping unambiguous; it should not contain logic that belongs in code. When to invest most here: when input is varied enough that a bare instruction produces inconsistent results, which is most real cases.

Instruction: what to extract, stated plainly
Format: the exact JSON shape, pasted in
Example: one input-output pair demonstrating an edge case

A: Ambiguity

The third stage addresses every place the input can be read more than one way.

Resolve It Deterministically

For each field with competing candidates, write a selection rule by meaning, and define what every missing field returns. Ambiguity left unaddressed becomes randomness in the output. When to invest most here: documents like contracts and invoices with multiple dates or amounts, where the failures detailed in 7 Common Mistakes with Prompting for Data Extraction (and How to Avoid Them) cluster.

D: Decode

The fourth stage turns model output into a validated record your systems can trust.

Parse, Validate, Transform

Parse the output, validate it against the schema in code, reject failures, and apply any normalization here rather than in the prompt. Keeping transformation in this stage means it is explicit and testable. When to invest most here: any pipeline feeding a system of record, where unvalidated output corrupts downstream data. The step-by-step mechanics appear in A Step-by-Step Approach to Prompting for Data Extraction.

E: Evaluate

The fifth stage measures whether the pipeline works and keeps working.

Test, Monitor, Audit

Test against a varied sample set with known correct answers, then in production track parse-failure and validation-failure rates and audit a sample on a schedule. Evaluation is what catches drift as input changes. When to invest most here: ongoing production pipelines, where quality silently degrades without measurement. A ready-made evaluation list lives in The Prompting for Data Extraction Checklist for 2026.

Applying SPADE in Order

The stages run in sequence but loop as needed.

The Feedback Loop

Evaluate feeds back into the earlier stages: a recurring validation failure might mean the Schema was wrong, the Prompt was unclear, or an Ambiguity rule was missing. The framework's value is that it tells you which stage to revisit rather than leaving you to rewrite blindly. Each loop tightens one stage, and the pipeline converges on reliability.

Diagnosing Failures by Stage

The clearest payoff of SPADE arrives when something goes wrong, because the framework converts a vague problem into a specific question about one stage. Instead of staring at a wrong record wondering what happened, you ask which stage owned the concern that failed.

A Triage Routine

When a record comes out wrong, walk the stages in reverse. If the output was malformed or an invalid type slipped through, the Decode stage's validation is incomplete. If the value was well-formed but semantically wrong, such as the renewal date appearing where the effective date belonged, the Ambiguity stage is missing a rule. If a field the document clearly contained came back null, the Prompt stage failed to map it, or the Schema never defined it. This reverse walk turns debugging from guesswork into a checklist.

Mapping Symptoms to Stages

Fabricated value where the document had none: missing Ambiguity rule for absence
Inconsistent field names across runs: Schema underspecified
Correct data in the wrong field: Ambiguity rule needed for competing candidates
Malformed or wrong-type output reaching storage: Decode validation gap
Quality declining over weeks: Evaluate stage not monitoring drift

This mapping is why the failures catalogued in 7 Common Mistakes with Prompting for Data Extraction (and How to Avoid Them) each trace cleanly to a single SPADE stage.

Adapting SPADE to Your Context

The framework is a structure, not a script, and it flexes with the difficulty of your work.

Scaling the Stages Up and Down

For a quick extraction on clean documents, the Schema and Decode stages still earn their keep, but the Ambiguity and Evaluate stages can be light. For a high-volume pipeline feeding a system of record, every stage deserves real investment, with Evaluate becoming a continuous operational practice rather than a one-time test. The framework does not prescribe equal effort everywhere; it prescribes that you consciously decide how much each stage warrants given your input and the cost of an error. The checklist that operationalizes these decisions appears in The Prompting for Data Extraction Checklist for 2026.

Frequently Asked Questions

Do I have to run all five stages for every extraction?

For a quick one-off on clean documents you can compress the middle stages, but the Schema and Decode stages always pay off because they define your target and catch bad output. For anything feeding a system or running at volume, all five stages matter. The framework scales: invest heavily in each stage proportional to the difficulty of your input and the cost of an error.

How is SPADE different from just following best practices?

Best practices are a list of good habits; SPADE organizes those habits into stages that own distinct concerns, which is what makes failures diagnosable. When a record is wrong, the framework points you to the stage responsible, the Schema, the Prompt, the Ambiguity rules, the Decode step, or Evaluation, rather than leaving you to guess. The structure turns a flat list into a troubleshooting map.

Which stage do teams most often skip?

Ambiguity and Evaluate are the two most commonly skipped. Teams write a schema, a prompt, and some validation, then ship without resolving competing values or measuring quality in production. Both omissions cause failures that surface later as mysterious bad records. Explicitly naming these as stages forces them onto the agenda rather than letting them fall through the gaps between writing a prompt and validating its output.

Where does normalization belong in the framework?

Normalization belongs in the Decode stage, in code, not in the Prompt. Asking the model to normalize values during extraction introduces silent errors that validation may not catch. Extracting raw values in the Prompt stage and transforming them in Decode keeps the conversion logic explicit and testable. Placing normalization deliberately in Decode is one of the clearest benefits of separating the stages.

Key Takeaways

SPADE separates extraction into five stages: Schema, Prompt, Ambiguity, Decode, Evaluate
Schema defines the typed target record that every later stage fills or checks
Prompt maps input to the schema with an instruction, format, and worked example
Ambiguity resolves competing and missing values with deterministic rules
Decode parses, validates, rejects failures, and performs normalization in testable code
Evaluate tests, monitors, and audits, feeding failures back to the stage responsible

S: Schema

The first stage defines what you are extracting before you decide how.

Specify the Target Record

P: Prompt

The second stage maps input onto the schema with minimal room for interpretation.

Instruct, Format, Exemplify

Instruction: what to extract, stated plainly
Format: the exact JSON shape, pasted in
Example: one input-output pair demonstrating an edge case

A: Ambiguity

The third stage addresses every place the input can be read more than one way.

Resolve It Deterministically

D: Decode

The fourth stage turns model output into a validated record your systems can trust.

Parse, Validate, Transform

E: Evaluate

The fifth stage measures whether the pipeline works and keeps working.

Test, Monitor, Audit

Applying SPADE in Order

The stages run in sequence but loop as needed.

The Feedback Loop

Diagnosing Failures by Stage

A Triage Routine

Mapping Symptoms to Stages

Fabricated value where the document had none: missing Ambiguity rule for absence
Inconsistent field names across runs: Schema underspecified
Correct data in the wrong field: Ambiguity rule needed for competing candidates
Malformed or wrong-type output reaching storage: Decode validation gap
Quality declining over weeks: Evaluate stage not monitoring drift

This mapping is why the failures catalogued in 7 Common Mistakes with Prompting for Data Extraction (and How to Avoid Them) each trace cleanly to a single SPADE stage.

Adapting SPADE to Your Context

The framework is a structure, not a script, and it flexes with the difficulty of your work.

Scaling the Stages Up and Down

Frequently Asked Questions

Do I have to run all five stages for every extraction?

How is SPADE different from just following best practices?

Which stage do teams most often skip?

Where does normalization belong in the framework?

Key Takeaways

SPADE separates extraction into five stages: Schema, Prompt, Ambiguity, Decode, Evaluate
Schema defines the typed target record that every later stage fills or checks
Prompt maps input to the schema with an instruction, format, and worked example
Ambiguity resolves competing and missing values with deterministic rules
Decode parses, validates, rejects failures, and performs normalization in testable code
Evaluate tests, monitors, and audits, feeding failures back to the stage responsible

The SPADE Model for Structuring Extraction Work

S: Schema

Specify the Target Record

P: Prompt

Instruct, Format, Exemplify

A: Ambiguity

Resolve It Deterministically

D: Decode

Parse, Validate, Transform

E: Evaluate

Test, Monitor, Audit

Applying SPADE in Order

The Feedback Loop

Diagnosing Failures by Stage

A Triage Routine

Mapping Symptoms to Stages

Adapting SPADE to Your Context

Scaling the Stages Up and Down

Frequently Asked Questions

Do I have to run all five stages for every extraction?

How is SPADE different from just following best practices?

Which stage do teams most often skip?

Where does normalization belong in the framework?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?

The SPADE Model for Structuring Extraction Work

S: Schema

Specify the Target Record

P: Prompt

Instruct, Format, Exemplify

A: Ambiguity

Resolve It Deterministically

D: Decode

Parse, Validate, Transform

E: Evaluate

Test, Monitor, Audit

Applying SPADE in Order

The Feedback Loop

Diagnosing Failures by Stage

A Triage Routine

Mapping Symptoms to Stages

Adapting SPADE to Your Context

Scaling the Stages Up and Down

Frequently Asked Questions

Do I have to run all five stages for every extraction?

How is SPADE different from just following best practices?

Which stage do teams most often skip?

Where does normalization belong in the framework?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?