Pulling Clean Graphs From Messy Source Text

A playbook is different from a tutorial. A tutorial walks you through a task once, in ideal conditions. A playbook gives you a set of named plays, each with a clear trigger for when to run it, the person responsible, and its place in the larger sequence. When something goes sideways in production, you do not want to reason from first principles. You want to recognize the situation and reach for the play that handles it.

This article lays out the plays that matter for prompting a language model to extract a knowledge graph. They are ordered roughly the way you would run them in a project: define the target, prove the approach on a sample, harden it for volume, and keep it honest over time. Each play stands alone, so you can run only the ones your situation calls for.

Throughout, "owner" refers to a role rather than a person, because the same individual often wears several hats on a small team. The point is that every play has a single accountable role, so nothing falls through the cracks.

Play 1: Pin the Schema

Trigger

Run this first, before writing any extraction prompt, and rerun it whenever the downstream graph reveals a relationship type nobody anticipated.

Owner and moves

The domain owner defines a closed list of entity types and relationship types, each with a one-line definition and a worked example drawn from real text. Ambiguous cases get explicit rules: when two entities co-occur in a sentence, which relationship applies, and when do you emit none. The output is a written schema specification that the prompt will reference directly. Without this play, every later play produces inconsistent results you cannot aggregate.

Play 2: Prove It on a Sample

Trigger

Immediately after the schema exists, before any investment in scale.

Owner and moves

The prompt owner builds an extraction prompt that embeds the schema, includes three to five diverse examples covering different relationship types, and requests structured output with a source span for each triple. Run it against twenty representative documents and read every result by hand. You are checking two things: does the format hold, and does the model interpret the schema the way you intended. Disagreements here are cheap to fix and expensive to ignore later.

Play 3: Split Entities From Relationships

Trigger

When documents are long, when relationships span distant paragraphs, or when recall in the sample run was disappointing.

Owner and moves

The pipeline owner decomposes extraction into two passes. The first pass extracts and canonicalizes entities, producing a resolved entity list per document. The second pass takes that list and asks only about relationships among the known entities. This recovers long-range relationships a single pass misses and makes each stage independently testable. The deeper rationale for decomposition lives in What People Get Wrong About Pulling Graphs From Text.

Play 4: Resolve Entities Against a Canon

Trigger

The moment you see duplicate nodes for the same real-world entity in the sample graph.

Owner and moves

The pipeline owner maintains a canonical entity list and passes it into extraction so the model extends it rather than inventing new nodes for each spelling variant. New mentions are matched against the canon; genuinely new entities get added with a canonical form. Doing this during extraction, not after, is what keeps the graph from fragmenting into edges that point at half a dozen versions of the same company.

Play 5: Verify Against the Source

Trigger

Before any extracted triple is allowed into the production graph.

Owner and moves

The validation owner runs every triple through two checks. Structural validation confirms the output parses and conforms to the schema. Content validation confirms the cited source span actually exists in the document and that the entities appear in it. Triples whose confidence falls below a threshold route to human review instead of the graph. This play is where fabricated relationships get caught, and it is the one teams most often skip because the JSON "looks fine."

Play 6: Route by Difficulty

Trigger

When extraction cost at volume is higher than the value justifies.

Owner and moves

The pipeline owner adds a cheap classifier that sorts incoming documents by length and complexity. Short, simple documents take the single-pass path; long or dense ones take the decomposed multi-pass path. This keeps average cost low without sacrificing quality on the hard documents. It is the difference between paying premium rates on every document and paying them only where they matter.

Play 7: Measure Against a Gold Set

Trigger

Continuously, on every prompt or model change.

Owner and moves

The evaluation owner maintains a held-out set of documents with hand-labeled triples and reports precision and recall on every change. Precision guards against fabrication; recall guards against silent drops. Watching both prevents the common failure where tightening the prompt to reduce hallucination quietly halves recall. The discipline mirrors the evaluation thinking in Controlling Formality and Register in Output: Best Practices That Actually Work, where surface compliance can hide real regressions.

Play 8: Keep the Graph Fresh

Trigger

Whenever a source document changes or is replaced.

Owner and moves

The pipeline owner tracks provenance at the triple level: every relationship records which document and version it came from. When a document changes, re-extract it, reconcile the new triples against the old, and retire any relationship that no longer has source support. Triple-level provenance is what makes incremental updates possible without rebuilding the entire graph from scratch.

Play 9: Contain Ambiguity

Trigger

When repeated runs of the same document disagree on a relationship, or when reviewers flag passages that read multiple ways.

Owner and moves

The validation owner treats run-to-run disagreement as a signal rather than noise. Rather than forcing a deterministic answer, the pipeline records the competing interpretations, attaches their source spans, and routes the passage to human judgment. The reviewer decides which reading is correct, or records both with provenance if the source is genuinely ambiguous. This play prevents the false confidence that comes from collapsing real ambiguity into a single committed triple, a trap explored in Straight Answers on Turning Text Into Knowledge Graphs.

Play 10: Document the Pipeline for Handoff

Trigger

Before any team member who built the pipeline leaves it, and ideally from the start.

Owner and moves

The pipeline owner produces a short runbook per stage: its input, its output, how to run it, and how to recognize failure. Together with the schema document and the gold set, these runbooks form the handoff package that lets a newcomer operate the pipeline without decoding anyone's prompts. The play converts a pipeline that lives in one person's head into an asset the team owns, which is the difference between a fragile script and a durable system.

Frequently Asked Questions

In what order should I run these plays?

Roughly the order presented: pin the schema, prove it on a sample, then add decomposition, resolution, and validation as the sample reveals their need. Routing and freshness are operational plays you add once the pipeline runs at volume. Do not add complexity before the sample run shows it is warranted.

Can one person own all of these plays?

On a small team, yes. The roles exist to ensure single accountability per play, not to mandate headcount. What matters is that the schema, the prompt, validation, and evaluation each have a clear owner, even if that owner is the same person.

Which play has the highest payoff?

Pinning the schema. Every downstream play depends on a clear, closed vocabulary, and most extraction problems trace back to a schema that was never fully specified. Time spent here returns more than anywhere else in the sequence.

How often should I rerun the sample play?

Whenever the schema changes meaningfully, whenever you switch models, and whenever the production graph surprises you. The sample run is cheap and catches interpretation drift before it contaminates the full corpus.

Do I need all eight plays for a small project?

No. A small project with short, clean documents might run only the schema, sample, and validation plays. Add decomposition, resolution, routing, and freshness as your corpus grows in size and messiness. The plays are a menu, not a mandatory sequence.

Key Takeaways

Pin a closed schema first; nearly every extraction problem traces back to an underspecified vocabulary.
Prove the prompt on a hand-read sample before investing in scale, catching interpretation drift while it is cheap to fix.
Decompose long documents into entity and relationship passes, and resolve entities against a canon to prevent fragmentation.
Verify every triple against its source span before it enters the graph; structural validation alone does not catch fabrication.
Measure precision and recall on a gold set continuously, and track triple-level provenance so the graph can update incrementally.

Play 1: Pin the Schema

Trigger

Run this first, before writing any extraction prompt, and rerun it whenever the downstream graph reveals a relationship type nobody anticipated.

Owner and moves

Play 2: Prove It on a Sample

Trigger

Immediately after the schema exists, before any investment in scale.

Owner and moves

Play 3: Split Entities From Relationships

Trigger

When documents are long, when relationships span distant paragraphs, or when recall in the sample run was disappointing.

Owner and moves

Play 4: Resolve Entities Against a Canon

Trigger

The moment you see duplicate nodes for the same real-world entity in the sample graph.

Owner and moves

Play 5: Verify Against the Source

Trigger

Before any extracted triple is allowed into the production graph.

Owner and moves

Play 6: Route by Difficulty

Trigger

When extraction cost at volume is higher than the value justifies.

Owner and moves

Play 7: Measure Against a Gold Set

Trigger

Continuously, on every prompt or model change.

Owner and moves

Play 8: Keep the Graph Fresh

Trigger

Whenever a source document changes or is replaced.

Owner and moves

Play 9: Contain Ambiguity

Trigger

When repeated runs of the same document disagree on a relationship, or when reviewers flag passages that read multiple ways.

Owner and moves

Play 10: Document the Pipeline for Handoff

Trigger

Before any team member who built the pipeline leaves it, and ideally from the start.

Owner and moves

Frequently Asked Questions

In what order should I run these plays?

Can one person own all of these plays?

Which play has the highest payoff?

How often should I rerun the sample play?

Do I need all eight plays for a small project?

Key Takeaways

Pin a closed schema first; nearly every extraction problem traces back to an underspecified vocabulary.
Prove the prompt on a hand-read sample before investing in scale, catching interpretation drift while it is cheap to fix.
Decompose long documents into entity and relationship passes, and resolve entities against a canon to prevent fragmentation.
Verify every triple against its source span before it enters the graph; structural validation alone does not catch fabrication.
Measure precision and recall on a gold set continuously, and track triple-level provenance so the graph can update incrementally.

Pulling Clean Graphs From Messy Source Text

Play 1: Pin the Schema

Trigger

Owner and moves

Play 2: Prove It on a Sample

Trigger

Owner and moves

Play 3: Split Entities From Relationships

Trigger

Owner and moves

Play 4: Resolve Entities Against a Canon

Trigger

Owner and moves

Play 5: Verify Against the Source

Trigger

Owner and moves

Play 6: Route by Difficulty

Trigger

Owner and moves

Play 7: Measure Against a Gold Set

Trigger

Owner and moves

Play 8: Keep the Graph Fresh

Trigger

Owner and moves

Play 9: Contain Ambiguity

Trigger

Owner and moves

Play 10: Document the Pipeline for Handoff

Trigger

Owner and moves

Frequently Asked Questions

In what order should I run these plays?

Can one person own all of these plays?

Which play has the highest payoff?

How often should I rerun the sample play?

Do I need all eight plays for a small project?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?

Pulling Clean Graphs From Messy Source Text

Play 1: Pin the Schema

Trigger

Owner and moves

Play 2: Prove It on a Sample

Trigger

Owner and moves

Play 3: Split Entities From Relationships

Trigger

Owner and moves

Play 4: Resolve Entities Against a Canon

Trigger

Owner and moves

Play 5: Verify Against the Source

Trigger

Owner and moves

Play 6: Route by Difficulty

Trigger

Owner and moves

Play 7: Measure Against a Gold Set

Trigger

Owner and moves

Play 8: Keep the Graph Fresh

Trigger

Owner and moves

Play 9: Contain Ambiguity

Trigger

Owner and moves

Play 10: Document the Pipeline for Handoff

Trigger

Owner and moves

Frequently Asked Questions

In what order should I run these plays?

Can one person own all of these plays?

Which play has the highest payoff?

How often should I rerun the sample play?