A knowledge graph turns a pile of documents into a queryable web of facts. Instead of searching text for keywords, you traverse relationships: which companies a person founded, which drugs interact with which conditions, which contracts reference which parties. Building that graph used to require brittle rule systems and named-entity models trained per domain. Language models changed the economics. With a well-constructed prompt, a general model can read raw text and emit structured triples ready to load into a graph.
The catch is that "well-constructed" carries a lot of weight. A naive prompt that says "extract the entities and relationships" produces inconsistent labels, hallucinated edges, and output you cannot reliably parse. Getting extraction that holds up at scale requires deliberate prompt design: a schema, an output contract, grounding constraints, and a verification step.
This guide covers the full arc—what a knowledge graph extraction prompt actually needs to contain, how to structure it, how to keep the model honest, and how to validate what comes out. It is written for someone who wants to move past toy demos and run extraction they can trust against thousands of documents.
What Knowledge Graph Extraction Actually Produces
Entities, Relations, and Triples
A knowledge graph is built from triples: subject, predicate, object. "Marie Curie" (subject) "won" (predicate) "Nobel Prize in Physics" (object). The subject and object are entities; the predicate is the relation between them. Extraction is the process of reading text and emitting these triples. Everything in prompt design serves that output.
Why Schema Comes Before Prompting
The single biggest determinant of quality is whether you define a schema first. A schema lists the entity types you care about (Person, Organization, Drug, Disease) and the relation types that connect them (founded, employs, treats, interactswith). Without it, the model invents labels on the fly—"worksat" in one document, "employed_by" in the next—and your graph fragments into synonyms that never join.
Anatomy of an Extraction Prompt
The Schema Block
The prompt should open with an explicit enumeration of allowed entity types and relation types. Constraining the model to a closed vocabulary is what makes downstream merging possible. List each type with a one-line definition so the model resolves ambiguity the way you would.
The Output Contract
State the exact output format and require nothing else. JSON is the usual choice: an array of objects, each with subject, subjecttype, predicate, object, and objecttype fields. Demand that the model emit only valid JSON with no commentary, so a parser can consume it directly.
Grounding Constraints
Add an instruction that every triple must be supported by the source text, and that the model should omit anything it cannot ground rather than guess. This single sentence is the difference between a graph of facts and a graph of plausible fabrications.
Worked Examples
Include one or two short input-output pairs. Demonstrating the format with a real example anchors the model far more effectively than describing the format in prose. This is the same lesson covered in depth in Three Real Extraction Jobs, From Contracts to Clinical Notes.
Designing the Schema
Scope It to the Questions You Will Ask
A schema should be driven by the queries the graph must answer, not by everything the text mentions. If you will never ask about locations, do not extract them. A tighter schema produces a cleaner, more useful graph and a shorter, cheaper prompt.
Normalize Entity and Relation Names
Decide on canonical forms up front. Will organizations be stored by legal name or common name? Will "acquired" and "bought" collapse into one relation? Encoding these decisions in the schema prevents the fragmentation that kills graph usability.
Handle Entity Types You Did Not Anticipate
Provide a small escape hatch—an "Other" entity type or a flag for triples that almost fit—so the model surfaces edge cases instead of silently forcing them into the wrong bucket. You can review these flagged items and extend the schema deliberately.
Keeping the Model Grounded
Demand Source Spans
Ask the model to include the exact text span supporting each triple. This makes verification trivial: you can check that the span exists in the source and actually expresses the relation. The technique also discourages fabrication, since the model must point at evidence.
Forbid Inference Beyond the Text
Be explicit that the model should extract only stated facts, not inferred ones, unless inference is a stated goal. "The CEO attended the meeting" does not license a "Person employed_by Organization" triple unless the text says so. Spelling this out prevents the most common class of hallucinated edges, a problem dissected in Why Graph Extraction Prompts Silently Drop Half Your Entities.
Processing Documents at Scale
Chunking Long Documents
Models have context limits, and long documents must be chunked. The risk is that a relationship spanning two chunks gets lost. Overlap chunks slightly and, where the schema allows, run a second pass that links entities across chunk boundaries using consistent identifiers.
Resolving Duplicate Entities
The same entity appears in many forms—"IBM," "International Business Machines," "the company." Entity resolution merges these into one node. You can prompt the model to canonicalize names against a known list, or run a separate resolution pass after extraction. Without this step the graph contains duplicate nodes that should be one.
Deduplicating and Merging Triples
Across thousands of documents the same fact appears repeatedly. Deduplicate identical triples and decide how to handle conflicting ones—two documents claiming different founders for a company. A provenance field recording which document produced each triple makes conflict resolution auditable.
Validating Extraction Output
Schema Conformance Checks
Programmatically verify that every entity type and relation type in the output belongs to your schema. Anything outside the vocabulary is a signal that the prompt drifted or the schema is incomplete. This automated gate catches problems before they reach the graph.
Spot-Checking Against Source
Sample a set of triples and confirm each against its source span. A small manual review reveals systematic errors—a relation the model consistently misreads—that aggregate metrics hide. Pair this with the discipline in Ship-Ready Verification Steps for Graph Extraction Prompts.
Measuring Precision and Recall
Build a gold-standard set: a handful of documents with manually labeled triples. Precision tells you how many extracted triples are correct; recall tells you how many true triples you captured. Tracking both as you iterate on the prompt turns prompt tuning into engineering rather than guesswork.
Frequently Asked Questions
Do I need a separate model trained for my domain?
Usually not. A capable general model with a well-designed prompt, a tight schema, and grounding constraints handles most domains well. Train or fine-tune only when you have a highly specialized vocabulary, very high accuracy requirements, or extreme volume where per-call cost matters.
Why does my output keep using inconsistent relation names?
Because the prompt does not constrain the model to a closed set of relations. Enumerate the exact relation types you allow, define each in one line, and instruct the model to use only those. Free-form relation extraction always drifts into synonyms.
How do I stop the model from inventing relationships?
Add an explicit grounding instruction—every triple must be supported by the source text—and require the model to include the supporting span. Tell it to omit anything it cannot ground rather than guess. Then verify spans programmatically during validation.
What format should the extraction output use?
Structured JSON is the practical default: an array of triple objects with typed subject, predicate, and object fields plus a source span. Require the model to emit only valid JSON so a parser consumes it without cleanup. Avoid free prose, which is brittle to parse.
How do I handle the same entity appearing under different names?
Run entity resolution, either by prompting the model to canonicalize names against a reference list or with a dedicated post-extraction pass that merges variants into a single node. Skipping this step leaves duplicate nodes that fragment your graph.
How do I measure whether my extraction is good?
Build a small gold-standard set of manually labeled documents and compute precision and recall against it. Precision measures correctness of what you extracted; recall measures how much of the truth you captured. Track both as you refine the prompt.
Key Takeaways
- A knowledge graph is built from subject-predicate-object triples, and every extraction prompt exists to produce those triples reliably.
- Define a schema of allowed entity and relation types before writing the prompt; it is the single biggest driver of quality.
- A strong prompt contains a schema block, a strict output contract, grounding constraints, and worked examples.
- Keep the model honest by demanding source spans and forbidding inference beyond the stated text.
- At scale, handle chunking, entity resolution, and triple deduplication, and record provenance for auditability.
- Validate with schema conformance checks, source spot-checks, and precision/recall against a gold-standard set.