AGENCYSCRIPT
CoursesEnterpriseBlog
πŸ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
Β© 2026 Agency Script, Inc.Β·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Myth: A Bigger Model Solves Extraction QualityWhy the belief takes holdWhat actually happensMyth: One Prompt Can Handle the Whole DocumentThe appeal of a single passWhere it breaksMyth: Few-Shot Examples Always Improve ResultsThe reasonable starting pointThe hidden costMyth: JSON Output Means Reliable OutputWhy structure feels like safetyWhat structure cannot guaranteeMyth: Temperature Zero Makes Extraction DeterministicThe intuitive expectationThe uncomfortable realityMyth: Entity Resolution Is a Separate ProblemThe clean mental modelWhy coupling mattersFrequently Asked QuestionsDo I need a fine-tuned model for knowledge graph extraction?How do I stop the model from inventing relationships?Is a single end-to-end prompt ever the right choice?Why does my extracted graph have so many duplicate nodes?Should I validate output with a schema?Key Takeaways
Home/Blog/Misconceptions About Pulling Graphs From Text
General

Misconceptions About Pulling Graphs From Text

A

Agency Script Editorial

Editorial Team

Β·September 22, 2019Β·8 min read
prompting for knowledge graph extractionprompting for knowledge graph extraction mythsprompting for knowledge graph extraction guideprompt engineering

Knowledge graph extraction sounds like one of those tasks language models were born for. Feed in a paragraph of unstructured text, ask for entities and the relationships between them, and receive clean triples ready to load into a graph database. The demos are convincing. The reality, once you run the same prompt across ten thousand documents, is messier than the demos suggest.

Most of the trouble comes from beliefs that are half true. A technique works on the examples someone tested, so it gets written up as a rule. Then it travels through blog posts and conference talks until it hardens into received wisdom. By the time you inherit the advice, the conditions that made it work have been stripped away. The result is a pipeline that looks reasonable and produces a graph nobody trusts.

This article walks through the most common misconceptions about prompting for knowledge graph extraction, explains why each one feels right, and describes what actually happens when you push the technique past a handful of curated examples.

Myth: A Bigger Model Solves Extraction Quality

Why the belief takes hold

When extraction is sloppy, the first instinct is to reach for a more capable model. Sometimes that helps. A stronger model does recognize more entity types and handles ambiguous phrasing better. So people conclude that quality is mostly a function of model size and stop investigating.

What actually happens

The dominant source of error in extraction is not reasoning capacity. It is schema ambiguity. If your prompt does not tell the model whether "Acme acquired Beta" should produce an acquired relationship, an owns relationship, or both, a bigger model will simply make a more confident guess. It will also make a different guess on the next document, which is worse than being consistently wrong.

The fix is rarely a model upgrade. It is a tighter specification of the target schema, a controlled vocabulary for relationship types, and a few worked examples that show the model how to handle the edge cases you care about. Once the schema is pinned down, the gap between mid-tier and top-tier models often shrinks to noise.

Myth: One Prompt Can Handle the Whole Document

The appeal of a single pass

Sending an entire document and asking for the complete graph in one response is tidy. It keeps the code simple and the token accounting easy to reason about. For short, well-structured text it works fine.

Where it breaks

Long documents overwhelm a single extraction pass in two ways. First, the model starts dropping entities that appear late in the text because attention spreads thin. Second, relationships that span distant sentences get missed entirely because the model never holds both ends in working memory at once.

A more reliable pattern decomposes the task. Extract entities first, resolve them to canonical forms, then run a second pass that asks only about relationships among the already-identified entities. This is slower and costs more tokens, but recall improves dramatically and the output becomes auditable. If you are weighing this trade-off, our piece on Building a Repeatable Workflow for Prompting for Knowledge Graph Extraction lays out the staging in detail.

Myth: Few-Shot Examples Always Improve Results

The reasonable starting point

Few-shot prompting genuinely helps the model understand your schema and formatting. Showing two or three examples of input text paired with the desired triples anchors the output format and reduces hallucinated relationship types.

The hidden cost

Examples also bias the model toward the patterns they contain. If all your examples involve corporate acquisitions, the model starts seeing acquisitions everywhere, even in documents about partnerships or supplier relationships. The examples that taught the format also taught a prior that distorts extraction on dissimilar text.

The corrective is to diversify examples across the relationship types you expect, and to periodically run the prompt with zero examples to see whether the few-shot set is helping or merely steering. Treat your example set as a tunable component, not a fixed asset.

Myth: JSON Output Means Reliable Output

Why structure feels like safety

Asking for JSON output, and validating it against a schema, catches a real class of errors. Malformed responses get rejected, and you avoid downstream parsing failures. It is a genuine improvement over free text.

What structure cannot guarantee

A perfectly valid JSON object can still contain a wrong relationship, a misattributed entity, or a confidently invented fact. Schema validation checks shape, not truth. Teams that lean on structured output sometimes stop checking content because the pipeline "passes," which is exactly when silent errors accumulate in the graph.

Pair structural validation with content checks: verify that extracted entities actually appear in the source text, that relationship directions are consistent, and that confidence below a threshold routes to human review. Our companion article on What People Get Wrong About Controlling Formality and Register in Output makes a parallel point about how surface compliance can mask real defects.

Myth: Temperature Zero Makes Extraction Deterministic

The intuitive expectation

Setting temperature to zero is supposed to make the model deterministic, which sounds ideal for extraction. Run the same document twice, get the same graph, and your pipeline is reproducible.

The uncomfortable reality

Temperature zero reduces variance but does not eliminate it across model versions, infrastructure changes, or even minor prompt reformatting. More importantly, low temperature makes the model commit hard to its first interpretation, which means a single ambiguous phrasing produces a single confident answer with no signal that other readings existed.

For extraction, a small amount of sampling combined with multiple passes can actually surface ambiguity. If three runs produce three different relationship labels, that disagreement is information: the document is genuinely ambiguous and deserves review. Determinism hides that signal.

Myth: Entity Resolution Is a Separate Problem

The clean mental model

It is tempting to treat extraction and entity resolution as distinct stages owned by different systems. The model extracts mentions; a downstream service decides that "IBM," "I.B.M.," and "International Business Machines" are the same node.

Why coupling matters

In practice, resolution decisions feed back into extraction quality. If the model knows that two mentions refer to the same entity, it produces cleaner relationships. If it does not, you get duplicate nodes with fragmented edges, and merging them later is lossy. The strongest pipelines give the model a running list of already-resolved entities and ask it to extend that list rather than start fresh each time, a technique discussed alongside other staging choices in The Prompting for Knowledge Graph Extraction Playbook.

Frequently Asked Questions

Do I need a fine-tuned model for knowledge graph extraction?

Usually not at first. A well-specified schema, a controlled relationship vocabulary, and a diverse few-shot set get most teams to a usable baseline with a general-purpose model. Fine-tuning earns its keep when you have a stable schema, a large volume of domain-specific text, and labeled data, at which point it improves consistency and lowers per-call cost. Reach for it after prompt engineering plateaus, not before.

How do I stop the model from inventing relationships?

Constrain it. Provide an explicit, closed list of allowed relationship types and instruct the model to use only those. Require that every extracted triple cite the span of source text supporting it, then verify that span exists. Hallucinations drop sharply when the model knows its output will be checked against the source.

Is a single end-to-end prompt ever the right choice?

Yes, for short, structurally simple documents where recall is less critical than speed. A single pass is cheaper and simpler to operate. The moment documents grow long or relationships span paragraphs, decomposition into entity and relationship stages pays for itself in recall and auditability.

Why does my extracted graph have so many duplicate nodes?

Almost always because entity resolution is happening too late or not at all. The model extracts surface mentions and treats each variant spelling as a new entity. Feeding the model a canonical entity list during extraction, or resolving aggressively immediately after, prevents fragmentation that is painful to repair downstream.

Should I validate output with a schema?

Always validate shape, but never stop there. Schema validation guarantees parseable, well-formed output and nothing about correctness. Layer content checks on top: span verification, direction consistency, and confidence thresholds that route uncertain extractions to human review.

Key Takeaways

  • Most extraction failures come from schema ambiguity, not model weakness; specify your relationship vocabulary before reaching for a bigger model.
  • Long documents need decomposed extraction (entities first, then relationships) to maintain recall and auditability.
  • Few-shot examples both teach format and impose bias; diversify them and test against zero-shot periodically.
  • Valid JSON proves structure, not truth; pair schema validation with span verification and confidence-based review.
  • Treat entity resolution as coupled to extraction, not a separate downstream cleanup, to avoid fragmented graphs.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way β€” a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Case Study: Large Language Models in Practice

Most teams that fail with large language models don't fail because the technology doesn't work. They fail because they treat deployment as a one-time event rather than a discipline β€” pick a model, wri

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Thirty-Second Wins Breed False Confidence With LLMs

Working with large language models is deceptively easy to start and surprisingly hard to do well. You can get a useful output in thirty seconds, which creates a false confidence that compounds over ti

A
Agency Script Editorial
June 1, 2026Β·10 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification