AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Coreference Resolution Across a DocumentResolving within the context windowWhen references span chunksCross-Document Entity ResolutionCandidate generation and disambiguationResolving against an existing graphImplicit and Inferred RelationshipsHandling Contradictions and UncertaintyModeling assertions, not factsSurfacing low-confidence extractionsSchema Evolution Without Breaking the GraphVersioning the ontologyBackfilling deliberatelyPerformance and Cost at ScaleTiered model usageRouting work to the right tierFrequently Asked QuestionsHow do I handle a document longer than the context window?Should my graph store inferred relationships?What is the best approach to cross-document entity resolution?How do I keep contradictions from corrupting the graph?When is it worth using a stronger, more expensive model?Key Takeaways
Home/Blog/Coreference, Long Context, and Other Graph Extraction Hard Parts
General

Coreference, Long Context, and Other Graph Extraction Hard Parts

A

Agency Script Editorial

Editorial Team

·November 17, 2019·9 min read
prompting for knowledge graph extractionprompting for knowledge graph extraction advancedprompting for knowledge graph extraction guideprompt engineering

A first knowledge graph extraction pipeline works on clean, short, single-topic documents. The trouble starts when reality intrudes: a contract that refers to a party as "the Licensee" forty times, a research corpus where the same author appears under three name variants, a transcript where the crucial relationship is implied rather than stated. These are not bugs in your prompt. They are the genuinely hard parts of the problem, and handling them is what separates a demo from a system people trust with decisions.

This piece assumes you already have the fundamentals: a schema, structured output, basic normalization, and a way to measure quality. It goes after the problems that only appear once those basics are in place. Each of them has a real solution, but the solutions involve trade-offs and judgment rather than a setting you flip on.

The unifying theme is identity and inference. Most advanced extraction difficulty comes down to two questions: is this the same entity I saw before, and is this relationship actually supported by the text? Get disciplined about both and your graph quality jumps.

One more framing helps before diving in. The advanced problems share a common shape: they are all places where the locally correct answer and the globally correct answer diverge. A triple can be right in isolation and wrong in the context of the whole graph, because it duplicates an entity, contradicts another source, or rests on an inference the text does not license. Advanced extraction is largely the practice of making local extraction decisions that hold up globally, which is why so much of it concerns identity and provenance rather than the extraction prompt itself.

Coreference Resolution Across a Document

A single document refers to the same entity many ways: by name, by role, by pronoun. If your pipeline treats each surface form as a new entity, the graph fractures.

Resolving within the context window

When the whole document fits in the model's context, you can instruct the model to resolve references and emit a canonical entity for each mention. This works far better than post-hoc string matching because the model uses the surrounding meaning, not just spelling. The growth of context windows that makes this practical is part of the shift described in Schema-Constrained Decoding Is Reshaping Graph Extraction.

When references span chunks

If a document exceeds the context window, references break at chunk boundaries. Mitigate this with overlapping chunks and by carrying a running list of resolved entities into each subsequent chunk's prompt, so the model can link new mentions to entities it already established.

Cross-Document Entity Resolution

Within a document is hard; across thousands of documents is harder. The same organization, person, or concept must collapse to one node across the entire corpus.

Candidate generation and disambiguation

Generate candidate matches with cheap signals such as normalized strings and embeddings, then disambiguate the ambiguous candidates with a more expensive model call that considers context. Resolving every pair with a model is too expensive; resolving none leaves a graph full of duplicates. The art is routing only the genuinely ambiguous cases to expensive resolution.

Resolving against an existing graph

The strongest approach consults your existing graph during extraction, so a new document's entities link to known nodes rather than spawning duplicates. This folds resolution into extraction and prevents duplication at the source rather than cleaning it up later.

Implicit and Inferred Relationships

The hardest relationships are the ones the text implies without stating. A document might establish that a person leads a team and that the team owns a product, implying the person's authority over the product without ever saying so.

  • Decide deliberately whether your graph should contain only stated relationships or also inferred ones.
  • If you allow inference, demand that each inferred edge cite the stated facts it rests on.
  • Keep inferred edges distinguishable from stated edges so downstream consumers can choose their trust level.

Allowing inference increases recall but invites error, because a model that infers freely will infer wrongly. The safe default is to extract only stated relationships and run inference as a separate, audited pass.

Handling Contradictions and Uncertainty

Real corpora contradict themselves. One document says a contract expires in 2025, another says 2026. A naive pipeline stores both as fact and produces an incoherent graph.

Modeling assertions, not facts

Treat each extracted relationship as an assertion with a source, not as ground truth. When two assertions conflict, the graph records both with their provenance rather than silently overwriting one. Consumers then resolve conflicts with their own logic, which preserves the evidence rather than destroying it.

Surfacing low-confidence extractions

Let the model express uncertainty and route low-confidence assertions to review. This connects directly to the measurement discipline in Scoring Whether Your Extracted Triples Are Actually Right, because confidence is only useful if you have calibrated what it predicts.

Schema Evolution Without Breaking the Graph

Your ontology will change. A new relationship type becomes important, an entity type splits in two. Doing this on a live graph without corrupting history is an advanced skill.

Versioning the ontology

Version your schema and tag each edge with the schema version that produced it. When the ontology changes, you know which edges predate the change and may need re-extraction, rather than guessing.

Backfilling deliberately

When you add a relationship type, decide whether to re-extract historical documents to populate it or to apply it only going forward. Both are valid; the error is making the choice implicitly and ending up with a graph that is inconsistent about its own coverage.

Performance and Cost at Scale

Advanced extraction is expensive if run naively. The skill is spending model capacity where it changes the answer.

Tiered model usage

Use a cheaper, faster model for the bulk of straightforward extraction and reserve a stronger model for ambiguous resolution and inference. This tiering preserves quality where it matters while keeping cost defensible, which feeds the economics in What Knowledge Graph Extraction Actually Saves a Data Team.

Routing work to the right tier

The skill is deciding which extractions are straightforward enough for the cheap tier and which deserve the expensive one. A confidence signal from the first pass is the usual router: high-confidence extractions ship, low-confidence ones escalate to the stronger model. This keeps the expensive model focused on exactly the cases where its extra capability changes the answer, rather than spending it uniformly on documents that the cheap model already handles perfectly well.

Frequently Asked Questions

How do I handle a document longer than the context window?

Chunk with overlap and carry a running summary of resolved entities into each chunk's prompt so identity survives the boundaries. As context windows grow, more documents avoid chunking entirely, but the carry-forward technique remains essential for the truly long ones.

Should my graph store inferred relationships?

Only if you mark them as inferred and record the stated facts they derive from. Mixing inferred and stated edges without distinction destroys your ability to reason about trust. Many teams keep inference as a separate, audited layer rather than baking it into the base graph.

What is the best approach to cross-document entity resolution?

A tiered one: cheap signals to generate candidates, expensive model calls only on the ambiguous ones, and ideally resolution against your existing graph during extraction. Resolving everything with a model is too costly; resolving nothing leaves duplicates that distort queries.

How do I keep contradictions from corrupting the graph?

Model relationships as sourced assertions rather than facts. When sources conflict, store both with provenance and let consumers resolve the conflict. Silently overwriting one assertion with another hides the very disagreement your users need to see.

When is it worth using a stronger, more expensive model?

For the ambiguous cases: hard coreference, cross-document disambiguation, and inference. Routine extraction rarely justifies the premium. Tier your model usage so the expensive model only touches the decisions that actually change the graph.

Key Takeaways

  • Most advanced difficulty reduces to two questions: is this the same entity, and is this relationship truly supported by the text.
  • Resolve coreference within the context window using meaning, and carry resolved entities across chunks when documents are too long.
  • Use tiered resolution across documents: cheap candidate generation, expensive disambiguation only where genuinely ambiguous.
  • Model relationships as sourced assertions so contradictions and uncertainty are preserved rather than silently overwritten.
  • Version your ontology and tier your model usage to evolve the schema safely and keep cost defensible at scale.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification