Build a Working Knowledge Graph in One Sitting

Reading about knowledge graphs only gets you so far. The concept clicks when you build one and watch a question you couldn't answer before become a simple traversal. This guide is a sequential, do-this-then-that process. Follow it in order and you'll have a working knowledge graph by the end, plus the judgment to grow it.

We'll use a deliberately small example — a content team's articles, authors, and topics — because the value of a graph shows up even at small scale, and a small graph is something you can actually finish today. If you need the conceptual grounding first, read the complete guide and come back. Otherwise, let's build.

The order of these steps is not arbitrary. The single most common way knowledge graph projects fail is starting with the data instead of the questions. We start with questions on purpose.

Step 1: Write Down Three Questions

Before touching any tool, write three specific questions you want the graph to answer. Concrete ones, not "understand my data."

For our content team:

Which authors have written about the same topic?
If we retire a topic, which articles lose coverage?
What topics has author X never covered?

These questions are the entire blueprint. Every modeling decision later is judged against them. If a node or edge doesn't help answer one of your three questions, it doesn't belong in the first version.

Step 2: Identify Your Nodes

Read your questions and underline every noun that names a type of thing. From ours: author, topic, article.

Those three become your node types. Resist adding more. You might think "I should also track publish dates and word counts" — but those are properties, not nodes, and none of your three questions need them yet. Keep the first model lean.

Step 3: Identify Your Edges

Now find the verbs and relationships connecting your nodes. Ours imply:

An author wrote an article.
An article covers a topic.

Two edge types. Notice we don't directly connect authors to topics — that connection is derived by walking through articles. This is the graph's strength: relationships you didn't store explicitly emerge from traversal. "Authors who share a topic" comes from author → wrote → article → covers → topic ← covers ← article ← wrote ← author, without ever storing an author-topic link.

Step 4: Choose a Storage Approach

You have three realistic options, in increasing order of power:

A drawing or spreadsheet of edges — fine for under ~50 nodes, good for learning.
A graph database like Neo4j (free tier available) — the standard professional choice for labeled property graphs.
An RDF triplestore — better when you need formal standards and shared vocabularies.

For a first build, start with a free graph database so you can run real queries. Our tools guide compares the options in detail; for now, pick one and move on. Tool agonizing is a form of procrastination.

Step 5: Load Your Data

Create your nodes first, then connect them with edges. In a tool like Neo4j using Cypher, that looks like:

CREATE (a:Author {name: 'Maria'})
CREATE (art:Article {title: 'Graph Basics'})
CREATE (t:Topic {name: 'Knowledge Graphs'})
CREATE (a)-[:WROTE]->(art)
CREATE (art)-[:COVERS]->(t)

Do this for your real records. The hard part is not syntax — it's deciding when two things are the same node. If "Maria" and "Maria S." appear in your data, are they one author or two? Getting this wrong corrupts every answer. This is called entity resolution, and it deserves real care. Rushing it is the top entry in our common mistakes list.

Step 6: Ask Your Three Questions as Queries

Now translate each of your Step 1 questions into a traversal. "Which authors have written about the same topic?" becomes:

MATCH (a1:Author)-[:WROTE]->()-[:COVERS]->(t:Topic)<-[:COVERS]-()<-[:WROTE]-(a2:Author)
WHERE a1 <> a2
RETURN a1.name, t.name, a2.name

Run it. If you get sensible answers, your model works. If you get nonsense or nothing, the problem is almost always your data (duplicate or missing nodes) or your model (a missing edge type), not the query. Debug by checking the graph visually first.

Step 7: Validate, Then Iterate

Compare the graph's answers against what you know to be true. Spot-check five results by hand. Once you trust it, then expand — add a new node type or edge only when a new question demands it.

This iterative loop is the whole discipline:

A new question arrives.
You add only the nodes and edges that question needs.
You re-validate.

Graphs that grow this way stay coherent. Graphs that grow by "let's add everything we have" become tangled within months. The best practices article goes deeper on keeping a growing graph healthy.

A Worked Mini-Example, End to End

To make the seven steps concrete, here's the whole thing compressed into one tiny run you could replicate in an hour.

Step 1 (questions): "Which two authors have both written about graphs?"

Steps 2–3 (model): Nodes are Author, Article, Topic. Edges are WROTE and COVERS.

Step 4 (storage): A free Neo4j instance.

Step 5 (load): Create three authors, five articles, and two topics, wiring up WROTE and COVERS edges. Pause to check: is "J. Smith" in one article the same as "John Smith" in another? Decide now, before you query.

Step 6 (query): Run the shared-topic traversal. You get back pairs of authors and the topic they share.

Step 7 (validate): Eyeball the pairs against what you know. If two authors you know both wrote about graphs don't appear, you have a missing COVERS edge or a duplicated topic node.

The point of running this small is speed of feedback. With ten nodes you can verify every answer by hand, which builds the judgment you'll need when the graph has ten thousand. Scale comes after correctness, never before.

A Note on Letting AI Help

Once you're comfortable building by hand, you can accelerate Steps 5 and 6 with a large language model. Modern models can read unstructured text — say, a folder of articles — and extract author, topic, and article nodes automatically, then even draft the Cypher. Do this after you understand the manual process, not before. If you can't tell when the AI mis-extracts an entity, you can't trust the graph it builds.

Frequently Asked Questions

How long does building a first knowledge graph take?

A small, focused graph following these steps takes an afternoon, not a quarter. The seven steps for a three-node-type model are genuinely doable in a few hours. Projects that take months are usually ones that skipped Step 1 and tried to model everything at once.

What if my questions change after I build it?

That's expected and fine. Step 7 is built around it: when a new question arrives, add only the nodes and edges it needs, then re-validate. A graph is meant to evolve. The discipline is adding incrementally rather than rebuilding from scratch each time.

Do I have to use Cypher specifically?

No. Cypher is for labeled property graphs like Neo4j. If you choose an RDF triplestore, you'll write SPARQL instead. The steps in this guide are identical regardless of language — only the query syntax in Steps 5 and 6 changes. Pick the language that matches your chosen tool.

Can I skip entity resolution for a small graph?

Only if you've personally verified there are no duplicates, which is realistic at ten nodes and unrealistic at ten thousand. Even small graphs produce wrong answers when "Maria" and "Maria S." are treated as two people. Build the habit early; it does not get easier at scale.

What's the most important step?

Step 1, by a wide margin. Three concrete questions constrain every later decision and keep the graph lean and useful. Teams that start with questions ship working graphs; teams that start with data ship sprawling ones nobody can query.

Key Takeaways

Always start with three concrete questions — they are the blueprint for the entire build.
Derive node types from nouns and edge types from verbs in your questions.
Choose a free graph database for a first real build instead of agonizing over tooling.
Entity resolution — deciding when two records are the same node — is the hard, essential part.
Validate against known truths, then grow the graph one new question at a time.

The order of these steps is not arbitrary. The single most common way knowledge graph projects fail is starting with the data instead of the questions. We start with questions on purpose.

Step 1: Write Down Three Questions

Before touching any tool, write three specific questions you want the graph to answer. Concrete ones, not "understand my data."

For our content team:

Which authors have written about the same topic?
If we retire a topic, which articles lose coverage?
What topics has author X never covered?

Step 2: Identify Your Nodes

Read your questions and underline every noun that names a type of thing. From ours: author, topic, article.

Step 3: Identify Your Edges

Now find the verbs and relationships connecting your nodes. Ours imply:

An author wrote an article.
An article covers a topic.

Step 4: Choose a Storage Approach

You have three realistic options, in increasing order of power:

A drawing or spreadsheet of edges — fine for under ~50 nodes, good for learning.
A graph database like Neo4j (free tier available) — the standard professional choice for labeled property graphs.
An RDF triplestore — better when you need formal standards and shared vocabularies.

Step 5: Load Your Data

Create your nodes first, then connect them with edges. In a tool like Neo4j using Cypher, that looks like:

CREATE (a:Author {name: 'Maria'})
CREATE (art:Article {title: 'Graph Basics'})
CREATE (t:Topic {name: 'Knowledge Graphs'})
CREATE (a)-[:WROTE]->(art)
CREATE (art)-[:COVERS]->(t)

Step 6: Ask Your Three Questions as Queries

Now translate each of your Step 1 questions into a traversal. "Which authors have written about the same topic?" becomes:

MATCH (a1:Author)-[:WROTE]->()-[:COVERS]->(t:Topic)<-[:COVERS]-()<-[:WROTE]-(a2:Author)
WHERE a1 <> a2
RETURN a1.name, t.name, a2.name

Step 7: Validate, Then Iterate

Compare the graph's answers against what you know to be true. Spot-check five results by hand. Once you trust it, then expand — add a new node type or edge only when a new question demands it.

This iterative loop is the whole discipline:

A new question arrives.
You add only the nodes and edges that question needs.
You re-validate.

Graphs that grow this way stay coherent. Graphs that grow by "let's add everything we have" become tangled within months. The best practices article goes deeper on keeping a growing graph healthy.

A Worked Mini-Example, End to End

To make the seven steps concrete, here's the whole thing compressed into one tiny run you could replicate in an hour.

Step 1 (questions): "Which two authors have both written about graphs?"

Steps 2–3 (model): Nodes are Author, Article, Topic. Edges are WROTE and COVERS.

Step 4 (storage): A free Neo4j instance.

Step 6 (query): Run the shared-topic traversal. You get back pairs of authors and the topic they share.

Step 7 (validate): Eyeball the pairs against what you know. If two authors you know both wrote about graphs don't appear, you have a missing COVERS edge or a duplicated topic node.

A Note on Letting AI Help

Frequently Asked Questions

How long does building a first knowledge graph take?

What if my questions change after I build it?

Do I have to use Cypher specifically?

Can I skip entity resolution for a small graph?

What's the most important step?

Key Takeaways

Always start with three concrete questions — they are the blueprint for the entire build.
Derive node types from nouns and edge types from verbs in your questions.
Choose a free graph database for a first real build instead of agonizing over tooling.
Entity resolution — deciding when two records are the same node — is the hard, essential part.
Validate against known truths, then grow the graph one new question at a time.

Build a Working Knowledge Graph in One Sitting

Step 1: Write Down Three Questions

Step 2: Identify Your Nodes

Step 3: Identify Your Edges

Step 4: Choose a Storage Approach

Step 5: Load Your Data

Step 6: Ask Your Three Questions as Queries

Step 7: Validate, Then Iterate

A Worked Mini-Example, End to End

A Note on Letting AI Help

Frequently Asked Questions

How long does building a first knowledge graph take?

What if my questions change after I build it?

Do I have to use Cypher specifically?

Can I skip entity resolution for a small graph?

What's the most important step?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?

Build a Working Knowledge Graph in One Sitting

Step 1: Write Down Three Questions

Step 2: Identify Your Nodes

Step 3: Identify Your Edges

Step 4: Choose a Storage Approach

Step 5: Load Your Data

Step 6: Ask Your Three Questions as Queries

Step 7: Validate, Then Iterate

A Worked Mini-Example, End to End

A Note on Letting AI Help

Frequently Asked Questions

How long does building a first knowledge graph take?

What if my questions change after I build it?

Do I have to use Cypher specifically?

Can I skip entity resolution for a small graph?

What's the most important step?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?