GraphRAG Entity Model
The data shape of the knowledge graph — GraphNode, GraphEdge, evidence chunks, and the constraints that make multi-tenant graph queries safe.
A GraphRAG-enabled Knowledge space stores a directed labelled graph alongside its KnowledgeChunk rows. This page is the reference for the schema, the LLM passes that populate it, and the constraints to keep in mind when querying it directly.
Schema
Two tables in packages/database/prisma/schema.prisma:
GraphNode
model GraphNode {
id String @id @default(cuid())
knowledgeSpaceId String
knowledgeSpace KnowledgeSpace @relation(fields: [knowledgeSpaceId], references: [id], onDelete: Cascade)
canonicalName String
nodeType String
metadata Json @default("{}")
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
deletedAt DateTime?
@@unique([knowledgeSpaceId, canonicalName, nodeType])
@@index([knowledgeSpaceId, nodeType])
@@map("graph_node")
}| Field | Purpose |
|---|---|
canonicalName | The normalised entity label ("OpenAI", "Wallet API", "PrestaShop Module"). Resolved by the LLM at ingest. |
nodeType | Semantic type: "Product", "Person", "Concept", "API", "Organization", "Document", "Event", … |
metadata | Free-form JSON. Today: { aliases: string[], firstSeenChunkId: string }. Subject to additive evolution. |
(spaceId, canonicalName, nodeType) | Unique key. "OpenAI/Organization" is one node; "OpenAI/Product" would be a separate node. |
GraphEdge
model GraphEdge {
id String @id @default(cuid())
knowledgeSpaceId String
knowledgeSpace KnowledgeSpace @relation(fields: [knowledgeSpaceId], references: [id], onDelete: Cascade)
fromNodeId String
toNodeId String
relationType String
weight Float @default(1)
evidenceChunkId String?
metadata Json @default("{}")
createdAt DateTime @default(now())
deletedAt DateTime?
@@index([knowledgeSpaceId, fromNodeId, relationType])
@@index([knowledgeSpaceId, toNodeId, relationType])
@@map("graph_edge")
}| Field | Purpose |
|---|---|
fromNodeId / toNodeId | The directed pair. Order matters: "OpenAI provides GPT-4" is from=OpenAI, to=GPT-4, rel=provides. |
relationType | Verb-shaped string: "provides", "owns", "depends_on", "deprecates", "replaces", "integrates_with". |
weight | Confidence × frequency. Normalised to [0, 1] per ingest. Multiple mentions of the same edge sum. |
evidenceChunkId | The chunk that supplied this relation. Used by graphragExplain to cite the originating passage. |
metadata | Free-form JSON: { confidence: number, model: string, extractedAt: ISO }. |
There is no global uniqueness on edges — multiple (from, to, rel) rows can coexist, each pointing at a different evidence chunk. Aggregations sum weight across rows.
Cascading deletes
Everything cascades on KnowledgeSpace.id:
- Delete a space → drop every node and edge.
- Delete a document → its chunks cascade away; but edges that pointed at those chunks keep
evidenceChunkId(onDelete: SetNull) — the relation survives, just without a citation. This is intentional: deleting one document shouldn't invalidate the graph derived from twenty others.
How nodes and edges are produced
Driver: buildGraphFromChunks in packages/api/modules/knowledge/lib/graphrag.ts. It runs two LLM passes:
Pass 1 — entity resolution
resolveEntitiesFromChunks(chunks) in entity-resolution.ts sends each chunk to the model with a prompt that asks for:
canonicalName— the normalised form.type— one of the allowednodeTypestrings.aliases— surface forms in the chunk that map to this entity.
The pass deduplicates within a chunk and across chunks by (canonicalName, type). Pre-existing nodes for the same key are reused via upsertGraphNode. Aliases are merged into the node's metadata.aliases.
Pass 2 — relation typing
Only chunks with ≥ 2 distinct entities advance. extractRelationsFromChunks asks the model for each chunk:
from,to— the canonical names already produced in pass 1.relationType— a verb in lower snake_case from a small typed vocabulary.confidence—0..1.
Each accepted relation becomes a new GraphEdge with evidenceChunkId set. Edges below a confidence threshold (currently 0.5) are dropped.
Querying the graph
You can query the graph directly through Prisma helpers in @repo/database — useful for custom dashboards or evaluations:
import { db, listGraphEdgesForNodes } from "@repo/database";
// All edges originating from a node:
const outgoing = await db.graphEdge.findMany({
where: { knowledgeSpaceId: spaceId, fromNodeId: nodeId },
});
// Bidirectional fan-out from a set of anchor nodes:
const edges = await listGraphEdgesForNodes({
knowledgeSpaceId: spaceId,
nodeIds: anchorIds,
});For multi-tenant safety, every query MUST include knowledgeSpaceId (Invariant 5 extended to Knowledge). The repository helpers enforce this; direct db.graphEdge.findMany calls without it are a bug.
Confidence and noise
Two reasons edges look wrong:
- Hallucinated relation. The model invented a verb.
confidenceis usually low; filter tometadata.confidence > 0.7if you're rendering edges directly to a user. - Real but contested. Two chunks disagree. The weight accumulates and the graph stays — but
graphragExplainwill sometimes cite both. Treat this as a content-side problem to resolve, not a bug in the graph.
The community-detection job (graphragCommunities) only considers edges with weight ≥ 0.5 to keep noise out of clusters.
Schema evolution
The schema is frozen (Invariant 9). Additive evolution is expected via:
- New
relationTypestrings — no migration required; the column isString. - New entries in
metadataJSON — additive, validated at write time. - New
nodeTypestrings — no migration, but consumers should treat the set as open.
Renaming or removing existing relation/node types requires explicit user approval (Gate A) and an ingest-time migration plan; do not rewrite historical edges silently.
Inspecting in the dashboard
The dashboard's Knowledge → Graph tab (Beta) renders the node list, edge counts per relation type, and the detected communities. The visual graph explorer is roadmap.
Until then, the canonical inspection path is Postgres directly:
SELECT relationType, count(*) AS edges, avg(weight) AS avg_weight
FROM graph_edge
WHERE knowledgeSpaceId = '…'
AND deletedAt IS NULL
GROUP BY 1
ORDER BY 2 DESC;Limits to be aware of
- Two-hop horizon.
graphragExplainwalks at most 2 hops by default. Beyond that, the noise outweighs the signal. - No edge attribute search. You can't query "edges with
metadata.confidence > 0.9" through the public API — only directly through Prisma. - No write API. Edges and nodes are created exclusively by the ingest pipeline. Direct user-facing edge creation is not supported.
- No multi-graph. One graph per space. Cross-space joins are explicitly not supported.
Related pages
- GraphRAG overview
- GraphRAG use cases
- Knowledge sources — the ingest path that populates the graph
- Knowledge evaluation — measuring whether the graph helps
GraphRAG Overview
Graph-aware retrieval that adds entity and relation reasoning on top of Knowledge RAG. When to reach for it and how it differs from plain RAG.
GraphRAG Use Cases
Concrete patterns where GraphRAG beats plain RAG — product knowledge, support, compliance, internal documentation — with example questions and what the graph adds in each case.