AACsearch
GraphRAG

GraphRAG Entity Model

The data shape of the knowledge graph — GraphNode, GraphEdge, evidence chunks, and the constraints that make multi-tenant graph queries safe.

A GraphRAG-enabled Knowledge space stores a directed labelled graph alongside its KnowledgeChunk rows. This page is the reference for the schema, the LLM passes that populate it, and the constraints to keep in mind when querying it directly.

Schema

Two tables in packages/database/prisma/schema.prisma:

GraphNode

model GraphNode {
  id               String         @id @default(cuid())
  knowledgeSpaceId String
  knowledgeSpace   KnowledgeSpace @relation(fields: [knowledgeSpaceId], references: [id], onDelete: Cascade)
  canonicalName    String
  nodeType         String
  metadata         Json           @default("{}")
  createdAt        DateTime       @default(now())
  updatedAt        DateTime       @updatedAt
  deletedAt        DateTime?

  @@unique([knowledgeSpaceId, canonicalName, nodeType])
  @@index([knowledgeSpaceId, nodeType])
  @@map("graph_node")
}
FieldPurpose
canonicalNameThe normalised entity label ("OpenAI", "Wallet API", "PrestaShop Module"). Resolved by the LLM at ingest.
nodeTypeSemantic type: "Product", "Person", "Concept", "API", "Organization", "Document", "Event", …
metadataFree-form JSON. Today: { aliases: string[], firstSeenChunkId: string }. Subject to additive evolution.
(spaceId, canonicalName, nodeType)Unique key. "OpenAI/Organization" is one node; "OpenAI/Product" would be a separate node.

GraphEdge

model GraphEdge {
  id               String         @id @default(cuid())
  knowledgeSpaceId String
  knowledgeSpace   KnowledgeSpace @relation(fields: [knowledgeSpaceId], references: [id], onDelete: Cascade)
  fromNodeId       String
  toNodeId         String
  relationType     String
  weight           Float          @default(1)
  evidenceChunkId  String?
  metadata         Json           @default("{}")
  createdAt        DateTime       @default(now())
  deletedAt        DateTime?

  @@index([knowledgeSpaceId, fromNodeId, relationType])
  @@index([knowledgeSpaceId, toNodeId, relationType])
  @@map("graph_edge")
}
FieldPurpose
fromNodeId / toNodeIdThe directed pair. Order matters: "OpenAI provides GPT-4" is from=OpenAI, to=GPT-4, rel=provides.
relationTypeVerb-shaped string: "provides", "owns", "depends_on", "deprecates", "replaces", "integrates_with".
weightConfidence × frequency. Normalised to [0, 1] per ingest. Multiple mentions of the same edge sum.
evidenceChunkIdThe chunk that supplied this relation. Used by graphragExplain to cite the originating passage.
metadataFree-form JSON: { confidence: number, model: string, extractedAt: ISO }.

There is no global uniqueness on edges — multiple (from, to, rel) rows can coexist, each pointing at a different evidence chunk. Aggregations sum weight across rows.

Cascading deletes

Everything cascades on KnowledgeSpace.id:

  • Delete a space → drop every node and edge.
  • Delete a document → its chunks cascade away; but edges that pointed at those chunks keep evidenceChunkId (onDelete: SetNull) — the relation survives, just without a citation. This is intentional: deleting one document shouldn't invalidate the graph derived from twenty others.

How nodes and edges are produced

Driver: buildGraphFromChunks in packages/api/modules/knowledge/lib/graphrag.ts. It runs two LLM passes:

Pass 1 — entity resolution

resolveEntitiesFromChunks(chunks) in entity-resolution.ts sends each chunk to the model with a prompt that asks for:

  • canonicalName — the normalised form.
  • type — one of the allowed nodeType strings.
  • aliases — surface forms in the chunk that map to this entity.

The pass deduplicates within a chunk and across chunks by (canonicalName, type). Pre-existing nodes for the same key are reused via upsertGraphNode. Aliases are merged into the node's metadata.aliases.

Pass 2 — relation typing

Only chunks with ≥ 2 distinct entities advance. extractRelationsFromChunks asks the model for each chunk:

  • from, to — the canonical names already produced in pass 1.
  • relationType — a verb in lower snake_case from a small typed vocabulary.
  • confidence0..1.

Each accepted relation becomes a new GraphEdge with evidenceChunkId set. Edges below a confidence threshold (currently 0.5) are dropped.

Querying the graph

You can query the graph directly through Prisma helpers in @repo/database — useful for custom dashboards or evaluations:

import { db, listGraphEdgesForNodes } from "@repo/database";

// All edges originating from a node:
const outgoing = await db.graphEdge.findMany({
  where: { knowledgeSpaceId: spaceId, fromNodeId: nodeId },
});

// Bidirectional fan-out from a set of anchor nodes:
const edges = await listGraphEdgesForNodes({
  knowledgeSpaceId: spaceId,
  nodeIds: anchorIds,
});

For multi-tenant safety, every query MUST include knowledgeSpaceId (Invariant 5 extended to Knowledge). The repository helpers enforce this; direct db.graphEdge.findMany calls without it are a bug.

Confidence and noise

Two reasons edges look wrong:

  1. Hallucinated relation. The model invented a verb. confidence is usually low; filter to metadata.confidence > 0.7 if you're rendering edges directly to a user.
  2. Real but contested. Two chunks disagree. The weight accumulates and the graph stays — but graphragExplain will sometimes cite both. Treat this as a content-side problem to resolve, not a bug in the graph.

The community-detection job (graphragCommunities) only considers edges with weight ≥ 0.5 to keep noise out of clusters.

Schema evolution

The schema is frozen (Invariant 9). Additive evolution is expected via:

  • New relationType strings — no migration required; the column is String.
  • New entries in metadata JSON — additive, validated at write time.
  • New nodeType strings — no migration, but consumers should treat the set as open.

Renaming or removing existing relation/node types requires explicit user approval (Gate A) and an ingest-time migration plan; do not rewrite historical edges silently.

Inspecting in the dashboard

The dashboard's Knowledge → Graph tab (Beta) renders the node list, edge counts per relation type, and the detected communities. The visual graph explorer is roadmap.

Until then, the canonical inspection path is Postgres directly:

SELECT relationType, count(*) AS edges, avg(weight) AS avg_weight
FROM graph_edge
WHERE knowledgeSpaceId = '…'
  AND deletedAt IS NULL
GROUP BY 1
ORDER BY 2 DESC;

Limits to be aware of

  • Two-hop horizon. graphragExplain walks at most 2 hops by default. Beyond that, the noise outweighs the signal.
  • No edge attribute search. You can't query "edges with metadata.confidence > 0.9" through the public API — only directly through Prisma.
  • No write API. Edges and nodes are created exclusively by the ingest pipeline. Direct user-facing edge creation is not supported.
  • No multi-graph. One graph per space. Cross-space joins are explicitly not supported.

On this page