Knowledge RAG Overview

Retrieval-augmented Q&A over your own documents — uploaded files, URLs, internal knowledge bases. How spaces, sources, ingestion jobs, and the ask endpoint fit together.

The Knowledge module is the second AACSearch surface: instead of searching a catalog of structured documents, it answers natural-language questions over your own unstructured content — PDFs, DOCX files, scraped URLs, internal articles. It is intentionally separate from storefront search; the two surfaces share no indexes and serve different jobs.

Surface	Use case	Backed by
Storefront search	Find products in a catalog	`SearchIndex` + Typesense alias
Knowledge RAG	Answer a question with citations from internal documents	`KnowledgeSpace` + `KnowledgeChunk` (Postgres + vectors)
GraphRAG (extension)	Answer a multi-document / multi-concept question	`GraphNode` / `GraphEdge` over the same chunks

This page is the orientation for the Knowledge module. For ingestion details see Sources; for measuring answer quality see Evaluation.

Status

Capability	Status
File upload ingest (PDF / DOCX / TXT / Markdown)	✅ Available
URL ingest (single page)	✅ Available
Ingestion job tracking + retry	✅ Available
Chunking (fixed / semantic / markdown / code strategies)	✅ Available
Embedding (default model)	✅ Available
Embedding model selection per space	🟡 Beta — `KnowledgeSpace.ragConfig.embeddingModel`
`ask` (RAG retrieval + LLM answer, with citations)	✅ Available
`askStream` (server-sent events)	✅ Available
`graphragExplain` (graph-aware answer with paths)	✅ Available — see GraphRAG
Connector-driven sources (Confluence, Notion, GDrive)	⏳ Roadmap
Multi-modal sources (images, audio)	⏳ Roadmap
Per-tenant fine-tuned answer model	⏳ Roadmap (Enterprise)

Spaces

A Knowledge space is the unit of isolation. Each space:

Has exactly one owner — an organization (organizationId) OR a user (userId). The owner type is enforced by an XOR CHECK at the DB level, mirroring the Purchase model pattern.
Holds its own data sources, documents, chunks, embeddings, and graph nodes.
Has its own slug, unique per (ownerType, userId|organizationId).
Has its own RAG config — model, chunking strategy, top-k retrieval — at KnowledgeSpace.ragConfig (Beta).

Routes:

Scope	Path
Organization	`/[orgSlug]/knowledge/[spaceSlug]`
User (personal)	`/knowledge/[spaceSlug]`

The same dashboard UI is rendered for both scopes; the ownerType is inferred from the URL.

Data model

Model	Purpose
`KnowledgeSpace`	Top-level container; XOR owner (user or org).
`DataSource`	Source descriptor (file upload, URL, future connector). Holds sync config and credential ref.
`IngestionJob`	One ingest run. Tracks `QUEUED → RUNNING → SUCCEEDED / FAILED`, processed/failed counts, error message.
`KnowledgeDocument`	A parsed source (one PDF, one URL, one DOCX). Stores extracted text + checksum.
`KnowledgeChunk`	A retrievable text slice — `chunkIndex`, `text`, `tokenCount`, `embedding` (Json vector).
`GraphNode`	An entity / concept extracted from chunks for GraphRAG. Carries `canonicalName` and `nodeType`.
`GraphEdge`	A relation between two `GraphNode`s with `relationType`, `weight`, and `evidenceChunkId`.

All seven are namespaced by knowledgeSpaceId. Every retrieval scopes to a single space — cross-space reads are not allowed (Invariant 5 extended to Knowledge).

End-to-end flow

DataSource (file or URL)
     │
     ▼
IngestionJob ──► parsers ──► KnowledgeDocument ──► chunker ──► KnowledgeChunk (text + embedding)
                                                                      │
                                                  ┌───────────────────┴─────────┐
                                                  ▼                             ▼
                                          ask / askStream                buildGraphFromChunks
                                          (retrieval → LLM)              (GraphNode + GraphEdge)

DataSource is created (createSource for connectors, ingestFile / ingestUrl for ad-hoc uploads).
IngestionJob is enqueued. The worker parses the source (packages/api/modules/knowledge/lib/parsers.ts), splits into chunks (chunking.ts), embeds each chunk, and writes the rows.
Retrieval at query time: embed the question, fetch the top-k chunks by cosine similarity in retrieval.ts, and pass them as numbered context to the LLM in rag-pipeline.ts.
The answer includes sources — the documents the cited chunks came from.

For GraphRAG, step 2 also runs buildGraphFromChunks (LLM-based entity resolution + relation typing). Retrieval at step 3 then has the option of traversing the graph instead of (or in addition to) vector similarity.

The `ask` endpoint

const result = await orpc.knowledge.ask.call({
  spaceId: "ks_…",
  question: "How do I reset my password?",
  topK: 5,            // default 5
  includeGraph: false // true to mix in GraphRAG hits
});

Returns { answer, sources: KnowledgeDocument[], chunks: KnowledgeChunk[] }. Cite sources in the UI; show chunks[].text snippets if you want a "view passages" affordance.

For streaming, use askStream — it emits the same payload but as server-sent events so you can render the answer character-by-character.

Privacy and tenant isolation

Each ask call is scoped to exactly one KnowledgeSpace.
The org / user check is done by requireKnowledgeSpaceAccess (packages/api/modules/knowledge/lib/access.ts).
Retrieval never crosses spaces, even within the same organization.
Chunks are sent to the configured LLM provider (OpenAI by default). Per-space model selection is Beta.
Source files are stored in object storage; chunks + embeddings live in Postgres alongside the rest of the schema.

For DPA / SOC 2 scope, see Security & Compliance.

When to use Knowledge RAG

Internal Q&A — onboarding docs, runbooks, policies.
Product knowledge base — FAQs, support articles, integration guides.
Compliance and audit — questions where the answer must cite a specific document.

When not to use it

Product catalog Q&A — use AI answers on top of storefront search instead. Knowledge RAG doesn't see your SearchIndex documents.
Live data — Knowledge sources are snapshots taken at ingest time. For freshness, schedule recurring ingestion jobs or wait for the connectors roadmap.
Numerical computation — RAG is a retrieval layer, not a calculator. Don't ask it for live inventory counts.
Adversarial or PII-heavy content — see Security docs before ingesting; per-tenant model isolation is on the roadmap, not shipped today.

Cost shape

Every ask, askStream, and ingest job is metered through the AI Wallet (Invariant 8). Reserve-then-commit on each call; failed calls release the reservation. Rates live in packages/api/modules/entitlements/credit-rates.ts.

Watch the Activity tab for ai_overage_reached events when running large bulk ingests — the per-document embedding cost dominates ingest spend.

Sources — file upload, URL ingest, supported types, sync behavior
Evaluation — measuring answer quality
GraphRAG — multi-document reasoning over the same chunks
AI Search — answer surface on top of storefront search
Plans and limits — AI wallet rates and quotas
Security & Compliance

Knowledge RAG Overview

On this page