Knowledge RAG Overview
Retrieval-augmented Q&A over your own documents — uploaded files, URLs, internal knowledge bases. How spaces, sources, ingestion jobs, and the ask endpoint fit together.
The Knowledge module is the second AACSearch surface: instead of searching a catalog of structured documents, it answers natural-language questions over your own unstructured content — PDFs, DOCX files, scraped URLs, internal articles. It is intentionally separate from storefront search; the two surfaces share no indexes and serve different jobs.
| Surface | Use case | Backed by |
|---|---|---|
| Storefront search | Find products in a catalog | SearchIndex + Typesense alias |
| Knowledge RAG | Answer a question with citations from internal documents | KnowledgeSpace + KnowledgeChunk (Postgres + vectors) |
| GraphRAG (extension) | Answer a multi-document / multi-concept question | GraphNode / GraphEdge over the same chunks |
This page is the orientation for the Knowledge module. For ingestion details see Sources; for measuring answer quality see Evaluation.
Status
| Capability | Status |
|---|---|
| File upload ingest (PDF / DOCX / TXT / Markdown) | ✅ Available |
| URL ingest (single page) | ✅ Available |
| Ingestion job tracking + retry | ✅ Available |
| Chunking (fixed / semantic / markdown / code strategies) | ✅ Available |
| Embedding (default model) | ✅ Available |
| Embedding model selection per space | 🟡 Beta — KnowledgeSpace.ragConfig.embeddingModel |
ask (RAG retrieval + LLM answer, with citations) | ✅ Available |
askStream (server-sent events) | ✅ Available |
graphragExplain (graph-aware answer with paths) | ✅ Available — see GraphRAG |
| Connector-driven sources (Confluence, Notion, GDrive) | ⏳ Roadmap |
| Multi-modal sources (images, audio) | ⏳ Roadmap |
| Per-tenant fine-tuned answer model | ⏳ Roadmap (Enterprise) |
Spaces
A Knowledge space is the unit of isolation. Each space:
- Has exactly one owner — an organization (
organizationId) OR a user (userId). The owner type is enforced by an XOR CHECK at the DB level, mirroring thePurchasemodel pattern. - Holds its own data sources, documents, chunks, embeddings, and graph nodes.
- Has its own
slug, unique per(ownerType, userId|organizationId). - Has its own RAG config — model, chunking strategy, top-k retrieval — at
KnowledgeSpace.ragConfig(Beta).
Routes:
| Scope | Path |
|---|---|
| Organization | /[orgSlug]/knowledge/[spaceSlug] |
| User (personal) | /knowledge/[spaceSlug] |
The same dashboard UI is rendered for both scopes; the ownerType is inferred from the URL.
Data model
| Model | Purpose |
|---|---|
KnowledgeSpace | Top-level container; XOR owner (user or org). |
DataSource | Source descriptor (file upload, URL, future connector). Holds sync config and credential ref. |
IngestionJob | One ingest run. Tracks QUEUED → RUNNING → SUCCEEDED / FAILED, processed/failed counts, error message. |
KnowledgeDocument | A parsed source (one PDF, one URL, one DOCX). Stores extracted text + checksum. |
KnowledgeChunk | A retrievable text slice — chunkIndex, text, tokenCount, embedding (Json vector). |
GraphNode | An entity / concept extracted from chunks for GraphRAG. Carries canonicalName and nodeType. |
GraphEdge | A relation between two GraphNodes with relationType, weight, and evidenceChunkId. |
All seven are namespaced by knowledgeSpaceId. Every retrieval scopes to a single space — cross-space reads are not allowed (Invariant 5 extended to Knowledge).
End-to-end flow
DataSource (file or URL)
│
▼
IngestionJob ──► parsers ──► KnowledgeDocument ──► chunker ──► KnowledgeChunk (text + embedding)
│
┌───────────────────┴─────────┐
▼ ▼
ask / askStream buildGraphFromChunks
(retrieval → LLM) (GraphNode + GraphEdge)DataSourceis created (createSourcefor connectors,ingestFile/ingestUrlfor ad-hoc uploads).IngestionJobis enqueued. The worker parses the source (packages/api/modules/knowledge/lib/parsers.ts), splits into chunks (chunking.ts), embeds each chunk, and writes the rows.- Retrieval at query time: embed the question, fetch the top-k chunks by cosine similarity in
retrieval.ts, and pass them as numbered context to the LLM inrag-pipeline.ts. - The answer includes
sources— the documents the cited chunks came from.
For GraphRAG, step 2 also runs buildGraphFromChunks (LLM-based entity resolution + relation typing). Retrieval at step 3 then has the option of traversing the graph instead of (or in addition to) vector similarity.
The ask endpoint
const result = await orpc.knowledge.ask.call({
spaceId: "ks_…",
question: "How do I reset my password?",
topK: 5, // default 5
includeGraph: false // true to mix in GraphRAG hits
});Returns { answer, sources: KnowledgeDocument[], chunks: KnowledgeChunk[] }. Cite sources in the UI; show chunks[].text snippets if you want a "view passages" affordance.
For streaming, use askStream — it emits the same payload but as server-sent events so you can render the answer character-by-character.
Privacy and tenant isolation
- Each
askcall is scoped to exactly oneKnowledgeSpace. - The org / user check is done by
requireKnowledgeSpaceAccess(packages/api/modules/knowledge/lib/access.ts). - Retrieval never crosses spaces, even within the same organization.
- Chunks are sent to the configured LLM provider (OpenAI by default). Per-space model selection is Beta.
- Source files are stored in object storage; chunks + embeddings live in Postgres alongside the rest of the schema.
For DPA / SOC 2 scope, see Security & Compliance.
When to use Knowledge RAG
- Internal Q&A — onboarding docs, runbooks, policies.
- Product knowledge base — FAQs, support articles, integration guides.
- Compliance and audit — questions where the answer must cite a specific document.
When not to use it
- Product catalog Q&A — use AI answers on top of storefront search instead. Knowledge RAG doesn't see your
SearchIndexdocuments. - Live data — Knowledge sources are snapshots taken at ingest time. For freshness, schedule recurring ingestion jobs or wait for the connectors roadmap.
- Numerical computation — RAG is a retrieval layer, not a calculator. Don't ask it for live inventory counts.
- Adversarial or PII-heavy content — see Security docs before ingesting; per-tenant model isolation is on the roadmap, not shipped today.
Cost shape
Every ask, askStream, and ingest job is metered through the AI Wallet (Invariant 8). Reserve-then-commit on each call; failed calls release the reservation. Rates live in packages/api/modules/entitlements/credit-rates.ts.
Watch the Activity tab for ai_overage_reached events when running large bulk ingests — the per-document embedding cost dominates ingest spend.
Related pages
- Sources — file upload, URL ingest, supported types, sync behavior
- Evaluation — measuring answer quality
- GraphRAG — multi-document reasoning over the same chunks
- AI Search — answer surface on top of storefront search
- Plans and limits — AI wallet rates and quotas
- Security & Compliance