Semantic Search

Vector-based retrieval that matches by meaning instead of exact tokens — and how to combine it with keyword search.

Semantic search retrieves documents whose vector embeddings are closest to the query embedding. It complements keyword search: where keyword search fails on synonyms or paraphrasing ("running shoes" vs "sport sneakers"), semantic search closes the gap. The two modes can also be combined into hybrid search, which usually outperforms either one alone.

How it works

Each searchable text field is embedded at ingest time. The embedding is stored as a float[] vector field on the document (see Index schema).
At query time, the user's query is embedded by the same model.
The engine computes cosine similarity (or inner product) between the query vector and every document vector, returning the top hits ordered by distance.
Hybrid mode runs both keyword and vector retrieval and merges the result lists with a learned weight.

The underlying engine is Typesense's vector_query operator; @repo/search exposes it through formatVectorQuery() and generateEmbedding() (see packages/search/lib/embeddings.ts).

Status

Semantic search is Beta:

Capability	Status
Vector field at ingest	✅ Available (declare a `float[]` field with `num_dim`)
`vector_query` at search time	✅ Available
Auto-embed on `upsertDocument` / `bulkUpsert`	🟡 Beta — controlled by an org-level flag
Hybrid (keyword + vector) ranking	🟡 Beta
Custom embedding model per Knowledge space	🟡 Beta (`KnowledgeSpace.ragConfig.embeddingModel`)
Per-organization fine-tuned model	⏳ Roadmap (Enterprise)

Treat the schema and request shape as stable; treat per-org tuning knobs as subject to change.

Schema requirements

Add a vector field to the index schema:

await orpc.search.createIndex.call({
  organizationId: "org_…",
  slug: "products",
  fields: [
    { name: "id", type: "string" },
    { name: "title", type: "string", sort: true },
    { name: "description", type: "string" },
    { name: "embedding", type: "float[]", num_dim: 1536, vec_dist: "cosine" },
  ],
});

Picking values:

num_dim must match the embedding model (1536 for text-embedding-3-small, 3072 for text-embedding-3-large). Wrong dim → ingest fails with expected vector of length X, got Y.
vec_dist is "cosine" by default; switch to "ip" (inner product) only if you have a model that requires it.
hnsw_params — tune ef_construction and M only after benchmarking. Defaults are sane.

The embedding field name is conventional; the engine doesn't care what you call it as long as you reference it in vector_query.

Ingesting documents with embeddings

Two options:

Option 1 — server-side auto-embed (Beta)

Set the per-org auto-embed flag on the AI feature config; the worker calls generateEmbedding() on the configured text fields and writes the vector before forwarding to Typesense. Beta because the embedding model is still configurable only by the platform.

Option 2 — client-side embedding

Compute the embedding yourself and pass it as a regular field through upsertDocument / bulkUpsert:

await orpc.search.upsertDocument.call({
  organizationId: "org_…",
  indexSlug: "products",
  document: {
    id: "product-123",
    title: "Wireless Headphones",
    description: "Noise-cancelling over-ear headphones…",
    embedding: [0.0123, -0.0456, …, 0.0789],  // 1536 floats
  },
});

Same DB-first ingest path (Invariant 2). The vector is opaque to the buffer; the worker writes whatever you provided.

Querying

Vector-only

const res = await fetch("/api/search", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${searchKey}`,
  },
  body: JSON.stringify({
    indexSlug: "products",
    q: "*",
    vectorQuery: "embedding:([0.01, -0.05, …], k:20)",
  }),
});

vector_query takes the literal vector and a k value (how many nearest neighbours to consider). Use q: "*" so the keyword side is a no-op.

Hybrid (keyword + vector)

{
  "indexSlug": "products",
  "q": "running shoes",
  "queryBy": "title,description",
  "vectorQuery": "embedding:([…], k:50, distance_threshold:0.7, alpha:0.7)"
}

alpha blends the two scores: 0 is pure keyword, 1 is pure vector. Common starting point: 0.4–0.6.
distance_threshold is the maximum cosine distance to consider — useful for filtering out semantically-unrelated documents.

Hybrid is usually a strict improvement over either mode on noisy / ambiguous queries. On exact-string queries (SKUs, brand names), keyword alone is faster and just as accurate.

Filters

filterBy, facetBy, and sortBy work the same as keyword search. A common pattern is to use vector search to widen recall and filterBy to enforce business rules:

{
  "q": "*",
  "vectorQuery": "embedding:([…], k:100)",
  "filterBy": "availability:=in_stock && price:<100"
}

When semantic search helps

Paraphrasing. "running shoes" ↔ "sport sneakers", "wireless earbuds" ↔ "bluetooth headphones".
Multilingual catalogs. Embeddings from multilingual models bridge across locales without per-language synonym rules.
Long-form queries. Users typing a sentence often have keywords that mis-match the catalog vocabulary.
No-results recovery. When keyword search returns 0 hits, run a vector pass as a fallback (see No-results loop).

When semantic search hurts

Exact-match queries. SKUs, model numbers, brand names. Keyword search is faster, deterministic, and not subject to embedding drift.
Cold catalogs. Indexes with fewer than ~50 documents don't have enough signal for vectors to outperform keyword.
High-latency tolerance. Vector queries are usually fast but text-embedding-3-large query-time embeddings add 100–300 ms; budget for it.
Compliance lookups. When the user must see the exact source ("what does clause 4.2 say"), keyword + curation is auditable in a way vectors are not.

Cost shape

The query-time embedding call is metered through the AI Wallet (CREDIT_RATES.embedding_query). Bulk ingest embedding has its own rate (CREDIT_RATES.embedding_ingest). Insufficient balance → 402 Payment Required (Invariant 6 still applies — the upstream embedding-provider error is mapped to a typed JSON error).

For sustained semantic load, watch the Activity tab in the dashboard for embedding_cost_exceeded events and adjust the per-org budget.

AI Search overview
AI answers — the answer panel built on top of search hits
Index schema — vector field declaration
Public search endpoint — keyword-side request shape
Multi-search and querying — batching semantic + keyword searches in one round-trip

Semantic Search

On this page