Semantic Search
Vector-based retrieval that matches by meaning instead of exact tokens — and how to combine it with keyword search.
Semantic search retrieves documents whose vector embeddings are closest to the query embedding. It complements keyword search: where keyword search fails on synonyms or paraphrasing ("running shoes" vs "sport sneakers"), semantic search closes the gap. The two modes can also be combined into hybrid search, which usually outperforms either one alone.
How it works
- Each searchable text field is embedded at ingest time. The embedding is stored as a
float[]vector field on the document (see Index schema). - At query time, the user's query is embedded by the same model.
- The engine computes cosine similarity (or inner product) between the query vector and every document vector, returning the top hits ordered by distance.
- Hybrid mode runs both keyword and vector retrieval and merges the result lists with a learned weight.
The underlying engine is Typesense's vector_query operator; @repo/search exposes it through formatVectorQuery() and generateEmbedding() (see packages/search/lib/embeddings.ts).
Status
Semantic search is Beta:
| Capability | Status |
|---|---|
| Vector field at ingest | ✅ Available (declare a float[] field with num_dim) |
vector_query at search time | ✅ Available |
Auto-embed on upsertDocument / bulkUpsert | 🟡 Beta — controlled by an org-level flag |
| Hybrid (keyword + vector) ranking | 🟡 Beta |
| Custom embedding model per Knowledge space | 🟡 Beta (KnowledgeSpace.ragConfig.embeddingModel) |
| Per-organization fine-tuned model | ⏳ Roadmap (Enterprise) |
Treat the schema and request shape as stable; treat per-org tuning knobs as subject to change.
Schema requirements
Add a vector field to the index schema:
await orpc.search.createIndex.call({
organizationId: "org_…",
slug: "products",
fields: [
{ name: "id", type: "string" },
{ name: "title", type: "string", sort: true },
{ name: "description", type: "string" },
{ name: "embedding", type: "float[]", num_dim: 1536, vec_dist: "cosine" },
],
});Picking values:
num_dimmust match the embedding model (1536fortext-embedding-3-small,3072fortext-embedding-3-large). Wrong dim → ingest fails withexpected vector of length X, got Y.vec_distis"cosine"by default; switch to"ip"(inner product) only if you have a model that requires it.hnsw_params— tuneef_constructionandMonly after benchmarking. Defaults are sane.
The embedding field name is conventional; the engine doesn't care what you call it as long as you reference it in vector_query.
Ingesting documents with embeddings
Two options:
Option 1 — server-side auto-embed (Beta)
Set the per-org auto-embed flag on the AI feature config; the worker calls generateEmbedding() on the configured text fields and writes the vector before forwarding to Typesense. Beta because the embedding model is still configurable only by the platform.
Option 2 — client-side embedding
Compute the embedding yourself and pass it as a regular field through upsertDocument / bulkUpsert:
await orpc.search.upsertDocument.call({
organizationId: "org_…",
indexSlug: "products",
document: {
id: "product-123",
title: "Wireless Headphones",
description: "Noise-cancelling over-ear headphones…",
embedding: [0.0123, -0.0456, …, 0.0789], // 1536 floats
},
});Same DB-first ingest path (Invariant 2). The vector is opaque to the buffer; the worker writes whatever you provided.
Querying
Vector-only
const res = await fetch("/api/search", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${searchKey}`,
},
body: JSON.stringify({
indexSlug: "products",
q: "*",
vectorQuery: "embedding:([0.01, -0.05, …], k:20)",
}),
});vector_query takes the literal vector and a k value (how many nearest neighbours to consider). Use q: "*" so the keyword side is a no-op.
Hybrid (keyword + vector)
{
"indexSlug": "products",
"q": "running shoes",
"queryBy": "title,description",
"vectorQuery": "embedding:([…], k:50, distance_threshold:0.7, alpha:0.7)"
}alphablends the two scores:0is pure keyword,1is pure vector. Common starting point:0.4–0.6.distance_thresholdis the maximum cosine distance to consider — useful for filtering out semantically-unrelated documents.
Hybrid is usually a strict improvement over either mode on noisy / ambiguous queries. On exact-string queries (SKUs, brand names), keyword alone is faster and just as accurate.
Filters
filterBy, facetBy, and sortBy work the same as keyword search. A common pattern is to use vector search to widen recall and filterBy to enforce business rules:
{
"q": "*",
"vectorQuery": "embedding:([…], k:100)",
"filterBy": "availability:=in_stock && price:<100"
}When semantic search helps
- Paraphrasing.
"running shoes" ↔ "sport sneakers","wireless earbuds" ↔ "bluetooth headphones". - Multilingual catalogs. Embeddings from multilingual models bridge across locales without per-language synonym rules.
- Long-form queries. Users typing a sentence often have keywords that mis-match the catalog vocabulary.
- No-results recovery. When keyword search returns 0 hits, run a vector pass as a fallback (see No-results loop).
When semantic search hurts
- Exact-match queries. SKUs, model numbers, brand names. Keyword search is faster, deterministic, and not subject to embedding drift.
- Cold catalogs. Indexes with fewer than ~50 documents don't have enough signal for vectors to outperform keyword.
- High-latency tolerance. Vector queries are usually fast but
text-embedding-3-largequery-time embeddings add 100–300 ms; budget for it. - Compliance lookups. When the user must see the exact source ("what does clause 4.2 say"), keyword + curation is auditable in a way vectors are not.
Cost shape
The query-time embedding call is metered through the AI Wallet (CREDIT_RATES.embedding_query). Bulk ingest embedding has its own rate (CREDIT_RATES.embedding_ingest). Insufficient balance → 402 Payment Required (Invariant 6 still applies — the upstream embedding-provider error is mapped to a typed JSON error).
For sustained semantic load, watch the Activity tab in the dashboard for embedding_cost_exceeded events and adjust the per-org budget.
Related pages
- AI Search overview
- AI answers — the answer panel built on top of search hits
- Index schema — vector field declaration
- Public search endpoint — keyword-side request shape
- Multi-search and querying — batching semantic + keyword searches in one round-trip