AACsearch
Migrationsanleitungen

From database search (SQL LIKE / FTS)

Migrating from `LIKE %query%`, MySQL fulltext, PostgreSQL `tsvector`, or any other database-native search to AACsearch.

From database search

If your search today is WHERE name LIKE '%query%', MySQL fulltext indexes, or PostgreSQL tsvector, this is your guide. The pattern below is the most general migration template — most other migration guides on this site are specializations of it.

This isn't a marketing page. The honest reasons to move:

  • LIKE %query% doesn't use an index. Once you have more than ~100 k rows, search becomes the slowest query in your application.
  • MySQL fulltext is decent but has limited tokenizer control and no typo tolerance.
  • PostgreSQL tsvector is good for static corpora but rewriting an index on every update gets expensive.
  • All three have effectively no facets, no curated rankings, no analytics out of the box.

If your corpus is small (< 10 k rows) and you have no plans to grow, don't migrate. Database search is fine. Add a GIN index on the tsvector column and call it a day. AACsearch is for when you've outgrown the database's ability to be fast, helpful, and observable about search.

Mapping

The general shape:

DB table row     →   Document
DB column        →   Field
DB index/FTS     →   `query_by` (which columns to search)
WHERE clauses    →   `filter_by`
ORDER BY         →   `sort_by` / `default_sorting_field`

A worked example for a products table:

DB columnTypeAACsearch field
idbigintid (string — Typesense uses string ids)
namevarcharname (string, query_by)
descriptiontextdescription (string, query_by)
tags (jsonb)jsonbtags (string[], query_by, also for filter)
price_centsbigintprice (int64, sort_by, filter_by)
category_idbigintcategoryId (string, filter_by)
in_stockboolinStock (bool, filter_by)
created_attimestampcreatedAt (int64 epoch, sort_by)
tenant_idbiginttenantId (string, always in filter_by)

Two general rules:

  1. IDs are strings in Typesense. Cast your numeric ids to strings.
  2. Time is integers, not strings. Epoch seconds or milliseconds; never ISO string. Filter and sort by time only works with numeric.

Schema definition

import { client } from "@repo/api/client";

await client.searchIndex.create.call({
	slug: "products",
	fields: [
		{ name: "name", type: "string" },
		{ name: "description", type: "string", optional: true },
		{ name: "tags", type: "string[]", facet: true },
		{ name: "price", type: "int64" },
		{ name: "categoryId", type: "string", facet: true },
		{ name: "inStock", type: "bool", facet: true },
		{ name: "createdAt", type: "int64" },
		{ name: "tenantId", type: "string", facet: true },
	],
	defaultSortingField: "createdAt",
	tokenSeparators: ["-"], // optional — matches your data
});

The tenantId field is the one your application will filter by and the one your scoped tokens will narrow on. Don't skip it.

Export

A simple PostgreSQL export:

\copy (
  SELECT
    id::text                            AS id,
    name,
    description,
    tags,
    price_cents                         AS price,
    category_id::text                   AS "categoryId",
    in_stock                            AS "inStock",
    EXTRACT(EPOCH FROM created_at)::bigint AS "createdAt",
    tenant_id::text                     AS "tenantId"
  FROM products
  WHERE deleted_at IS NULL
)
TO '/tmp/products.ndjson'
WITH (FORMAT csv, HEADER false, DELIMITER '\t');

For a JSON-per-line export (NDJSON), use row_to_json:

\copy (
  SELECT row_to_json(t) FROM (
    SELECT
      id::text AS id,
      name,
      description,
      tags,
      price_cents AS price,
      category_id::text AS "categoryId",
      in_stock AS "inStock",
      EXTRACT(EPOCH FROM created_at)::bigint AS "createdAt",
      tenant_id::text AS "tenantId"
    FROM products
    WHERE deleted_at IS NULL
  ) t
)
TO '/tmp/products.ndjson';

You'll want NDJSON for the ingest path below.

Ingest

Stream the NDJSON through your ingest endpoint in batches:

import { client } from "@repo/api/client";
import { createReadStream } from "node:fs";
import { createInterface } from "node:readline";

const BATCH = 500;
const stream = createInterface({ input: createReadStream("/tmp/products.ndjson") });
let batch: Record<string, unknown>[] = [];

for await (const line of stream) {
	batch.push(JSON.parse(line));
	if (batch.length >= BATCH) {
		await client.searchDocument.bulkUpsert.call({ indexSlug: "products", documents: batch });
		batch = [];
	}
}
if (batch.length > 0) {
	await client.searchDocument.bulkUpsert.call({ indexSlug: "products", documents: batch });
}

The ingest endpoint enqueues into the buffer; the worker flushes to Typesense. You'll see ingest lag rise during the migration — that's expected. Watch the monitoring signals; lag should drain within minutes after the export completes.

Query translation

Most database queries map to AACsearch in a straightforward way:

DatabaseAACsearch
`WHERE name ILIKE '%'
WHERE to_tsvector(name) @@ to_tsquery($1)q=$1, query_by=name (with typo tolerance bonus)
WHERE tenant_id = $1filter_by=tenantId:=$1
WHERE in_stock = true AND price < 5000filter_by=inStock:=true && price:&lt;5000
ORDER BY created_at DESC LIMIT 20sort_by=createdAt:desc, per_page=20
`WHERE name ILIKE '%'

The big differences are:

  • Pagination. Use page and per_page instead of OFFSET/LIMIT. Performance is the same up to page=250; for deep pagination, use a sort-key cursor instead.
  • Typo tolerance. On by default. Set num_typos=0 if you actually want exact-match behaviour.
  • Highlighting. Add highlight_full_fields=name,description to get <mark>-wrapped match snippets back from the server. The database does not give you this; you wrote it client-side.

Real-time updates

Database search updates the index when the row updates (either through triggers or through next-query reads). For AACsearch, you push updates through the ingest pipeline.

The patterns that work in practice:

  1. Outbox-based. Your application writes to the database and an outbox table in the same transaction. A worker reads the outbox and calls searchDocument.bulkUpsert. Latency: typically 1–5 seconds.
  2. Change Data Capture (Debezium, pgstream). A CDC pipeline reads the WAL and ships changes to AACsearch. Latency: typically sub-second. More moving parts.
  3. Synchronous on write. Your application writes to AACsearch in the same handler as the DB write. Simplest, but every search-engine outage becomes a DB-write outage. Avoid for high-volume systems.

For e-commerce / CMS / CRM, the outbox pattern is usually the right starting point.

Validation

After the export and re-ingest:

  1. Doc count check. Compare the row count from your SELECT count(*) FROM products WHERE deleted_at IS NULL to searchIndex.getHealth.actualDocCount. They should match.
  2. Spot-check 100 queries from your existing logs. For each: top-5 hits before, top-5 hits after. Classify identical / mostly-same / drift.
  3. Test edge cases. Empty query (should return facets without hits), unicode in the query, very long queries, queries with operators (-not_this, "phrase").
  4. Test the tenant filter. Search as one tenant, confirm no other tenant's documents appear.

Keep both paths running in parallel for at least 7 days after the migration. After that:

  1. Remove the LIKE or tsvector index (it's no longer in the query path; the index is dead weight).
  2. Remove the search columns from your SELECT lists in the application.
  3. Keep the export tooling around — you'll want it for a backfill the first time you change the schema.

Common mistakes

  • Importing without tenantId. Single-tenant index for what should be multi-tenant data. Plan the tenant field before the first import.
  • Casting IDs as ints. Typesense expects string ids. id: "42" not id: 42.
  • Forgetting time fields. createdAt: "2026-05-01T00:00:00Z" is invalid. createdAt: 1714521600 is correct.
  • Skipping query validation. "It's just full-text search, how different can it be" — different enough that you'll get user complaints if you ship without validating.

See also

On this page