Migrating from `LIKE %query%`, MySQL fulltext, PostgreSQL `tsvector`, or any other database-native search to AACsearch.

From database search

If your search today is WHERE name LIKE '%query%', MySQL fulltext indexes, or PostgreSQL tsvector, this is your guide. The pattern below is the most general migration template — most other migration guides on this site are specializations of it.

Why move off of database search

This isn't a marketing page. The honest reasons to move:

LIKE %query% doesn't use an index. Once you have more than ~100 k rows, search becomes the slowest query in your application.
MySQL fulltext is decent but has limited tokenizer control and no typo tolerance.
PostgreSQL tsvector is good for static corpora but rewriting an index on every update gets expensive.
All three have effectively no facets, no curated rankings, no analytics out of the box.

If your corpus is small (< 10 k rows) and you have no plans to grow, don't migrate. Database search is fine. Add a GIN index on the tsvector column and call it a day. AACsearch is for when you've outgrown the database's ability to be fast, helpful, and observable about search.

Mapping

The general shape:

DB table row     →   Document
DB column        →   Field
DB index/FTS     →   `query_by` (which columns to search)
WHERE clauses    →   `filter_by`
ORDER BY         →   `sort_by` / `default_sorting_field`

A worked example for a products table:

DB column	Type	AACsearch field
`id`	`bigint`	`id` (string — Typesense uses string ids)
`name`	`varchar`	`name` (string, `query_by`)
`description`	`text`	`description` (string, `query_by`)
`tags` (jsonb)	`jsonb`	`tags` (string[], `query_by`, also for filter)
`price_cents`	`bigint`	`price` (int64, `sort_by`, `filter_by`)
`category_id`	`bigint`	`categoryId` (string, `filter_by`)
`in_stock`	`bool`	`inStock` (bool, `filter_by`)
`created_at`	`timestamp`	`createdAt` (int64 epoch, `sort_by`)
`tenant_id`	`bigint`	`tenantId` (string, always in `filter_by`)

Two general rules:

IDs are strings in Typesense. Cast your numeric ids to strings.
Time is integers, not strings. Epoch seconds or milliseconds; never ISO string. Filter and sort by time only works with numeric.

Schema definition

import { client } from "@repo/api/client";

await client.searchIndex.create.call({
	slug: "products",
	fields: [
		{ name: "name", type: "string" },
		{ name: "description", type: "string", optional: true },
		{ name: "tags", type: "string[]", facet: true },
		{ name: "price", type: "int64" },
		{ name: "categoryId", type: "string", facet: true },
		{ name: "inStock", type: "bool", facet: true },
		{ name: "createdAt", type: "int64" },
		{ name: "tenantId", type: "string", facet: true },
	],
	defaultSortingField: "createdAt",
	tokenSeparators: ["-"], // optional — matches your data
});

The tenantId field is the one your application will filter by and the one your scoped tokens will narrow on. Don't skip it.

Export

A simple PostgreSQL export:

\copy (
  SELECT
    id::text                            AS id,
    name,
    description,
    tags,
    price_cents                         AS price,
    category_id::text                   AS "categoryId",
    in_stock                            AS "inStock",
    EXTRACT(EPOCH FROM created_at)::bigint AS "createdAt",
    tenant_id::text                     AS "tenantId"
  FROM products
  WHERE deleted_at IS NULL
)
TO '/tmp/products.ndjson'
WITH (FORMAT csv, HEADER false, DELIMITER '\t');

For a JSON-per-line export (NDJSON), use row_to_json:

\copy (
  SELECT row_to_json(t) FROM (
    SELECT
      id::text AS id,
      name,
      description,
      tags,
      price_cents AS price,
      category_id::text AS "categoryId",
      in_stock AS "inStock",
      EXTRACT(EPOCH FROM created_at)::bigint AS "createdAt",
      tenant_id::text AS "tenantId"
    FROM products
    WHERE deleted_at IS NULL
  ) t
)
TO '/tmp/products.ndjson';

You'll want NDJSON for the ingest path below.

Ingest

Stream the NDJSON through your ingest endpoint in batches:

import { client } from "@repo/api/client";
import { createReadStream } from "node:fs";
import { createInterface } from "node:readline";

const BATCH = 500;
const stream = createInterface({ input: createReadStream("/tmp/products.ndjson") });
let batch: Record<string, unknown>[] = [];

for await (const line of stream) {
	batch.push(JSON.parse(line));
	if (batch.length >= BATCH) {
		await client.searchDocument.bulkUpsert.call({ indexSlug: "products", documents: batch });
		batch = [];
	}
}
if (batch.length > 0) {
	await client.searchDocument.bulkUpsert.call({ indexSlug: "products", documents: batch });
}

The ingest endpoint enqueues into the buffer; the worker flushes to Typesense. You'll see ingest lag rise during the migration — that's expected. Watch the monitoring signals; lag should drain within minutes after the export completes.

Query translation

Most database queries map to AACsearch in a straightforward way:

Database	AACsearch
`WHERE name ILIKE '%'
`WHERE to_tsvector(name) @@ to_tsquery($1)`	`q=$1`, `query_by=name` (with typo tolerance bonus)
`WHERE tenant_id = $1`	`filter_by=tenantId:=$1`
`WHERE in_stock = true AND price < 5000`	`filter_by=inStock:=true && price:<5000`
`ORDER BY created_at DESC LIMIT 20`	`sort_by=createdAt:desc`, `per_page=20`
`WHERE name ILIKE '%'

The big differences are:

Pagination. Use page and per_page instead of OFFSET/LIMIT. Performance is the same up to page=250; for deep pagination, use a sort-key cursor instead.
Typo tolerance. On by default. Set num_typos=0 if you actually want exact-match behaviour.
Highlighting. Add highlight_full_fields=name,description to get <mark>-wrapped match snippets back from the server. The database does not give you this; you wrote it client-side.

Real-time updates

Database search updates the index when the row updates (either through triggers or through next-query reads). For AACsearch, you push updates through the ingest pipeline.

The patterns that work in practice:

Outbox-based. Your application writes to the database and an outbox table in the same transaction. A worker reads the outbox and calls searchDocument.bulkUpsert. Latency: typically 1–5 seconds.
Change Data Capture (Debezium, pgstream). A CDC pipeline reads the WAL and ships changes to AACsearch. Latency: typically sub-second. More moving parts.
Synchronous on write. Your application writes to AACsearch in the same handler as the DB write. Simplest, but every search-engine outage becomes a DB-write outage. Avoid for high-volume systems.

For e-commerce / CMS / CRM, the outbox pattern is usually the right starting point.

Validation

After the export and re-ingest:

Doc count check. Compare the row count from your SELECT count(*) FROM products WHERE deleted_at IS NULL to searchIndex.getHealth.actualDocCount. They should match.
Spot-check 100 queries from your existing logs. For each: top-5 hits before, top-5 hits after. Classify identical / mostly-same / drift.
Test edge cases. Empty query (should return facets without hits), unicode in the query, very long queries, queries with operators (-not_this, "phrase").
Test the tenant filter. Search as one tenant, confirm no other tenant's documents appear.

Decommissioning the database search

Keep both paths running in parallel for at least 7 days after the migration. After that:

Remove the LIKE or tsvector index (it's no longer in the query path; the index is dead weight).
Remove the search columns from your SELECT lists in the application.
Keep the export tooling around — you'll want it for a backfill the first time you change the schema.

Common mistakes

Importing without tenantId. Single-tenant index for what should be multi-tenant data. Plan the tenant field before the first import.
Casting IDs as ints. Typesense expects string ids. id: "42" not id: 42.
Forgetting time fields. createdAt: "2026-05-01T00:00:00Z" is invalid. createdAt: 1714521600 is correct.
Skipping query validation. "It's just full-text search, how different can it be" — different enough that you'll get user complaints if you ship without validating.

From database search (SQL LIKE / FTS)