Server-side helpers

Batching, idempotency, retry strategy, and webhook signature verification — the patterns every server-side SDK consumer should follow when ingesting documents and receiving webhooks from AACsearch.

The browser SDK is a thin wrapper around the search API. Server-side SDK consumption — ingesting documents, running CMS connectors, receiving webhooks — needs more discipline. This page collects the four patterns you should always follow.

Batching

Single-document writes are slow and expensive. Always batch.

Endpoint	Max batch size	Recommended size
`documents:batch` (upsert)	1000	100–500
`documents:batchdelete`	1000	100–1000
`sync/full`	1000	500
`sync/delta`	1000	100–500
`events:batch`	1000	100

The batch endpoints return per-row error arrays — partial-success is the norm.

Node helper

import { AdminClient } from "@aacsearch/client";

const admin = new AdminClient({
	baseUrl: process.env.AACSEARCH_BASE_URL!,
	apiKey: process.env.AACSEARCH_ADMIN_KEY!,
	projectId: process.env.AACSEARCH_ORG_ID!,
});

async function bulkUpsert(indexId: string, docs: Document[], batchSize = 500) {
	const errors: Array<{ id: string; error: string; message: string }> = [];

	for (let i = 0; i < docs.length; i += batchSize) {
		const batch = docs.slice(i, i + batchSize);
		const result = await admin.batchUpsertDocuments(indexId, batch);
		errors.push(...(result.errors ?? []));
	}

	return { total: docs.length, succeeded: docs.length - errors.length, errors };
}

Python helper

from aacsearch import AdminClient

admin = AdminClient(
    base_url=os.environ["AACSEARCH_BASE_URL"],
    api_key=os.environ["AACSEARCH_ADMIN_KEY"],
    project_id=os.environ["AACSEARCH_ORG_ID"],
)

def bulk_upsert(index_id, docs, batch_size=500):
    errors = []
    for i in range(0, len(docs), batch_size):
        batch = docs[i:i + batch_size]
        result = admin.batch_upsert_documents(index_id, batch)
        errors.extend(result.get("errors", []))
    return {"total": len(docs), "succeeded": len(docs) - len(errors), "errors": errors}

PHP (CMS connector)

<?php
function bulkUpsert(string $baseUrl, string $token, string $projectId, array $products, int $batchSize = 500): array {
    $errors = [];
    foreach (array_chunk($products, $batchSize) as $batch) {
        $ch = curl_init("$baseUrl/api/projects/$projectId/sync/delta");
        curl_setopt_array($ch, [
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_POST => true,
            CURLOPT_HTTPHEADER => [
                "Authorization: Bearer $token",
                'Content-Type: application/json',
            ],
            CURLOPT_POSTFIELDS => json_encode(['products' => $batch]),
        ]);
        $res = json_decode(curl_exec($ch), true);
        curl_close($ch);
        $errors = array_merge($errors, $res['errors'] ?? []);
    }
    return ['total' => count($products), 'errors' => $errors];
}

For PrestaShop / Bitrix-specific batching, see Connector API lifecycle.

Idempotency

Document IDs are deterministic from external_id. Re-pushing the same document is safe — it overwrites. This is what makes ingest idempotent.

// Run this script as many times as you want — same outcome
await admin.batchUpsertDocuments(indexId, [
	{ external_id: "product-123", title: "Shoes", price: 49.99 },
]);

The same is true for sync/full and sync/delta — the connector can replay any batch without creating duplicates.

Why this matters

CMS connectors run in unreliable environments (cron timeouts, network blips, server restarts mid-job). With idempotency, the recovery story is "just re-run the failed batch." Without it, you would have to track which subset of each batch landed and replay only those.

When idempotency does NOT cover you

Deletes are not idempotent against newly-created docs with the same external_id. If your job deletes product-123 and a separate job creates it, the order matters.
Schema changes between runs. Re-running an old payload after a schema migration may fail per-row validation.
events:track is at-least-once, not exactly-once. Track an event_id (UUID) on the connector side and dedupe in your analytics pipeline.

Idempotency-key header (for events)

For analytics events where idempotency matters, include an Idempotency-Key header:

await fetch(`${BASE}/api/events/track`, {
	method: "POST",
	headers: {
		Authorization: `Bearer ${KEY}`,
		"Content-Type": "application/json",
		"Idempotency-Key": eventId, // UUIDv4 from your side
	},
	body: JSON.stringify({ event: "result_click", properties: { ... } }),
});

Duplicate events with the same Idempotency-Key within a 24h window are deduped server-side.

Retry strategy

Different errors need different retry policies. Get this wrong and you either give up too early (data loss) or hammer the server (rate limit cascade).

HTTP / error	Retry?	Strategy
`4xx` except 429	No	Fix the request
`429 rate_limit_exceeded`	Yes	Wait `Retry-After` seconds
`429 quota_exceeded`	No	Upgrade plan or wait for monthly reset
`502 search_failed`	Yes	1 retry after 1s; if fails, escalate
`502 ingest_failed`	Yes	Exponential backoff: 1s → 2s → 4s
`503 service_unavailable`	Yes	Exponential backoff with jitter
Network error	Yes	Exponential backoff

Node helper with `p-retry`

import pRetry, { AbortError } from "p-retry";
import { AacSearchError } from "@aacsearch/client";

async function withRetry<T>(fn: () => Promise<T>): Promise<T> {
	return pRetry(
		async () => {
			try {
				return await fn();
			} catch (err) {
				if (err instanceof AacSearchError) {
					if (err.status >= 400 && err.status < 500 && err.code !== "rate_limit") {
						throw new AbortError(err); // do not retry 4xx
					}
					if (err.code === "quota_exceeded") {
						throw new AbortError(err);
					}
					if (err.code === "rate_limit") {
						const retryAfter = Number(err.response?.headers.get("Retry-After") ?? 5);
						await new Promise((r) => setTimeout(r, retryAfter * 1000));
					}
				}
				throw err;
			}
		},
		{
			retries: 3,
			factor: 2,
			minTimeout: 1000,
			maxTimeout: 30_000,
			randomize: true, // jitter
		},
	);
}

// Use it:
const result = await withRetry(() => admin.batchUpsertDocuments(indexId, batch));

Python helper

import time, random
from aacsearch import SdkError

def with_retry(fn, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            return fn()
        except SdkError as e:
            if e.status and 400 <= e.status < 500 and e.code != "rate_limit":
                raise
            if e.code == "quota_exceeded":
                raise
            if e.code == "rate_limit":
                retry_after = int(e.response_headers.get("Retry-After", 5))
                time.sleep(retry_after)
            else:
                if attempt == max_retries:
                    raise
                backoff = (2 ** attempt) + random.random()
                time.sleep(backoff)
        except Exception:
            if attempt == max_retries:
                raise
            time.sleep((2 ** attempt) + random.random())

Why jitter

Without jitter, every client that hit 429 at the same moment will retry at the same moment, creating a thundering herd that triggers another 429. Jitter spreads them out.

Webhook signature verification

AACsearch signs every outgoing webhook with HMAC-SHA256. Verify the signature before trusting any payload — anyone with your endpoint URL can POST garbage otherwise.

How signing works

signature = HMAC-SHA256(secret, request_body_bytes)
header X-AACSearch-Signature-256: sha256=<hex>

The secret is the one you configured in Search → Webhooks → Endpoint → "Signing secret".

Node verification

import crypto from "node:crypto";
import { Hono } from "hono";

const app = new Hono();

app.post("/webhooks/aacsearch", async (c) => {
	const rawBody = await c.req.text();
	const signature = c.req.header("X-AACSearch-Signature-256") ?? "";

	const expected = "sha256=" + crypto
		.createHmac("sha256", process.env.AACSEARCH_WEBHOOK_SECRET!)
		.update(rawBody)
		.digest("hex");

	if (!crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
		return c.text("invalid signature", 401);
	}

	const event = JSON.parse(rawBody);
	// safe to process
	await handleEvent(event);
	return c.text("ok", 200);
});

timingSafeEqual prevents timing attacks. Do not use === for signature comparison.

Python verification

import hmac, hashlib
from flask import request, abort

@app.route("/webhooks/aacsearch", methods=["POST"])
def aacsearch_webhook():
    raw_body = request.get_data()
    signature = request.headers.get("X-AACSearch-Signature-256", "")

    expected = "sha256=" + hmac.new(
        os.environ["AACSEARCH_WEBHOOK_SECRET"].encode(),
        raw_body,
        hashlib.sha256,
    ).hexdigest()

    if not hmac.compare_digest(signature, expected):
        abort(401)

    event = request.get_json()
    handle_event(event)
    return "ok", 200

PHP verification

<?php
$rawBody = file_get_contents('php://input');
$signature = $_SERVER['HTTP_X_AACSEARCH_SIGNATURE_256'] ?? '';

$expected = 'sha256=' . hash_hmac('sha256', $rawBody, getenv('AACSEARCH_WEBHOOK_SECRET'));

if (!hash_equals($signature, $expected)) {
    http_response_code(401);
    exit('invalid signature');
}

$event = json_decode($rawBody, true);
handleEvent($event);
http_response_code(200);

hash_equals is the PHP equivalent of timingSafeEqual.

Replay protection

The signature alone does not protect against replay (a recorded valid request resubmitted later). For replay protection, check the event's timestamp field and reject anything older than 5 minutes:

const event = JSON.parse(rawBody);
const eventTime = new Date(event.timestamp).getTime();
if (Math.abs(Date.now() - eventTime) > 5 * 60 * 1000) {
	return c.text("event too old", 401);
}

For exactly-once delivery, dedupe on event.id in your handler — AACsearch retries on 5xx, so you may receive the same event twice.

Read the raw request body for HMAC verification, not the parsed JSON. Re-serializing JSON can change byte order, whitespace, and key escaping — the signature will not match.

Bulk import (initial sync)

For the very first ingest of a large catalog (10k–10M documents), the right tool is the export → import pattern, not many small batches.

// 1. Export your source-of-truth catalog as JSONL
// 2. Stream it through batchUpsert in 500-doc chunks
import { createReadStream } from "node:fs";
import readline from "node:readline";

async function bulkImport(path: string, indexId: string) {
	const stream = readline.createInterface({
		input: createReadStream(path),
		crlfDelay: Infinity,
	});

	let batch: Document[] = [];
	let total = 0;
	const errors: any[] = [];

	for await (const line of stream) {
		batch.push(JSON.parse(line));
		if (batch.length >= 500) {
			const result = await withRetry(() => admin.batchUpsertDocuments(indexId, batch));
			errors.push(...(result.errors ?? []));
			total += batch.length;
			batch = [];
			if (total % 10_000 === 0) console.log(`imported ${total}`);
		}
	}
	if (batch.length) {
		const result = await withRetry(() => admin.batchUpsertDocuments(indexId, batch));
		errors.push(...(result.errors ?? []));
		total += batch.length;
	}
	console.log(`done: ${total} docs, ${errors.length} errors`);
}

Throughput on a typical Pro plan: ~1000 docs/sec sustained. A 1M-document catalog imports in ~15–20 minutes.

Delta sync

After the initial bulk import, switch to delta sync. The connector tracks the last-modified cursor and pushes only changes:

async function deltaSync(indexId: string, since: Date) {
	const changed = await db.products.findMany({
		where: { updatedAt: { gt: since } },
		take: 1000,
	});

	if (!changed.length) return since;

	await withRetry(() =>
		admin.batchUpsertDocuments(
			indexId,
			changed.map((p) => ({ external_id: p.id, ...mapToDocument(p) })),
		),
	);

	return changed[changed.length - 1].updatedAt;
}

For deletions, query separately for "soft-deleted since":

const deleted = await db.products.findMany({
	where: { deletedAt: { gt: since } },
	select: { id: true },
});
if (deleted.length) {
	await admin.batchDeleteDocuments(
		indexId,
		deleted.map((p) => p.id),
	);
}

For PrestaShop, Bitrix, and other CMS-specific connectors, the lifecycle endpoints (sync/full, sync/delta) replace direct batchUpsert calls. See Connector API lifecycle.

Node.js SDK reference — AdminClient API
Python SDK reference
Connector API lifecycle — full sync / delta / heartbeat
Webhooks overview — outgoing event reference
Errors and rate limits — error code matrix
Ingest failures — debug a stalled or failing ingest

Server-side helpers

On this page