Server-side helpers
Batching, idempotency, retry strategy, and webhook signature verification — the patterns every server-side SDK consumer should follow when ingesting documents and receiving webhooks from AACsearch.
The browser SDK is a thin wrapper around the search API. Server-side SDK consumption — ingesting documents, running CMS connectors, receiving webhooks — needs more discipline. This page collects the four patterns you should always follow.
Batching
Single-document writes are slow and expensive. Always batch.
| Endpoint | Max batch size | Recommended size |
|---|---|---|
documents:batch (upsert) | 1000 | 100–500 |
documents:batchdelete | 1000 | 100–1000 |
sync/full | 1000 | 500 |
sync/delta | 1000 | 100–500 |
events:batch | 1000 | 100 |
The batch endpoints return per-row error arrays — partial-success is the norm.
Node helper
import { AdminClient } from "@aacsearch/client";
const admin = new AdminClient({
baseUrl: process.env.AACSEARCH_BASE_URL!,
apiKey: process.env.AACSEARCH_ADMIN_KEY!,
projectId: process.env.AACSEARCH_ORG_ID!,
});
async function bulkUpsert(indexId: string, docs: Document[], batchSize = 500) {
const errors: Array<{ id: string; error: string; message: string }> = [];
for (let i = 0; i < docs.length; i += batchSize) {
const batch = docs.slice(i, i + batchSize);
const result = await admin.batchUpsertDocuments(indexId, batch);
errors.push(...(result.errors ?? []));
}
return { total: docs.length, succeeded: docs.length - errors.length, errors };
}Python helper
from aacsearch import AdminClient
admin = AdminClient(
base_url=os.environ["AACSEARCH_BASE_URL"],
api_key=os.environ["AACSEARCH_ADMIN_KEY"],
project_id=os.environ["AACSEARCH_ORG_ID"],
)
def bulk_upsert(index_id, docs, batch_size=500):
errors = []
for i in range(0, len(docs), batch_size):
batch = docs[i:i + batch_size]
result = admin.batch_upsert_documents(index_id, batch)
errors.extend(result.get("errors", []))
return {"total": len(docs), "succeeded": len(docs) - len(errors), "errors": errors}PHP (CMS connector)
<?php
function bulkUpsert(string $baseUrl, string $token, string $projectId, array $products, int $batchSize = 500): array {
$errors = [];
foreach (array_chunk($products, $batchSize) as $batch) {
$ch = curl_init("$baseUrl/api/projects/$projectId/sync/delta");
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
"Authorization: Bearer $token",
'Content-Type: application/json',
],
CURLOPT_POSTFIELDS => json_encode(['products' => $batch]),
]);
$res = json_decode(curl_exec($ch), true);
curl_close($ch);
$errors = array_merge($errors, $res['errors'] ?? []);
}
return ['total' => count($products), 'errors' => $errors];
}For PrestaShop / Bitrix-specific batching, see Connector API lifecycle.
Idempotency
Document IDs are deterministic from external_id. Re-pushing the same document is safe — it overwrites. This is what makes ingest idempotent.
// Run this script as many times as you want — same outcome
await admin.batchUpsertDocuments(indexId, [
{ external_id: "product-123", title: "Shoes", price: 49.99 },
]);The same is true for sync/full and sync/delta — the connector can replay any batch without creating duplicates.
Why this matters
CMS connectors run in unreliable environments (cron timeouts, network blips, server restarts mid-job). With idempotency, the recovery story is "just re-run the failed batch." Without it, you would have to track which subset of each batch landed and replay only those.
When idempotency does NOT cover you
- Deletes are not idempotent against newly-created docs with the same
external_id. If your job deletesproduct-123and a separate job creates it, the order matters. - Schema changes between runs. Re-running an old payload after a schema migration may fail per-row validation.
events:trackis at-least-once, not exactly-once. Track anevent_id(UUID) on the connector side and dedupe in your analytics pipeline.
Idempotency-key header (for events)
For analytics events where idempotency matters, include an Idempotency-Key header:
await fetch(`${BASE}/api/events/track`, {
method: "POST",
headers: {
Authorization: `Bearer ${KEY}`,
"Content-Type": "application/json",
"Idempotency-Key": eventId, // UUIDv4 from your side
},
body: JSON.stringify({ event: "result_click", properties: { ... } }),
});Duplicate events with the same Idempotency-Key within a 24h window are deduped server-side.
Retry strategy
Different errors need different retry policies. Get this wrong and you either give up too early (data loss) or hammer the server (rate limit cascade).
| HTTP / error | Retry? | Strategy |
|---|---|---|
4xx except 429 | No | Fix the request |
429 rate_limit_exceeded | Yes | Wait Retry-After seconds |
429 quota_exceeded | No | Upgrade plan or wait for monthly reset |
502 search_failed | Yes | 1 retry after 1s; if fails, escalate |
502 ingest_failed | Yes | Exponential backoff: 1s → 2s → 4s |
503 service_unavailable | Yes | Exponential backoff with jitter |
| Network error | Yes | Exponential backoff |
Node helper with p-retry
import pRetry, { AbortError } from "p-retry";
import { AacSearchError } from "@aacsearch/client";
async function withRetry<T>(fn: () => Promise<T>): Promise<T> {
return pRetry(
async () => {
try {
return await fn();
} catch (err) {
if (err instanceof AacSearchError) {
if (err.status >= 400 && err.status < 500 && err.code !== "rate_limit") {
throw new AbortError(err); // do not retry 4xx
}
if (err.code === "quota_exceeded") {
throw new AbortError(err);
}
if (err.code === "rate_limit") {
const retryAfter = Number(err.response?.headers.get("Retry-After") ?? 5);
await new Promise((r) => setTimeout(r, retryAfter * 1000));
}
}
throw err;
}
},
{
retries: 3,
factor: 2,
minTimeout: 1000,
maxTimeout: 30_000,
randomize: true, // jitter
},
);
}
// Use it:
const result = await withRetry(() => admin.batchUpsertDocuments(indexId, batch));Python helper
import time, random
from aacsearch import SdkError
def with_retry(fn, max_retries=3):
for attempt in range(max_retries + 1):
try:
return fn()
except SdkError as e:
if e.status and 400 <= e.status < 500 and e.code != "rate_limit":
raise
if e.code == "quota_exceeded":
raise
if e.code == "rate_limit":
retry_after = int(e.response_headers.get("Retry-After", 5))
time.sleep(retry_after)
else:
if attempt == max_retries:
raise
backoff = (2 ** attempt) + random.random()
time.sleep(backoff)
except Exception:
if attempt == max_retries:
raise
time.sleep((2 ** attempt) + random.random())Why jitter
Without jitter, every client that hit 429 at the same moment will retry at the same moment, creating a thundering herd that triggers another 429. Jitter spreads them out.
Webhook signature verification
AACsearch signs every outgoing webhook with HMAC-SHA256. Verify the signature before trusting any payload — anyone with your endpoint URL can POST garbage otherwise.
How signing works
signature = HMAC-SHA256(secret, request_body_bytes)
header X-AACSearch-Signature-256: sha256=<hex>The secret is the one you configured in Search → Webhooks → Endpoint → "Signing secret".
Node verification
import crypto from "node:crypto";
import { Hono } from "hono";
const app = new Hono();
app.post("/webhooks/aacsearch", async (c) => {
const rawBody = await c.req.text();
const signature = c.req.header("X-AACSearch-Signature-256") ?? "";
const expected = "sha256=" + crypto
.createHmac("sha256", process.env.AACSEARCH_WEBHOOK_SECRET!)
.update(rawBody)
.digest("hex");
if (!crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
return c.text("invalid signature", 401);
}
const event = JSON.parse(rawBody);
// safe to process
await handleEvent(event);
return c.text("ok", 200);
});timingSafeEqual prevents timing attacks. Do not use === for signature comparison.
Python verification
import hmac, hashlib
from flask import request, abort
@app.route("/webhooks/aacsearch", methods=["POST"])
def aacsearch_webhook():
raw_body = request.get_data()
signature = request.headers.get("X-AACSearch-Signature-256", "")
expected = "sha256=" + hmac.new(
os.environ["AACSEARCH_WEBHOOK_SECRET"].encode(),
raw_body,
hashlib.sha256,
).hexdigest()
if not hmac.compare_digest(signature, expected):
abort(401)
event = request.get_json()
handle_event(event)
return "ok", 200PHP verification
<?php
$rawBody = file_get_contents('php://input');
$signature = $_SERVER['HTTP_X_AACSEARCH_SIGNATURE_256'] ?? '';
$expected = 'sha256=' . hash_hmac('sha256', $rawBody, getenv('AACSEARCH_WEBHOOK_SECRET'));
if (!hash_equals($signature, $expected)) {
http_response_code(401);
exit('invalid signature');
}
$event = json_decode($rawBody, true);
handleEvent($event);
http_response_code(200);hash_equals is the PHP equivalent of timingSafeEqual.
Replay protection
The signature alone does not protect against replay (a recorded valid request resubmitted later). For replay protection, check the event's timestamp field and reject anything older than 5 minutes:
const event = JSON.parse(rawBody);
const eventTime = new Date(event.timestamp).getTime();
if (Math.abs(Date.now() - eventTime) > 5 * 60 * 1000) {
return c.text("event too old", 401);
}For exactly-once delivery, dedupe on event.id in your handler — AACsearch retries on 5xx, so you may receive the same event twice.
Read the raw request body for HMAC verification, not the parsed JSON. Re-serializing JSON can change byte order, whitespace, and key escaping — the signature will not match.
Bulk import (initial sync)
For the very first ingest of a large catalog (10k–10M documents), the right tool is the export → import pattern, not many small batches.
// 1. Export your source-of-truth catalog as JSONL
// 2. Stream it through batchUpsert in 500-doc chunks
import { createReadStream } from "node:fs";
import readline from "node:readline";
async function bulkImport(path: string, indexId: string) {
const stream = readline.createInterface({
input: createReadStream(path),
crlfDelay: Infinity,
});
let batch: Document[] = [];
let total = 0;
const errors: any[] = [];
for await (const line of stream) {
batch.push(JSON.parse(line));
if (batch.length >= 500) {
const result = await withRetry(() => admin.batchUpsertDocuments(indexId, batch));
errors.push(...(result.errors ?? []));
total += batch.length;
batch = [];
if (total % 10_000 === 0) console.log(`imported ${total}`);
}
}
if (batch.length) {
const result = await withRetry(() => admin.batchUpsertDocuments(indexId, batch));
errors.push(...(result.errors ?? []));
total += batch.length;
}
console.log(`done: ${total} docs, ${errors.length} errors`);
}Throughput on a typical Pro plan: ~1000 docs/sec sustained. A 1M-document catalog imports in ~15–20 minutes.
Delta sync
After the initial bulk import, switch to delta sync. The connector tracks the last-modified cursor and pushes only changes:
async function deltaSync(indexId: string, since: Date) {
const changed = await db.products.findMany({
where: { updatedAt: { gt: since } },
take: 1000,
});
if (!changed.length) return since;
await withRetry(() =>
admin.batchUpsertDocuments(
indexId,
changed.map((p) => ({ external_id: p.id, ...mapToDocument(p) })),
),
);
return changed[changed.length - 1].updatedAt;
}For deletions, query separately for "soft-deleted since":
const deleted = await db.products.findMany({
where: { deletedAt: { gt: since } },
select: { id: true },
});
if (deleted.length) {
await admin.batchDeleteDocuments(
indexId,
deleted.map((p) => p.id),
);
}For PrestaShop, Bitrix, and other CMS-specific connectors, the lifecycle endpoints (sync/full, sync/delta) replace direct batchUpsert calls. See Connector API lifecycle.
Related
- Node.js SDK reference —
AdminClientAPI - Python SDK reference
- Connector API lifecycle — full sync / delta / heartbeat
- Webhooks overview — outgoing event reference
- Errors and rate limits — error code matrix
- Ingest failures — debug a stalled or failing ingest
Multi-locale catalog
One product, multiple locales. Single index with a `locale` facet, locale-specific text fields, and a scoped token that pins the user to their language.
Migration overview
How to migrate to AACsearch from your current search stack — database LIKE, Algolia, Elasticsearch, Meilisearch, or self-hosted Typesense.