Operations
Running AACsearch in production — reindexing, monitoring, rate limits, incident response, and support escalation.
Operations
This section is for the operator on duty. If you're integrating AACsearch from an application, you probably want the Search API or SDKs sections. If something is on fire, jump straight to Troubleshooting.
Where to start
| If you need to… | Go to |
|---|---|
| Understand what changes when you reindex | Reindexing |
| Check whether AACsearch is healthy right now | Status and incidents |
| Know what we back up and how to restore | Backups and retention |
| Wire AACsearch metrics into your observability | Monitoring |
| Understand a 429, or plan headroom | Rate limits and quotas |
| Diagnose a specific failure | Troubleshooting |
| Open a support ticket that doesn't bounce | Support escalation |
The operational model in one diagram
┌──────────────────────────────────────────────────────────────────────────┐
│ CLIENT EDGE │
│ Browser / mobile / connector ──► search.aacsearch.com (Hono + oRPC) │
└──────────────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────┼──────────────────────────┐
▼ ▼ ▼
AUTH & RATE LIMIT INGEST BUFFER SEARCH CLUSTER
- API key hash - PostgreSQL queue - Typesense
- origin check - Worker flushes - Versioned aliases
- per-key bucket - Retry on failure - Alias-swap reindex
│ │ │
└──── audit log ──────────┴──── analytics ───────────┘
▼
POSTGRESQL (residency region)
│
▼
BACKUPS (WAL-G → S3, encrypted)Three things are worth memorizing:
-
Writes don't go straight to the search engine. Public writes enqueue into
SearchIngestBufferin PostgreSQL. A worker picks them up and flushes to Typesense. If Typesense is unreachable, the row stays in the buffer with afailedAtand gets retried. This is Invariant 2 — seeagents.md. -
Reindex is alias-swap, not in-place. We build a new physical collection (
org_abc__products__v4), backfill, and atomically point the alias at it. No request sees a half-built index. See Reindexing. -
Per-key rate limit, not per-IP. A 429 means the key's bucket overflowed, not your IP. See Rate limits and quotas.
SLOs
The shared cluster's SLO is 99.9 % search-query availability per calendar month, measured by synthetic probes from at least three geographic regions per cluster region. Backups are continuous (WAL-G) for PostgreSQL and every 6 hours for Typesense snapshots. RTO target is 1 hour; RPO target is 15 minutes.
Enterprise customers can negotiate tighter SLOs against a dedicated cluster — see Dedicated cluster.
A two-minute health check
Before opening a ticket, please do this loop:
- Open status.aacsearch.com — is there an incident for your region?
- In the dashboard, Project → Indexes: are health badges green, yellow, or red? Yellow = ingest lag or error-rate over threshold; red = drift over 5 %. See Monitoring.
- Dashboard → Diagnostics: does the most recent flush carry an error message?
- If everything looks fine on our side, capture a request ID from your application logs and open a ticket per Support escalation.
This loop solves the majority of tickets without anyone on either side waiting.
See also
- DR recovery runbook — backup restore procedures for our infrastructure
- Security overview — controls referenced from this section
- Enterprise overview — when you need a custom SLA or dedicated infrastructure