AACsearch
Operations & Reliability

Operations

Running AACsearch in production — reindexing, monitoring, rate limits, incident response, and support escalation.

Operations

This section is for the operator on duty. If you're integrating AACsearch from an application, you probably want the Search API or SDKs sections. If something is on fire, jump straight to Troubleshooting.

Where to start

If you need to…Go to
Understand what changes when you reindexReindexing
Check whether AACsearch is healthy right nowStatus and incidents
Know what we back up and how to restoreBackups and retention
Wire AACsearch metrics into your observabilityMonitoring
Understand a 429, or plan headroomRate limits and quotas
Diagnose a specific failureTroubleshooting
Open a support ticket that doesn't bounceSupport escalation

The operational model in one diagram

┌──────────────────────────────────────────────────────────────────────────┐
│                              CLIENT EDGE                                  │
│  Browser / mobile / connector  ──►  search.aacsearch.com (Hono + oRPC)   │
└──────────────────────────────────────────────────────────────────────────┘

        ┌─────────────────────────┼──────────────────────────┐
        ▼                         ▼                          ▼
   AUTH & RATE LIMIT       INGEST BUFFER             SEARCH CLUSTER
   - API key hash          - PostgreSQL queue        - Typesense
   - origin check          - Worker flushes          - Versioned aliases
   - per-key bucket        - Retry on failure        - Alias-swap reindex
        │                         │                          │
        └──── audit log ──────────┴──── analytics ───────────┘

                            POSTGRESQL (residency region)


                            BACKUPS (WAL-G → S3, encrypted)

Three things are worth memorizing:

  1. Writes don't go straight to the search engine. Public writes enqueue into SearchIngestBuffer in PostgreSQL. A worker picks them up and flushes to Typesense. If Typesense is unreachable, the row stays in the buffer with a failedAt and gets retried. This is Invariant 2 — see agents.md.

  2. Reindex is alias-swap, not in-place. We build a new physical collection (org_abc__products__v4), backfill, and atomically point the alias at it. No request sees a half-built index. See Reindexing.

  3. Per-key rate limit, not per-IP. A 429 means the key's bucket overflowed, not your IP. See Rate limits and quotas.

SLOs

The shared cluster's SLO is 99.9 % search-query availability per calendar month, measured by synthetic probes from at least three geographic regions per cluster region. Backups are continuous (WAL-G) for PostgreSQL and every 6 hours for Typesense snapshots. RTO target is 1 hour; RPO target is 15 minutes.

Enterprise customers can negotiate tighter SLOs against a dedicated cluster — see Dedicated cluster.

A two-minute health check

Before opening a ticket, please do this loop:

  1. Open status.aacsearch.com — is there an incident for your region?
  2. In the dashboard, Project → Indexes: are health badges green, yellow, or red? Yellow = ingest lag or error-rate over threshold; red = drift over 5 %. See Monitoring.
  3. Dashboard → Diagnostics: does the most recent flush carry an error message?
  4. If everything looks fine on our side, capture a request ID from your application logs and open a ticket per Support escalation.

This loop solves the majority of tickets without anyone on either side waiting.

See also

On this page