Rate limits and quotas
How AACsearch limits requests per API key, what a 429 means, and how to recover.
Rate limits and quotas
AACsearch limits two things separately:
- Rate limits — requests per minute, per API key. This is a per-key bucket that resets every minute.
- Quotas — total search units per month, per organization. This is a billing-tier ceiling.
A 429 response from the public API is always a rate limit. Quota overruns return a different code — see Quota responses below.
Rate limit model
Every persisted API key has a rateLimitPerMinute field. The default is 600 (10 requests / second sustained). On every request, the public auth layer increments a counter for that key in PostgreSQL; if the counter exceeds the limit, the request is rejected with:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": "rate_limited",
"limit": 600
}The counter resets at the start of each clock minute. There is no leaky-bucket smoothing — if you burst all 600 requests in the first second, the rest of the minute is rejected, and at the top of the next minute you get a fresh 600.
Yes, this means the first second of every minute can absorb a sudden burst. It also means a steady 10 req/s rarely hits the limit, while a single 600-req burst will be partly rejected if it lands in two minutes' worth of bucket time. Spread your retries.
Setting the limit
From the dashboard, API keys → Edit → Rate limit per minute. Reasonable defaults:
| Key role | Suggested limit / minute |
|---|---|
| Public browser search (your widget) | 600 – 1 500 |
| Server-side ingest worker | 300 – 600 |
| Admin / dashboard (low volume) | 60 |
| Connector key (CMS plugin) | 1 000 |
| Synthetic probe | 30 |
Defaults are a starting point. If you watch the Usage chart for a week and see a 99th-percentile minute below half the limit, you can lower it. If you see sustained 80 %+ usage, you need to either raise the limit or split traffic across two keys.
Recovering from a 429
If your client gets a 429:
- Don't retry immediately. A retry inside the same clock minute will fail again.
- Wait until the next clock minute. Optionally use the
Retry-Afterheader if present. - Add jitter. Clients all retrying at exactly the start of the next minute create a synchronized stampede.
A backoff loop that works:
async function searchWithRetry(payload, maxAttempts = 5) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const res = await fetch("/api/v1/.../search", { method: "POST", body: payload });
if (res.status !== 429) return res;
const ms = 1000 * 2 ** attempt + Math.random() * 1000;
await new Promise((r) => setTimeout(r, Math.min(ms, 60_000)));
}
throw new Error("Rate limited after retries");
}Browser clients should also degrade gracefully: show "Searching…" with a small delay rather than spamming retries while the user types.
Why you might be over the limit unexpectedly
| Cause | Diagnosis |
|---|---|
| Same key in multiple workers. | Each worker contributes to the same bucket. The dashboard shows usage spikes after a deploy. |
| Search-as-you-type without debouncing. | Browser fires one search per keystroke. Debounce 150–250 ms. |
| Webhook fan-out triggering reindex. | A backlog of webhook deliveries can fire many writes per minute. |
| A bot. | Look at the API keys → Usage → User-Agent breakdown. If most of the traffic is from one UA you don't recognize, suspect a scraper. Add it to your CSP and consider rotating the key. |
Per-organization quotas
In addition to the per-key rate limit, your organization has a monthly quota of search units. A search unit is one search request to the public API (suggest, multi-search, federated, geo, etc.).
The quota is enforced after the rate-limit gate but before the search runs. When your organization exceeds its monthly quota:
- Search continues to work until you exceed your plan + overage budget.
- The dashboard surfaces a quota warning at 80 % and a quota-exhausted banner once the budget is gone.
- If you have a wallet balance and overage-bypass is enabled, search units are deducted from the wallet and the search continues.
See Billing wallet for how overage works (Enterprise customers may have flat-rate quotas with no overage).
Quota responses
When the wallet/overage path also runs out:
HTTP/1.1 402 Payment Required
Content-Type: application/json
{
"error": "quota_exhausted",
"resetAt": "2026-06-01T00:00:00Z",
"topUpUrl": "https://app.aacsearch.com/organization/billing/wallet"
}402 Payment Required is a deliberate choice — 429 is for "you're going too fast", 402 is for "you've used what you paid for". Treat them differently in your client.
Headers we send
| Header | Meaning |
|---|---|
x-ratelimit-limit | Your key's rateLimitPerMinute. |
x-ratelimit-remaining | Requests remaining in the current clock minute. |
x-ratelimit-reset | UTC timestamp when the bucket resets. |
retry-after | Seconds to wait, set on 429 responses. |
The x-request-id header is also returned on every request; capture it in your logs — it's the single most useful piece of information for support escalation.
What does not count toward the rate limit
GET /api/v1/health— health probe.GET /scim/v2/ServiceProviderConfig— SCIM capability discovery.- Webhooks from AACsearch to your endpoint (you set the rate at which we deliver, not the other way around).
Common mistakes
- Setting a key's rate limit to "very high" so we never see 429s. The limit exists to protect you from a stampede that costs you money. Set it where you think traffic should be.
- Sharing one key across teams. Two teams sharing a key cannot diagnose each other's load. Use one key per service.
- Implementing exponential backoff against the wrong response.
5xxdeserves backoff;4xx(including 429) is for the client to fix. The only correct4xxretry is on429and408.
See also
- Monitoring — how to alert before you hit the limit
- Status and incidents — when the cluster, not your key, is the issue
- Best practices — including key-per-service hygiene