How AACsearch limits requests per API key, what a 429 means, and how to recover.

Rate limits and quotas

AACsearch limits two things separately:

Rate limits — requests per minute, per API key. This is a per-key bucket that resets every minute.
Quotas — total search units per month, per organization. This is a billing-tier ceiling.

A 429 response from the public API is always a rate limit. Quota overruns return a different code — see Quota responses below.

Rate limit model

Every persisted API key has a rateLimitPerMinute field. The default is 600 (10 requests / second sustained). On every request, the public auth layer increments a counter for that key in PostgreSQL; if the counter exceeds the limit, the request is rejected with:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "rate_limited",
  "limit": 600
}

The counter resets at the start of each clock minute. There is no leaky-bucket smoothing — if you burst all 600 requests in the first second, the rest of the minute is rejected, and at the top of the next minute you get a fresh 600.

Yes, this means the first second of every minute can absorb a sudden burst. It also means a steady 10 req/s rarely hits the limit, while a single 600-req burst will be partly rejected if it lands in two minutes' worth of bucket time. Spread your retries.

Setting the limit

From the dashboard, API keys → Edit → Rate limit per minute. Reasonable defaults:

Key role	Suggested limit / minute
Public browser search (your widget)	600 – 1 500
Server-side ingest worker	300 – 600
Admin / dashboard (low volume)	60
Connector key (CMS plugin)	1 000
Synthetic probe	30

Defaults are a starting point. If you watch the Usage chart for a week and see a 99th-percentile minute below half the limit, you can lower it. If you see sustained 80 %+ usage, you need to either raise the limit or split traffic across two keys.

Recovering from a 429

If your client gets a 429:

Don't retry immediately. A retry inside the same clock minute will fail again.
Wait until the next clock minute. Optionally use the Retry-After header if present.
Add jitter. Clients all retrying at exactly the start of the next minute create a synchronized stampede.

A backoff loop that works:

async function searchWithRetry(payload, maxAttempts = 5) {
	for (let attempt = 0; attempt < maxAttempts; attempt++) {
		const res = await fetch("/api/v1/.../search", { method: "POST", body: payload });
		if (res.status !== 429) return res;
		const ms = 1000 * 2 ** attempt + Math.random() * 1000;
		await new Promise((r) => setTimeout(r, Math.min(ms, 60_000)));
	}
	throw new Error("Rate limited after retries");
}

Browser clients should also degrade gracefully: show "Searching…" with a small delay rather than spamming retries while the user types.

Why you might be over the limit unexpectedly

Cause	Diagnosis
Same key in multiple workers.	Each worker contributes to the same bucket. The dashboard shows usage spikes after a deploy.
Search-as-you-type without debouncing.	Browser fires one search per keystroke. Debounce 150–250 ms.
Webhook fan-out triggering reindex.	A backlog of webhook deliveries can fire many writes per minute.
A bot.	Look at the API keys → Usage → User-Agent breakdown. If most of the traffic is from one UA you don't recognize, suspect a scraper. Add it to your CSP and consider rotating the key.

Per-organization quotas

In addition to the per-key rate limit, your organization has a monthly quota of search units. A search unit is one search request to the public API (suggest, multi-search, federated, geo, etc.).

The quota is enforced after the rate-limit gate but before the search runs. When your organization exceeds its monthly quota:

Search continues to work until you exceed your plan + overage budget.
The dashboard surfaces a quota warning at 80 % and a quota-exhausted banner once the budget is gone.
If you have a wallet balance and overage-bypass is enabled, search units are deducted from the wallet and the search continues.

See Billing wallet for how overage works (Enterprise customers may have flat-rate quotas with no overage).

Quota responses

When the wallet/overage path also runs out:

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "error": "quota_exhausted",
  "resetAt": "2026-06-01T00:00:00Z",
  "topUpUrl": "https://app.aacsearch.com/organization/billing/wallet"
}

402 Payment Required is a deliberate choice — 429 is for "you're going too fast", 402 is for "you've used what you paid for". Treat them differently in your client.

Headers we send

Header	Meaning
`x-ratelimit-limit`	Your key's `rateLimitPerMinute`.
`x-ratelimit-remaining`	Requests remaining in the current clock minute.
`x-ratelimit-reset`	UTC timestamp when the bucket resets.
`retry-after`	Seconds to wait, set on 429 responses.

The x-request-id header is also returned on every request; capture it in your logs — it's the single most useful piece of information for support escalation.

What does not count toward the rate limit

GET /api/v1/health — health probe.
GET /scim/v2/ServiceProviderConfig — SCIM capability discovery.
Webhooks from AACsearch to your endpoint (you set the rate at which we deliver, not the other way around).

Common mistakes

Setting a key's rate limit to "very high" so we never see 429s. The limit exists to protect you from a stampede that costs you money. Set it where you think traffic should be.
Sharing one key across teams. Two teams sharing a key cannot diagnose each other's load. Use one key per service.
Implementing exponential backoff against the wrong response. 5xx deserves backoff; 4xx (including 429) is for the client to fix. The only correct 4xx retry is on 429 and 408.

Rate limits and quotas

Rate limits and quotas

Rate limit model

Setting the limit

Recovering from a 429

Why you might be over the limit unexpectedly

Per-organization quotas

Quota responses

Headers we send

What does not count toward the rate limit

Common mistakes

See also

On this page