Analytics Export
Exporting analytics data as CSV or JSON for BI tools, custom dashboards, and long-term retention.
The dashboard and the oRPC procedures cover most analytics workflows, but two cases need an export:
- BI / warehouse integration. Plug AACSearch into Looker, Metabase, BigQuery, Snowflake — wherever your other product analytics already lives.
- Long-term retention. Plan-default retention is 90 days; for compliance or trend analysis beyond that, pull the data out and store it yourself.
Export endpoints
| Format | Endpoint | Use case |
|---|---|---|
| CSV (period) | GET /v1/projects/{projectId}/analytics?format=csv | Spreadsheets, ad-hoc analysis |
| JSON (period) | GET /v1/projects/{projectId}/analytics?format=json | API consumers, custom dashboards |
| Webhook | Configured per org — analytics events streamed live | Real-time ingestion into a data warehouse |
The CSV and JSON endpoints are bounded by period (?period=24h|7d|30d|90d). For a full backfill, paginate by from / to timestamps. The webhook path streams live; it's the right choice if you need fresher-than-period data.
CSV export
curl -H "Authorization: Bearer <admin-key>" \
"https://your-app.com/v1/projects/proj_123/analytics?period=30d&format=csv" \
> aacsearch-events-30d.csvColumns (stable, additive):
| Column | Type | Description |
|---|---|---|
event_id | string | Stable id; safe to dedupe on. |
created_at | ISO 8601 | UTC timestamp. |
organization_id | string | Tenant id (always equals the caller's org). |
index_slug | string | The index the event was on (null for cross-index events). |
event_type | string | search_query, result_click, conversion, zero_results, …. |
query | string | Normalised query string (lowercased, trimmed). |
query_id | string | Joins search → click → conversion chains. |
session_id | string | Pseudonymous client session. |
anonymous_user_id | string | Optional persistent pseudonymous id. |
position | int | For result_click: rank of the clicked result (1-indexed). |
product_id | string | The document id touched by the event. |
filters_json | string | The filterBy expression at the time of the event. |
sort | string | The sortBy at the time of the event. |
locale | string | Locale code from the event metadata. |
referrer | string | Sanitised — host + path, no query string, no fragment. |
latency_ms | int | Latency for search_query events. |
found | int | Hit count for search_query events. |
conversion_type | string | For conversion events: "purchase" / "signup" / …. |
value_minor | bigint | metadata.value in BigInt minor units (kopecks/cents). Numeric in CSV. |
currency | string | Currency code. |
Order is append-only: new columns get added at the right, never inserted in the middle. Reader code can rely on positional CSV parsing.
JSON export
curl -H "Authorization: Bearer <admin-key>" \
-H "Accept: application/json" \
"https://your-app.com/v1/projects/proj_123/analytics?period=30d&format=json"{
"events": [
{
"eventId": "evt_…",
"createdAt": "2025-…",
"type": "search_query",
"query": "wireless headphones",
"queryId": "qry_…",
"indexSlug": "products",
"found": 47,
"latencyMs": 38,
"filters": { "brand": ["Sony"] },
"metadata": { … }
},
…
],
"page": 1,
"pageSize": 1000,
"total": 12440
}Same fields as CSV, plus the raw filters and metadata JSON unrolled. Page size is 1000 (max); paginate with ?page=2.
Both CSV and JSON enforce the same plan-based retention window. If you ask for period=90d on a plan with 30-day retention, the response includes only the available 30 days plus a truncated: true flag in JSON (or a header X-Truncated: true in CSV).
Webhook streaming
The dashboard's Integrations → Analytics webhook panel configures a destination URL. The system POSTs batched events to it within ~30 seconds of receipt:
POST <your-webhook-url>
Content-Type: application/json
X-AACSearch-Signature: t=…,v1=hmacSha256(…)
{
"organizationId": "org_…",
"batchId": "batch_…",
"events": [
{ "eventId": "…", "type": "search_query", … },
…
]
}Verify the HMAC signature exactly as you would a payment-provider webhook. Replay-protection: the t= timestamp is checked at the receiver — reject anything older than 5 minutes.
Retries: failed POSTs (non-2xx) are retried with exponential backoff up to 24 hours, then dropped. The dashboard's webhook panel shows deliveries and failures — see Webhooks for the full delivery model.
Authentication for export
- CSV / JSON pull: a search admin API key (
ss_search_*withadminscope). Read-onlyss_search_*keys cannot pull analytics. - Webhook: configured server-side; no caller credential.
The admin key is the same one used for POST /v1/projects/.../keys rotation — see API keys.
Retention
| Surface | Default retention |
|---|---|
| Dashboard (24h / 7d / 30d periods) | 90 days for paid plans, 30 days for free, 7 days for 24h-only events |
SearchUsageEvent raw rows | 90 days (paid), 30 days (free) |
SearchActivityEvent | 365 days |
| Webhook delivery log | 30 days |
For longer retention, pull regularly and store yourself. Plan-tier upgrades extend retention but cost more than self-archiving for most teams.
Privacy and PII at export
referreris sanitised — host + path only. No query strings. No fragments.- No raw IP. No email. No phone.
anonymous_user_idandsession_idare pseudonymous identifiers; treat them as PII for retention purposes per your DPA.- For right-to-be-forgotten, you must run the same erasure on the warehouse copy. AACSearch can erase by
anonymous_user_idon request — see Data privacy.
BI integration patterns
The pattern most teams converge on:
- Nightly cron pulls the previous 24 h with
period=24h&format=json. - Lands the rows in a
aacsearch_eventstable in the warehouse. - dbt / Looker / Metabase models compute the rest — top queries, CTR, conversion rate, period-over-period diffs — against that table.
- Old AACSearch data (past 90 days) survives in the warehouse, queryable forever.
The webhook is a better fit if your warehouse can ingest streams natively (BigQuery streaming inserts, Snowflake Snowpipe, Redshift Kinesis). Streaming wins on freshness; nightly pulls win on simplicity.
Rate limits
CSV and JSON export are rate-limited at 6 requests per minute per organization. For larger backfills, paginate within a single export rather than firing many small calls.