Overview
A second is a long time
An interactive dashboard at 5-second latency feels broken; at 500 ms it feels alive. Hitting 500 ms over a 10 TB fact table is impossible without pre-aggregation: the engine writes a small, dimensionally-keyed rollup table on a schedule, then transparently substitutes it when a query matches.
The contract a pre-aggregation makes
- Match condition —
(measures, dimensions, time grain, filters)of the live query are a subset of the rollup's. - Refresh policy — every 30 min, every hour, hourly+real-time delta, etc.
- Storage target — Snowflake table, Cube Store, ClickHouse, S3+Iceberg.
Why this is a semantic-layer concern, not a warehouse concern
Materialised views in the warehouse are a flat list. The semantic layer knows the metric tree and can pick which rollups to maintain so the matching subset covers 90% of real queries with 5% of the storage. It is an instrumentation-driven optimisation, not a guess.