If a query asks for `revenue BY country, MONTH` and a rollup exists at `BY country, productLine, DAY`, what happens?

Cube uses the rollup: country is a subset of {country, productLine}, and DAY-grain rolls up to MONTH-grain

Cube creates a new rollup automatically

Cube falls back to the live SQL because dimensions don't match exactly

Cube uses the rollup: country is a subset of {country, productLine}, and DAY-grain rolls up to MONTH-grain

Cube Pre-aggregations Deep Dive — Semantic Web Academy

Overview

How Cube actually picks a rollup

Pre-aggregations are not magic — there is a deterministic matching algorithm. For an incoming query Q, Cube looks at every pre-aggregation P on the cubes Q touches and asks:

Are Q's measures a subset of P's measures? No → skip.
Are Q's dimensions a subset of P's dimensions? No → skip.
Is Q's time grain ≥ P's grain? (day-rollup can answer month, but not the reverse.)
Are Q's filters compatible with P's filter constraints?

First match wins. If nothing matches, Cube falls back to the live SQL.

The three pre-aggregation kinds

Kind	What it stores	When to use
`rollup`	Pre-aggregated measures by dims+grain	95% of cases
`original_sql`	A snapshot of the base SQL	When you can't change the warehouse but want a refresh boundary
`rollup_join`	Rollup of multiple cubes joined first	High-cardinality joins that always co-occur

Refresh strategies

Cron-style — refresh_key: { every: '30 minute' }. Simple, predictable.
SQL key — refresh when MAX(updated_at) changes. Cheap when the source table has a watermark.
Incremental — refresh only the latest partition. Mandatory at TB scale.

Pre-organised library shelves

Pre-aggregation matching is search routing in a library. A patron asks for 'all 19th-century French novels'; the librarian checks: do we have a pre-organised shelf for 19th-century European novels by country? Yes — pull the shelf, hand the patron the France slice. Subset of dimensions, compatible grain — served instantly. No matching shelf? Walk the stacks (scan the fact table). Pre-aggregations are pre-organised shelves your engine maintains overnight.

Reflect

The instrumentation move that makes pre-aggregation design a science instead of guesswork: log every query Cube serves with its (measures, dimensions, grain, was-rollup-hit?) tuple. After a week, the top 20 unique tuples cover 80% of traffic. Build pre-aggregations for those 20 — and walk away from the rest.

›Do you currently log query shape and rollup-hit-rate? If not, that's the cheapest performance win available.
›What is the smallest set of pre-aggregations that would cover 80% of your slowest dashboards?

Reading in progress · 0 of 1 activity done