Dimensional Modelling — Star, Snowflake, OBT

Facts and dimensions, grain, additivity. Kimball's 50-year legacy explained.

0/2 done

Theory

Fact + Dimension = star schema

Kimball's dimensional model is simple and shockingly durable:

  • A fact table stores measurements of a business process (e.g., an order line, a user click, a temperature reading). Every row has a grain — the precise level of detail (one row per order line, per user login). Get the grain wrong and every metric will lie.
  • A dimension table stores the who/what/where/when/why context (customer, product, date, geographical region). Dimensions are denormalised on purpose — the goal is query speed (fewer joins), not storage efficiency.
  • A star schema is one central fact table surrounded by its dimensions. A snowflake normalises those dimensions further; an OBT ('one big table') joins everything ahead of time for columnar engines.

Use Case Example: Imagine a fct_sales table (grain: one row per item sold). It has columns like amount, customer_id, product_id. It joins to a dim_customer table to get the customer's demographics, and a dim_product table to get the product categories. This star-like layout allows analysts to effortlessly slice 'sales amount' by 'customer region' and 'product category'.

Analogy

A star schema is an IKEA showroom. The fact table is the central warehouse aisle stacked with the actual flat-pack boxes you buy (the measurable events — each sale, each unit). The dimensions are the themed room displays around it — 'living room', 'kitchen', 'by colour', 'by price' — each one a different way to describe and slice the same products. You never confuse a sofa (a fact you can count and sum) with the 'Scandinavian style' label (a dimension you filter by). Get that separation right — countable measures in the centre, descriptive context around the edge — and every business question becomes a short walk from the aisle to a display.

Star schema — fct_orders

Click a node to focus its neighbourhood · drag to pan · scroll to zoom

A star schema, drawn

One fact (orders) in the centre, dimensions around it. Every analytical question becomes a join from fct_orders to one or more dimensions, plus an aggregation.

Reflect

Kimball is older than most data engineers reading this. It is durable because it encodes a physical truth — the cost of scanning a dimension vs a fact — that no amount of cloud has erased.

  • Which of your fact tables has a fuzzy grain right now?
  • Where would adopting OBT cut query cost — and where would it cost you flexibility you actually use?

Reading in progress · 0 of 2 activities done