Theory
Fact + Dimension = star schema
Kimball's dimensional model is simple and shockingly durable:
- A fact table stores measurements of a business process (e.g., an order line, a user click, a temperature reading). Every row has a grain — the precise level of detail (one row per order line, per user login). Get the grain wrong and every metric will lie.
- A dimension table stores the who/what/where/when/why context (customer, product, date, geographical region). Dimensions are denormalised on purpose — the goal is query speed (fewer joins), not storage efficiency.
- A star schema is one central fact table surrounded by its dimensions. A snowflake normalises those dimensions further; an OBT ('one big table') joins everything ahead of time for columnar engines.
Use Case Example: Imagine a fct_sales table (grain: one row per item sold). It has columns like amount, customer_id, product_id. It joins to a dim_customer table to get the customer's demographics, and a dim_product table to get the product categories. This star-like layout allows analysts to effortlessly slice 'sales amount' by 'customer region' and 'product category'.