SCDs and Data Vault — Modelling History

How dimensions change over time, and when to reach for Data Vault instead.

0/2 done

Theory

Dimensions change. Your model decides who notices.

When a customer moves house, do reports about last year's orders show the old city (truth-as-of-then) or the new one (truth-as-of-now)? That choice is encoded as an SCD type:

  • SCD Type 1 — overwrite. History lost. Cheapest.
  • SCD Type 2 — new row with valid_from / valid_to / is_current. The industry default for analytics.
  • SCD Type 3 — keep a previous_value column. Useful for one specific change, not general history.

Data Vault is the heavier alternative — Hubs (business keys), Links (relationships), Satellites (descriptive + historical attributes) — designed for regulated, multi-source warehouses where auditability and source-system independence matter more than query simplicity. Reach for it when Kimball starts feeling brittle under merger/acquisition data churn.

Analogy

An SCD-1 dimension is your phone's contacts app — you don't remember your friend's previous phone number. An SCD-2 dimension is a person's CV — every employer they ever had, with start and end dates. An SCD-3 is the 'former name' field on a passport — useful but narrow. Data Vault is the full archive a biographer would use — every claim, every source, every date it was true. Pick the artefact that matches what your readers will ask.

Reflect

SCD choice is a product decision, not just a modelling one. The analyst asking 'how many customers moved house this year' silently assumes SCD-2; the analyst asking 'what's their current country' silently assumes SCD-1. Get the conversation in the open before you build.

  • Which of your dimensions are *secretly* SCD-1 when they should be SCD-2?
  • What's the smallest Data Vault you would deploy — and what's the trigger to graduate to it?

Reading in progress · 0 of 2 activities done