Lineage and Impact Analysis

Track where every value came from and which downstream consumers depend on it.

0/2 done

Overview

Lineage and Impact Analysis

Track where every value came from and which downstream consumers depend on it.

Why it matters

Lineage answers 'if I change / break this upstream column, what blows up downstream?' Tools (OpenLineage, dbt, Atlas) parse SQL to auto-extract it.

Going deeper

Three lineage layers worth distinguishing:

  • Static lineage — parsed from SQL / dbt DAGs. Cheap, complete, but blind to runtime branching (CASE WHEN, dynamic SQL).
  • Runtime lineage — emitted by the engine (OpenLineage events from Airflow, Spark, dbt). Captures what actually ran.
  • BI / consumer lineage — extends lineage past the warehouse into Looker, Tableau, ML feature stores. The hardest tier, and the one that closes the loop.

Analogy

Lineage is the supply-chain receipt for a number on a dashboard.

Imagine a chocolate bar in a supermarket. The packaging tells you the brand; the receipt tells you the shop. Lineage is the bill of materials underneath — which cocoa farm, which roasting batch, which factory line, which truck. When a contamination alert fires for one cocoa batch, the supply chain instantly knows every bar that contains it. Column-level data lineage does the same: when an upstream currency_rate column changes meaning, you can list every dashboard, alert, and ML feature that consumed it.

Make it stick

Use the prompts below to anchor lineage and impact analysis to something you actually own.

  • Pick the most-viewed dashboard you own. Can you list every column that feeds it within 5 minutes? If not, that gap *is* your lineage problem.
  • Which breaking change in the last quarter could lineage have prevented? Estimate the hours lost.
  • Where in your stack does lineage *end* today (warehouse? BI? ML?) — and what's the next mile to close?

Reading in progress · 0 of 2 activities done