Theory
Lineage is the platform's memory
Lineage maps which upstreams produced each table/column and which downstreams depend on it. The mature use cases:
- Impact analysis before a schema change: 'which dashboards, ML features, reverse-ETL syncs read this column?'
- Incident triage when a metric looks wrong: walk the lineage upstream until you find the broken hop.
- Compliance: prove that a PII field never reached an uncontrolled downstream.
OpenLineage is the open standard (originated at Marquez, now under LF AI & Data). It is emitted natively by Airflow, Dagster, dbt, Spark, Flink. Centralising those events into a Marquez / DataHub / OpenMetadata backend gives you the platform-wide graph for free.