Theory
A contract turns 'oh sorry, we changed it' into a build failure
A data contract is a producer-owned, machine-readable declaration of:
- Schema — columns, types, nullability, semantic meaning.
- Quality expectations — uniqueness, ranges, distributions, freshness SLA.
- Ownership — a team, an on-call rotation, a Slack channel.
- Versioning policy — what's a breaking change, deprecation window.
Enforcement happens in CI (the producer's pipeline tests against the contract before merge) and at the boundary (consumer-side validation with Great Expectations / Soda / dbt tests). The cultural shift is the hard part: contracts move responsibility for analytics-grade data from the data team to the team that owns the source system. Done right, they end the 'who changed prod_users.email_v2?' phone call forever.