Overview
Data Observability & Anomaly Detection
Volume / freshness / schema drift / distribution checks — monitoring for datasets the way SRE monitors services.
Why it matters
Data observability is the operational layer that catches a silent break before the dashboard does.
Going deeper
The four pillars, with the most common silent failure each catches:
| Pillar | Catches | Typical alert |
|---|---|---|
| Freshness | Upstream job hung / cron skipped | max(updated_at) < now() - 1h |
| Volume | Filter regression dropping 90 % of rows | row count ± 3σ of 7-day rolling mean |
| Schema | Column rename, type change, dropped field | hash of information_schema for the table changed |
| Distribution | Locale bug filling a column with NULLs / nonsense | null-rate, cardinality, mean/p99 outside band |
All four can be implemented as cheap SQL or as a managed product (Monte Carlo, Bigeye, Soda, Datafold, Elementary). The decision isn't whether to instrument; it's who maintains the rules and how alert fatigue is kept under control.