Inputs change vs world changes
Two drifts, two responses
- Data drift (covariate shift) — the distribution of inputs changes; the relationship
P(y|x)is unchanged. Example: more users from a new country. Fix: retrain on fresh data. - Concept drift — the relationship changes; same
x, differenty. Example: post-COVID, the same browsing pattern no longer predicts the same purchase. Fix: rethink features, often re-design the model.
Common detectors
| Detector | What it measures | Good for |
|---|---|---|
| PSI (Population Stability Index) | Distributional change per feature | Tabular, monthly |
| KS test | Difference between two empirical distributions | Numeric features |
| Chi-square | Categorical distribution change | Categorical features |
| Model performance | Direct outcome metric vs labels | When labels arrive fast enough |
Always pair distribution metrics with outcome metrics — drift without performance loss is sometimes irrelevant; performance loss without drift is the interesting problem.