Data Drift vs Concept Drift

The two failure modes you must monitor — and they have different fixes.

0/2 done

Inputs change vs world changes

Two drifts, two responses

  • Data drift (covariate shift) — the distribution of inputs changes; the relationship P(y|x) is unchanged. Example: more users from a new country. Fix: retrain on fresh data.
  • Concept drift — the relationship changes; same x, different y. Example: post-COVID, the same browsing pattern no longer predicts the same purchase. Fix: rethink features, often re-design the model.

Common detectors

DetectorWhat it measuresGood for
PSI (Population Stability Index)Distributional change per featureTabular, monthly
KS testDifference between two empirical distributionsNumeric features
Chi-squareCategorical distribution changeCategorical features
Model performanceDirect outcome metric vs labelsWhen labels arrive fast enough

Always pair distribution metrics with outcome metrics — drift without performance loss is sometimes irrelevant; performance loss without drift is the interesting problem.

Analogy

Data drift is a new neighbourhood moving in — the customers look different, but their preferences still match. Concept drift is the same neighbourhood changing its preferences — looks identical, behaves differently. Same surveys, different answers needed.

Reading in progress · 0 of 2 activities done