Training-serving skew most often happens because…

Training and serving compute the same feature with different code

Feature Stores: One Source of Truth for Features — Semantic Web Academy

The problem they solve

A feature like user_avg_basket_30d is computed:

in training, by a batch SQL job over Parquet,
in serving, by a Python function over Redis.

If those two implementations drift by even a small amount, you have training-serving skew — the model behaves differently in production than in evaluation, often silently. The accuracy gap doesn't show up in your test set.

A feature store (Feast, Tecton, Vertex Feature Store, SageMaker Feature Store, Databricks UC) imposes:

Single definition of each feature (a FeatureView).
Dual storage — offline (for training joins) and online (low-latency for serving).
Point-in-time joins so training samples never use the future.
Versioning + lineage for the features themselves.

Feature Stores: One Source of Truth for Features

Skew is the silent killer

The problem they solve

Analogy