Inference logs → next training
The runtime → training feedback loop
MLflow does not monitor inference. Wire your serving layer to log:
- request payload (optionally sampled / hashed for privacy),
- prediction,
- ground truth as soon as it arrives,
- system metrics (p50/p95 latency, error rate, GPU util).
Feed those into your drift / performance dashboards (Evidently, WhyLabs, custom). When the dashboard says retrain, the next training run lands back in MLflow — closing the loop and starting the next version cycle.
Key habit: always log mlflow.set_tag('parent_version', '7') on a retrained model so lineage is preserved across cycles.