Post-deployment: Closing the Loop

MLflow's job ends at serve — but the next training cycle starts here.

0/1 done

Inference logs → next training

The runtime → training feedback loop

MLflow does not monitor inference. Wire your serving layer to log:

  • request payload (optionally sampled / hashed for privacy),
  • prediction,
  • ground truth as soon as it arrives,
  • system metrics (p50/p95 latency, error rate, GPU util).

Feed those into your drift / performance dashboards (Evidently, WhyLabs, custom). When the dashboard says retrain, the next training run lands back in MLflow — closing the loop and starting the next version cycle.

Key habit: always log mlflow.set_tag('parent_version', '7') on a retrained model so lineage is preserved across cycles.

Analogy

Treat the model lifecycle as a water cycle: train → registry → serve → observe → drift signals → retrain. MLflow handles the clouds (the catalog and the rain schedule); the rest of your stack handles the ocean and the rivers.

Reflect

Plan your next retraining trigger.

  • Which signal will tell you it's time to retrain?
  • Who owns acting on that signal?
  • How will the new run be linked back to the version it replaces?

Reading in progress · 0 of 1 activity done