Reproducible Training in CI/CD

An MLproject that any runner — laptop, GitHub Actions, Airflow — executes identically.

0/2 done

Gate merges on model quality

The project is the CI contract

Level 2 gave you MLproject + a locked environment. The payoff is CI: a pull request can train and evaluate the model in a clean runner, log everything to a remote tracking server, and gate the merge on a metric threshold. No more 'works on my laptop'.

The pattern has three moving parts:

  1. mlflow run . -P data=s3://... — the entrypoint the runner calls.
  2. A remote MLFLOW_TRACKING_URI so CI runs are visible to everyone.
  3. A post-train check that fails the job if the new metric regresses against the current Production model.

Analogy

Unit tests gate code on 'does it still work?'. A training CI job gates the model on 'is it still good enough?'. Same red/green discipline, applied to AUC instead of assertions.

Reflect

Picture adding a model-quality gate to your repo.

  • What's the minimum metric you'd block a merge on?
  • Against which baseline — a fixed number, or the live Production model?
  • Who gets paged when the gate fails: the PR author or the ML platform team?

Reading in progress · 0 of 2 activities done