Shared infra + shared vocabulary
Two upgrades that matter
- Remote tracking server — a single MLflow / W&B instance shared by all teams, backed by a real database (Postgres) and object store (S3).
- Naming + tagging conventions — agreed at team level, not invented per-experiment. Without conventions, search becomes archaeology.
A minimal convention
experiment: <business_goal>-<quarter>
tags: owner=<team>, model_family=<xgb|nn|linear>, dataset_version=<hash>
Six months later, anyone can find 'the best fraud model from Q2 by the risk team' in seconds.