The 5% model, the 95% plumbing
The 'last mile' problem
A successful notebook proves the model can work. Shipping asks a different question: can it work predictably, repeatedly and accountably, under traffic you don't control, on data that will drift, with on-call humans who didn't write the notebook?
Concretely, a production ML system needs to handle:
- Data plumbing — features must be computed the same way at train and serve time, on bounded compute and within SLOs.
- Versioning — code, data, model, environment, all together.
- Deployment — rolling, rollback, canary, shadow, multi-region.
- Monitoring — latency, errors, data drift, concept drift, fairness.
- Governance — who can ship, who signed off, what is the audit trail.
The modelling code is often less than 5% of what runs in production. The infamous Sculley et al. paper ('Hidden Technical Debt in ML Systems', 2015) visualised this as a tiny box of ML surrounded by enormous boxes of plumbing.