The Sculley et al. 'Hidden Technical Debt' paper's central claim is…

ML code is a tiny fraction of what runs in production — the rest is plumbing

Deep learning is faster than gradient boosting

ML code is a tiny fraction of what runs in production — the rest is plumbing

You should never use Python in production

From Notebook to Production — Semantic Web Academy

The 'last mile' problem

A successful notebook proves the model can work. Shipping asks a different question: can it work predictably, repeatedly and accountably, under traffic you don't control, on data that will drift, with on-call humans who didn't write the notebook?

Concretely, a production ML system needs to handle:

Data plumbing — features must be computed the same way at train and serve time, on bounded compute and within SLOs.
Versioning — code, data, model, environment, all together.
Deployment — rolling, rollback, canary, shadow, multi-region.
Monitoring — latency, errors, data drift, concept drift, fairness.
Governance — who can ship, who signed off, what is the audit trail.

The modelling code is often less than 5% of what runs in production. The infamous Sculley et al. paper ('Hidden Technical Debt in ML Systems', 2015) visualised this as a tiny box of ML surrounded by enormous boxes of plumbing.

From Notebook to Production

The 5% model, the 95% plumbing

The 'last mile' problem

Analogy

Visualization

Reflect