From Notebook to Production

Why the jump is 10× harder than the modelling looks.

0/1 done

The 5% model, the 95% plumbing

The 'last mile' problem

A successful notebook proves the model can work. Shipping asks a different question: can it work predictably, repeatedly and accountably, under traffic you don't control, on data that will drift, with on-call humans who didn't write the notebook?

Concretely, a production ML system needs to handle:

  • Data plumbing — features must be computed the same way at train and serve time, on bounded compute and within SLOs.
  • Versioning — code, data, model, environment, all together.
  • Deployment — rolling, rollback, canary, shadow, multi-region.
  • Monitoring — latency, errors, data drift, concept drift, fairness.
  • Governance — who can ship, who signed off, what is the audit trail.

The modelling code is often less than 5% of what runs in production. The infamous Sculley et al. paper ('Hidden Technical Debt in ML Systems', 2015) visualised this as a tiny box of ML surrounded by enormous boxes of plumbing.

Analogy

Compare a home cook with a restaurant chain. Both can make a great risotto. The home cook can improvise, substitute, and start over. The chain has to deliver an identical risotto in 1,200 locations, at predictable cost, with allergens disclosed and a recall process. The recipe is the easy part.

Visualization

Click a node to focus its neighbourhood · drag to pan · scroll to zoom
  • input
  • data
  • ml
  • ops

A simplified system map: the model sits inside a much larger graph of feature pipelines, monitoring, registry, policy and on-call.

Reflect

Audit a model you shipped (or wish you had).

  • How much of the work was actually modelling?
  • Where did you spend most of your time and stress?
  • Which of the surrounding boxes is the weakest link today?

Reading in progress · 0 of 1 activity done