Hyperparameter Optimisation at Scale

Optuna, Ray Tune and how not to burn the GPU bill.

0/2 done

Search smarter, not longer

Tools at a glance

ToolWhen to pick it
OptunaSingle-node or simple distributed search; great Python ergonomics.
Ray TuneHeavy distributed search; integrates with PBT, ASHA, BOHB.
HyperoptClassic Bayesian search; mostly legacy.
Vendor (Vertex Vizier, SageMaker AMT)When you're deep into that cloud.

Cost discipline

  • Use early stopping — ASHA / Hyperband stop bad trials early. 10–50× cheaper than naive grid.
  • Budget by trial, not by hour — pin a max trial count and a max wall-clock.
  • Track every trial — losing the trial history defeats the point.
  • Beware the noise floor — if your metric varies by ±0.5 between seeds, treat sub-0.5 'improvements' as ties.

Analogy

Naive grid search is fishing with dynamite: you catch everything, including a huge cloud bill. Bayesian search + early stopping is a trained falconer: a few targeted casts, dropping the unpromising chases early.

Reading in progress · 0 of 2 activities done