Why is early stopping (ASHA / Hyperband) a budget multiplier?

It stops bad trials before they spend the full compute budget

Hyperparameter Optimisation at Scale — Semantic Web Academy

Tool	When to pick it
Optuna	Single-node or simple distributed search; great Python ergonomics.
Ray Tune	Heavy distributed search; integrates with PBT, ASHA, BOHB.
Hyperopt	Classic Bayesian search; mostly legacy.
Vendor (Vertex Vizier, SageMaker AMT)	When you're deep into that cloud.

Use early stopping — ASHA / Hyperband stop bad trials early. 10–50× cheaper than naive grid.
Budget by trial, not by hour — pin a max trial count and a max wall-clock.
Track every trial — losing the trial history defeats the point.
Beware the noise floor — if your metric varies by ±0.5 between seeds, treat sub-0.5 'improvements' as ties.