GPU Scheduling Basics

Why your $40 000 GPU sits at 4% utilisation, and what to do.

0/1 done

Utilisation is a discipline

Three knobs

  1. Right-size the request — asking for 4 GPUs when you use 1 wastes 75% capacity. Profile first.
  2. Use a scheduler with bin-packing — Kubernetes + KubeFlow, Ray, Slurm. Don't 'hand out' GPUs by Slack DM.
  3. Time-share with MIG / fractional GPUs — A100/H100 can be sliced; many inference workloads fit in 1/7th of a GPU.

Cheap wins

  • Spot/preemptible instances for non-urgent training (50–80% saving).
  • Checkpoint every N steps — losing 12 hours to a preemption hurts only once.
  • Profile before buying bigger hardware.

Analogy

A GPU cluster without a scheduler is a gym with no booking system: the strongest person grabs the squat rack and nobody else can use it. A scheduler is the booking app — slots, fairness, queues — and suddenly the same gym serves 5× more people.

Reading in progress · 0 of 1 activity done