Cheapest way to halve the bill for non-urgent training?

Use spot/preemptible GPUs and checkpoint regularly

GPU Scheduling Basics — Semantic Web Academy

Right-size the request — asking for 4 GPUs when you use 1 wastes 75% capacity. Profile first.
Use a scheduler with bin-packing — Kubernetes + KubeFlow, Ray, Slurm. Don't 'hand out' GPUs by Slack DM.
Time-share with MIG / fractional GPUs — A100/H100 can be sliced; many inference workloads fit in 1/7th of a GPU.