Theory
The shift you are living through
Ten years ago a data team ran its own Hadoop / on-prem MPP cluster: fixed nodes, a sizing committee, a 3am pager when a disk filled. Today most analytics runs on a managed cloud platform you never SSH into. Three forces drove that:
- Elasticity — On a fixed cluster, the month-end report and the idle Sunday cost the same. Managed platforms let compute scale to zero when nobody queries, and burst for the heavy job — you pay for work done, not hardware owned.
- Separation of storage and compute — The single biggest architectural idea in this track. On Hadoop, data lived on the compute nodes (HDFS), so to store more you bought more CPU you didn't need (and vice-versa). Cloud platforms put data in cheap object storage (S3 / ADLS / GCS) and spin compute up next to it on demand. Storage and compute now scale — and bill — independently.
- Operability — No cluster to patch, no rebalancing, no JVM tuning. The platform owns uptime; you own SQL and cost.
Use Case Example: A retailer's BI dashboards are idle overnight but hammered 9am–6pm. On a fixed cluster they paid for peak 24/7. On Snowflake/BigQuery/Databricks, compute auto-suspends at night and auto-scales at the open — same workload, a fraction of the bill — while the data sits untouched in object storage either way.