Theory
The lake's two enemies: too many tiny files, wrong partitions
Partitioning physically separates rows by a column (event_date=2026-05-27/...). Done right, scans skip 99% of the data. Done wrong (high-cardinality column like user_id) you create millions of microscopic files and the engine spends its life opening them.
Compaction (OPTIMIZE in Delta, rewrite_data_files in Iceberg) merges those microscopic files into ~128MB–1GB chunks — the sweet spot for object storage + columnar reads.
Rules of thumb: partition on a low-cardinality column the queries actually filter on (usually a date); run compaction on a schedule; never partition on something with millions of distinct values.