Theory
Layout is destiny
A row-oriented file stores (id, name, email, country, amount) for row 1, then row 2, then row 3 on disk. Great for 'give me this whole user', terrible for 'sum amount across 100M rows' — you read every column you don't need.
A columnar file stores all id values together, then all name values, then all amount values. The analytics query reads only the amount column — often 1–5% of the bytes — and compression is dramatically better because adjacent values are of the same type and often similar.
Parquet (and ORC) are the industry standard columnar files. Combined with predicate pushdown (skip whole row groups via min/max statistics), they are the reason modern warehouses can scan petabytes for cents.