Theory
A 'table format' is a metadata layer above Parquet
Plain Parquet on S3 has no concept of a transaction. Two writers stomp on each other; a partial failure leaves orphaned files; you cannot UPDATE or DELETE a row.
Iceberg / Delta Lake / Hudi solve this by maintaining a manifest (a JSON / Avro log of which Parquet files compose the current table snapshot). They give you:
- ACID transactions on object storage (S3, GCS, ADLS).
- Time travel — query the table as of a previous snapshot for debugging, audit, or rollback.
- Schema evolution — add, drop, rename columns without rewriting data.
- Hidden partitioning (Iceberg) — the query doesn't need to know the partition column.
This is the foundation of the lakehouse: warehouse-class guarantees on cheap object storage, queryable by Spark, Trino, Snowflake, Flink and DuckDB.