Pick by query pattern
Four formats, four trade-offs
| Format | Layout | Schema | Best for |
|---|---|---|---|
| CSV | row, text | none | One-off exchange, smallish files, human inspection |
| JSON | row, text | self-desc | API payloads, nested records, log lines |
| Avro | row, bin | embedded | Streaming + schema evolution (Kafka) |
| Parquet | column, bin | embedded | Analytics — scan a few columns over many rows |
Rule of thumb: OLTP / streaming → row. OLAP / analytics → column. Anything crossing trust boundaries → schema-embedded.