Schema on Read vs Schema on Write

Pay the schema cost up-front, or pay it on every read.

0/2 done

Where the cost lands

When do you pay the schema tax?

Schema-on-write (RDBMS, warehouse): the schema must exist before you insert. Writes reject bad data; reads are fast and predictable.
Schema-on-read (data lake, JSON store): write anything, infer / project a schema only when querying. Writes never fail on shape; reads carry the parsing cost forever.

Heuristic: the more downstream consumers, the more schema-on-write pays off. The more exploratory the data, the more schema-on-read pays off.

TSA vs lost-and-found

Schema-on-write is TSA security at the airport: every passenger gets checked once, on entry. Slow gate, fast plane.

Schema-on-read is the lost-and-found bin: throw anything in, sort it out when someone comes looking. Fast bin, slow search.

Where does the tax land?

Locate the schema tax in your own stack.

›Which dataset in your team is currently schema-on-read but *should* be schema-on-write because it has many trusting consumers?
›Which dataset is currently schema-on-write but *should* be schema-on-read because the data is exploratory and the producer's schema keeps changing?
›Where in your pipeline is the schema cost being paid twice (once on ingest, once on every query) without anyone noticing?

Reading in progress · 0 of 2 activities done