Three shapes, three storage tiers
Three shapes, three storage tiers
Every byte your company owns falls into one of three buckets:
- Structured — rigid schema, rows and columns. CSV, RDBMS tables, Parquet columns. Cheap to query, expensive to change.
- Semi-structured — self-describing records with optional / nested fields. JSON, XML, Avro, log lines, document stores. Flexible to write, slower to query in aggregate.
- Unstructured — free-form: text, images, audio, video, PDFs. Value lives inside; you need parsing, OCR, ASR, embeddings or an LLM to get it out.
The shape of the source drives the shape of the store, the query engine, the tooling and the cost model.