The Three Shapes of Data

Structured / semi-structured / unstructured — pick the right tool by shape.

0/1 done

Three shapes, three storage tiers

Three shapes, three storage tiers

Every byte your company owns falls into one of three buckets:

  • Structured — rigid schema, rows and columns. CSV, RDBMS tables, Parquet columns. Cheap to query, expensive to change.
  • Semi-structured — self-describing records with optional / nested fields. JSON, XML, Avro, log lines, document stores. Flexible to write, slower to query in aggregate.
  • Unstructured — free-form: text, images, audio, video, PDFs. Value lives inside; you need parsing, OCR, ASR, embeddings or an LLM to get it out.

The shape of the source drives the shape of the store, the query engine, the tooling and the cost model.

Spice rack, pantry, fridge

Think of a kitchen:

  • Structured = a labelled spice rack. Every jar in its slot, you grab the cumin in 2 seconds. Reorganising the rack means moving every jar.
  • Semi-structured = the pantry — boxes and bottles of varying shapes, most labelled, some not. Findable, just slower.
  • Unstructured = the fridge — leftovers in random containers. You'll find the lasagna eventually, but you might have to open every Tupperware.

The data-management map

Click a node to focus its neighbourhood · drag to pan · scroll to zoom
  • shape
  • store
  • discipline
  • architecture

The full landscape: data shapes on the left, stores in the middle, governance disciplines on the right.

Audit your sources

Pick three data sources in your current company.

  • What shape is each one — structured, semi, or unstructured?
  • What's the natural store for that shape?
  • Where today is the team forcing one shape into a store designed for another?

Reading in progress · 0 of 1 activity done