An e-commerce platform logs every page view as a JSON event. Which shape is this?

Semi-structured — JSON, with optional / nested fields and schema drift over time

Unstructured — JSON isn't tabular

Structured — every event has the same fields

Semi-structured — JSON, with optional / nested fields and schema drift over time

The Three Shapes of Data — Semantic Web Academy

Every byte your company owns falls into one of three buckets:

Structured — rigid schema, rows and columns. CSV, RDBMS tables, Parquet columns. Cheap to query, expensive to change.
Semi-structured — self-describing records with optional / nested fields. JSON, XML, Avro, log lines, document stores. Flexible to write, slower to query in aggregate.
Unstructured — free-form: text, images, audio, video, PDFs. Value lives inside; you need parsing, OCR, ASR, embeddings or an LLM to get it out.

The shape of the source drives the shape of the store, the query engine, the tooling and the cost model.

Think of a kitchen:

Structured = a labelled spice rack. Every jar in its slot, you grab the cumin in 2 seconds. Reorganising the rack means moving every jar.
Semi-structured = the pantry — boxes and bottles of varying shapes, most labelled, some not. Findable, just slower.
Unstructured = the fridge — leftovers in random containers. You'll find the lasagna eventually, but you might have to open every Tupperware.

LayoutLabelsClick a node to focus its neighbourhood · drag to pan · scroll to zoom

The full landscape: data shapes on the left, stores in the middle, governance disciplines on the right.

Pick three data sources in your current company.

Reading in progress · 0 of 1 activity done