The Modern Data Stack — a Map, not a Religion

Ingest · Store · Transform · Serve · Observe — and the vendors in each box.

0/1 done

Theory

Five boxes you will always recognise

Every data platform — whether you call it ETL, ELT, lakehouse, or warehouse — reduces to five boxes:

  1. Ingest — Moving raw data from sources. Tools: Fivetran, Airbyte, Debezium, Kafka.
  2. Store — Landing the data. Object storage (S3/GCS) for raw files, or warehouses (Snowflake/BigQuery) for structured analytics.
  3. Transform — Cleaning and joining. Converting raw forms into clean SQL tables. Tools: dbt, Spark, SQL pipelines.
  4. Serve — Delivering the insights out of the warehouse. Tools: BI dashboards (Looker/Preset), reverse-ETL, or API endpoints.
  5. Observe — Monitoring the pipes. Tools: Monte Carlo, Soda, OpenLineage for schema drift and freshness alerts.

Use Case Example (E-commerce): You use Airbyte (Ingest) to pull daily Shopify sales strings into Snowflake (Store). You run a dbt (Transform) job to clean the data and calculate 'Customer Lifetime Value'. Finally, the Marketing team logs into Looker (Serve) to see if their recent ad campaign brought in high-value users. If Airbyte fails overnight, your Soda (Observe) tests alert you via Slack before the Marketing team wakes up!

Analogy

The modern data stack is a restaurant kitchen. Ingest is the delivery dock where raw produce arrives from suppliers. Store is the walk-in fridge and pantry. Transform is the prep line where raw ingredients become clean, chopped, ready-to-cook components. Serve is the pass where finished dishes go out to diners. Observe is the health inspector and the order tickets — checking that nothing is spoiled and every plate left on time. Swap a supplier (Fivetran→Airbyte) or a stove (Spark→dbt) and the stations stay exactly the same. That's why the map outlives every tool that fills it.

The five boxes

Click a node to focus its neighbourhood · drag to pan · scroll to zoom

The five boxes, visualised

Every modern platform fits this skeleton. The arrows are data flow; the dashed arrow is metadata flow (lineage, quality, freshness) — a property of mature platforms only.

Reflect

The map is durable; the tools are not. If a vendor pitch doesn't fit cleanly into a box, that's information about the pitch — either it's a new box (rare) or it's three boxes pretending to be one (common).

  • Which boxes are *missing* on your platform today, and which are over-tooled?
  • What's the cost of vendor lock-in in each box — and which box has the cheapest exit?

Reading in progress · 0 of 1 activity done