Theory
Five boxes you will always recognise
Every data platform — whether you call it ETL, ELT, lakehouse, or warehouse — reduces to five boxes:
- Ingest — Moving raw data from sources. Tools: Fivetran, Airbyte, Debezium, Kafka.
- Store — Landing the data. Object storage (S3/GCS) for raw files, or warehouses (Snowflake/BigQuery) for structured analytics.
- Transform — Cleaning and joining. Converting raw forms into clean SQL tables. Tools: dbt, Spark, SQL pipelines.
- Serve — Delivering the insights out of the warehouse. Tools: BI dashboards (Looker/Preset), reverse-ETL, or API endpoints.
- Observe — Monitoring the pipes. Tools: Monte Carlo, Soda, OpenLineage for schema drift and freshness alerts.
Use Case Example (E-commerce): You use Airbyte (Ingest) to pull daily Shopify sales strings into Snowflake (Store). You run a dbt (Transform) job to clean the data and calculate 'Customer Lifetime Value'. Finally, the Marketing team logs into Looker (Serve) to see if their recent ad campaign brought in high-value users. If Airbyte fails overnight, your Soda (Observe) tests alert you via Slack before the Marketing team wakes up!