ETL vs ELT

Transform-on-write (ETL) vs transform-in-warehouse (ELT) — and why ELT won in the cloud.

0/2 done

Overview

ETL vs ELT

Transform-on-write (ETL) vs transform-in-warehouse (ELT) — and why ELT won in the cloud.

Why it matters

Cheap warehouse compute flipped the cost calculus: load raw, transform in SQL inside the warehouse, version the transformations in git (dbt, SQLMesh).

Going deeper

The paradigm shift introduced by ELT and tools like dbt:

  1. Idempotent Transformations: In ELT, if a business rule changes (e.g. how 'revenue' is calculated), you just tweak the SQL and rebuild the view on top of the raw data. In ETL, you often had to re-extract the old source data and run it through the external processor again.
  2. Analytics Engineering: ELT allowed data analysts who only knew SQL to become 'Analytics Engineers'. The transformation layer moved out of complex GUI tools (Informatica) and Java/Scala code, into version-controlled SQL.
  3. Data Freshness: Loading raw data immediately means analysts have access to the absolute newest events, even if the transformed, polished 'gold' tables aren't fully baked yet.

Analogy

ETL vs ELT is like cooking at home vs ordering a meal kit.

  • ETL (Extract, Transform, Load) is like the old way: you buy ingredients, prep the veggies, cook the meal at the grocery store, and carry the finished hot plate home. You had to do the work externally because your home kitchen was tiny.
  • ELT (Extract, Load, Transform): You throw raw ingredients into a box (Extract & Load), bring it into your massive, modern home kitchen (Cloud Warehouse), and cook it there (Transform).

Because modern cloud warehouses like Snowflake and BigQuery have nearly infinite, cheap compute, it's way faster and more flexible to dump the raw JSON in the warehouse first, and then write SQL to transform it locally.

Make it stick

Use the prompts below to anchor etl vs elt to something you actually own.

  • Look at your team's current data ingestion process. Are transformations happening in an external tool, or inside the data warehouse via SQL?
  • If a business definition changes tomorrow (e.g. 'Active User' now means logging in twice, not once), how long would it take to backfill historical data in your current setup?
  • What is the biggest source of pipeline failures in your workflow: the Extraction from APIs, or the Transformation logic?

Reading in progress · 0 of 2 activities done