Anatomy of a dbt Semantic Model

Entities, measures, dimensions and the YAML that ties dbt models to MetricFlow.

0/2 done

Overview

Five blocks, one file

A dbt semantic model is a YAML object that sits next to a dbt model and declares five things:

  • Name & description — the business name (orders).
  • Model referencemodel: ref('fct_orders').
  • Entities — the keys this model can join on. Each entity has a type: primary (this is the table's PK), foreign (an FK to another semantic model), or unique.
  • Measures — numerical aggregates: a name, an expr (SQL), and an agg (sum, count, avg, count_distinct, max, min, median, percentile).
  • Dimensions — categorical or time columns the metric can be sliced by. Time dimensions also declare type_params: { time_granularity: day }.

Why this shape is the right shape

It maps cleanly to dimensional modelling (facts ↔ measures, dims ↔ dimensions) but with added type-safety: the engine knows which entities are primary vs foreign, so it constructs join paths automatically and fails loud when a request asks for a join that does not exist.

Once a semantic model exists, metrics live in their own file (metrics.yml) and reference measures by name. That separation is the dbt SL's biggest win: the model is the stable artefact, metrics are the changeable, opinionated view on top.

Typed signatures for fact tables

A dbt semantic model is the typed signature of a function above your fact table. Without it, every consumer guesses the function's argument types and occasionally segfaults the dashboard. With it, the compiler (MetricFlow) checks every call: revenue BY country is valid because country is declared as a dimension; revenue BY rocket_id fails immediately, not ten minutes later in a stale dashboard.

Reflect

If dbt is already in your stack, picking the right first semantic model matters more than tooling. The orders fact (or its equivalent — events, sessions, transactions) is almost always the right starting point because every high-leverage metric (revenue, conversion, retention) ultimately roots there.

  • Which fact table in your warehouse, modelled as a semantic model, would unlock the most KPIs?
  • What entities does that table need to join to — and are those joins currently inconsistent across dashboards?

Reading in progress · 0 of 2 activities done