Anatomy of a Cloud Data Platform

The three layers every platform shares — and the words each vendor uses for them.

0/1 done

Theory

Same three layers, three vocabularies

Snowflake, Databricks and BigQuery look wildly different in their docs, but under the marketing they share the same three-layer anatomy. Learn the layers once and every vendor's jargon becomes translation, not new learning:

  • Storage layer — Where bytes durably live, in open or proprietary columnar format on object storage. Snowflake: proprietary micro-partitions in its managed storage. Databricks: Delta Lake (Parquet + a transaction log) in your lake. BigQuery: the Capacitor columnar format in Google-managed Colossus storage.
  • Compute layer — The engines that scan storage. Snowflake: virtual warehouses (T-shirt-sized clusters). Databricks: clusters / SQL warehouses running Spark + the Photon engine. BigQuery: slots (serverless units of CPU drawn from a shared or reserved pool).
  • Services / metadata layer — The brain: query optimiser, security & RBAC, transactions, the catalogue. Snowflake: the Cloud Services layer. Databricks: the control plane + Unity Catalog. BigQuery: the Dremel engine + IAM + the BigQuery metadata service.

Use Case Example: 'Grant analysts read on the sales table' is the services layer everywhere — only the syntax differs (GRANT SELECT vs Unity Catalog grants vs IAM roles). 'Make this query faster' is the compute layer — resize the warehouse, scale the cluster, or buy more slots. Knowing which layer a problem lives in tells you which knob to turn on any of the three platforms.

Analogy

A cloud data platform is a restaurant. The storage layer is the walk-in fridge — ingredients kept cold and safe. The compute layer is the line cooks — add cooks for the dinner rush, send them home when it's quiet. The services layer is the head chef + front-of-house — taking orders, deciding who's allowed in the kitchen, and routing each ticket efficiently. A slow kitchen on a busy night is a cooks problem (compute), not a fridge problem (storage) — and that's true whether the sign outside says Snowflake, Databricks or BigQuery.

Three layers · three vocabularies

Click a node to focus its neighbourhood · drag to pan · scroll to zoom

The shared three-layer anatomy, with each vendor's word

Read each row as one concept in three dialects.

Vocabulary translator

Vocabulary field guide: same ideas, vendor words

When teams say these platforms are confusing, they often mean the same concept has three names. Use this table as your translator:

Durable ideaSnowflake wordDatabricks wordBigQuery wordIntuition
Compute you pay forVirtual warehouseCluster / SQL warehouse / job computeSlot pool / reservation / on-demand slotsThe workers scanning data right now.
Stored table bytesManaged micro-partitionsParquet + Delta log in object storageCapacitor files in ColossusThe shelves where columns live.
Metadata + permissionsCloud Services + RBACUnity CatalogIAM + datasets + policy tagsThe librarian deciding who can find and read data.
Point-in-time recoveryTime TravelDelta time travelTable snapshots / time travelRewind the table clock after a mistake.
Copy without copying bytesZero-copy cloneShallow clone / Delta clone patternsTable clone / snapshot patternsA branch pointer, not a full duplicate.
Cost meterCredits per warehouse-secondDBUs + cloud VM/storage costBytes scanned or slot-hoursThe unit finance sees on the bill.
Layout tuningClustering keys / pruningOPTIMIZE, Z-ORDER, partitioningPartitioning, clustering, column selectionHelp the engine skip irrelevant bytes.

Vocabulary trap to avoid: do not compare 'warehouse' literally. In Snowflake, a warehouse is compute only. In generic data architecture, a data warehouse is the whole analytical system. A Snowflake virtual warehouse is closer to a Databricks SQL warehouse or a BigQuery slot reservation than to the database itself.

Production examples

Most-used production examples

These are the patterns you will see repeatedly in real companies. The vendor changes; the pattern survives.

PatternTypical shapeSnowflake implementationDatabricks implementationBigQuery implementation
Executive BI dashboardsCurated marts refreshed hourly/dailyBI warehouse over star-schema tablesSQL Warehouse over gold Delta tablesLooker/BI over partitioned marts
ELT transformationsRaw -> staged -> martsdbt on a separate ELT warehouseWorkflows/dbt/DLT over bronze/silver/goldDataform/dbt scheduled queries
CDC and upsertsSource DB changes applied continuouslyStreams/tasks or MERGE on warehouseAuto Loader + Delta MERGEDatastream/Dataflow + MERGE
ML feature tablesTraining and scoring featuresSnowpark + feature store integrationsSpark/MLflow/Feature Store on DeltaBigQuery ML / Vertex AI features
Cost governanceWeekly owner-ranked bill reviewQUERY_HISTORY by warehouse/teamSystem tables by cluster/tag/jobINFORMATION_SCHEMA.JOBS by user/label
Recovery from bad writesRewind or clone before repairTime Travel + zero-copy cloneDelta VERSION AS OF / RESTORETable snapshots / time travel

Rule of thumb: if the workload is mostly SQL dashboards and finance reports, start your mental model with Snowflake or BigQuery. If the workload mixes engineering, streaming, notebooks and ML over open files, start with Databricks. Then check cloud, skills, governance and cost.

Reflect

The fastest way to look senior on any of these platforms is to stop arguing vendor features and start asking 'which layer?'. Most heated tool debates are really one team talking compute and the other talking storage past each other.

  • Take your last platform incident — which of the three layers did it actually live in?
  • Which layer does your team understand worst today?

Reading in progress · 0 of 1 activity done