Databricks — Unity Catalog, Photon & Workflows

Governance, the vectorised engine, and how production pipelines actually ship.

0/2 done

Theory

The platform around the lake

Delta makes the data trustworthy; three more pieces make Databricks a platform you can run a company on:

  • Unity Catalog — One governance layer across all workspaces and clouds. A three-level namespace catalog.schema.table, central RBAC (GRANT SELECT ON ...), column/row-level security, plus automatic data lineage and discovery. It replaces the old per-workspace, per-cloud permission sprawl with a single source of truth for who can see what.
  • Photon — A C++ vectorised query engine that transparently accelerates Spark SQL and DataFrame ops (often 2–3×) without rewriting code. It's why Databricks SQL Warehouses can compete with Snowflake/BigQuery on BI latency.
  • Workflows & Delta Live Tables (DLT)Workflows is the native scheduler/orchestrator for jobs (tasks, dependencies, retries). DLT is declarative pipelines: you describe the tables and quality expectations, and Databricks manages the execution, incremental processing and data-quality enforcement.

Use Case Example: A team declares a DLT pipeline: bronze (raw ingest) → silver (cleaned, with EXPECT order_total > 0 quality rules that quarantine bad rows) → gold (aggregated marts). Unity Catalog governs who reads each layer and shows lineage from gold marts back to the raw source; Photon makes the analysts' SQL on the gold tables fast — all one platform, one permission model.

Analogy

Think of an airport. Delta Lake is the well-organised cargo (every crate sealed and tracked). Unity Catalog is airport security + the manifest system — one set of rules deciding who boards which area, and a paper trail of where every crate came from (lineage). Photon is the high-speed baggage belt that moves everything faster without changing the crates. Workflows/DLT is air-traffic control — scheduling every flight, retrying the delayed ones, refusing to launch a plane that fails its safety checks (quality expectations).

License & edition notes

Databricks, Delta Lake and licensing clarifications

Databricks is a commercial managed platform built around Apache Spark, Delta Lake, notebooks, jobs, SQL Warehouses and Unity Catalog. Its licensing story mixes open-source components with paid managed services.

  • Apache Spark is open source — Spark itself is Apache-licensed and can run outside Databricks. Databricks packages Spark with managed clusters, notebooks, jobs, governance, performance engines and support.
  • Delta Lake has an open-source core — Delta Lake's table format and transaction log are open-source, but Databricks adds managed runtime features, performance optimizations and platform integrations around it.
  • Databricks bills DBUs plus cloud resources — A workload usually consumes Databricks Units and underlying cloud VMs/storage/network. Forgetting the VM side is a common cost-estimation mistake.
  • Photon, Unity Catalog and DLT are platform features — Treat them as Databricks service capabilities. Availability can depend on cloud, workspace configuration, pricing tier and product packaging.
  • Open storage reduces lock-in, not migration work — Parquet/Delta in your lake is portable, but notebooks, jobs, permissions, cluster policies and optimized runtimes still need migration planning.

Procurement intuition: Databricks is like paying for a managed factory around open machinery. Some machines are open standards; the factory floor, scheduling system, safety controls and high-performance belts are paid services.

Reflect

Databricks' bet is that open formats + a strong governance and engine layer beats a closed warehouse. Whether that's true for you depends on how much you value not being locked into one vendor's storage — the next level makes you weigh exactly that.

  • Would your analysts notice or care that the data underneath is open Parquet rather than a proprietary store?
  • Is governance in your stack one model, or a different one per tool/cloud today?

Reading in progress · 0 of 2 activities done