Theory
The platform around the lake
Delta makes the data trustworthy; three more pieces make Databricks a platform you can run a company on:
- Unity Catalog — One governance layer across all workspaces and clouds. A three-level namespace
catalog.schema.table, central RBAC (GRANT SELECT ON ...), column/row-level security, plus automatic data lineage and discovery. It replaces the old per-workspace, per-cloud permission sprawl with a single source of truth for who can see what. - Photon — A C++ vectorised query engine that transparently accelerates Spark SQL and DataFrame ops (often 2–3×) without rewriting code. It's why Databricks SQL Warehouses can compete with Snowflake/BigQuery on BI latency.
- Workflows & Delta Live Tables (DLT) — Workflows is the native scheduler/orchestrator for jobs (tasks, dependencies, retries). DLT is declarative pipelines: you describe the tables and quality expectations, and Databricks manages the execution, incremental processing and data-quality enforcement.
Use Case Example: A team declares a DLT pipeline: bronze (raw ingest) → silver (cleaned, with EXPECT order_total > 0 quality rules that quarantine bad rows) → gold (aggregated marts). Unity Catalog governs who reads each layer and shows lineage from gold marts back to the raw source; Photon makes the analysts' SQL on the gold tables fast — all one platform, one permission model.