Theory
On-demand BigQuery: every query has a price tag in bytes
Under on-demand pricing, BigQuery charges by bytes scanned, not rows returned or time taken. Because storage is columnar, a query only pays for the columns and the partitions it actually touches. Your data layout is, quite literally, your bill.
SELECT *is the cardinal sin — Selecting all columns scans all columns. Naming only the columns you need can cut a query's cost by 10× on a wide table. There's no row-store fallback to save you.- Partitioning — Partition a table by a date/timestamp (or integer range). A
WHERE event_date = '2026-06-01'then scans one partition, not the whole table. Require a partition filter on huge tables to stop accidental full scans. - Clustering — Within partitions, cluster by up to 4 columns (e.g.
customer_id) so filtered/aggregated queries skip blocks — BigQuery's analogue to Snowflake clustering keys, and free to maintain. - Reservations & BI Engine — Heavy, steady workloads move to capacity pricing (flat slot pool, predictable bill). BI Engine is an in-memory layer that makes dashboard queries sub-second.
Use Case Example: A dashboard query did SELECT * FROM events with no date filter and scanned 8 TB ($40 each run, every refresh). Fix: partition events by event_date, cluster by customer_id, and change the query to name 6 columns with a WHERE event_date >= CURRENT_DATE - 7. It now scans ~30 GB — a ~250× cost cut for the same answer.