BigQuery — Partitioning, Clustering & the Bytes-Scanned Bill

On-demand pricing bills by bytes read — so data layout *is* the cost model.

0/2 done

Theory

On-demand BigQuery: every query has a price tag in bytes

Under on-demand pricing, BigQuery charges by bytes scanned, not rows returned or time taken. Because storage is columnar, a query only pays for the columns and the partitions it actually touches. Your data layout is, quite literally, your bill.

  • SELECT * is the cardinal sin — Selecting all columns scans all columns. Naming only the columns you need can cut a query's cost by 10× on a wide table. There's no row-store fallback to save you.
  • Partitioning — Partition a table by a date/timestamp (or integer range). A WHERE event_date = '2026-06-01' then scans one partition, not the whole table. Require a partition filter on huge tables to stop accidental full scans.
  • Clustering — Within partitions, cluster by up to 4 columns (e.g. customer_id) so filtered/aggregated queries skip blocks — BigQuery's analogue to Snowflake clustering keys, and free to maintain.
  • Reservations & BI Engine — Heavy, steady workloads move to capacity pricing (flat slot pool, predictable bill). BI Engine is an in-memory layer that makes dashboard queries sub-second.

Use Case Example: A dashboard query did SELECT * FROM events with no date filter and scanned 8 TB ($40 each run, every refresh). Fix: partition events by event_date, cluster by customer_id, and change the query to name 6 columns with a WHERE event_date >= CURRENT_DATE - 7. It now scans ~30 GB — a ~250× cost cut for the same answer.

Analogy

On-demand BigQuery is a library that charges by the shelf-metre you walk past, not the books you check out. SELECT * strolls the entire building; naming your columns walks one aisle; a partition filter takes you straight to the right floor. The bill isn't about what you read — it's about how much shelving you made the system scan to find it. Good partitioning and column selection are just walking a shorter route to the same book.

License & pricing notes

BigQuery licensing and pricing clarifications

BigQuery is a proprietary Google Cloud managed service. You do not license a database binary; you enable the service in a GCP project and pay for storage, query processing and optional capacity/features.

  • On-demand pricing bills bytes scanned — This is the simplest mode: each query has a byte estimate and cost follows scanned bytes. Partition filters and column selection become licensing-cost controls in practice.
  • Capacity pricing bills slot commitments/editions — For steady usage, teams buy reservations or editions that provide a pool of slots. This makes spend more predictable but requires capacity management.
  • Storage is separate — Active/long-term table storage, streaming inserts, extracts and cross-region movement can affect the bill outside query processing.
  • IAM is the license boundary you feel daily — Permissions live in GCP IAM plus BigQuery dataset/table/policy-tag controls. Most access problems are IAM design problems, not SQL problems.
  • Adjacent tools may have separate meters — Looker, Dataform, Datastream, Dataflow, Vertex AI and BI Engine are related but can have their own pricing, quotas and product terms.

Procurement intuition: BigQuery on-demand is a toll road priced by distance scanned; reservations are a monthly transit pass. The right choice depends on whether your travel is occasional and spiky or steady and predictable.

Reflect

BigQuery makes the cost of bad data modelling visible per query. That transparency is a gift: the byte estimate turns every analyst into a cost-aware engineer, if you teach them to read it.

  • Does anyone on your team look at the bytes-processed estimate before hitting Run?
  • Which of your biggest tables is unpartitioned and routinely SELECT *'d?

Reading in progress · 0 of 2 activities done