In Snowflake, the primary mechanism that lets a filtered query scan only a fraction of a table is…

Micro-partition pruning using per-partition min/max metadata

B-tree indexes the user creates

Micro-partition pruning using per-partition min/max metadata

Manually created date partition folders

Caching the whole table in the warehouse

Snowflake — Micro-Partitions, Pruning & Clustering — Semantic Web Academy

You don't manage partitions — Snowflake does

Unlike a Hive/Spark lake where you design partition folders, Snowflake automatically slices every table into immutable micro-partitions (~50–500 MB of compressed columnar data each). For every micro-partition it keeps metadata: the min/max of each column, distinct counts, nulls.

Pruning — When you run WHERE order_date = '2026-06-01', Snowflake reads the metadata first and skips every micro-partition whose min/max range can't contain that date. A well-pruned query scans 1% of the table and bills accordingly. Pruning, not indexes, is how Snowflake goes fast.
Natural clustering — Data tends to land roughly ordered by load time, so date-filtered queries prune well for free. You usually don't need to do anything.
Clustering keys — For very large tables (TB+) queried on a column other than load order, you can define a clustering key so Snowflake's background service co-locates rows with similar values, restoring good pruning. This costs compute to maintain — only worth it when pruning is measurably bad.

Use Case Example: A 4 TB events table is usually filtered by event_date (prunes great, naturally). But a fraud team always filters by account_id, scanning the whole table each time. Adding CLUSTER BY (account_id) re-organises micro-partitions so those queries prune to a sliver — trading some background re-clustering cost for far cheaper fraud queries.

Snowflake — Micro-Partitions, Pruning & Clustering

Theory

You don't manage partitions — Snowflake does

Analogy

Reflect