Theory
You don't manage partitions — Snowflake does
Unlike a Hive/Spark lake where you design partition folders, Snowflake automatically slices every table into immutable micro-partitions (~50–500 MB of compressed columnar data each). For every micro-partition it keeps metadata: the min/max of each column, distinct counts, nulls.
- Pruning — When you run
WHERE order_date = '2026-06-01', Snowflake reads the metadata first and skips every micro-partition whose min/max range can't contain that date. A well-pruned query scans 1% of the table and bills accordingly. Pruning, not indexes, is how Snowflake goes fast. - Natural clustering — Data tends to land roughly ordered by load time, so date-filtered queries prune well for free. You usually don't need to do anything.
- Clustering keys — For very large tables (TB+) queried on a column other than load order, you can define a clustering key so Snowflake's background service co-locates rows with similar values, restoring good pruning. This costs compute to maintain — only worth it when pruning is measurably bad.
Use Case Example: A 4 TB events table is usually filtered by event_date (prunes great, naturally). But a fraud team always filters by account_id, scanning the whole table each time. Adding CLUSTER BY (account_id) re-organises micro-partitions so those queries prune to a sliver — trading some background re-clustering cost for far cheaper fraud queries.