Theory
A query that costs $50 every refresh is a bug
Cloud warehouses (Snowflake credits, BigQuery slot-hours, Databricks DBUs) make storage cheap and compute the dominant line item. The DE platform owns this cost. Four habits keep it under control:
- Tag everything. Warehouse, query, dbt model, dashboard — all tagged with
cost_centre,pipeline,owner. You can't cut what you can't attribute. - Audit partition pruning. Most expensive queries scan data they didn't need. Run a weekly 'top N most-scanned tables' report; fix the partition layout or the WHERE clause.
- Size warehouses for workload, not for peace of mind. Auto-suspend, multi-cluster scaling, and per-environment warehouse separation.
- Lifecycle storage. Cold partitions to S3 Glacier / BigQuery long-term storage / Snowflake remote stages — usually automatic, frequently forgotten.