The Missing Manual - SELECT - Data Council
The Missing Manual - SELECT - Data Council
● 🌮 Tacos
● End of the longest bull run in history
● Data teams are increasingly being
asked to better understand, monitor
and reduce their warehouse spend
● Snowflake is the market leader, with
many cost and performance levers
available
● Next steps
select.dev/posts/snowflake-warehouse-sizing
Warehouse Sizing
select.dev/posts/snowflake-warehouse-sizing
Warehouse Sizing
● Larger warehouses improve performance at low additional cost – up to a point
select.dev/posts/snowflake-warehouse-sizing
How to lower costs
1. Micro-partitions
2. Pruning
3. Clustering
Micro-partitions
● Tables are stored in cloud
storage as micro-partitions
● Micro-partitions are a
proprietary, closed-source file
format created by Snowflake
● Heavily compressed and
~16MB each
● DML operations
(updates/inserts/deletes)
add/remove entire files
select.dev/posts/introduction-to-snowflake-micro-partitions
Micro-partition metadata
Snowflake stores column level statistics in the cloud services layer
select.dev/posts/introduction-to-snowflake-micro-partitions
Optimizing performance
Pruning and clustering
1. Micro-partitions
2. Pruning
3. Clustering
Pruning - every fast query’s secret
select.dev/posts/introduction-to-snowflake-micro-partitions
Check for pruning using the Query Profile
1. Micro-partitions
2. Pruning
3. Clustering
Clustering
select.dev/posts/introduction-to-snowflake-clustering
Clustering methods
● Natural Clustering
○ Leverage wherever possible
● Automatic Clustering Service
○ Use where a table is commonly filtered by a column
which isn’t the ‘natural’ clustering key
● Manual Sorting
○ Useful for one-off clustering at lowest cost
select.dev/posts/introduction-to-snowflake-clustering
Finding good candidates for clustering
select.dev/posts/introduction-to-snowflake-clustering
Optimizing performance
Query design
⚠ ✅
select.dev/posts/should-you-use-ctes-in-snowflake
Optimizing performance
Query design
select.dev/posts/cost-per-query
github.com/get-select/dbt-snowflake-monitoring
Use SELECT
Lower Costs
Save Time
Optimize Performance
select.dev/posts/snowflake-warehouse-sizing
Impact of warehouse size on query execution
time
● Compute, memory, and disk space (cache size + space available for local spillage) double with
each size increase
● Generally speaking, query execution time will also halve, until…
○ A certain point where performance will either stop improving (Snowflake won't parallelize
further) or gets worse due to added communication costs outweighing performance benefits
select.dev/posts/snowflake-warehouse-sizing
Before you start, can you reduce the frequency?
● Yes
● CTEs are computed once in Snowflake
● In certain scenarios where CTE is referenced more than
once, can be faster to repeat logic in subqueries rather
than use a CTE
select.dev/posts/should-you-use-ctes-in-snowflake