0% found this document useful (0 votes)
3 views6 pages

SNF Prre MNGR Ytw

Micro partitions in Snowflake are immutable storage units ranging from 50 to 500 MB that store table data based on a clustering key, which organizes the data for efficient querying. When data is inserted, updated, or deleted, new micro partitions are created, and query pruning can optimize performance by limiting the number of partitions scanned based on the clustering key. Clustering is recommended for large tables to improve performance, although it incurs additional costs and requires careful selection of cluster keys.

Uploaded by

ashokkumar vadla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

SNF Prre MNGR Ytw

Micro partitions in Snowflake are immutable storage units ranging from 50 to 500 MB that store table data based on a clustering key, which organizes the data for efficient querying. When data is inserted, updated, or deleted, new micro partitions are created, and query pruning can optimize performance by limiting the number of partitions scanned based on the clustering key. Clustering is recommended for large tables to improve performance, although it incurs additional costs and requires careful selection of cluster keys.

Uploaded by

ashokkumar vadla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SNOWFLAKE PERFORMANCE MANAGEMENT

MICRO PARTITIONS
What are micro partitions?
Table data is stored in files of size 50-500 MB. These are the contiguous storage units

The logical structure of the table is stored into physical structure in micro partitions as per
clustering key which snowflake maintains

Micro partitions are small files of size 50 – 500 MB


Here is the example as per the snowflake documentation
Once if u add a clustering key, it will take data from logical structure ‘date’ and ‘type’ and it will
sorts the data and new version of micro partitions are created in the order of sorted data as
shown

Micro partitions can’t be modified once created while inserting/deleting/updating as they are
immutable
If you insert the data, the new micro partitions gets created
If you update the data, new micro partitions gets created with new updated data but it has to
scan and check the previous version of micro partition has that same data
If you delete any row, a new micro partition gets created with same data but that particular
row will not be there in new micro partition
Query pruning:
In simple words, if you don’t have a clustering key and run a query, all the partitions will be
scanned but when if you add a clustering key, micro partitions scan will be limited based on the
conditions that give in your query
Take any example and add clustering key to the table and run a query on it
Here is the image

You will get clarity of how will it selects partitions scanned in below theory.
What is overlapping micro partitions and overlap depth??
In the below image with different cases, overlapping of data in micro partition happens and
that’s the reason the terms overlapping micro-partitions and overlap depth (number of
overlapping micro partitions) comes into picture.
Let’s take case2 in below image, if you want to get data in between k to z, it has to scan 3
overlapping micro partitions which makes your execution faster rather than scanning all the
micro partitions
In previous post snowflake_11 as shown

Let’s discuss each and every term


Cluster_by_keys – These are the clustering keys that you need to create as per the requirement
Total_partition_count – total number of partitions it scans

Average_overlaps – for each micro partition, what is the avg overlap


Average_depth -- for each micro partition, what is the avg overlapping depth
Partition_depth_histogram – it’s like how many micro partitions have a depth of 0, 1,2,3,4, etc.

Why is clustering needed??


How to check if clustering is needed??
If performance degrades for large (multi TB) tables. Clustering is an option but comes with a
cost
Which columns should be used as cluster keys??

What are clustering guidelines and best practices?

You might also like