0% found this document useful (0 votes)

23 views11 pages

Caching in The Snowflake Cloud Data Platform

Uploaded by

lnteks.livetraining

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views11 pages

Caching in The Snowflake Cloud Data Platform

Uploaded by

lnteks.livetraining

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Caching in the Snowflake Cloud Data Platform

This article explains how each layer of caching works in Snowflake while a query is executed.

July 31, 2023

Solution
This article was originally published on analytics.today/blog

In terms of performance tuning in Snowflake, there are very few options available. However, it is worth understanding how the Snowflake architecture includes various levels of

caching to help speed your queries. This article provides an overview of the techniques used, and some best practice tips on how to maximize system performance using caching.

Snowflake Database Architecture

Before starting it’s worth considering the underlying Snowflake architecture, and explaining when Snowflake caches data. The diagram below illustrates the overall architecture

which consists of three layers:-

1. Service Layer: Which accepts SQL requests from users, coordinates queries, managing transactions and results. Logically, this can be assumed to hold the result
cache – a cached copy of the results of every query executed.
2. Compute Layer: Which actually does the heavy lifting. This is where the actual SQL is executed across the nodes of a Virtual Data Warehouse. This layer holds a
cache of data queried, and is often referred to as Local Disk I/O although in reality this is implemented using SSD storage. All data in the compute layer is
temporary, and only held as long as the virtual warehouse is active.
3. Storage Layer: Which provides long term storage of results. This is often referred to as Remote Disk, and is currently implemented on either Amazon S3 or
Microsoft Blob storage.

Snowflake Cache Layers

The diagram below illustrates the levels at which data and results are cached for subsequent use. These are:-

1. Result Cache: Which holds the results of every query executed in the past 24 hours. These are available across virtual warehouses, so query results returned to one
user is available to any other user on the system who executes the same query, provided the underlying data has not changed.
2. Local Disk Cache: Which is used to cache data used by SQL queries. Whenever data is needed for a given query it's retrieved from the Remote Disk storage, and
cached in SSD and memory.
3. Remote Disk: Which holds the long term storage. This level is responsible for data resilience, which in the case of Amazon Web Services, means 99.999999999%
durability. Even in the event of an entire data centre failure.
Snowflake Benchmark Performance
Every Snowflake database is delivered with a pre-built and populated set of Transaction Processing Council (TPC) benchmark tables. To test the result of caching, I set up a series

of test queries against a small sub-set of the data, which is illustrated below.

All the queries were executed on a MEDIUM sized cluster (4 nodes), and joined the tables. The tables were queried exactly as is, without any performance tuning.

The following query was executed multiple times, and the elapsed time and query plan were recorded each time.
The screenshot below illustrates the results of the query which summarise the data by Region and Country. In total the SQL queried, summarised and counted over 1.5 Billion

rows. The screenshot shows the first eight lines returned.

Benchmark Test Sequence

The test sequence was as follows:-

1. Run from cold: Which meant starting a new virtual warehouse (with no local disk caching), and executing the query.
2. Run from warm: Which meant disabling the result caching, and repeating the query. This makes use of the local disk caching, but not the result cache.
3. Run from hot: Which again repeated the query, but with the result caching switched on.

Each query ran against 60Gb of data, although as Snowflake returns only the columns queried, and was able to automatically compress the data, the actual data transfers were

around 12Gb. As Snowflake is a columnar data warehouse, it automatically returns the columns needed rather then the entire row to further help maximize query performance.

Performance Run from Cold

This query returned in around 20 seconds, and demonstrates it scanned around 12Gb of compressed data, with 0% from the local disk cache. This means it had no benefit from disk

caching.
The bar chart above demonstrates around 50% of the time was spent on local or remote disk I/O, and only 2% on actually processing the data. Clearly any design changes we can

do to reduce the disk I/O will help this query.

The results also demonstrate the queries were unable to perform any partition pruning which might improve query performance. We’ll cover the effect of partition pruning and

clustering in the next article.

Run from Warm

This query was executed immediately after, but with the result cache disabled, and it completed in 1.2 seconds – around 16 times faster. In this case, the Local Disk cache (which is

actually SSD on Amazon Web Services) was used to return results, and disk I/O is no longer a concern.
In the above case, the disk I/O has been reduced to around 11% of the total elapsed time, and 99% of the data came from the (local disk) cache. While querying 1.5 billion rows,

this is clearly an excellent result.

Run from Hot

This query returned results in milliseconds, and involved re-executing the query, but with this time, the result cache enabled. Normally, this is the default situation, but it was

disabled purely for testing purposes.

The above profile indicates the entire query was served directly from the result cache (taking around 2 milliseconds). Although not immediately obvious, many dashboard

applications involve repeatedly refreshing a series of screens and dashboards by re-executing the SQL. In these cases, the results are returned in milliseconds.

Although more information is available in the Snowflake Documentation, a series of tests demonstrated the result cache will be reused unless the underlying data (or SQL query)

has changed. As a series of additional tests demonstrated inserts, updates and deletes which don't affect the underlying data are ignored, and the result cache is used, provided data

in the micro-partitions remains unchanged.

Finally, results are normally retained for 24 hours, although the clock is reset every time the query is re-executed, up to a limit of 30 days, after which results query the remote

disk.
Snowflake Performance Summary

The sequence of tests was designed purely to illustrate the effect of data caching on Snowflake. The tests included:-

 Raw Data: Including over 1.5 billion rows of TPC generated data, a total of over 60Gb of raw data

 Initial Query: Took 20 seconds to complete, and ran entirely from the remote disk. Quite impressive.

 Second Query: Was 16 times faster at 1.2 seconds and used the Local Disk (SSD) cache.

 Result Set Query: Returned results in 130 milliseconds from the result cache (intentially disabled on the prior query).

To put the above results in context, I repeatedly ran the same query on Oracle 11g production database server for a tier one investment bank and it took over 22 minutes to

complete.

Finally, unlike Oracle where additional care and effort must be made to ensure correct partitioning, indexing, stats gathering and data compression, Snowflake caching is entirely

automatic, and available by default. Absolutely no effort was made to tune either the queries or the underlying design, although there are a small number of options available,

which I'll discuss in the next article. Sign up below for further details.

System Performance Tuning Best Practice

Clearly data caching makes a massive difference to Snowflake query performance, but what can you do to ensure maintain the performance when you cannot change the cache?

Here's a few best practice tips:-

 Auto-Suspend: By default, Snowflake will auto-suspend a virtual warehouse (the compute resources with the SSD cache after 10 minutes of idle time. Best
practice? Leave this alone. Keep in mind, you should be trying to balance the cost of providing compute resources with fast query performance. To illustrate the
point, consider these two extremes:
1. Suspend after 60 seconds: When the warehouse is re-started, it will (most likely) start with a clean cache, and will take a few queries to hold the
relevant cached data in memory. (Note: Snowflake will try to restore the same cluster, with the cache intact, but this is not guaranteed).
2. Suspend Never: And your cache will always be warm, but you will pay for compute resources, even if nobody is running any queries. However,
provided you set up a script to shut down the server when not being used, it may make sense.

 Scale up for large data volumes: If you have a sequence of large queries to perform against massive (multi-terabyte) size data volumes, you can improve query
performance by scaling up. Simple execute a SQL statement to increase the virtual warehouse size, and new queries will start on the larger (faster) cluster. While
this will start with a clean (empty) cache, you should normally find performance doubles at each size, and this extra performance boost will more than out-weigh the
cost of refreshing the cache.

 Scale down - but not too soon: Once your large task has completed, you could reduce costs by scaling down or even suspending the virtual warehouse. Be aware
again however, the cache will start again clean on the smaller cluster. By all means tune the warehouse size dynamically, but don't keep adjusting it, or you'll lose the
benefit.

Snowflake Notes
100% (9)
Snowflake Notes
67 pages
Snowflake PPT 22
50% (2)
Snowflake PPT 22
220 pages
Programming+in+Snowflake+ +All+Slides
100% (1)
Programming+in+Snowflake+ +All+Slides
342 pages
Snowflake Interview Questions and Answers
No ratings yet
Snowflake Interview Questions and Answers
5 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
Snowflake Interview Questions: Click Here
No ratings yet
Snowflake Interview Questions: Click Here
29 pages
Snowflake Faq
No ratings yet
Snowflake Faq
185 pages
Database Caching Strategies Using Redis
No ratings yet
Database Caching Strategies Using Redis
22 pages
The Missing Manual - SELECT - Data Council
No ratings yet
The Missing Manual - SELECT - Data Council
54 pages
All Course Slides
100% (1)
All Course Slides
192 pages
Snowflake Certification Syllabus
No ratings yet
Snowflake Certification Syllabus
4 pages
Snowflake - Interview Questions
No ratings yet
Snowflake - Interview Questions
15 pages
Ravi Snowflake Interview Questions-1
No ratings yet
Ravi Snowflake Interview Questions-1
20 pages
Snowflake
No ratings yet
Snowflake
16 pages
Cache Cheatsheet
No ratings yet
Cache Cheatsheet
12 pages
Snowflake Prctice1
No ratings yet
Snowflake Prctice1
51 pages
Snowflake Training Slide SANMs
67% (6)
Snowflake Training Slide SANMs
218 pages
Snow
No ratings yet
Snow
3 pages
E - Snowflake-Snowpro-Core-1
No ratings yet
E - Snowflake-Snowpro-Core-1
79 pages
Snowflake Training
No ratings yet
Snowflake Training
136 pages
Snowflake - T
No ratings yet
Snowflake - T
108 pages
Architecture
No ratings yet
Architecture
4 pages
Query Optimization
No ratings yet
Query Optimization
24 pages
Snowflake
No ratings yet
Snowflake
7 pages
Snowflake Note
No ratings yet
Snowflake Note
35 pages
Interview Questions
No ratings yet
Interview Questions
16 pages
Snowflake Architecture - Concepts
No ratings yet
Snowflake Architecture - Concepts
38 pages
Performance and Tuning - 6
No ratings yet
Performance and Tuning - 6
172 pages
Snowflake Overview
No ratings yet
Snowflake Overview
44 pages
Snowflake PPT 22
No ratings yet
Snowflake PPT 22
220 pages
All Snowflake Details Document
No ratings yet
All Snowflake Details Document
105 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Snowflake
No ratings yet
Snowflake
73 pages
Why The Company Was Called Snowflake?: Founders
0% (1)
Why The Company Was Called Snowflake?: Founders
15 pages
Sample DCDFGapAssessment Sanitized
No ratings yet
Sample DCDFGapAssessment Sanitized
58 pages
SnowPro Core Study Guide
No ratings yet
SnowPro Core Study Guide
37 pages
Snowflake
No ratings yet
Snowflake
122 pages
Snowflake Material
No ratings yet
Snowflake Material
2,157 pages
Commonly Asked Snowflake
No ratings yet
Commonly Asked Snowflake
26 pages
Snowflake Question
No ratings yet
Snowflake Question
446 pages
Snowflake Interview Question
No ratings yet
Snowflake Interview Question
20 pages
Snowflake Optimization Best Practices: A Guide To Balancing Cost and Performance at Scale
No ratings yet
Snowflake Optimization Best Practices: A Guide To Balancing Cost and Performance at Scale
16 pages
Snowflake Notes
No ratings yet
Snowflake Notes
2 pages
Snowflake SnowPro Set1 223
No ratings yet
Snowflake SnowPro Set1 223
8 pages
Snow
No ratings yet
Snow
17 pages
Snowflake - Syllubus and DBT
No ratings yet
Snowflake - Syllubus and DBT
11 pages
Snowflake Interview Questions PDF
No ratings yet
Snowflake Interview Questions PDF
6 pages
Snowflake SnowPro Set2 162
No ratings yet
Snowflake SnowPro Set2 162
7 pages
Snowflake Data Engineering Concepts
No ratings yet
Snowflake Data Engineering Concepts
93 pages
3 Snowflake+Architecture
No ratings yet
3 Snowflake+Architecture
20 pages
Practice Test 6 70 Questions Udemy
No ratings yet
Practice Test 6 70 Questions Udemy
51 pages
Caching in Snowflake
No ratings yet
Caching in Snowflake
7 pages
REVISION Practice Set - 4
No ratings yet
REVISION Practice Set - 4
106 pages
Best Practices For Using Tableau With Snowflake
No ratings yet
Best Practices For Using Tableau With Snowflake
64 pages
Snowflake Latest 231
No ratings yet
Snowflake Latest 231
44 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
Snowpro Exam Questions
No ratings yet
Snowpro Exam Questions
4 pages
SF Notes Anuja
No ratings yet
SF Notes Anuja
12 pages
Performance Analysis of Startup Time in CPU Within Windows Environment
No ratings yet
Performance Analysis of Startup Time in CPU Within Windows Environment
9 pages
Technical Skills
No ratings yet
Technical Skills
5 pages
Artificial Intelligence CS-3431w (V2)
No ratings yet
Artificial Intelligence CS-3431w (V2)
23 pages
Documento de Fresh-1
No ratings yet
Documento de Fresh-1
2 pages
Strategic Management Challenges in The 21st Century
No ratings yet
Strategic Management Challenges in The 21st Century
6 pages
Using CAATs To Support IS Audit
No ratings yet
Using CAATs To Support IS Audit
3 pages
Cse Computer Forensics PPT 38
No ratings yet
Cse Computer Forensics PPT 38
21 pages
Patient Monitor: Series
No ratings yet
Patient Monitor: Series
498 pages
Module 19 Business Objects
No ratings yet
Module 19 Business Objects
15 pages
YSFlight Blender Book - Chapter 1
No ratings yet
YSFlight Blender Book - Chapter 1
30 pages
Project Report On Hodophile Touristry
No ratings yet
Project Report On Hodophile Touristry
101 pages
Unit 1 - Cloud Computing
No ratings yet
Unit 1 - Cloud Computing
12 pages
TLE ICT CSS9 Q2 WEEK3 MODULE Edited Black
No ratings yet
TLE ICT CSS9 Q2 WEEK3 MODULE Edited Black
6 pages
Trees
No ratings yet
Trees
63 pages
100 Days of Code - The Complete Python Pro Bootcamp For 2021
No ratings yet
100 Days of Code - The Complete Python Pro Bootcamp For 2021
15 pages
BCG Application Guide 2021revised
No ratings yet
BCG Application Guide 2021revised
41 pages
Database Programming With SQL Section 10 Quiz
No ratings yet
Database Programming With SQL Section 10 Quiz
20 pages
SS Convolution PDF
No ratings yet
SS Convolution PDF
17 pages
VMware Vsphere Troubleshooting Workshop 6.5 Lab Manual
No ratings yet
VMware Vsphere Troubleshooting Workshop 6.5 Lab Manual
64 pages
MM 2 - Students Book Unit 3 Answers
No ratings yet
MM 2 - Students Book Unit 3 Answers
3 pages
Colqwen2 Similarity Maps Cookbook
No ratings yet
Colqwen2 Similarity Maps Cookbook
8 pages
997-476 HW19
No ratings yet
997-476 HW19
144 pages
(Day - 1 - 7) - Prep For Mock Conference - Info Kit (Netmission)
No ratings yet
(Day - 1 - 7) - Prep For Mock Conference - Info Kit (Netmission)
34 pages
Online Exams: SR. NO Olympiad Date of Examination Registration Fees Cost of Books (Optional)
No ratings yet
Online Exams: SR. NO Olympiad Date of Examination Registration Fees Cost of Books (Optional)
2 pages
BA Resume
No ratings yet
BA Resume
6 pages
2port-Efr Thu-6404
No ratings yet
2port-Efr Thu-6404
255 pages
TBChap 002 A
No ratings yet
TBChap 002 A
30 pages
ISDN, B-ISDN, X.25, Frame-Relay, ATM Networks: A Telephony View of Convergence Architectures
No ratings yet
ISDN, B-ISDN, X.25, Frame-Relay, ATM Networks: A Telephony View of Convergence Architectures
158 pages
Angular Js
No ratings yet
Angular Js
45 pages
Log
No ratings yet
Log
45 pages

Caching in The Snowflake Cloud Data Platform

Uploaded by

Caching in The Snowflake Cloud Data Platform

Uploaded by

Caching in the Snowflake Cloud Data Platform

July 31, 2023

Snowflake Database Architecture

which consists of three layers:-

Snowflake Cache Layers

rows. The screenshot shows the first eight lines returned.

Benchmark Test Sequence

Performance Run from Cold

do to reduce the disk I/O will help this query.

clustering in the next article.

Run from Warm

this is clearly an excellent result.

Run from Hot

disabled purely for testing purposes.

in the micro-partitions remains unchanged.

System Performance Tuning Best Practice

Here's a few best practice tips:-

You might also like