Snowflake Question
Snowflake Question
INTRODUCTION
Section 0:
About the exam & course setup
About the
SNOWPRO CORE
CERTIFICATION
The Definite Preparation Course
✓ Impactful way to advance career
Why getting certified? ✓ Positioning as an expert
✓ Future proof + great job opportunities.
✓ 750 / 1000
Passing Score ✓ Goal: Achieve a score of 900+
How to
Master the exam
Exam Topics
DOMAIN WEIGHT
1.0 Snowflake Data Cloud Features & Architecture 25%
Book Exam
Confident & Prepared
Final Tips
Resources
Q&A Section
Reviews
T
1 2 3
3
73273
1760 0009-14563.7
No need to select,
1 No Hardware install, configure, or
manage.
1250 003-77156.8
No software needs to be
2 No Software installed, configured,
or managed.
003-1040559
No downtime.
3 No Maintenance Early access for Enterprise
Edition accounts on request.
Cloud
73273
Completely cloud-native.
1760 0009-14563.7
From scratch built for
1 Designed for Cloud the cloud.
1250 003-77156.8
All components run
completely in the cloud.
2 Runs in the Cloud Cannot be installed on-
premise.
003-1040559
Storage and compute
3 Cloud optimized scale independently and
elastically.
Data Platform
73273
One single platform around data.
1760 0009-14563.7
Modern Data Warehouse
1 Data Warehouse with advanced features
and performance.
1250 003-77156.8
Stores structured, semi-
2 Data Lake structured and unstructured
data.
003-1040559
Connect ML tools.
Run ML models in Snowflake.
3 Data Science Language of choice with
Snowpark.
What is Snowflake?
T 1 2 3
3
Self-managed Cloud Data Platform
No Software, Designed for Cloud, Data Warehousing,
No Hardware, Runs in the Cloud, Data Engineering,
No Maintenance. Optimized for Cloud. Data Applications.
Multi-cluster shared-disk
T
1
shared-disk shared-nothing
Processor
Central data Accessible from all Each node is
memory
storage compute nodes independent
disk
Traditional architectures
PROs PROs
Simplicity Scalability
CONs CONs
Massive parallel
shared-disk shared-nothing 2 processing compute
clusters
–
1 2 Each node stores a
portion of the data
locally
Traditional architectures
Multi cluster
PROs shared-data
QUERY PROCESSING
(COMPUTE)
DATABASE STORAGE
Three distinct layers
- Compressed Columnar Storage -
DATABASE STORAGE
Three distinct layers
DATABASE STORAGE
Three distinct layers
- "Muscle of the system" -
Queries are processed using Virtual warehouses
DATABASE STORAGE
Three distinct layers
CLOUD SERVICES
DATABASE STORAGE
Snowflake Editions
Additional
features/service
Additional
features/service
Virtual Private
Highest level of
Business Critical security
Even higher data
protection for
Enterprise organizations with
extremely
Additional features for sensitive data
Features
Introductory level
Pricing
Snowflake Editions
✓ Complete DWH ✓ All Standard features ✓ All Enterprise features ✓ All Business Critical features
✓ Automatic data encryption ✓ Multi-cluster warehouse ✓ Additional security ✓ Dedicated virtual servers
✓ Broad support for standard ✓ Time travel up to 90 days features such as and completely seperate
and special data types Materialized views customer-managed
✓ Snowflake environment
✓ Time travel up to 1 day
✓ Search Optimization encryption ✓ Dedicated metadata store
✓ Disaster recovery for 7 days
✓ Column-level security ✓ Support for data specific ⇒ Isolated from all other
beyond time travel
✓ 24-hour early access to regulation Snowflake accounts
✓ Network policies
weekly new releases ✓ Database
✓ Secure data share
✓ Federated authentication & failover/failback
Pricing
Storage
Compute Storage
Standard
(Active) Warehouses Query Processing
Search Optimization
Serverless Snowpipe
Automatically resized
Snowflake Pricing
Compute
Compute Storage
Credits $/€
Consumed
XS 1
L 8
S 2
XL 16
M 4
4XL 128
Region: EU (Frankfurt)
Platform: AWS
$40
per TB / per month Storage
Storage
On Demand Capacity
Storage Storage
✓ Pay only for what you use ✓ Pay only for defined capacity upfront
✓ $40/TB ✓ $23/TB
Storage
On Demand Capacity
Storage Storage
✓ Pay only for what you use ✓ Pay only for defined capacity upfront
✓ $45/TB ✓ $24.50/TB
Region: EU (Frankfurt)
Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing
On Demand Capacity
Storage ✓ We think we need 1 TB of storage Storage
On Demand Capacity
Storage Storage
Data Transfer
TABLE_STORAGE_METRICS
view in
INFORMATION_SCHEMA
Most detailed:
Active (ACTIVE_BYTES column)
Time Travel (TIME_TRAVEL_BYTES column)
Fail-safe (FAILSAFE_BYTES column)
TABLE_STORAGE_METRICS
view in
ACCOUNT_USAGE
TABLE_STORAGE_METRICS
view in SELECT * FROM DB_NAME.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
INFORMATION_SCHEMA
TABLE_STORAGE_METRICS
view in SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
ACCOUNT_USAGE
Standard Edition
In defined cycle
Standard Edition
Virtual warehouse
Account
Multiple
virtual warehouses
Standard Edition
Standard Edition
Track usage of cloud services needed
Virtual warehouse
Standard Snowpark-optimized
Most suitable in Recommended for
most use cases memory-intensive
workloads such as
ML training
XS 1
L 8
S 2
XL 16
M 4
4XL 128
L 12
XL 24
M 6
6XL 768
Queue
… More queries …
S
Maximized Auto-scale
Min # clusters Min # clusters
= ≠
Max # clusters Max # clusters
Queue
Standard Economy
Favors starting Favors conserving
additional credits rather
warehouses than starting
additional
warehouses
To organize a
Schemas database
2 Download SnowSQL
3 Install SnowSQL
Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages
Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages
Stage ▪
▪
Cloud provider storage
Upload file before load
Stage ▪ AWS S3 (AWS)
▪ Google Cloud Storage (GCP)
▪ User stages ▪ Azure Container (Azure)
▪ Table stages
▪ Internal Named Stages
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages
Internal
Uploading
Stage
PUT
• Data will be compressed
COPY INTO … Loading
• (.gz file ending)
• Automatically encrypted
Tables • (128-bit or 256-bit keys)
Internal
Downloading
Stage
GET
COPY INTO location
Unloading
Tables
Internal
Stage
Internal Named
User Stages Table Stages
Stages
Load
Stage ▪
▪
Cloud provider storage
Upload file before load
Stage ▪ AWS S3 (AWS)
▪ Google Cloud Storage (GCP)
▪ User stages ▪ Azure Container (Azure)
▪ Table stages
▪ Internal Named Stages
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages
Table Stage
Copy TO stage
SELECT
COPY INTO @STAGE_NAME $1,
FROM TABLE_NAME; $2,
$3
FROM @STAGE_NAME;
BULK CONTINUOUS
LOADING LOADING
Event Notification
Queue Storage
Load
Snowflake DB
Snowpipe for Azure
ALTER PIPE
Pause / Resume SET PIPE_EXECUTION_PAUSED = TRUE
DEFAULT
SNOWPIPE SKIP_FILE Skip file if errors are found
DEFAULT
ABORT_STATEMENT Aborts load if error is found
BULK LOAD
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Copy Options – SIZE_LIMIT
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
SIZE_LIMIT = <num>
25 MB
10MB 0 MB ✓
10MB 10 MB ✓ 3 files loaded with 30MB
10MB 20 MB ✓
10MB 30 MB X
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Copy Options – PURGE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
PURGE= TRUE | FALSE
DEFAULT FALSE Don't load file when they have been loaded before
RETURN_FAILED_ONLY Return only files that have failed to load TRUE | FALSE FALSE
LOAD_UNCERTAIN_FILES Load files even if load status unknown TRUE | FALSE FALSE
RETURN_n _ROWS e.g. RETURN_5_ROWS: Validates <n> rows (returns error or rows)
Stage
Download
Tables
Internal
Stage Download
GET
COPY INTO location
Unload
Tables
DEFAULT
MAX_FILE_SIZE <num>
DEFAULT Can be increased to 5 GB
=16777216 16MB
Unload using SELECT SELECT statement can be used for the COPY statement
e.g. data_0_1_0.csv.gz
Truncate
Filter with WHERE
(TRUNCATECOLUMNS)
Scalar functions Returns one value per invocation (one value per row)
SELECT DAYNAME('2023-12-31')
SELECT DAYNAME("effective_date")
FROM LOAN_PAYMENT
Scalar functions Returns one value per invocation (one value per row)
SELECT MAX(amount)
FROM orders
Scalar functions Returns one value per invocation (one value per row)
Scalar functions Returns one value per invocation (one value per row)
Scalar functions Returns one value per invocation (one value per row)
SELECT SYSTEM$TYPEOF('abc');
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Functions
Supports most standard SQL functions defined in SQL:1999
Scalar functions Returns one value per invocation (one value per row)
https://docs.snowflake.com/en/sql-reference/intro-summary-operators-functions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Estimation functions
Situation:
Large input
COUNT(DISTINCT (COLUMN1,…))
Average error is acceptable 1.62338%
-1.244%
Function:
APPROX_TOP_K (COLUMN)
count >> k
count large
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273 ⇒ more accurate
Frequent Values
SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE_SALES 28.8B rows
SELECT SS_CUSTOMER_SK,COUNT(SS_CUSTOMER_SK)
FROM STORE_SALES
GROUP BY SS_CUSTOMER_SK
The idea:
t-Digest algorithm is used to estimate
percentile values.
Function:
APPROX_PERCENTILE (COLUMN,<percentile>)
Returns the percentile
value
SELECT APPROX_PERCENTILE(O_TOTALPRICE,0.5)
FROM ORDERS;
1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH
{ [
MH
{
"state": [
2200169610250,
22818457966550,
SELECT MINHASH(7, O_ORDERKEY) AS mh FROM ORDERS; 2507497641893,
12337014946743,
5083517324927,
1039435359430,
967271249674
],
"type": "minhash",
"version": 1
}
1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH
2 FROM
similarity with APPROXIMATE_SIMILARITY().
1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH
2 FROM
similarity with APPROXIMATE_SIMILARITY().
select add_two(3);
✓ Schema-level object
Examples:
o AWS Lambda function
o Microsoft Azure function
o HTTPS server
Limitations:
• Must be scalar
• Slower performance (overhead + fewer optimizations)
• Not sharable
Supported languages:
o Snowflake Scripting
(Snowflake SQL + procedural logic)
o JavaScript
o Snowpark API
(Python, Scala, Java)
Result
call update_table('new_value','table_name');
Use-case:
SELECT my_seq.nextval;
SELECT my_seq.nextval;
Sequences
{
▪ no fixed schema "courses":[
{
▪ contains tags/labels and a nested structure "topic":"Snowflake",
"level":"All levels"
},
{
What is structured data? "topic":"SQL"
"language":["English","German"]
},
{
Data has a well-defined structure "topic":"Azure",
"level":"Beginner"
}
]
}
▪ JSON
▪ XML
▪ PARQUET
▪ ORC
▪ Avro
{
"topic":"Snowflake",
"level":"All levels"
},
ARRAY
Consists of 0 or more pieces of data.
VARIANT
COPY INTO …
o audio files
o documents
Snowflake supports:
Internal & External Stages
o Layered on a stage
Manual refresh
Automatical refresh
using event notification
10 TB
Why Sampling?
SAMPLE
500 GB
Why Sampling?
Percentage of rows
Reproducable results
BLOCK or SYSTEM method
Every row is chosen with percentage p Every block is chosen with percentage p
Schema-level object
ALTER TASK my_task RESUME;
Can be cloned ALTER TASK my_task SUSPEND;
Limited to
• 1000 tasks in total
Task A Task B • 100 child tasks
Task A Task B
CREATE TASK my_task
WAREHOUSE = my_wh
AFTER my_task_a
AS
…;
HR data
Data scources
Schema-level object
Can be cloned
Table
Stream object
DELETE
INSERT
UPDATE
Table
Stream object
Table
Stream object
METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID
Consuming a stream
INSERT
Consuming a stream
Empties records
INSERT
Sales data
ETL
HR data
Data scources
✓ UPDATE
✓ DELETE
Views Views
✓ INSERT ✓ INSERT
✓ UPDATE
✓ DELETE
Snowsight Go
Pyhton
JDBC
Kafka
Command Line Tool .NET
Spark
Node.js
SnowSQL
ODBC
PHP PDO
Snowflake Scripting
variables
cursors
if/case
Stored Procedures Procedual Code resultsets
loops
Outside of
Most commonly
stored procedures
Snowflake Scripting
IF/ELSE
if/case
CASE
FOR REPEAT
loops
WHILE LOOP
BEGIN
Minimizing confusion CREATE TABLE employee (id INTEGER,…);
CREATE TABLE store (id INTEGER, …); block
END;
Python Code Build application and query data outside the system
No need to move data!
Converts to SQL
- Create clones of tables, schemas and and databases from previous state
Configurable for
table, schema, DATA_RETENTION_TIME_IN_DAYS = 2 DEFAULT = 1
database and account
For all accounts
Time travel up to 1 day Time travel up to 90 days Time travel up to 90 days Time travel up to 90 days
Current
Disaster? Time Travel Data Storage
Current
Fail Safe Time Travel Data Storage
permanent: 7 days
(transient: 0 days)
0 – 90 days Currently available
Original
Cloud Service Layer
Updates
New data
inherited
inherited
inherited
inherited
inherited
inherited
Pipe
Stream OWNER
Task
All other
ojects USAGE
Account 1
Account 1 Provider
Storage
Data is synchronized
Account 2
Account 2 Consumer
Compute Resources Read-only
Cannot be modified!
Read-only
Account 1
Account 1 Provider & Consumer
Storage
Account 2
Account 2 Provider & Consumer
Compute Resources Read-only
4. Import share ACCOUNTADMIN role or IMPORT SHARE / CREATE DATABASE privileges required
Secure UDFs
Privileges Privileges
Provider Account
Compute
Resources Created
&
managed by
Non-snowflake users
Reader Account
Provider responsible
for all costs
✓ Each account can share and consume ✓ Sharing across region / cloud provider must be enabled
Even own share can be consumed Done via replicating
Provider Account 1
Primary Database
Region 1
Across cloud provider
Across regions Data and objects synchronized periodically
"Cross-region sharing"
-- Enable replication for each source and target account in your organization
select system$global_account_set_parameter('<organization_name>.<account_name>',
'ENABLE_ACCOUNT_DATABASE_REPLICATION', 'true');
Users
Creates
Role Securable object
Owns
TO User 1
GRANT <privilege>
Privilege 1 User
ON <obeject> Privilege 2
TO <role> GRANT <role>
TO <user>
SECURITYADMIN SYSADMIN
Custom Role 3
PUBLIC
Securable object
USAGE
USAGE
SELECT
SECURITYADMIN SYSADMIN
Custom Role 3
PUBLIC
GRANT
Privilege 1
Role REVOKE
Role 1
Role 2
Privilege 1
Role
User Role 2
⇒ Primary role
USERADMIN
PUBLIC
✓ Hierarchy of roles
SECURITYADMIN SYSADMIN
✓ Privileges are inherited
USERADMIN
PUBLIC
✓ Best practice:
SECURITYADMIN SYSADMIN Assigned to SYSADMIN
Custom Role 3
PUBLIC
✓ Create accounts
SECURITYADMIN SYSADMIN
✓ View all accounts
✓ View account usage information
USERADMIN
PUBLIC
PUBLIC
PUBLIC
USERADMIN
PUBLIC
PUBLIC
OWNERSHIP
global
GRANT MANAGE GRANTS
privilege
REVOKE
Privilege 1
Role
GRANT <privilege> SELECT
ON <obeject>
my_table
TO <role>
REVOKE <privilege>
ON <obeject>
FROM <role>
OPERATE Enables to change the state of a warehouse (e.g. suspend and resume).
USAGE Enables to use the database and execute SHOW DATABASES command.
REFERENCE_USAGE Enables using an object (shared secure view) to reference another object in a different database.
Enables to perform operations that require reading (e.g. GET, LIST, COPY INTO table)
READ
Stages from internal stages; not applicable to external stages
Enables to perform writing to internal stage (PUT, REMOVE, COPY INTO location); not
WRITE applicable to external stages
Tables INSERT Inserting values into the table and manually reclustering tables.
Authentication is proving that you are who you say you are.
One or two
Minimum:
Connecting via Snowflake
2048-bit RSA Clients (SnowSQL etc.)
key pair
Enterprise Edition
Dynamic Data Masking
Masking policy
Based on
Define policy
Dynamic Data Masking
CREATE MASKING POLICY my_policy
AS (val varchar) returns varchar ->
Original column value CASE
WHEN current_role in (role_name) Masking value
THEN val
ELSE '##-##'
END;
Apply policy
ALTER TABLE my_table MODIFY COLUMN phone
SET MASKING POLICY my_policy;
Define policy
Dynamic Data Masking
CREATE MASKING POLICY my_policy
AS (val varchar) retruns varchar ->
Original column value CASE
WHEN current_role in (role_name) Masking value
THEN val
ELSE '##-##'
END;
Apply policy
ALTER TABLE my_table MODIFY COLUMN phone
UNSET MASKING POLICY my_policy;
Enterprise Edition
External Tokenization
Enterprise Edition
Row Access Policies
Filtered at runtime
Condition
User Role
Apply policy
ALTER TABLE my_table ADD ROW POLICY my_policy
ON (column1);
Apply policy
Standard Edition
Standard Edition
Standard Edition
Enterprise Edition
Tables Internal Stages Re-keying every year If enabled
Key Rotation
AES 256-bit encryption
Snowflake-managed
TLS 1.2
End-to-End encryption
PUT
Internal Stage
External Stage
Client-side
encryption
Customer- Snowflake-
Master Key
managed managed
Shared database
ACCOUNTADMIN
can view everything
Object metadata
Reader accounts
Object metadata
Object metadata
Object metadata
Read-only
Long-term historical
Historical usage data
usage data
Output depends on privileges
Parent DB + account-level
Not real-time
45 min – 3 hours latency Information schema query returned too much data.
Please repeat query with more selective predicates.
Read-only
Long-term historical
Historical usage data
usage data
Output depends on privileges
Parent DB + account-level
Not real-time
No latency
45 min – 3 hours latency
Does not includes dropped objects
Includes dropped objects
Shorter retention
Retention: 365 days
(7 days – 6 months)
Impact on
Monthly
workload
Enterprise Edition
When to use?
Bytes Scanned
Statistics
Scanned from Cache
Data Spilling
QUERY_HISTORY
table function in select * from table(information_schema.query_history())
INFORMATION_SCHEMA order by start_time;
QUERY_HISTORY
view in select * from snowflake.account_usage.query_history
ACCOUNT_USAGE schema
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Caching
STORAGE
Remote Disk
CLOUD SERVICES Metadata Cache Result Cache Same queries can use that cache in the future
Table data has not changed
Micro-partitions have not changed
Query doesn't include UDFs or external functions
Virtual Virtual Virtual Sufficient privileges & results are still available
Warehouse Warehouse Warehouse
Very fast result (persisted query result)
QUERY PROCESSING Avoids re-execution
(COMPUTE) Can be disable by using
Data Cache
USE_CACHED_RESULT parameter
Remote Disk
CLOUD SERVICES Metadata Cache Result Cache Cannot be shared with other warehouses
Data Cache
STORAGE
Remote Disk
CLOUD SERVICES Metadata Cache Result Cache Porperties for query optimization and processing
Range of values in micro-partition
STORAGE
Remote Disk
CLOUD SERVICES
STORAGE
STORAGE
Automatically performed
Can't be disabled
Hundreds of millions Allow very granular partition pruning
Additional properties
for query optimization Data is stored in columnar format
Unnessary columns are eliminated when querying
Micro-partitions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Micro-partitions
External Cloud Provider
STORAGE
Immutable
Can't be changed
new data
=
new micro-partitions
CLUSTERING KEYS
With order in which
data is created
Micro-partitions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Clustering Keys
CLUSTERING KEYS
Micro-partitions
Micro-partition 1 Micro-partition 2
Sorted by Sales_Date Name Amount
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-02 Sunglases TR-7 $25
Sales_Date 2023-06-01 Chocolate bar 70% cacao $3
2023-06-02 2023-06-03 2023-06-02 2023-06-03
2023-06-02 Sunglases TR-7 $25
Chocolate bar Chocolate bar Oat meal 2023-06-03 Oat meal biscuits $4
Sunglases TR-7
70% cacao
Name 70% cacao biscuits 2023-06-02 Chocolate bar 70% cacao $3
Sunglases TR-7
Oat meal Oat meal
Sunglases TR-7
2023-06-03 Oat meal biscuits $4
biscuits biscuits
2023-06-02 Oat meal biscuits $4
2023-06-05 Sunglases TR-7 $25
$3 $25 $3 $4
Amount
$25 $4 $4 $25
Micro-partition 1 Micro-partition 2
Metadata stored
$3 $25 $3 $4
Amount
$25 $4 $4 $25
$3 - $25 $3 - $25
Overlapping micro-partitions
Micro-partition 2
Clustering depth A - F Micro-partition 2 A - C
Micro-partition 2
Clustering depth A - F Micro-partition 2 A - C
B - D Micro-partition 2
Clustering depth A - F Micro-partition 2
A - B C - D F
Clustering depth A - F Micro-partition 2
A - B C - D F
Improved Query Performance A - F Micro-partition 2
A - F Micro-partition 1
A - B C - D F
A - F Micro-partition 2
A - F Micro-partition 1
Reclustering is automatic
Cloud Services (Serverless) A - B C - D F
A - F Micro-partition 2
Storage Costs
Micro-partitions
CREATE TABLE t1 CLUSTER BY (c1, c5); ALTER TABLE t1 DROP CLUSTER KEY;
Similar to secondary index concept Substring and regular expression searches e.g. LIKE or ILIKE, VARIANT column
Start slow
Serverless Costs Storage Costs
Credit consumption Additional Storage needed Resource monitors can't control Snowflake-managed warehouses
Limitations
• Query only 1 table (no joins)