0% found this document useful (0 votes)
10 views446 pages

Snowflake Question

The document provides an overview of the SnowPro Core Certification, including exam topics, preparation strategies, and Snowflake's architecture. It outlines the exam structure, passing score, and various Snowflake editions with their features and pricing. Additionally, it discusses Snowflake's pricing model for compute, storage, and data transfer, emphasizing the flexibility and scalability of the platform.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views446 pages

Snowflake Question

The document provides an overview of the SnowPro Core Certification, including exam topics, preparation strategies, and Snowflake's architecture. It outlines the exam structure, passing score, and various Snowflake editions with their features and pricing. Additionally, it discusses Snowflake's pricing model for compute, storage, and data transfer, emphasizing the flexibility and scalability of the platform.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 446

Not provided by, affiliated with, or sponsored by Snowflake Inc.

INTRODUCTION
Section 0:
About the exam & course setup
About the
SNOWPRO CORE
CERTIFICATION
The Definite Preparation Course
✓ Impactful way to advance career
Why getting certified? ✓ Positioning as an expert
✓ Future proof + great job opportunities.

✓ SnowPro Core Certification


What is covered? ✓ https://www.snowflake.com/certifications/
✓ Topics are always kept up-to-date.

✓ Not needed for the exam.


Demos ✓ Help with memorizing.
✓ Give you practical foundation.

✓ Clear exam with ease.


Goal
✓ Knowledge for working with Snowflake.

✓ 750 / 1000
Passing Score ✓ Goal: Achieve a score of 900+
How to
Master the exam
Exam Topics
DOMAIN WEIGHT
1.0 Snowflake Data Cloud Features & Architecture 25%

2.0 Account Access and Security 20%

3.0 Performance Concepts 15%

4.0 Data Loading and Unloading 10%

5.0 Data Transformations 20%

6.0 Data Proctection and Data Sharing 10%


Master the exam
Not needed for the exam.
Free Trial Account Help with memorizing.
Give you practical knowledge.

Exam Overview https://www.snowflake.com/certifications/

Exam Duration ❑ Time: 115min

❑ 100 questions ❑ Multiple Select, Multiple Choice


Exam Questions What are the features about? How do they work?

What is the mechanism to eliminate micro-partions


What is not a system role in Snowflake? during query runtime?
❑ SECURITYADMIN ✓
❑ Pruning

❑ NETWORKADMIN ❑ Caching
❑ ORGADMIN ❑ Clustering
❑ ACCOUNTADMIN ❑ Flattening
Recipe to clear the exam
Step-by-step incl. Demos
Lectures ~ 30-60 min / day

Quizzes Practice and test your knowledge

Slides Repeat and go through important points

Evaluate knowledge & weaknesses


Practice Test Eliminate weaknesses

Book Exam
Confident & Prepared
Final Tips

Resources

Q&A Section

Reviews

Connect & Congratulate


Snowflake
Architecture
Domain 1.0:
Data Cloud Features & Architecture
What is Snowflake?

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


What is Snowflake?

T
1 2 3
3

Self-managed Cloud Data Platform


Self-managed service

73273
1760 0009-14563.7
No need to select,
1 No Hardware install, configure, or
manage.

1250 003-77156.8
No software needs to be
2 No Software installed, configured,
or managed.

Transparent Weekly releases.

003-1040559
No downtime.
3 No Maintenance Early access for Enterprise
Edition accounts on request.
Cloud

73273
Completely cloud-native.

1760 0009-14563.7
From scratch built for
1 Designed for Cloud the cloud.

1250 003-77156.8
All components run
completely in the cloud.
2 Runs in the Cloud Cannot be installed on-
premise.

003-1040559
Storage and compute
3 Cloud optimized scale independently and
elastically.
Data Platform

73273
One single platform around data.

1760 0009-14563.7
Modern Data Warehouse
1 Data Warehouse with advanced features
and performance.

1250 003-77156.8
Stores structured, semi-
2 Data Lake structured and unstructured
data.

003-1040559
Connect ML tools.
Run ML models in Snowflake.
3 Data Science Language of choice with
Snowpark.
What is Snowflake?

T 1 2 3
3
Self-managed Cloud Data Platform
No Software, Designed for Cloud, Data Warehousing,
No Hardware, Runs in the Cloud, Data Engineering,
No Maintenance. Optimized for Cloud. Data Applications.
Multi-cluster shared-disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Traditional architectures

T
1

shared-disk shared-nothing
Processor
Central data Accessible from all Each node is
memory
storage compute nodes independent
disk
Traditional architectures

PROs PROs
Simplicity Scalability

Data management High availability

CONs CONs

Limited scalability Expensive

shared-disk Network bottleneck shared-nothing Management more difficult

Single point of failure


Traditional architectures
Multi cluster
shared-data

1 Central data repository

Massive parallel
shared-disk shared-nothing 2 processing compute
clusters

1 2 Each node stores a
portion of the data
locally
Traditional architectures
Multi cluster
PROs shared-data

Data management simplicity shared-disk


1 Central data repository

performance Massive parallel


shared-nothing
scale-out benefits
2 processing compute
clusters

each node stores a
portion of the data
locally
Three distinct layers

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Three distinct layers
CLOUD SERVICES

Virtual Virtual Virtual


Warehouse Warehouse Warehouse

QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Three distinct layers
- Compressed Columnar Storage -

Decoupled Compute & Storage

Blob Storage (AWS, Azure, GCP)

Snowflake manages all aspects about storage

Optimized for OLAP / analytical purposes

DATABASE STORAGE
Three distinct layers

Virtual Virtual Virtual


Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Three distinct layers
- "Muscle of the system" -
Queries are processed using Virtual warehouses

Warehouse = MPP compute cluster


(multiple compute nodes)

Provides resources: CPU, memory, and temporary storage

Virtual Virtual Virtual


Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)
Three distinct layers
CLOUD SERVICES

Virtual Virtual Virtual


Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Three distinct layers
CLOUD SERVICES

- "Brain of the system" - Collection of services to


coordinate & manage the components
✓ Authentication
Also run on compute instances
✓ Access control of cloud provider
✓ Metadata management
✓ Query parsing and optimization
✓ Infrastructure management
Three distinct layers
CLOUD SERVICES

Virtual Virtual Virtual


Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Snowflake Editions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Additional
features/service

Additional
features/service
Additional
features/service
Virtual Private
Highest level of
Business Critical security
Even higher data
protection for
Enterprise organizations with
extremely
Additional features for sensitive data

Standard the needs of large-


scale enterprises

Features
Introductory level

Pricing
Snowflake Editions

Standard Enterprise Business Critical Virtual Private

✓ Complete DWH ✓ All Standard features ✓ All Enterprise features ✓ All Business Critical features
✓ Automatic data encryption ✓ Multi-cluster warehouse ✓ Additional security ✓ Dedicated virtual servers
✓ Broad support for standard ✓ Time travel up to 90 days features such as and completely seperate
and special data types Materialized views customer-managed
✓ Snowflake environment
✓ Time travel up to 1 day
✓ Search Optimization encryption ✓ Dedicated metadata store
✓ Disaster recovery for 7 days
✓ Column-level security ✓ Support for data specific ⇒ Isolated from all other
beyond time travel
✓ 24-hour early access to regulation Snowflake accounts
✓ Network policies
weekly new releases ✓ Database
✓ Secure data share
✓ Federated authentication & failover/failback

SSO (disaster recovery)


003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
✓ Premier support 24/7
Compute Cost

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Pricing
Storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Compute Data Transfer Storage

✓ Compute and Storage costs decoupled


✓ Pay only what you need
✓ Scalable and at affordable cloud price
✓ Pricing depending on the region/cloud provider

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Compute Storage

Standard
(Active) Warehouses Query Processing

Behind-the-scenes Only charged if exceeds


Cloud Services cloud service tasks 10% of warehouse consumption

Search Optimization
Serverless Snowpipe
Automatically resized
Snowflake Pricing

Compute

✓ Charged for active warehouses per hour


✓ Billed by second (minimum of 1min)
✓ Depending on the size of the warehouse

Time / active warehouses / size

✓ Charged in Snowflake credits

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Compute Storage

Credits $/€
Consumed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Virtual Warehouse Sizes

XS 1
L 8
S 2
XL 16
M 4

4XL 128

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Standard Enterprise Business Critical Virtual Private

✓ $2 / Credit ✓ $3 / Credit ✓ $4 / Credit ✓ Contact Snowflake

Region: US East (Ohio)


Platform: AWS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Standard Enterprise Business Critical Virtual Private

✓ $2.70 / Credit ✓ $4 / Credit ✓ $5.40 / Credit ✓ Contact Snowflake

Region: EU (Frankfurt)
Platform: AWS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Storage & Data Transfer

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Compute Data Transfer Storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

$40
per TB / per month Storage

Region: US East (Ohio)


Platform: AWS ✓ Monthly storage fees
✓ Based on average storage used per month
✓ Cloud Providers
✓ Cost calculated after compression

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Storage

On Demand Capacity
Storage Storage

✓ Pay only for what you use ✓ Pay only for defined capacity upfront

✓ $40/TB ✓ $23/TB

Region: US East (Northern Virginia)


Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing

Storage

On Demand Capacity
Storage Storage

✓ Pay only for what you use ✓ Pay only for defined capacity upfront

✓ $45/TB ✓ $24.50/TB

Region: EU (Frankfurt)
Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing

On Demand Capacity
Storage ✓ We think we need 1 TB of storage Storage

❖ Scenario 1: 100GB of storage used ❖ Scenario 1: 100GB of storage used


0.1 TB x $40 = $4 1 TB x $23 = $23

❖ Scenario 2: 800GB of storage used ❖ Scenario 2: 800GB of storage used


0.8 TB x $40 = $32 1 TB x $23 = $23

Region: US East (Northern Virginia)


Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing

On Demand Capacity
Storage Storage

✓ Start with On Demand


✓ Once you are sure about your usage use
Capacity storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Compute Data Transfer Storage

✓ Data ingress FREE


✓ Data egress CHARGED
✓ Cloud Storage Provider

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Pricing

Data Transfer

✓ Data ingress FREE


Transfer/replicate data to different account
✓ Data egress CHARGED
(in different region and/or cloud provider)
✓ Cloud Storage Provider

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Storage Monitoring

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Individual Table Storage

SHOW TABLES; Statistics for table storage and properties

TABLE_STORAGE_METRICS
view in
INFORMATION_SCHEMA
Most detailed:
Active (ACTIVE_BYTES column)
Time Travel (TIME_TRAVEL_BYTES column)
Fail-safe (FAILSAFE_BYTES column)
TABLE_STORAGE_METRICS
view in
ACCOUNT_USAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Individual Table Storage

SHOW TABLES; SHOW TABLES;

TABLE_STORAGE_METRICS
view in SELECT * FROM DB_NAME.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
INFORMATION_SCHEMA

TABLE_STORAGE_METRICS
view in SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
ACCOUNT_USAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Resource Monitors

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Resource Monitors
Control and monitor credit usage of warehouses and account

Standard Edition

Set Credit Limits

In defined cycle

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Resource Monitors
Control and monitor usage of warehouses and account credits

Standard Edition

Virtual warehouse

Account

Multiple
virtual warehouses

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Resource Monitors
Control and monitor usage of warehouses and account credits

Standard Edition

Suspend immediately and notify


Created by

Suspend and notify ACCOUNTADMIN

Notify MONITOR, MODIFY


On resource monitor

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Resource Monitors
Control and monitor usage of warehouses and account credits

Standard Edition
Track usage of cloud services needed
Virtual warehouse

Can be supsended if limit reached


Account

Can NOT prevent cloud service usage


Multiple
virtual warehouses

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Warehouses &
Multi-Clustering

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Warehouses

What is a virtual warehouse? Type


Provides compute
resources to Size
execute queries
and operations
Multi-cluster

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Warehouses

Standard Snowpark-optimized
Most suitable in Recommended for
most use cases memory-intensive
workloads such as
ML training

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Virtual Warehouse Sizes

XS 1
L 8
S 2
XL 16
M 4

4XL 128

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Virtual Warehouse Sizes
(Snowpark-optimized)

L 12

XL 24
M 6

6XL 768

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Clustering

Queue

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Clustering

… More queries …
S

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Clustering

S Great for more


concurrent users!
… More queries …
S
Not ideal for more
> Auto-Scaling complex workload!
S

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Mode

Maximized Auto-scale
Min # clusters Min # clusters
= ≠
Max # clusters Max # clusters

Static workload Dynamic workload

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Clustering

Queue

Auto-Scaling: When to start an additional cluster?

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Scaling policy

Standard Economy
Favors starting Favors conserving
additional credits rather
warehouses than starting
additional
warehouses

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Scaling policy
Policy Description Cluster Starts… Cluster Shuts Down…

Immediately when either a query is After 2 to 3 consecutive successful


Prevents/minimizes queuing by favoring queued or the system detects that checks
Standard starting additional clusters over conserving there are more queries than can be (performed at 1 minute intervals), which
(default) credits. executed by the currently available determine whether the load on the least-
clusters. loaded cluster could be redistributed to
the other clusters
Conserves credits by favoring keeping running
clusters fully-loaded rather than starting
Only if the system estimates there’s
additional clusters. After 5 to 6 consecutive successful
Economy enough query load to keep the cluster
checks …
busy for at least 6 minutes.
Result: May result in queries being queued and
taking longer to complete.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Objects

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Objects in Snowflake Centralized view of
accounts in
organization
Organization
E.g. billing

Account 2 Account 1 Managed by


ORGADMIN

Users Roles Databases Warehouses Other account objects

To organize a
Schemas database

UDFs Views Tables Stages Other database objects

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


SnowSQL

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


SnowSQL
1 What is SnowSQL?

2 Download SnowSQL

3 Install SnowSQL

4 Connect to your Snowflake Account

5 Run a query in SnowSQL on your account


003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
SnowSQL
Connect to Snowflake through the command line

Execute queries, load, and unload data

Perform all DDL & DML operations

Windows Linux MacOS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Loading &
Unloading
Domain 4.0:
Data Loading and Unloading
Internal Stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

Stages in Snowflake are

Stage locations used to store data.

✓ Location where data is loaded FROM

Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Stages in Snowflake are

Stage locations used to store data.

✓ Location where data is loaded TO

Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Internal Stages in Snowflake are


locations used to store data.
Stage Stage
✓ Location where data is loaded FROM/TO
✓ Not to be confused with dataware house stages
External
Stage
Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Internal Snowflake managed External External cloud provider

Stage ▪

Cloud provider storage
Upload file before load
Stage ▪ AWS S3 (AWS)
▪ Google Cloud Storage (GCP)
▪ User stages ▪ Azure Container (Azure)
▪ Table stages
▪ Internal Named Stages
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Internal
Uploading
Stage
PUT
• Data will be compressed
COPY INTO … Loading
• (.gz file ending)
• Automatically encrypted
Tables • (128-bit or 256-bit keys)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

Internal
Downloading
Stage
GET
COPY INTO location
Unloading

Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

Internal
Stage

Internal Named
User Stages Table Stages
Stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Internal stages

User Stages • Tied to a user


• Cannot be accessed by other users
• Every user has default stage
• Cannot be altered or dropped
• Put files to that stage before loading
• Explicitly remove files again
• Loading to multiple tables
• Referred to with '@~'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Internal stages

Table Stages • Automatically created with a table


• Can only be accessed by one table
• Cannot be altered or dropped
• Load to one table
• Referred to with '@%TABLE_NAME'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Internal stages

Named Stages • CREATE STAGE …


• Snowflake database object
• Everyone with privileges can access it
• Most flexible
• Referred to with '@STAGE_NAME'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Use cases

Local System Multiple Snowflake tables

Load

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Use cases
1. Connect to SnowSQL

Local System Multiple Snowflake tables


User Stage
PUT
COPY INTO

GET COPY INTO

2. PUT files to stage


3. COPY INTO tables 4. Process data

6. GET files from stage


5. COPY INTO stage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


External Stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

Internal Snowflake managed External External cloud provider

Stage ▪

Cloud provider storage
Upload file before load
Stage ▪ AWS S3 (AWS)
▪ Google Cloud Storage (GCP)
▪ User stages ▪ Azure Container (Azure)
▪ Table stages
▪ Internal Named Stages
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

External • CREATE STAGE …

Stage • Snowflake database object


• Everyone with privileges can access it
• Referred to with '@STAGE_NAME'
External cloud provider
▪ AWS S3 (AWS) • References external storage location
▪ Google Cloud Storage (GCP)
CREATE STAGE <stage_name>
▪ Azure Container (Azure) URL = 's3://bucket/path/'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

External • CREATE STAGE …

Stage • Snowflake database object


• Everyone with privileges can access it
• Referred to with '@STAGE_NAME'
External cloud provider
▪ AWS S3 (AWS) • References external storage location
▪ Google Cloud Storage (GCP)
CREATE STAGE <stage_name>
▪ Azure Container (Azure) URL = 'azure://account.blob.core.windows.net/container/path/'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

External • CREATE STAGE …

Stage • Snowflake database object


• Everyone with privileges can access it
• Referred to with '@STAGE_NAME'
External cloud provider
▪ AWS S3 (AWS) • References external storage location
▪ Google Cloud Storage (GCP)
CREATE STAGE <stage_name>
▪ Azure Container (Azure) URL = 's3://bucket/path/'
CREDENTIALS = …
FILE_FORMAT = …

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stages

External • CREATE STAGE …

Stage • Snowflake database object


• Everyone with privileges can access it
• Referred to with '@STAGE_NAME'
External cloud provider
▪ AWS S3 (AWS) • References external storage location
▪ Google Cloud Storage (GCP)
CREATE STAGE <stage_name>
▪ Azure Container (Azure) URL = 's3://bucket/path/'
STORAGE_INTEGRATION = …
FILE_FORMAT = …

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


LIST
LIST @STAGE_NAME; External Stage / Internal Named Stage

List all files


LIST @~; User Stage
and additional properties

LIST @%TABLE_STAGE_NAME; Table Stage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Referencing stages
Copy FROM stage Query from stage

COPY INTO TABLE_NAME SELECT * FROM @STAGE_NAME;


FROM @STAGE_NAME;

Table Stage
Copy TO stage
SELECT
COPY INTO @STAGE_NAME $1,
FROM TABLE_NAME; $2,
$3
FROM @STAGE_NAME;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


COPY INTO

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


COPY INTO
Copy FROM stage

COPY INTO TABLE_NAME ▪ Loads data into a table


FROM @STAGE_NAME;
Files must be already staged in:
• Internal or External Stage
Copy TO stage

COPY INTO @STAGE_NAME


▪ Unloads data into a file
FROM TABLE_NAME;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


COPY INTO
Copy FROM stage
▪ Loads data into a table
COPY INTO TABLE_NAME ⇒ Bulk Loading
FROM @STAGE_NAME;
▪ Files must be already staged in:
• Internal Named Stage
File formats: • External Stage
• CSV (Default)
• JSON ▪ Warehouses are needed
• AVRO
• ORC ▪ Data Transfer costs may apply
• PARQUET
• XML

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


COPY INTO
Load specific files

COPY INTO TABLE_NAME Load with Copy Options


FROM @STAGE_NAME
FILES = file_name1,…;
COPY INTO TABLE_NAME
FROM @STAGE_NAME
FILES = file_name1,…
CopyOptions;
Load with pattern

COPY INTO TABLE_NAME


FROM @STAGE_NAME
PATTERN = .*sales.*;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


COPY INTO

COPY INTO TABLE_NAME


FROM @STAGE_NAME;
FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' |
TYPE = CSV | JSON | AVRO | ORC | PARQUET | XML)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


File Format

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


FILE FORMAT

COPY INTO TABLE_NAME


FROM @STAGE_NAME;

COPY INTO TABLE_NAME


FROM @STAGE_NAME
FILE_FORMAT = (TYPE = CSV …);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


FILE FORMAT

CREATE FILE FORMAT <fileformatname>


TYPE = CSV
FIELD_DELIMITER =','
SKIP_HEADER = 1;

CREATE STAGE <stagename>


URL = '<location>'
FILE_FORMAT = (FORMAT_NAME = <fileformatname >);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


FILE FORMAT

CREATE FILE FORMAT <fileformatname> COPY INTO table_name


TYPE = CSV FROM @stagename;
FIELD_DELIMITER =','
SKIP_HEADER = 1;

COPY INTO table_name


CREATE STAGE <stagename> FROM @stagename
URL = '<location>' FILE_FORMAT = (TYPE = CSV …);
FILE_FORMAT = (FORMAT_NAME = <fileformatname >);

COPY INTO table_name


FROM @stagename
FILE_FORMAT = (FORMAT_NAME = <fileformatname >);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Storage Integration

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Integration object
Storage Integration ✓ Stores a generated identity for external cloud storage

Create Snowflake object ✓ Contains ALLOWED_LOCATIONS

Grant permission in Cloud Provider ✓ Allow it as Enterprise Application

Assign Role in Cloud Provider ✓ Grant permissions

Use it in Stage ✓ Connect it to stage object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpipe

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Loading Data

BULK CONTINUOUS
LOADING LOADING

✓ Manual execution ✓ Snowpipe

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


What is Snowpipe?

✓ Loads data immediately when a file appears in a blob storage


✓ According to defined COPY statement
✓ If data needs to be available immediately for analysis
✓ Snowpipe uses serverless features instead of warehouses
⇒ No user-created warehouse is needed!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpipe

Cloud notification Serverless


Load
COPY
S3 bucket / Container
Snowflake DB

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpipe for Azure

Event Notification
Queue Storage

Azure Container Notification Integration

Load

Snowflake DB
Snowpipe for Azure

CREATE PIPE <name>


AUTO_INGEST = TRUE | FALSE
INTEGRATION = '<string>'
COMMENT = '<string_literal>'
AS <copy_statement>

✓ PIPE contains COPY statement


Snowpipe for Azure

CREATE PIPE <name>


AUTO_INGEST = TRUE | FALSE
INTEGRATION = '<string>'
COMMENT = '<string_literal>'
AS COPY INTO table_name
FROM @stage_name

✓ PIPE contains COPY statement


Setting up Snowpipe
Storage Integration ✓ Connection details to container
✓ Grant permissions

Create Stage ✓ Location to container

Queue + Notification ✓ To trigger snowpipe

✓ Notification can be received by Snowflake


Notification Integration
✓ Grant permissions

Create Pipe ✓ Create pipe as object with COPY COMMAND

Test COPY COMMAND ✓ To make sure it works

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpipe
Snowpipe Methods

Cloud messaging REST API

✓ Uses event notifications ✓ Calls REST API Endpoints

✓ External stages ✓ Internal stages & External stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpipe
Serverless No dedicated warehouse

Cost Per-second/per-core granulatiry

Load Time Typically within 1 minute

File Size Ideally 100MB – 250MB (or more)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpipe

Load History 14 Days

Location Store in Schema

ALTER PIPE
Pause / Resume SET PIPE_EXECUTION_PAUSED = TRUE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options

COPY INTO <table_name>


FROM @externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions

How data is copied

Unloading vs. Loading

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options
Stage properties

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options

CREATE STAGE <stage_name>


URL = 's3://bucket/path/'
STORAGE_INTEGRATION = …
FILE_FORMAT = …
COPY_OPTIONS = ( copyOptions )

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options

COPY INTO <table_name>


FROM @externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions

If specified in COPY command they will overwrite the stage


copy options

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – ON ERROR
COPY INTO <table_name> Only with Data Loading
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
ON_ERROR = CONTINUE

CONTINUE Continue loading file if errors are found

DEFAULT
SNOWPIPE SKIP_FILE Skip file if errors are found

SKIP_FILE_<num> e.g. SKIP_FILE_10: Skip file when # errors >= 10

SKIP_FILE_<num>% e.g. SKIP_FILE_10%: Skip file when # errors >= 10%

DEFAULT
ABORT_STATEMENT Aborts load if error is found
BULK LOAD
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Copy Options – SIZE_LIMIT
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
SIZE_LIMIT = <num>

SIZE_LIMIT = <num> Specifies max. size of data loaded

DEFAULT Next file will be loaded until <num> (in bytes) is


null exceeded

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – SIZE_LIMIT
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
SIZE_LIMIT = 25000000

25 MB

10MB 0 MB ✓
10MB 10 MB ✓ 3 files loaded with 30MB
10MB 20 MB ✓
10MB 30 MB X
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Copy Options – PURGE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
PURGE= TRUE | FALSE

Remove files (if possible) from stage after


TRUE successful load

DEFAULT FALSE Leave files in stage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – MATCH_BY_COLUMN_NAME
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE

Load semi-structured data columns by matching field names

CASE_SENSITIVE Matches case-sensitive

CASE_INSENSITIVE Matches case-insensitive

DEFAULT NONE Loads data in variant column or based on COPY statement

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – ENFORCE_LENGTH
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
ENFORCE_LENGTH = TRUE | FALSE

DEFAULT TRUE Produces error if loaded string length is exceeded

FALSE Strings are automatically truncated

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – TRUNCATECOLUMNS
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
TRUNCATECOLUMNS = TRUE | FALSE

TRUE Strings are automatically truncated

DEFAULT FALSE Produces error if loaded string length is exceeded

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – FORCE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FORCE = TRUE | FALSE

TRUE Load files even if they have been loaded before

DEFAULT FALSE Don't load file when they have been loaded before

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options – LOAD_UNCERTAIN_FILES
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
LOAD_UNCERTAIN_FILES = TRUE | FALSE

TRUE Load file if load status is unknown

DEFAULT FALSE Don't load file if load status is unknown

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Copy Options
CopyOption Description Values Default
Specifies the error handling for the load CONTINUE | SKIP_FILE | SKIP_FILE_num ABORT_STATEME
ON_ERROR operation | 'SKIP_FILE_num%' | ABORT_STATEMENT NT
Specifies the maximum size (in bytes) of
SIZE_LIMIT data to be loaded
<num> null (no limit)

PURGE Remove files after successful load TRUE | FALSE FALSE

RETURN_FAILED_ONLY Return only files that have failed to load TRUE | FALSE FALSE

Load semi-structured data into columns in CASE_SENSITIVE |


MATCH_BY_COLUMN_NAME matching the columns names
NONE
CASE_INSENSITIVE | NONE
Truncate text strings that exceed the
ENFORCE_LENGTH target column length
TRUE | FALSE TRUE

Truncate text strings that exceed the


TRUNCATECOLUMNS target column length
TRUE | FALSE FALSE

FORCE Load files even if loaded before TRUE | FALSE FALSE

LOAD_UNCERTAIN_FILES Load files even if load status unknown TRUE | FALSE FALSE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


VALIDATION_MODE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


VALIDATION_MODE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS

VALIDATE THE DATA INSTEAD OF LOADING

RETURN_n _ROWS e.g. RETURN_5_ROWS: Validates <n> rows (returns error or rows)

RETURN_ERRORS Returns all errors across all files

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


VALIDATE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


VALIDATE

VALIDATE( <table_name> , JOB_ID => { '<query_id>' | '_last' } )

Validates the files loaded in a past execution of


the COPY INTO <table> command and returns all the
errors encountered during the load

Doesn't return anything if


ON_ERROR = ABORT_STATEMENT

SELECT * FROM TABLE(VALIDATE( ORDERS , JOB_ID => '_last' ))

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading

Stage
Download

COPY INTO location


Unload

Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading

Internal
Stage Download
GET
COPY INTO location
Unload

Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name
FROM table_name

External & Internal Stages File formats:


Structured

Interface (Cloud Provider) GET command • CSV (Default), TSV, etc.


Semi-structured:
• JSON
• PARQUET
GET @internal_stage file://<path_to_file>/<filename>
• XML

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name
FROM table_name

SINGLE = FALSE Split into multiple files

DEFAULT

SINGLE = TRUE Everything in one file

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name
FROM table_name

MAX_FILE_SIZE <num>
DEFAULT Can be increased to 5 GB
=16777216 16MB

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name
FROM (SELECT id, name, start_date
FROM table_name)

Unload using SELECT SELECT statement can be used for the COPY statement

Transformations Transformations can be used

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name
FROM (SELECT id, name, start_date
FROM table_name)
FILE_FORMAT=(TYPE=JSON)

FILE_FORMAT File format can be set in the COPY command

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name
FROM (SELECT id, name, start_date
FROM table_name)
FILE_FORMAT=(TYPE=CSV)
HEADER = TRUE

FILE_FORMAT File format can be set in the COPY command

HEADER = TRUE Will include a header in the output files

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unloading
COPY INTO @stage_name/myfile
FROM (SELECT id, name, start_date
FROM table_name)
FILE_FORMAT=(TYPE=CSV)
HEADER = TRUE

Prefix If not specified 'data_' is prefix

Suffix Suffix is added to ensures each file name is unique

e.g. data_0_1_0.csv.gz

e.g. myfile_0_0_1. csv.gz

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data
Transformations
Domain 5.0:
Data Transformations
Transformations &
Functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Transformations
Data can be transformed when loading data

Simplify ETL pipeline

Supported Not Supported

Column reordering FLATTEN function

Cast data types Aggregation functions

Remove columns GROUP BY

Truncate
Filter with WHERE
(TRUNCATECOLUMNS)

Subset of SQL functions JOINs

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Functions
Supports most standard SQL functions defined in SQL:1999 and
parts of SQL:2003 extensions

Scalar functions Returns one value per invocation (one value per row)

SELECT DAYNAME('2023-12-31')

SELECT DAYNAME("effective_date")
FROM LOAN_PAYMENT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across


Aggregate functions rows

SELECT MAX(amount)
FROM orders

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across


Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

SELECT ORDER_ID, SUBCATEGORY,


MAX(AMOUNT) OVER (PARTITION BY SUBCATEGORY)
FROM ORDERS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across


Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

Return a set of rows for each input row –


Table functions used to obtain information about Snowflake features

SELECT * FROM TABLE(VALIDATE( ORDERS , JOB_ID => '_last' ))

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across


Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

Return a set of rows for each input row –


Table functions used to obtain information about Snowflake features

Control & information functions –


System functions usually SYSTEM$... - SYSTEM$CANCEL_ALL_QUERIES

SELECT SYSTEM$TYPEOF('abc');
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across


Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

Return a set of rows for each input row –


Table functions used to obtain information about Snowflake features

Control & information functions –


System functions usually SYSTEM$... - SYSTEM$CANCEL_ALL_QUERIES

Define functions – store & execute outside of


UDFs + External Snowflake

https://docs.snowflake.com/en/sql-reference/intro-summary-operators-functions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Estimation functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
The idea:
Exact calculations on very large tables can
be very compute-/ memory-intensive.

Mathematical algorithms can approximate the


exact number, they might be good enough and
require fewer resources.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets


MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets


MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Number of Distinct Values
The idea:
HyperLogLog (cardinality estimation
algorithm) is used to estimate the number
of distinct values.

Situation:
Large input
COUNT(DISTINCT (COLUMN1,…))
Average error is acceptable 1.62338%

HLL(COLUMN1,…)) APPROX_COUNT_DISTINCT (COLUMN1,…)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Number of Distinct Values
SELECT HLL(C_NAME) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER;

-1.244%

SELECT COUNT(DISTINCT (C_NAME)) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets


MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Frequent Values
The idea:
Space-Saving algorithm is used to estimate
the most frequent values along with their
frequency.

Function:

APPROX_TOP_K (COLUMN)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Frequent Values
APPROX_TOP_K (COLUMN)

k = No. of values whose frequency


APPROX_TOP_K (COLUMN,<k>) k=1
should be approximated

counters = Max. no. of distinct


APPROX_TOP_K (COLUMN,<k>, <counters>) values that can be tracked

count >> k

count large
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273 ⇒ more accurate
Frequent Values
SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE_SALES 28.8B rows

SELECT SS_CUSTOMER_SK,COUNT(SS_CUSTOMER_SK)
FROM STORE_SALES
GROUP BY SS_CUSTOMER_SK

SELECT APPROX_TOP_K (SS_CUSTOMER_SK,5,20)


FROM STORE_SALES

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets


MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Percentile Values

The idea:
t-Digest algorithm is used to estimate
percentile values.

Function:

APPROX_PERCENTILE (COLUMN,<percentile>)
Returns the percentile
value

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Percentile Values
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000 1.5B rows

SELECT APPROX_PERCENTILE(O_TOTALPRICE,0.5)
FROM ORDERS;

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY O_TOTALPRICE)


FROM ORDERS;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets


MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Similarity of Two or More Sets
The idea:
Uses MinHash to estimate the similarity
between two or more data sets.

Typically the Jaccard similarity coefficient


J(A,B) = (A ∩ B) / (A ∪ B)
is used to compare similarity.

MinHash can estimate J(A,B) quickly.


/
Computationally expensive!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Similarity of Two or More Sets
2-step-process

1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH
{ [
MH
{
"state": [
2200169610250,
22818457966550,
SELECT MINHASH(7, O_ORDERKEY) AS mh FROM ORDERS; 2507497641893,
12337014946743,
5083517324927,
1039435359430,
967271249674
],
"type": "minhash",
"version": 1
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Similarity of Two or More Sets
2-step-process

1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH

SELECT APPROXIMATE_SIMILARITY(mh) Use MinHash states to calculate

2 FROM
similarity with APPROXIMATE_SIMILARITY().

((SELECT MINHASH(100, *) AS mh FROM mhtab1)


APPROXIMATE_SIMILARITY UNION ALL
(SELECT MINHASH(100, *) AS mh FROM mhtab2));

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Similarity of Two or More Sets
2-step-process

1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH

SELECT APPROXIMATE_SIMILARITY(mh) Use MinHash states to calculate

2 FROM
similarity with APPROXIMATE_SIMILARITY().

((SELECT MINHASH(100, *) AS mh FROM mhtab1)


APPROXIMATE_SIMILARITY UNION ALL Returns a value between 0 and 1.

(SELECT MINHASH(100, *) AS mh FROM mhtab2));

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets


MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


User-defined functions (UDFs)
Way of extending functionality with
additional functions.

Supported languages: create function add_two(n int)


o SQL returns int
o Python as
$$
o Java n+2
o JavaScript $$;

select add_two(3);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


User-defined functions (UDFs)
Way of extending functionality with
additional functions.

create function add_two(n int)


Supported languages: returns int
o SQL language python
o Python runtime_version = '3.8'
o Java handler = 'addtwo'
o JavaScript as
$$
def add_two(n):
return n+2
$$;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


User-defined functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


User-defined functions (UDFs)

Scalar functions Returns one output row per input row

Tabular functions Returns a tabular value for each input row

Securable schema-level object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


External Functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


External functions
User-defined functions that are
stored and executed outside of Snowflake.

Calls code that is executed outside of Snowflake.

✓ No code is stored in function definition

✓ Reference third-party libraries, services and data

create external function my_az_funct(string_col VARCHAR)


returns variant
api_integration = azure_external_api_integration
as 'https://my-api-management-svc.azure-api.net/my-api-url/my_http_function'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


External functions
User-defined functions that are
stored and executed outside of Snowflake.

Calls code that is executed outside of Snowflake.

✓ No code is stored in function definition

✓ Reference third-party libraries, services and data

✓ Remotely executed code ⇒ "remote service"

✓ Security related information are stored in API integration

✓ Schema-level object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


External functions
User-defined functions that are
stored and executed outside of Snowflake.

Calls code that is executed outside of Snowflake.

Examples:
o AWS Lambda function
o Microsoft Azure function
o HTTPS server

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Advantages & Limitations
Advantages:

✓ Additional languages including GO and C#


✓ Accessing 3rd-party libraries such as machine learning
scoring libraries
✓ Can be called from Snowflake and from other software

Limitations:

• Must be scalar
• Slower performance (overhead + fewer optimizations)
• Not sharable

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Way of extending functionality with additional functions.

Stored Procedure UDF

Typically performs database


operations - usually Typically calculate and return a
administrative operations like value.
DELETE, UPDATE or INSERT.

Doesn't need to return a value Need to return a value

No need to have access to objects


Caller or Owner's rights
reference in the function.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Way of extending functionality with additional functions.

Supported languages:
o Snowflake Scripting
(Snowflake SQL + procedural logic)

o JavaScript
o Snowpark API
(Python, Scala, Java)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Creating a procedure
Securable objects

create procedure find_min(n1 int, n2 int)


returns int
language sql
as
BEGIN
IF (n1 < n2)
THEN RETURN n1;
ELSE RETURN n2;
END IF;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Calling a procedure

Result

call find_min(5, 7);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Define one or multiple operations.

create procedure update_test_table()


returns varchar
language sql
as
BEGIN
UPDATE manage_db.public.test1
SET test_col = 3;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Define one or multiple operations.

create procedure update_test_table()


returns varchar
language sql
as
BEGIN
UPDATE manage_db.public.test1
SET test_col = 3;
UPDATE manage_db.public.test1
SET test_col2 = 4;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
If argument is used in SQL
statement ⇒ use :argument

create procedure update_test_table(new_value varchar)


returns int
language sql
as
BEGIN
UPDATE manage_db.public.test1
SET test_col = :new_value;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
If argument is used in SQL
statement to refer to as an object
⇒ use IDENTIFIER(:argument)

create procedure update_table(new_v varchar,table_name varchar)


returns varchar
language sql
as
BEGIN
UPDATE IDENTIFIER(:table_name)
SET test_col = :new_value;
RETURN 'Successfully updated table';
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
Calling a stored procedure

call update_table('new_value','table_name');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Runs either with caller's or owner's rights

Runs with the Runs with the


caller's privileges owner's privileges

Can make user of user information Delegation of


(e.g session variables) Administrative tasks

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Stored Procedures
When creating specify rights DEFAULT
⇒ use execute as caller/owner OWNER

create procedure update_table(new_v varchar,table_name varchar)


execute as caller
returns int
language sql
as
BEGIN
UPDATE IDENTIFIER(:table_name)
SET test_col = :new_value;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Secure UDFs &
Stored Procedures

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Secure UDFs & Stored Procedures
DESC FUNCTION MANAGE_DB.PUBLIC.ADD_TWO(NUMBER);

Secure UDFs & Stored Procedures

✓ Hide certain information, e.g. definition

✓ Prevent users from seeing underlying data

CREATE SECURE FUNCTION <function_name> …

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Secure UDFs & Stored Procedures
Trade-off:

Reduced query performance ⇔ Security

Use-case:

• Consider purpose of UDF / Stored Procedure

• NOT make it secure if it is define just for


convenience

• Make it secure when the data is sensitive


enough

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Sequences
CREATE SEQUENCE my_seq
START = 1
DEFAULT = 1
INCREMENT = 1;

SELECT my_seq.nextval;

SELECT my_seq.nextval;
Sequences

✓ Are securable objects

✓ Typically create for default values


003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Sequences

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Sequences
CREATE
CREATE TABLE
TABLE sequence_test(
sequence_test(
id
id int
int DEFAULT
DEFAULT my_seq.nextval,
my_seq.nextval,
first_name
first_name varchar);
varchar);
Sequences
✓ Not guarantueed to be gap-free

INSERT INTO sequence_test(first_name)


VALUES
('Maria'),('Frank');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Semi-structured data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Semi-structured data
What is semi-structured data? JSON

{
▪ no fixed schema "courses":[
{
▪ contains tags/labels and a nested structure "topic":"Snowflake",
"level":"All levels"
},
{
What is structured data? "topic":"SQL"
"language":["English","German"]
},
{
Data has a well-defined structure "topic":"Azure",
"level":"Beginner"
}
]
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Semi-structured Data Types
Supported formats:

▪ JSON

▪ XML

▪ PARQUET

▪ ORC

▪ Avro

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Types in Snowflake
OBJECT
Unordered set of name value pairs.

{
"topic":"Snowflake",
"level":"All levels"
},

ARRAY
Consists of 0 or more pieces of data.

["USA", "India", "Canada"]

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Types in Snowflake
VARIANT Use-cases

Can store values of any other data Explicitely define hierarchy of


type including ARRAY and OBJECT. ARRAYs and OBJECTs

Suitable to store and query semi- Let Snowflake convert semi-structured


structured data. data into hierarchy of ARRAY, OBJECT,
and VARIANT data stored into VARIANT

CREATE FILE FORMAT my_fileformat


TYPE = {JSON | AVRO | XML | PARQUET | ORC}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Types in Snowflake
VARIANT Use-cases

SQL nulls are just stored as "null" Explicitely define hierarchy of


strings ARRAYs and OBJECTs
They are called JSON null (or "VARIANT null")

Non-native strings (e.g. dates) are Let Snowflake convert semi-structured


stored in strings data into hierarchy of ARRAY, OBJECT,
and VARIANT data stored into VARIANT

Maximum length is 16 MB.


(uncompressed per row)
CREATE FILE FORMAT my_fileformat
TYPE = {JSON | AVRO | XML | PARQUET | ORC}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Loading semi-structured data
VARIANT "Native support for semi-structured data"

Load the data as it is and transform it later. ELT approach

Extract & Load raw data Analyze / Parse Flatten

VARIANT

COPY INTO …

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query
Semi-structured data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query semi-structured data
{"courses": {
"snowflake": {
"module1": {
To access elements of VARIANT column use : "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
SELECT raw_column:courses "video lectures"
],
FROM variant_table "difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query semi-structured data
{"courses": {
"snowflake": {
"module1": {
To access elements of VARIANT column use : "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
SELECT $1:courses "video lectures"
],
FROM variant_table "difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query semi-structured data
{"courses": {
"snowflake": {
"module1": {
To access elements of VARIANT column use : "topic": "Introduction",
Use . to access subsequent elements "difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
"video lectures"
SELECT column_name:courses.snowflake ],
FROM variant_table "difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query semi-structured data
{"courses": {
To access elements of VARIANT column use : "snowflake": {
"module1": {
Use [] to access array elements "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
SELECT column_name: courses.azure.module1.formats[1] "formats": [
"video lectures"
FROM variant_table ],
"difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Semi-structured data
{"courses": {
To access elements of VARIANT column use : "snowflake": {
"module1": {
Use [] to access array elements "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
SELECT "video lectures"
column_name: courses.azure.module1.formats[1]::VARCHAR ],
"difficulty": "All levels"
FROM variant_table }
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Flatten hierarchical data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Flatten data
FLATTEN(INPUT => <expression> ) Used to convert semi-structured data into
relational table view

SELECT * FROM TABLE(FLATTEN(INPUT => [2,4,6])) Cannot be used in COPY command!

Produces a lateral view:

Contains references to other


tables in FROM clause

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unstructured data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unstructured data support
What is unstructured data?

▪ Does not fit into any pre-defined data model


o video files

o audio files

o documents

Snowflake supports:
Internal & External Stages

o Access files through URL in cloud storage.


o Share file access URLs.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Unstructured data support
Stage
URL

Scoped URL File URL Pre-signed URL

Encoded URL with temporary


Permits prolonged access to HTTPS URL used to access a
access to a file
a specified file file via a web browser
(no access to the stage)

Expires when the persisted query


The expiration time for the access
result period ends (i.e. results Does not expire
token is configurable.
cache expires) - currently 24 hours.

returned by calling returned by calling returned by calling


the BUILD_SCOPED_FILE_URL function the BUILD_STAGE_FILE_URL function the GET_PRESIGNED_URL function

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


SQL File Functions
URL

Scoped URL File URL Pre-signed URL

returned by calling returned by calling returned by calling


the BUILD_SCOPED_FILE_URL function the BUILD_STAGE_FILE_URL function the GET_PRESIGNED_URL function

SELECT BUILD_SCOPED_FILE_URL (@stage_azure,'Logo.png')

SELECT BUILD_STAGE_FILE_URL (@stage_azure,'Logo.png')

SELECT GET_PRESIGNED_URL (@stage_azure,'Logo.png',60)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Directory Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Directory Tables
What is a directory table?

Stores metadata of staged files

o Layered on a stage

o Can be queried with sufficient privileges (on stage)

o Retrieve file URLs to access files

Needs to be enabled for stages

CREATE STAGE stage_azure ALTER STAGE stage_azure


URL = <'url'> DEFAULT SET DIRECTORY = (ENABLE = TRUE)
STORAGE_INTEGRATION = integration FALSE
DIRECTORY = (ENABLE = TRUE)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Directory Tables
CREATE STAGE stage_azure Scoped URL
URL = <'url'>
BUILD_SCOPED_FILE_URL function
STORAGE_INTEGRATION = integration
DIRECTORY = (ENABLE = TRUE)

SELECT * FROM DIRECTORY(@stage_azure)

ALTER STAGE stage_azure REFRESH;

Manual refresh
Automatical refresh
using event notification

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sampling

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sampling

10 TB
Why Sampling?
SAMPLE

500 GB

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sampling

Why Sampling?

- Use-cases: Query development, data analysis etc.

- Faster & more cost efficient (less compute resources)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sampling Methods

SELECT * FROM table


SAMPLE ROW (<p>) SEED(15)
ROW or BERNOULLI method

Percentage of rows
Reproducable results
BLOCK or SYSTEM method

SELECT * FROM table


SAMPLE SYSTEM (<p>) SEED(15)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sampling Methods
ROW or BERNOULLI method BLOCK or SYSTEM method

Every row is chosen with percentage p Every block is chosen with percentage p

More "randomness" More effective processing

Smaller tables Larger tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Tasks

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Tasks
Used to schedule execution of SQL statement / stored procedures

Often combined with streams to set up continuous ETL workflows

EXECUTE MANAGED TASK


Account
CREATE TASK my_task
WAREHOUSE = my_wh Snowflake-managed compute CREATE TASK
Schema
SCHEDULE = '15 MINUTE'
Run using privilges of task OWNER
AS
USAGE Warehouse
INSERT INTO my_table(time_col) VALUES(CURRENT_TIMESTAMP);

Schema-level object
ALTER TASK my_task RESUME;
Can be cloned ALTER TASK my_task SUSPEND;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Directed Acyclic Graph (DAG)
Root task

Directed Acyclic Graph (DAG)

Limited to
• 1000 tasks in total
Task A Task B • 100 child tasks

Task C Task D Task E Task F

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Tree of Tasks
Root task
Limited to
• 1000 tasks in total
• 100 child tasks

Task A Task B
CREATE TASK my_task
WAREHOUSE = my_wh
AFTER my_task_a
AS
…;

Task C Task D Task E Task F

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams
Record (DML-)changes made to a table

Sales data ETL

HR data
Data scources
Schema-level object

Can be cloned

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams

Table
Stream object
DELETE
INSERT
UPDATE

Records (DML-)changes made to a table


This process is called change data capture (CDC)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams

Table
Stream object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams

Table
Stream object

METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams

CREATE STREAM my_stream Create Stream


ON TABLE my_table

SELECT * FROM my_stream We can query from stream

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams
Table
Stream object

Consuming a stream

INSERT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams
Table
Stream object

Consuming a stream

Empties records

INSERT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Streams

Sales data
ETL

HR data
Data scources

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Types of streams
STANDARD APPEND-ONLY INSERT-ONLY

✓ INSERT ✓ INSERT ✓ INSERT

✓ UPDATE
✓ DELETE

Tables Standard tables External tables only

Directory tables Directory tables

Views Views

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Staleness
Stream becomes stale when offset is outside the
data retention period of source table DATA_RETENTION_TIME_IN_DAYS

Unconsumed change records won't be accessible anymore.

How frequently stream should be consumed

DESCRIBE STREAM or SHOW STREAMS command

Column indicating when the stream is predicted to become stale STALE_AFTER

Stream extends retention to 14 days (default). DEFAULT=14


Regardless of Snowflake edition MAX_DATA_EXTENSION_TIME_IN_DAYS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Types of streams
STANDARD APPEND-ONLY

✓ INSERT ✓ INSERT

✓ UPDATE

✓ DELETE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Combining Streams & Tasks

CREATE TASK my_task


WAREHOUSE = my_wh
SCHEDULE = '15 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('MY_STREAM')
AS
INSERT INTO my_table(time_col) VALUES(CURRENT_TIMESTAMP);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Additional Tools,
Drivers &
Connectors
Domain 1.0:
Snowflake Data Cloud Features & Architecture
Transformations &
Functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Connectors & Drivers
WebUI Drivers Connectors

Snowsight Go
Pyhton
JDBC
Kafka
Command Line Tool .NET
Spark
Node.js
SnowSQL
ODBC
PHP PDO

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Partner Connect
Create trial account and integrate them with Snowflake

Data Integration Security


ML & Data Sciene CI/CD Business Intelligence
& Governance

Informatica Talend Dataiku SqlDBM Immuta Sisense

Matillion Stitch Alteryx DataOps Hunters Domo

Qlik DataRobot Alation Sigma

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
Extention to Snowflake SQL with added support for procedural logic

Snowflake Scripting

variables
cursors
if/case
Stored Procedures Procedual Code resultsets
loops
Outside of
Most commonly
stored procedures

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
Extention to Snowflake SQL with added support for procedural logic

Snowflake Scripting

IF/ELSE
if/case
CASE

FOR REPEAT
loops
WHILE LOOP

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
How to write procedural code in a block

DECLARE <variable & cursor declaration>



sections BEGIN <Scripting and SQL statements>
… block
EXCEPTION <Handling exceptions>

END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
How to write procedural code in a block

DECLARE <variable & cursor declaration>



optional BEGIN <Scripting and SQL statements>
… block
EXCEPTION <Handling exceptions>

END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
How to write procedural code in a block

DECLARE <variable & curso declaration>



optional BEGIN <Scripting and SQL statements>
… block
EXCEPTION <Handling exceptions>

END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
How to write procedural code in a block

Object created in block can be


used outside the block

BEGIN
Minimizing confusion CREATE TABLE employee (id INTEGER,…);
CREATE TABLE store (id INTEGER, …); block
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
How to write procedural code in a block

variables created in block can


CREATE PROCEDURE calc_area() be used only inside the block
RETURNS float
LANGUAGE SQL
AS
DECLARE Snowsight
length_a float;
area float;
BEGIN
length_a := 4;
area := length_a * length_a;
RETURN area
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowflake Scripting
How to write procedural code in a block

variables created in block can


CREATE PROCEDURE calc_area() be used only inside the block
RETURNS float
LANGUAGE SQL
AS
$$ Classic UI
DECLARE
length_a float;
area float; SnowSQL
BEGIN
length_a := 4;
area := length_a * length_a;
RETURN area
END;
$$

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpark

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpark
Snowpark API provides support for three programming languages

Python Code Build application and query data outside the system
No need to move data!
Converts to SQL

Snowflake Process at scale with serverless Snowflake engine

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpark
Snowpark API provides support for three programming languages

Python Java Scala

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Snowpark

Lazy evaluation Expression is not evaluated until it is needed

Pushdown Code is pushed down to Snowflake and executed there

UDFs inline Created functions can be executed in UDFs

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Protection
Domain 6.0:
Data Protection and Data Sharing
Time Travel

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Protection Lifecycle
Access and query data etc.
Current
Data Storage SELECT * FROM table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Protection Lifecycle
Drop dabase or table accidentally? Truncate or update table accidentally?
Current
Data Storage
DROP DATABASE prod_db; TRUNCATE TABLE prod_table;

Time Travel enables accessing historical data.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Time Travel

What is possible with Time Travel?


- Query deleted or updated data

- Restore tables, schemas and databases that have been dropped

- Create clones of tables, schemas and and databases from previous state

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Time Travel SQL
Query historic data within retention period.

SELECT * FROM table AT (TIMESTAMP => timestamp) 1 TIMESTAMP

SELECT * FROM table AT (OFFSET => -10*60)


2 OFFSET

SELECT * FROM table BEFORE (STATEMENT => query_id)


3 QUERY

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Time Travel SQL
Recover objects that have been dropped within retention period.

UNDROP TABLE table_name; 1 Table

UNDROP SCHEMA schema_name;


2 Schema

UNDROP DATABASE database_name;


3 Database

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Considerations
UNDROP fails if an object with the
same name already exists.

OWNERSHIP privileges are needed


for an object to be restored.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Retention period

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Protection Lifecycle

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Retention period
Number of days for which this historical data is preserved
and Time Travel can be applied.

Configurable for
table, schema, DATA_RETENTION_TIME_IN_DAYS = 2 DEFAULT = 1
database and account
For all accounts

Retention period of 0 "disables" time travel.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Retention period
Create table that overwrites
default retention period Alter table's retention period.

CREATE TABLE table_name (


ALTER TABLE table_name (
column1 int,
SET DATA_RETENTION_TIME_IN_DAYS = 0;
column2 varchar)
DATA_RETENTION_TIME_IN_DAYS = 0;

ALTER ACCOUNT SET ALTER ACCOUNT SET


DATA_RETENTION_TIME_IN_DAYS = 2; MIN_DATA_RETENTION_TIME_IN_DAYS = 2;

Alter account's Set minimum


default retention period retention period

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Retention period

Standard Enterprise Business Critical Virtual Private

Time travel up to 1 day Time travel up to 90 days Time travel up to 90 days Time travel up to 90 days

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Fail Safe

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Continuous Data Protection Lifecycle

✓ SELECT … AT | BEFORE ✓ Access and query data etc.


✓ UNDROP

Current
Disaster? Time Travel Data Storage

0 – 90 days Currently available

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Continuous Data Protection Lifecycle
✓ Recovery beyond Time Travel
✓ Non-configurable ✓ SELECT … AT | BEFORE ✓ Access and query data etc.

✓ No user operations/queries ✓ UNDROP

✓ Restoring only by snowflake support

Current
Fail Safe Time Travel Data Storage

permanent: 7 days
(transient: 0 days)
0 – 90 days Currently available

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Fail Safe

✓ Protection of historical data in case of disaster

✓ No user interaction & recoverable only by Snowflake

✓ Non-configurable 7-day period for permanent tables

✓ Period starts immediately after Time Travel period ends

✓ Contributes to storage cost

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Table Types

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Table types
Only for data that does
Permanent data Non-permanent data
not need to be protected

Permanent Transient Temporary

CREATE TABLE CREATE TRANSIENT TABLE CREATE TEMPORARY TABLE


Time Travel
0 – 90 days 0 or 1 day 0 or 1 day
Retention Period

Fail Safe ✓ Fail Safe × Fail Safe × Fail Safe

Auto-Drop Until dropped Until dropped Only in session

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Managing Storage Cost Table types
Large tables that does
Permanent data Non-permanent data
not need to be protected

Permanent Transient Temporary

CREATE TABLE CREATE TRANSIENT TABLE CREATE TEMPORARY TABLE


Time Travel
0 – 90 days 0 or 1 day 0 or 1 day
Retention Period

Fail Safe ✓ Fail Safe × Fail Safe × Fail Safe

Persistence Until dropped Until dropped Only in session

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Table types notes
Table Stages
✓ Types are also available for other database objects
If database is transient all included objects are transient. Schema Database

✓ For temporary table no naming conflicts with permanent/transient tables


Other tables will be effectively hidden! Relevant for time travel.

Not visible to other users!

✓ Not possible to change type of object for existing object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Zero-Copy Cloning

New table name

CREATE TABLE table_new


CLONE table_source
Cloning Syntax

Source for clone


Database Schema Table

Stream File Format Sequence


Cloning a database or schema will
clone all contained objects Stage Task Pipe
Named internal stages
not cloned Only for external stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Zero-Copy
Cloning
Domain 6.0:
Data Protection and Data Sharing
Zero-Copy Cloning

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Zero-Copy Cloning

✓ Create copies of a database, a schema or a table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Zero-Copy Cloning

✓ Create copies of a database, a schema or a table

Original
Cloud Service Layer

Snapshot metadata operation

Copy SELECT * FROM table_copy

Updates

New data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Zero-Copy Cloning
✓ Create copies of a database, a schema or a table

✓ Cloned object is independent from original table

✓ Easy to copy all metadata & improved storage management

✓ Creating backups for development purposes

✓ Typically combined with time travel

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


How about privileges?

inherited
inherited

NOT to database itself!

inherited

NOT to schema itself!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


How about privileges?

inherited
inherited

NOT to database itself!


Privileges will always only be inherited to
child objects never to source object itself

inherited

NOT to schema itself!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


What privileges are needed?
Table SELECT

Pipe

Stream OWNER
Task

All other
ojects USAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Additional Considerations
✓ Load history metadata is not copied
Loaded data can be loaded again

CREATE TABLE table_new


CLONE table_source Cloning from specific point in time is possible.
BEFORE (TIMESTAMP => 'timestamp')

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Zero-Copy Cloning

CREATE TABLE table_new


CLONE table_source Cloning from specific point in time is possible.
BEFORE (TIMESTAMP => 'timestamp')

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing

Usually this can be also a rather complicated process….

✓ Sharing with actually copying data

✓ Data is always up-to-date

✓ Compute paid by consumer

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing Standard Edition

Account 1
Account 1 Provider
Storage
Data is synchronized

Cloud Service Layer

Account 2
Account 2 Consumer
Compute Resources Read-only
Cannot be modified!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing Standard Edition

Read-only

Account 1
Account 1 Provider & Consumer
Storage

Cloud Service Layer

Account 2
Account 2 Provider & Consumer
Compute Resources Read-only

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Setting up share
1. Create share ACCOUNTADMIN role or CREATE SHARE privileges required

CREATE SHARE my_share;

2. Grant privileges to share

GRANT USAGE ON DATABASE my_db TO SHARE my_share;


GRANT USAGE ON SCHEMA my_schema.my_db TO SHARE my_share;
GRANT SELECT ON TABLE my_table.myschema.my_db TO SHARE my_share;

3. Add consumer account(s)

ALTER SHARE my_share ADD ACCOUNT bl67131;

4. Import share ACCOUNTADMIN role or IMPORT SHARE / CREATE DATABASE privileges required

CREATE DATABASE my_db FROM SHARE my_share;


5. Grant privileges
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
What can be shared
Tables External Tables Secure views Secure materialized views

Secure UDFs

Share Best practice Share

Database Schema Objects Account(s) Database Schema Secure views Account(s)

Privileges Privileges

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing with Non-Snowflake Users
Storage ACCOUNTADMIN

Provider Account
Compute
Resources Created
&
managed by

Non-snowflake users
Reader Account
Provider responsible
for all costs

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing Considerations
✓ Share becomes immediately visible once shared
New objects added immediately visible as well

✓ Each account can share and consume


Account 1
Even own share can be consumed Producer

✓ Virtual Private Edition doesn't allow sharing


Dedicated compute and metadata storage

✓ Marketplace: Find & Import third-party datasets


ACCOUNTADMIN role or IMPORT SHARE privileges required Account 2
Consumer
✓ Data Exchange: Private Hub for sharing data
Members can be invited
Needs to be enabled by reaching out to Snowflake support

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Sharing Considerations
✓ Share becomes immediately visible once shared
New objects added immediately visible as well

✓ Each account can share and consume ✓ Sharing across region / cloud provider must be enabled
Even own share can be consumed Done via replicating

✓ Virtual Private Edition doesn't allow sharing


Dedicated compute and metadata storage

✓ Marketplace: Find & Import third-party datasets


ACCOUNTADMIN role or IMPORT SHARE privileges required

✓ Data Exchange: Private Hub for sharing data


Members can be invited
Needs to be enabled by reaching out to Snowflake support

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Sharing with
Non Snowflake users
New Reader ✓ Indepentant instance with
Account own url & own compute resources

Share data ✓ Share database & table

Create Users ✓ As administrator create user & roles

Create database ✓ In reader account create database from share

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Database replication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Database replication
Replicates a database between accounts within the same organization.
Snowflake Standard feature

Must be enabled first

Provider Account 1
Primary Database
Region 1
Across cloud provider
Across regions Data and objects synchronized periodically
"Cross-region sharing"

Consumer Account 2 Secondary Database


Read-only
Region 2 (Replica)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Database replication
1. Enable replication for source account with ORGADMIN role

show organization accounts;

-- Enable replication for each source and target account in your organization
select system$global_account_set_parameter('<organization_name>.<account_name>',
'ENABLE_ACCOUNT_DATABASE_REPLICATION', 'true');

Step 2: Promote a Local Database to Primary Database with ACCOUNTADMIN role

ALTER DATABASE my_db1 ENABLE REPLICATION TO ACCOUNTS myorg.account2, myorg.account3;

Step 3: Create replica in consumer account

CREATE DATABASE my_db1 AS REPLICA OF myorg.account1.my_db1;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Database replication
Step 4: Refresh database

ALTER DATABASE my_db1 REFRESH;

Ownership privileges are needed

A task can be scheduled with this command

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Database replication
✓ Privileges must be granted separately (not inherited)

✓ All objects in database are replicated apart from:


Temporary tables Stages Pipes

✓ Data Transfer cost apply according to cloud provider

✓ Compute cost apply

✓ Data is actually physically replicated

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account &
Security
Domain 2.0:
Account Access and Security
Acess Control

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Access Control
Two aspects

Discretionary Access Control Role-based Access Control


(DAC) (RBAC)
Each object has an owner Privileges

Owner can grant access to that object Roles

Users

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Access Control
Key Concepts
Access can be granted
Securable object Access denied unless granted

Privilege Defined level of access

Entity to which privileges are granted


Role Will be assigned to users … or other roles

User Identity associated with person or program

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Access Control

Creates
Role Securable object
Owns
TO User 1

Privilege Role TO User 2

GRANT <privilege>
Privilege 1 User
ON <obeject> Privilege 2
TO <role> GRANT <role>
TO <user>

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles Hierarchy
Role
ACCOUNTADMIN

SECURITYADMIN SYSADMIN

USERADMIN Custom Role 1 Custom Role 2

Custom Role 3

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Objects in Snowflake

Securable object

USAGE

USAGE

SELECT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Objects in Snowflake
✓ Every object is owned by one single role OWNERSHIP privileges

• All privileges per default


• Including GRANT and REVOKE
Securable object • Active role in the session

✓ Ownership can be transferred

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
Role
ACCOUNTADMIN

SECURITYADMIN SYSADMIN

USERADMIN Custom Role 1 Custom Role 2

Custom Role 3

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles

GRANT
Privilege 1
Role REVOKE

GRANT <privilege> SELECT


ON <obeject>
TO <role> my_table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles

GRANT <role> ✓ Roles are assigned to users


TO <user>
User ✓ Multiple roles can be assigned

Role 1
Role 2

Privilege 1
Role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
Role 1 ✓ "Current role" in every session

User Role 2
⇒ Primary role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


System-defined Roles
ACCOUNTADMIN ✓ Can't be dropped

✓ Privileges can be added


SECURITYADMIN SYSADMIN but not revoked

USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
ACCOUNTADMIN ✓ Roles can be assigned to roles

✓ Hierarchy of roles
SECURITYADMIN SYSADMIN
✓ Privileges are inherited
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
ACCOUNTADMIN ✓ Custom roles

✓ Best practice:
SECURITYADMIN SYSADMIN Assigned to SYSADMIN

USERADMIN Custom Role 1 Custom Role 2

Custom Role 3

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
ORGADMIN
ORGADMIN
Manages actions on organizational level.
ACCOUNTADMIN

✓ Create accounts
SECURITYADMIN SYSADMIN
✓ View all accounts
✓ View account usage information
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
ACCOUNTADMIN
ORGADMIN
Top-level role
ACCOUNTADMIN

✓ Should be to limited number of users


SECURITYADMIN SYSADMIN
✓ Contains SECURITYADMIN & SYSADMIN
✓ Can manage all objects in account
USERADMIN
✓ Incl. share and reader accounts
✓ Modify account-level parameters
✓ Manage billing & resource monitors
PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
SECURITYADMIN
ORGADMIN
Manage any object grant globally
ACCOUNTADMIN

✓ MANAGE GRANTS privilege


SECURITYADMIN SYSADMIN
✓ Create, monitor, and manage users & roles
✓ Inherits USERADMIN privileges
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
SYSADMIN
ORGADMIN
Create warehouses, databases & other objects
ACCOUNTADMIN

✓ All custom roles should be assigned to


SECURITYADMIN SYSADMIN
✓ Can grant privileges on warehouses,
databases, and other objects
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
USERADMIN
ORGADMIN
Dedicated to user and role management
ACCOUNTADMIN

✓ CREATE USER & CREATE ROLE privileges


SECURITYADMIN SYSADMIN
✓ Can mange users and roles that are owned

USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
PUBLIC
ORGADMIN
Automatically granted per default
ACCOUNTADMIN

✓ Granted to when no access control needed


SECURITYADMIN SYSADMIN
✓ Objects can be owned but are available to
to everyone
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Roles
CUSTOM

ACCOUNTADMIN ✓ Created by USERADMIN or higher

✓ CREATE ROLE privilege


SECURITYADMIN SYSADMIN
✓ Should be assigned to SYSADMIN
Otherwise, SYSADMIN won't be able to
USERADMIN Custom Role 1 manage objects created by these roles

✓ Custom database roles


Custom Role 3 can be created by owner

PUBLIC Custom Role 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Privileges

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Define granular level of access

OWNERSHIP
global
GRANT MANAGE GRANTS
privilege
REVOKE

Privilege 1
Role
GRANT <privilege> SELECT
ON <obeject>
my_table
TO <role>
REVOKE <privilege>
ON <obeject>
FROM <role>

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Important privileges

CREATE SHARE Enables provider to create a share.

Global privileges IMPORT SHARE Enables to create a database.

APPLY MASKING POLICY Enables to set masking policies.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Important privileges

MODIFY Enables to alter properties of a warehouse – e.g. resizing.

Virtual Warehouse MONITOR Enables to view executed queries by the warehouse.

OPERATE Enables to change the state of a warehouse (e.g. suspend and resume).

USAGE Enables to use the warehouse and execute queries.

OWNERSHIP Full control over the warehouse.

ALL All privileges apart from OWNERSHIP.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Important privileges

MODIFY Enables to alter properties and settings of a database.

Databases MONITOR Enables to perform DESCRIBE command.

USAGE Enables to use the database and execute SHOW DATABASES command.

REFERENCE_USAGE Enables using an object (shared secure view) to reference another object in a different database.

ALL All privileges apart from OWNERSHIP.

OWNERSHIP Full control over the database.

CREATE SCHEMA Enable creating a schema in the database.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Important privileges

Enables to perform operations that require reading (e.g. GET, LIST, COPY INTO table)
READ
Stages from internal stages; not applicable to external stages

USAGE Enables to use an external stage; not applicable to internal stage.

Enables to perform writing to internal stage (PUT, REMOVE, COPY INTO location); not
WRITE applicable to external stages

ALL All privileges apart from OWNERSHIP.

OWNERSHIP Full control over the stage.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Priviliges
Important privileges

SELECT Using SELECT to query table.

Tables INSERT Inserting values into the table and manually reclustering tables.

UPDATE Using UPDATE command on a table.

TRUNCATE Using TRUNCATE command on a table.

DELETE Using DELETE command on a table.

ALL All privileges apart from OWNERSHIP.

OWNERSHIP Full control over the database.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-factor
authentication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Factor Authentication

Authentication is proving that you are who you say you are.

Multi-Factor Authentication provides additional login security. Standard Edition

Powered by Duo Security but managed by Snowflake. No sign-up,


only installation.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Factor Authentication
Strongly recommended for
ACCOUNTADMIN
Per default enabled for accounts but requires user to enroll.

SECURITYADMIN (or ACCOUNTADMIN) can disable MFA for user.

Fully-supported by web interface, SnowSQL, Snowflake ODBC


and JDBC, and Python Connectors.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Multi-Factor Authentication

MFA token caching can reduce the number of prompts during


authentication.

Needs to be enabled first.

MFA token is valid for up to 4 hours.

ODBC driver version 2.23.0 (or later).


JDBC driver version 3.12.16 (or later).
Python Connector for Snowflake version 2.3.7 (or later).

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Federated Authentication
(SSO)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Federated Authentication (SSO)
✓ Login
Enables users to use login via SSO. ✓ Logout
✓ Time out due to inactivity

Service provider External Identity provider Federated


(Snowflake) (IdP) environment

Maintaining credentials ✓ most SAML 2.0-compliant vendors are supported


as an IdP are supported
Authenticate users
✓ Native support for Okta and Microsoft AD FS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


SSO-Login Workflow
Snowflake-initiated

User navigates to Snowflake WebUI.

Choose login via configured IdP (e.g. Okta or AD FS)

Authenticate via IdP credentials (e.g. email and passoword)

External Identity provider Opens Snowflake


Snowflake
(IdP) SAML session
response

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


SSO-Login Workflow
IdP-initiated

User navigates to IdP

Authenticate with credentials

Select Snowflake as application

External Identity provider Opens Snowflake


Snowflake
(IdP) SAML session
response

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


SCIM support
Snowflake is compatible with SCIM 2.0

SCIM is an open standard for automating user provisioning.

Create user in IdP Snowflake


Provision
User

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Key Pair Authentication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Key Pair Authentication
Enhanced security as an alternative to basic username/password.

One or two

Public Key Public Key Private Key

Minimum:
Connecting via Snowflake
2048-bit RSA Clients (SnowSQL etc.)
key pair

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Key Pair Authentication
Enhanced security as an alternative to basic username/password.

1. Generate Private Key

2. Generate Public Key

3. Store the Keys locally

ALTER USER my_user SET


4. Assign public key to user
RSA_PUBLIC_KEY 'FGKdojfeFdD…';

5. Configure client to use


key pair authentication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Column-level security

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Column-level Security
Column-level security masks data in tables and views enforced on columns

Enterprise Edition
Dynamic Data Masking

Masking policy

Based on

Masked at runtime Role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Column-level Security
Column-level security masks data in tables and views enforced on columns

Define policy
Dynamic Data Masking
CREATE MASKING POLICY my_policy
AS (val varchar) returns varchar ->
Original column value CASE
WHEN current_role in (role_name) Masking value
THEN val
ELSE '##-##'
END;

Apply policy
ALTER TABLE my_table MODIFY COLUMN phone
SET MASKING POLICY my_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Column-level Security
Column-level security masks data in tables and views enforced on columns

Define policy
Dynamic Data Masking
CREATE MASKING POLICY my_policy
AS (val varchar) retruns varchar ->
Original column value CASE
WHEN current_role in (role_name) Masking value
THEN val
ELSE '##-##'
END;

Apply policy
ALTER TABLE my_table MODIFY COLUMN phone
UNSET MASKING POLICY my_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Column-level Security
Column-level security masks data in tables and views enforced on columns

Enterprise Edition
External Tokenization

✓ Analytical value preseved


Data is tokenized
✓ Sensitive information protected

Pre-load ✓ Data is tokenized pre-load and detokenized at query runtime

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Row-level security

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Row-Level Security
Supported through row access policies to determine which rows are returned

Enterprise Edition
Row Access Policies

Filtered at runtime

Condition

User Role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Row-Level Security
Define policy
CREATE ROW ACCESS POLICY my_policy
AS (column1 varchar) returns boolean ->
Signature column CASE
WHEN 'ROLE_NAME' = current_role()
and 'value1'= column1 THEN true
ELSE false
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Row-Level Security
Define policy
CREATE ROW ACCESS POLICY my_policy
AS (column1 varchar) returns boolean ->
Signature column CASE
WHEN 'ROLE_NAME' = current_role()
and 'value1'= column1 THEN true
ELSE false
END;

Apply policy
ALTER TABLE my_table ADD ROW POLICY my_policy
ON (column1);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Row-Level Security
Define policy
CREATE ROW ACCESS POLICY my_policy
AS (column1 varchar) returns boolean ->
Signature column CASE
WHEN 'ROLE_NAME' = current_role()
and 'value1'= column1 THEN true
ELSE false
END;

Apply policy

ALTER TABLE my_table DROP ROW POLICY my_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Allow to restrict access to account based on user IP addresses

Standard Edition

Allowed IP adddresses IP_address1, IP_address2, IP_address3,

Blocked IP addresses IP_address2


Has priority

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Allow to restrict access to account based on user IP addresses

Standard Edition

Allowed IP adddresses 192.168.1.0/24 (192.168.1.0 - 192.168.1.255)

Blocked IP addresses 192.168.1.75, 192.168.1.15


Has priority
CIDR notation

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Allow to restrict access to account based on user IP addresses

Standard Edition

Allowed IP adddresses 192.168.1.95, 192.168.1.113

Blocked IP addresses N/A


Has priority

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Create Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

CREATE NETWORK POLICY my_network_policy


ALLOWED_IP_LIST = ('192.168.1.95', '192.168.1.113');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Create Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

CREATE NETWORK POLICY my_network_policy


ALLOWED_IP_LIST = ('192.168.1.95', '192.168.1.113'),
BLOCKED_IP_LIST = ('192.168.1.95');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Apply Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

ALTER ACCOUNT SET NETWORK_POLICY = mynetwork_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Apply Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

ALTER ACCOUNT UNSET NETWORK_POLICY;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Network policies
Apply Network Policy to USER
OWNERSHIP of User & Network Policy

ALTER USER SET NETWORK_POLICY = mynetwork_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Encryption

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Encryption
All data is encrypted at rest and in transit
Standard Edition
Automatically by default
Encryption at rest

Enterprise Edition
Tables Internal Stages Re-keying every year If enabled
Key Rotation
AES 256-bit encryption

Snowflake-managed

Key Rotation every 30 days

Old keys will be destroyed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Data Encryption
All data is encrypted at rest and in transit
Standard Edition
Automatically by default
Data in Transit

WebUI SnowSQL JDBC ODBC Python Connector

TLS 1.2

End-to-End encryption

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


End-to-End Encryption
All data is encrypted at rest and in transit

PUT
Internal Stage

Encrypted on Encrypted in Encrypted Encrypted


user-machine the stage in Transit at rest

External Stage
Client-side
encryption

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Tri-Secret Secure
Enables customer to use own keys
Business Critical Edition
Snowflake Support

Customer- Snowflake-
Master Key
managed managed

E.g. Azure Key Vault Composite Key

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage &
Information Schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage and Information Schema

Query object metadata and historical usage data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage

Shared database

ACCOUNTADMIN
can view everything

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage

Object metadata

Historical usage data

Reader accounts

Object metadata

Historical usage data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage
Reader accounts

Long-term historical usage data

Object metadata

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage
Long-term historical usage data

Object metadata

Data provided is not real-time


45 min – 3 hours latency

Retention: 365 days

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage vs. Information Schema
Table functions Views

ACCOUNT_USAGE INFORMATION_SCHEMA Automatically created

Read-only
Long-term historical
Historical usage data
usage data
Output depends on privileges
Parent DB + account-level

Object metadata Object metadata

Not real-time
45 min – 3 hours latency Information schema query returned too much data.
Please repeat query with more selective predicates.

Retention: 365 days To prevent performance issues

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Account Usage vs. Information Schema

ACCOUNT_USAGE INFORMATION_SCHEMA Automatically created

Read-only
Long-term historical
Historical usage data
usage data
Output depends on privileges
Parent DB + account-level

Object metadata Object metadata

Not real-time
No latency
45 min – 3 hours latency
Does not includes dropped objects
Includes dropped objects

Shorter retention
Retention: 365 days
(7 days – 6 months)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Release Process

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Releases
How are releases deployed?

Full Releases New features Enhancements or Updates

Weekly new Fixes Behavior changes


releases

Seamless - No downtime Patch Releases Fixes

Impact on
Monthly
workload

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Releases: Three stage approach
Helps to monitor and react to issues

Day 1 Early Access Designated Enterprise (or higher) accounts

Enterprise Edition

Day 1 or 2 Regular Access Standard accounts

Day 2 Final Access Enterprise (or higher) accounts

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Performance
Concepts
Domain 3.0:
Performance Concepts
Query Profile & History

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

When to use?

Understand mechanics of a query

Performance and behavior of a query

Identify performance bottlenecks

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

Available for all queries (Completed, Failed, Running)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

Percentage Percentage of time this


operator needed

Nodes Building blocks

Operator Tree Graphical representation

Operator Types Aspect of query processing

Data Flow No. of records processed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query Profile
Provide execution details for a query

Overview Where time was spent

Bytes Scanned
Statistics
Scanned from Cache

Data Spilling

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


What is spilling?
Warehouse

S Data doesn't fit in memory

Performance Local Storage


Data spilled to local storage

Performance Remote Storage


Data spilled to remote cloud storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


How to avoid it?
Warehouse

S Reduce the amount of data processed

M Increase size of warehouse

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Query History
Query History in
Snowsight

QUERY_HISTORY
table function in select * from table(information_schema.query_history())
INFORMATION_SCHEMA order by start_time;

QUERY_HISTORY
view in select * from snowflake.account_usage.query_history
ACCOUNT_USAGE schema
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Caching

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Caching
Statistics for tables and columns

CLOUD SERVICES Metadata Cache Result Cache Cached copy of the


query result

Virtual Virtual Virtual


Warehouse Warehouse Warehouse
QUERY PROCESSING Virtual Warehouse Cache

(COMPUTE) Locally caches data of the query


Data Cache Local Disc I/O

STORAGE
Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Caching
Result Cache
Statistics for tables and columns
Stores the results of a query (Cloud Services)

CLOUD SERVICES Metadata Cache Result Cache Same queries can use that cache in the future
Table data has not changed
Micro-partitions have not changed
Query doesn't include UDFs or external functions
Virtual Virtual Virtual Sufficient privileges & results are still available
Warehouse Warehouse Warehouse
Very fast result (persisted query result)
QUERY PROCESSING Avoids re-execution
(COMPUTE) Can be disable by using
Data Cache
USE_CACHED_RESULT parameter

If query is not re-used purged after 24 hours

STORAGE If query is re-used can be stored up to 31 days

Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Caching Data Cache

Statistics for tables and columns


Local SSD cache

CLOUD SERVICES Metadata Cache Result Cache Cannot be shared with other warehouses

Improve performance of subsequent queries


that use the same data

Virtual Virtual Virtual Purged if warehouse is suspended or resized


Warehouse Warehouse Warehouse
QUERY PROCESSING Queries with similar data ⇒ same warehouse

(COMPUTE) Size depends on warehouse size

Data Cache

STORAGE
Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Caching Metadata Cache

Stores statistics and metadata about objects

CLOUD SERVICES Metadata Cache Result Cache Porperties for query optimization and processing
Range of values in micro-partition

Count rows, count distinct values, max/min value


Virtual Virtual Virtual
Warehouse Warehouse Warehouse Without using virtual warehouse

QUERY PROCESSING DESCRIBE + system-defined functions

(COMPUTE) "Metadata store"


Data Cache
Virtual Private Edition: Dedicated metadata store

STORAGE
Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Micro-partitions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Micro-partitions

CLOUD SERVICES

Virtual Virtual Virtual


Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

STORAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Micro-partitions
External Cloud Provider

STORAGE

Automatically performed
Can't be disabled
Hundreds of millions Allow very granular partition pruning

Can overlap! Very small Eliminate unnecessary partitions

Range of values Contain 50-500 MB of uncompressed data


Actual size is less ⇒ data is compressed automatically
No. of distinct values Most efficient compression algorithm found independently

Additional properties
for query optimization Data is stored in columnar format
Unnessary columns are eliminated when querying

Micro-partitions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Micro-partitions
External Cloud Provider

STORAGE

Immutable
Can't be changed

new data
=
new micro-partitions
CLUSTERING KEYS
With order in which
data is created

Micro-partitions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Clustering Keys

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys

CLUSTERING KEYS

Clustering table on a specific column


redistributes the data in the micro-partions.
⇒ Improves access to this column
⇒ Optimized partition pruning

Micro-partitions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount


2023-06-02 Sunglases TR-7 $25
2023-06-01 Chocolate bar 70% cacao $3
2023-06-02 Sunglases TR-7 $25
2023-06-03 Oat meal biscuits $4
2023-06-02 Chocolate bar 70% cacao $3
2023-06-03 Oat meal biscuits $4
2023-06-02 Oat meal biscuits $4
Micro-partitions 2023-06-05 Sunglases TR-7 $25

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
How is data stored in micro-partitions?

Row 1-4 Row 5-8

Micro-partition 1 Micro-partition 2
Sorted by Sales_Date Name Amount
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-02 Sunglases TR-7 $25
Sales_Date 2023-06-01 Chocolate bar 70% cacao $3
2023-06-02 2023-06-03 2023-06-02 2023-06-03
2023-06-02 Sunglases TR-7 $25
Chocolate bar Chocolate bar Oat meal 2023-06-03 Oat meal biscuits $4
Sunglases TR-7
70% cacao
Name 70% cacao biscuits 2023-06-02 Chocolate bar 70% cacao $3

Sunglases TR-7
Oat meal Oat meal
Sunglases TR-7
2023-06-03 Oat meal biscuits $4
biscuits biscuits
2023-06-02 Oat meal biscuits $4
2023-06-05 Sunglases TR-7 $25
$3 $25 $3 $4
Amount
$25 $4 $4 $25

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount


Row 1-4 Row 5-8 2023-06-02 Sunglases TR-7 $25
Micro-partition 1 Micro-partition 2 2023-06-01 Chocolate bar 70% cacao $3
Sorted by 2023-06-02 Sunglases TR-7 $25
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-03 Oat meal biscuits $4
Sales_Date
2023-06-02 2023-06-03 2023-06-02 2023-06-03 2023-06-02 Chocolate bar 70% cacao $3
2023-06-03 Oat meal biscuits $4
Chocolate bar
Sunglases TR-7
Chocolate bar Oat meal 2023-06-02 Oat meal biscuits $4
70% cacao
Name 70% cacao biscuits
2023-06-05 Sunglases TR-7 $25
Oat meal Oat meal
Sunglases TR-7 Sunglases TR-7
biscuits biscuits

SELECT * FROM SALES


$3 $25 $3 $4 WHERE SALES_DATE = '2023-06-01'
Amount
$25 $4 $4 $25

2023-06-01 - 2023-06-03 2023-06-02 - 2023-06-03

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount


Row 1-4 Row 5-8 2023-06-02 Sunglases TR-7 $25
Micro-partition 1 Micro-partition 2 2023-06-01 Chocolate bar 70% cacao $3
Sorted by 2023-06-02 Sunglases TR-7 $25
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-03 Oat meal biscuits $4
Sales_Date
2023-06-02 2023-06-03 2023-06-02 2023-06-03 2023-06-02 Chocolate bar 70% cacao $3
2023-06-03 Oat meal biscuits $4
Chocolate bar
Sunglases TR-7
Chocolate bar Oat meal 2023-06-02 Oat meal biscuits $4
70% cacao
Name 70% cacao biscuits
2023-06-05 Sunglases TR-7 $25
Oat meal Oat meal
Sunglases TR-7 Sunglases TR-7
biscuits biscuits

SELECT * FROM SALES


$3 $25 $3 $4 WHERE SALES_DATE = '2023-06-01'
Amount
$25 $4 $4 $25

2023-06-01 - 2023-06-03 2023-06-02 - 2023-06-03 Well-clustered


Natural order of data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount


Row 1-4 Row 5-8 2023-06-02 Sunglases TR-7 $25
Micro-partition 1 Micro-partition 2 2023-06-01 Chocolate bar 70% cacao $3
Sorted by 2023-06-02 Sunglases TR-7 $25
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-03 Oat meal biscuits $4
Sales_Date
2023-06-02 2023-06-03 2023-06-02 2023-06-03 2023-06-02 Chocolate bar 70% cacao $3
2023-06-03 Oat meal biscuits $4
Chocolate bar
Sunglases TR-7
Chocolate bar Oat meal 2023-06-02 Oat meal biscuits $4
70% cacao
Name 70% cacao biscuits
2023-06-05 Sunglases TR-7 $25
Oat meal Oat meal
Sunglases TR-7 Sunglases TR-7
biscuits biscuits

SELECT * FROM SALES


$3 $25 $3 $4 WHERE AMOUNT BETWEEN 3 AND 4
Amount
$25 $4 $4 $25

$3 - $25 $3 - $25 Not-clustered

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
How is data stored in micro-partitions?

Row 1-4 Row 5-8

Micro-partition 1 Micro-partition 2
Metadata stored

2023-06-01 2023-06-01 2023-06-02 2023-06-02 Number of micropartions in table


Sales_Date
2023-06-02 2023-06-03 2023-06-02 2023-06-03
Overlapping micro-partitions
Chocolate bar Chocolate bar Oat meal
Sunglases TR-7
Name 70% cacao 70% cacao biscuits Clustering depth
Oat meal Oat meal
Sunglases TR-7 Sunglases TR-7
biscuits biscuits

$3 $25 $3 $4
Amount
$25 $4 $4 $25

$3 - $25 $3 - $25

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions

Number of partitions that overlap Micro-partition 1


A - F Micro-partition 1 A - D

Micro-partition 2
Clustering depth A - F Micro-partition 2 A - C

Average depth of the overlapping micro- B - L Micro-partition 3


G-H Micro-partition 3
partitions for specific column.

In how many micro-partions value occurs.


Overlapping micro-partitions: 3 Overlapping micro-partitions: 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions Better but not ideal


Number of partitions that overlap. Micro-partition 1
A - F Micro-partition 1 A - C

Micro-partition 2
Clustering depth A - F Micro-partition 2 A - C

In how many micro-partions value occurs. A - F Micro-partition 3 D - F Micro-partition 3

Overlapping micro-partitions: 3 Overlapping micro-partitions: 2

Average Depth: 3 Average Depth: 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions Better but not ideal


Number of partitions that overlap. Micro-partition 1
A - F Micro-partition 1 A - C

B - D Micro-partition 2
Clustering depth A - F Micro-partition 2

In how many micro-partions value occurs. A - F Micro-partition 3 D - F Micro-partition 3

Overlapping micro-partitions: 3 Overlapping micro-partitions: 3

Average Depth: 3 Average Depth: 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions Worst Ideal


Constant state
Number of partitions that overlap.
A - F Micro-partition 1

A - B C - D F
Clustering depth A - F Micro-partition 2

Micro-partition 1 Micro-partition 2 Micro-partition 3


In how many micro-partions value occurs. A - F Micro-partition 3

Overlapping micro-partitions: 3 Overlapping micro-partitions: 0

Average Depth: 3 Average Depth: 1

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
Similar rows in similar micro-partitions
Defining Clustering Keys Worst Ideal
Constant state
More beneficial distribution of rows
in micropartions A - F Micro-partition 1

A - B C - D F
Improved Query Performance A - F Micro-partition 2

Better Scan Efficiency Micro-partition 1 Micro-partition 2 Micro-partition 3


A - F Micro-partition 3
Better Column Compression
Especially when columns are similar

Overlapping micro-partitions: 3 Overlapping micro-partitions: 0


No Future Maintenance
Fully managed by Snowflake Average Depth: 3 Average Depth: 1

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Defining Clustering Keys

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Reclustering
Only after periodic reclustering
Defining Clustering Key
Rows are not always updated immediately
Before After

A - F Micro-partition 1

A - B C - D F
A - F Micro-partition 2

Micro-partition 1 Micro-partition 2 Micro-partition 3


A - F Micro-partition 3

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Reclustering
Defining Clustering Key
Clustering Key is used for reclustering
Only after periodic reclustering
Reclustering
Rows are not always updated immediately
Before After

A - F Micro-partition 1
Reclustering is automatic
Cloud Services (Serverless) A - B C - D F
A - F Micro-partition 2

Micro-partition 1 Micro-partition 2 Micro-partition 3

Automatic Reclustering A - F Micro-partition 3

Only adjusts micro-partitions that benefit

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
Clustering is not for all tables

Query performance Cost

Storage Costs

Old partitions are maintained (Time Travel)

Micro-partitions

New partitions are created


Serverless Costs

Credit consumption of reclustering Old partitions are marked as deleted

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
On which columns a clustering key is most effective?

Large number of micro-partitions High enough cardinality High enough cardinality

Very large tables Too high cardinality ⇒ no efficient grouping


Too low cardinality ⇒ no effective pruning
Multiple terabytes of data.

Non-ideal pruning Overhead for micro-partitioning


SELECT * FROM SALES
WHERE AMOUNT BETWEEN 3 AND 4 M / F

Frequently used column in WHERE / JOIN /( ORDER BY) M / F

Selective queries and sorting of columns M / F


⇒ most performance improvement

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Clustering Keys
Clustering Key SQL commands

Adding cluster key on one or multiple colums Expression

Cluster key can be added at any time. Clustering key on expression.

Low ⇒ High Cardinality


ALTER TABLE t1 CLUSTER BY (c1, c5); ALTER TABLE t1 CLUSTER BY (DATE(timestamp));

Create table with cluster key Removing clustering key


Cluster key can be added at any time.
Cluster key can be defined in table definition. Cluster key can be removed.

CREATE TABLE t1 CLUSTER BY (c1, c5); ALTER TABLE t1 DROP CLUSTER KEY;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


System functions
for Clustering Keys

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


System Functions on Clustering

Information on Clustering Depth


Find out more clustering information.

SYSTEM$CLUSTERING_INFORMATION ('table_name', ['(columns/expression)'])

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Returning clustering information
+--------------------------------------------------------------+
| SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)') |
|--------------------------------------------------------------|
| { |
| "cluster_by_keys" : "(COL1, COL3)", |
| "total_partition_count" : 1156, |
| "total_constant_partition_count" : 0, |
| "average_overlaps" : 117.5484, |
SYSTEM$CLUSTERING_INFORMATION ('table_name','(col1,col3)') | "average_depth" : 64.0701, |
| "partition_depth_histogram" : { |
| "00000" : 0, |
| "00001" : 0, |
| "00002" : 3, |
| "00003" : 3, |
| "00004" : 4, |
|
|
"00005" : 6,
"00006" : 3, Not well partitioned! |
|
| "00007" : 5, |
| "00008" : 10, |
| "00009" : 5, |
| "00010" : 7, |
SYSTEM$CLUSTERING_INFORMATION ('table_name') | "00011" : 6, |
| "00012" : 8, |
| "00013" : 8, |
| "00014" : 9, |
| "00015" : 8, |
| "00016" : 6, |
| "00032" : 98, |
| "00064" : 269, |
| "00128" : 698 |
| } |
| } |
+--------------------------------------------------------------+

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Returning clustering information
+--------------------------------------------------------------+
| SYSTEM$CLUSTERING_INFORMATION('TEST2', '('CUSTOMER','(C_NAME)')
|---------------------------------------------------------------
| {
| "cluster_by_keys" : "LINEAR(C_NAME)",
| "notes" : "Clustering key columns contain high cardinality key C_NAME which
| might result in expensive re-clustering. Consider reducing the cardinality of
| clustering keys. Please refer to https://docs.snowflake.net/manuals/user -
| guide/tables-clustering-keys.html for more information.",
| "total_partition_count" : 16,
| "total_constant_partition_count" : 16,
| "average_overlaps" : 0.0,
| "average_depth" : 1.0,
SYSTEM$CLUSTERING_INFORMATION ('CUSTOMER','(C_NAME)') | "partition_depth_histogram" : {
| "00000" : 0,
| "00001" : 16,
| "00002" : 0,
| "00003" : 0,
| "00004" : 0,
| "00005" : 0,
| "00006" : 0,
| "00007" : 0,
| "00008" : 0,
| "00009" : 0,
| "00010" : 0,
| "00011" : 0,
| "00012" : 0,
| "00013" : 0,
| "00014" : 0,
| "00015" : 0,
| "00016" : 0
| }
|}
+--------------------------------------------------------------+

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Returning clustering information

Average Depth of Table


Average Depth of Table according to
speficied column or clustering key.

SELECT SYSTEM$CLUSTERING_DEPTH ('orders','amount')

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization Service

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization Enterprise Edition

Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to column


Beneficial queries

Selective point look-up Returns on one or very few rows

Equality predicates (=) or IN predicates WHERE AMOUNT = 1

Similar to secondary index concept Substring and regular expression searches e.g. LIKE or ILIKE, VARIANT column

Selective geospatial functions with GEOGRAPHY values

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path


Maintained by Search Optimization Service

Serverless Costs Storage Costs

Credit consumption Additional Storage needed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to table

ALTER TABLE mytable ADD SEARCH OPTIMIZATION;

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

ALTER TABLE mytable DROP SEARCH OPTIMIZATION;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to table

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY (*);

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to column

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON GEO(mycol);

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to column

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON GEO(mycol);

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Materialized Views

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Materialized Views Enterprise Edition

Method to handle performance issues of views

Frequently run query SELECT …;

Serverless Costs Storage Costs

Credit consumption Additional Storage needed


View Compute-intensive?

Queries that are…


Materialized View Pre-computed and physically stored Performance
… frequently run
Updated automatically … sufficiently complex
Base table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Materialized Views
Method to handle performance issues of views

Start slow
Serverless Costs Storage Costs

Credit consumption Additional Storage needed Resource monitors can't control Snowflake-managed warehouses

SELECT * FROM TABLE(INFORMATION_SCHEMA.MATERIALIZED_VIEW_REFRESH_HISTORY());

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Materialized Views
Create Materialized View
Use CREATE MATERIALIZED VIEW statement

CREATE MATERIALIZED VIEW v_1 AS


SELECT * FROM table1 where c1 = 200

Limitations
• Query only 1 table (no joins)

• No views / materialized views Can be created on external tables

• No window functions, UDFs, HAVING

• Some aggregate functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Materialized Views
Create Materialized View
Use CREATE MATERIALIZED VIEW statement

CREATE MATERIALIZED VIEW v_1 AS


SELECT * FROM table1 where c1 = 200

ALTER MATERIALIZED VIEW v_1 SUSPEND;


ALTER MATERIALIZED VIEW v_1 RESUME;

DROP MATERIALIZED VIEW v_1;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Warehouse
Considerations

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273


Warehouse Considerations
• Warehouses can be resized even when query is running or when suspended
Resizing
⇒ Impact only future queries, not on the running one

• Scale up (resize): More complex queries


Scale up vs. Scale out
• Scale out: More users (more queries)

• Isolate workload of specific user


Dedicated warehouse • Different type of workload ⇒ different warehouse
• Enable auto-suspend and auto-resume (available for all warehouses)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

You might also like