0% found this document useful (0 votes)

10 views446 pages

Snowflake Question

The document provides an overview of the SnowPro Core Certification, including exam topics, preparation strategies, and Snowflake's architecture. It outlines the exam structure, passing score, and various Snowflake editions with their features and pricing. Additionally, it discusses Snowflake's pricing model for compute, storage, and data transfer, emphasizing the flexibility and scalability of the platform.

Uploaded by

xfcyriklovmwmbwikt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views446 pages

Snowflake Question

Uploaded by

xfcyriklovmwmbwikt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 446

Not provided by, affiliated with, or sponsored by Snowflake Inc.

INTRODUCTION
Section 0:
About the exam & course setup
About the
SNOWPRO CORE
CERTIFICATION
The Definite Preparation Course
✓ Impactful way to advance career
Why getting certified? ✓ Positioning as an expert
✓ Future proof + great job opportunities.

✓ SnowPro Core Certification

What is covered? ✓ https://www.snowflake.com/certifications/
✓ Topics are always kept up-to-date.

✓ Not needed for the exam.

Demos ✓ Help with memorizing.
✓ Give you practical foundation.

✓ Clear exam with ease.

Goal
✓ Knowledge for working with Snowflake.

✓ 750 / 1000
Passing Score ✓ Goal: Achieve a score of 900+
How to
Master the exam
Exam Topics
DOMAIN WEIGHT
1.0 Snowflake Data Cloud Features & Architecture 25%

2.0 Account Access and Security 20%

3.0 Performance Concepts 15%

4.0 Data Loading and Unloading 10%

5.0 Data Transformations 20%

6.0 Data Proctection and Data Sharing 10%

Master the exam
Not needed for the exam.
Free Trial Account Help with memorizing.
Give you practical knowledge.

Exam Overview https://www.snowflake.com/certifications/

Exam Duration ❑ Time: 115min

❑ 100 questions ❑ Multiple Select, Multiple Choice

Exam Questions What are the features about? How do they work?

What is the mechanism to eliminate micro-partions

What is not a system role in Snowflake? during query runtime?
❑ SECURITYADMIN ✓
❑ Pruning
✓
❑ NETWORKADMIN ❑ Caching
❑ ORGADMIN ❑ Clustering
❑ ACCOUNTADMIN ❑ Flattening
Recipe to clear the exam
Step-by-step incl. Demos
Lectures ~ 30-60 min / day

Quizzes Practice and test your knowledge

Slides Repeat and go through important points

Evaluate knowledge & weaknesses

Practice Test Eliminate weaknesses

Book Exam
Confident & Prepared
Final Tips

Resources

Q&A Section

Reviews

Connect & Congratulate

Snowflake
Architecture
Domain 1.0:
Data Cloud Features & Architecture
What is Snowflake?

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

What is Snowflake?

T
1 2 3
3

Self-managed Cloud Data Platform

Self-managed service

73273
1760 0009-14563.7
No need to select,
1 No Hardware install, configure, or
manage.

1250 003-77156.8
No software needs to be
2 No Software installed, configured,
or managed.

Transparent Weekly releases.

003-1040559
No downtime.
3 No Maintenance Early access for Enterprise
Edition accounts on request.
Cloud

73273
Completely cloud-native.

1760 0009-14563.7
From scratch built for
1 Designed for Cloud the cloud.

1250 003-77156.8
All components run
completely in the cloud.
2 Runs in the Cloud Cannot be installed on-
premise.

003-1040559
Storage and compute
3 Cloud optimized scale independently and
elastically.
Data Platform

73273
One single platform around data.

1760 0009-14563.7
Modern Data Warehouse
1 Data Warehouse with advanced features
and performance.

1250 003-77156.8
Stores structured, semi-
2 Data Lake structured and unstructured
data.

003-1040559
Connect ML tools.
Run ML models in Snowflake.
3 Data Science Language of choice with
Snowpark.
What is Snowflake?

T 1 2 3
3
Self-managed Cloud Data Platform
No Software, Designed for Cloud, Data Warehousing,
No Hardware, Runs in the Cloud, Data Engineering,
No Maintenance. Optimized for Cloud. Data Applications.
Multi-cluster shared-disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Traditional architectures

T
1

shared-disk shared-nothing
Processor
Central data Accessible from all Each node is
memory
storage compute nodes independent
disk
Traditional architectures

PROs PROs
Simplicity Scalability

Data management High availability

CONs CONs

Limited scalability Expensive

shared-disk Network bottleneck shared-nothing Management more difficult

Single point of failure

Traditional architectures
Multi cluster
shared-data

1 Central data repository

Massive parallel
shared-disk shared-nothing 2 processing compute
clusters
–
1 2 Each node stores a
portion of the data
locally
Traditional architectures
Multi cluster
PROs shared-data

Data management simplicity shared-disk

1 Central data repository

performance Massive parallel

shared-nothing
scale-out benefits
2 processing compute
clusters
–
each node stores a
portion of the data
locally
Three distinct layers

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Three distinct layers
CLOUD SERVICES

Virtual Virtual Virtual

Warehouse Warehouse Warehouse

QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Three distinct layers
- Compressed Columnar Storage -

Decoupled Compute & Storage

Blob Storage (AWS, Azure, GCP)

Snowflake manages all aspects about storage

Optimized for OLAP / analytical purposes

DATABASE STORAGE
Three distinct layers

Virtual Virtual Virtual

Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Three distinct layers
- "Muscle of the system" -
Queries are processed using Virtual warehouses

Warehouse = MPP compute cluster

(multiple compute nodes)

Provides resources: CPU, memory, and temporary storage

Virtual Virtual Virtual

Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)
Three distinct layers
CLOUD SERVICES

Virtual Virtual Virtual

Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Three distinct layers
CLOUD SERVICES

- "Brain of the system" - Collection of services to

coordinate & manage the components
✓ Authentication
Also run on compute instances
✓ Access control of cloud provider
✓ Metadata management
✓ Query parsing and optimization
✓ Infrastructure management
Three distinct layers
CLOUD SERVICES

Virtual Virtual Virtual

Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

DATABASE STORAGE
Snowflake Editions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Additional
features/service

Additional
features/service
Additional
features/service
Virtual Private
Highest level of
Business Critical security
Even higher data
protection for
Enterprise organizations with
extremely
Additional features for sensitive data

Standard the needs of large-

scale enterprises

Features
Introductory level

Pricing
Snowflake Editions

Standard Enterprise Business Critical Virtual Private

✓ Complete DWH ✓ All Standard features ✓ All Enterprise features ✓ All Business Critical features
✓ Automatic data encryption ✓ Multi-cluster warehouse ✓ Additional security ✓ Dedicated virtual servers
✓ Broad support for standard ✓ Time travel up to 90 days features such as and completely seperate
and special data types Materialized views customer-managed
✓ Snowflake environment
✓ Time travel up to 1 day
✓ Search Optimization encryption ✓ Dedicated metadata store
✓ Disaster recovery for 7 days
✓ Column-level security ✓ Support for data specific ⇒ Isolated from all other
beyond time travel
✓ 24-hour early access to regulation Snowflake accounts
✓ Network policies
weekly new releases ✓ Database
✓ Secure data share
✓ Federated authentication & failover/failback

SSO (disaster recovery)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
✓ Premier support 24/7
Compute Cost

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Pricing
Storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Compute Data Transfer Storage

✓ Compute and Storage costs decoupled

✓ Pay only what you need
✓ Scalable and at affordable cloud price
✓ Pricing depending on the region/cloud provider

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Compute Storage

Standard
(Active) Warehouses Query Processing

Behind-the-scenes Only charged if exceeds

Cloud Services cloud service tasks 10% of warehouse consumption

Search Optimization
Serverless Snowpipe
Automatically resized
Snowflake Pricing

Compute

✓ Charged for active warehouses per hour

✓ Billed by second (minimum of 1min)
✓ Depending on the size of the warehouse

Time / active warehouses / size

✓ Charged in Snowflake credits

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Compute Storage

Credits $/€
Consumed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Virtual Warehouse Sizes

XS 1
L 8
S 2
XL 16
M 4

4XL 128

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Standard Enterprise Business Critical Virtual Private

✓ $2 / Credit ✓ $3 / Credit ✓ $4 / Credit ✓ Contact Snowflake

Region: US East (Ohio)

Platform: AWS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Standard Enterprise Business Critical Virtual Private

✓ $2.70 / Credit ✓ $4 / Credit ✓ $5.40 / Credit ✓ Contact Snowflake

Region: EU (Frankfurt)
Platform: AWS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Storage & Data Transfer

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Compute Data Transfer Storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

$40
per TB / per month Storage

Region: US East (Ohio)

Platform: AWS ✓ Monthly storage fees
✓ Based on average storage used per month
✓ Cloud Providers
✓ Cost calculated after compression

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Storage

On Demand Capacity
Storage Storage

✓ Pay only for what you use ✓ Pay only for defined capacity upfront

✓ $40/TB ✓ $23/TB

Region: US East (Northern Virginia)

Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing

Storage

On Demand Capacity
Storage Storage

✓ Pay only for what you use ✓ Pay only for defined capacity upfront

✓ $45/TB ✓ $24.50/TB

Region: EU (Frankfurt)
Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing

On Demand Capacity
Storage ✓ We think we need 1 TB of storage Storage

❖ Scenario 1: 100GB of storage used ❖ Scenario 1: 100GB of storage used

0.1 TB x $40 = $4 1 TB x $23 = $23

❖ Scenario 2: 800GB of storage used ❖ Scenario 2: 800GB of storage used

0.8 TB x $40 = $32 1 TB x $23 = $23

Region: US East (Northern Virginia)

Platform: AWS
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Snowflake Pricing

On Demand Capacity
Storage Storage

✓ Start with On Demand

✓ Once you are sure about your usage use
Capacity storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Compute Data Transfer Storage

✓ Data ingress FREE

✓ Data egress CHARGED
✓ Cloud Storage Provider

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Pricing

Data Transfer

✓ Data ingress FREE

Transfer/replicate data to different account
✓ Data egress CHARGED
(in different region and/or cloud provider)
✓ Cloud Storage Provider

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Storage Monitoring

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Individual Table Storage

SHOW TABLES; Statistics for table storage and properties

TABLE_STORAGE_METRICS
view in
INFORMATION_SCHEMA
Most detailed:
Active (ACTIVE_BYTES column)
Time Travel (TIME_TRAVEL_BYTES column)
Fail-safe (FAILSAFE_BYTES column)
TABLE_STORAGE_METRICS
view in
ACCOUNT_USAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Individual Table Storage

SHOW TABLES; SHOW TABLES;

TABLE_STORAGE_METRICS
view in SELECT * FROM DB_NAME.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
INFORMATION_SCHEMA

TABLE_STORAGE_METRICS
view in SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
ACCOUNT_USAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Resource Monitors

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Resource Monitors
Control and monitor credit usage of warehouses and account

Standard Edition

Set Credit Limits

In defined cycle

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Resource Monitors
Control and monitor usage of warehouses and account credits

Standard Edition

Virtual warehouse

Account

Multiple
virtual warehouses

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Resource Monitors
Control and monitor usage of warehouses and account credits

Standard Edition

Suspend immediately and notify

Created by

Suspend and notify ACCOUNTADMIN

Notify MONITOR, MODIFY

On resource monitor

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Resource Monitors
Control and monitor usage of warehouses and account credits

Standard Edition
Track usage of cloud services needed
Virtual warehouse

Can be supsended if limit reached

Account

Can NOT prevent cloud service usage

Multiple
virtual warehouses

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Warehouses &
Multi-Clustering

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Warehouses

What is a virtual warehouse? Type

Provides compute
resources to Size
execute queries
and operations
Multi-cluster

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Warehouses

Standard Snowpark-optimized
Most suitable in Recommended for
most use cases memory-intensive
workloads such as
ML training

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Virtual Warehouse Sizes

XS 1
L 8
S 2
XL 16
M 4

4XL 128

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Virtual Warehouse Sizes
(Snowpark-optimized)

L 12

XL 24
M 6

6XL 768

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Clustering

Queue

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Clustering

… More queries …
S

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Clustering

S Great for more

concurrent users!
… More queries …
S
Not ideal for more
> Auto-Scaling complex workload!
S

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Mode

Maximized Auto-scale
Min # clusters Min # clusters
= ≠
Max # clusters Max # clusters

Static workload Dynamic workload

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Clustering

Queue

Auto-Scaling: When to start an additional cluster?

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Scaling policy

Standard Economy
Favors starting Favors conserving
additional credits rather
warehouses than starting
additional
warehouses

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Scaling policy
Policy Description Cluster Starts… Cluster Shuts Down…

Immediately when either a query is After 2 to 3 consecutive successful

Prevents/minimizes queuing by favoring queued or the system detects that checks
Standard starting additional clusters over conserving there are more queries than can be (performed at 1 minute intervals), which
(default) credits. executed by the currently available determine whether the load on the least-
clusters. loaded cluster could be redistributed to
the other clusters
Conserves credits by favoring keeping running
clusters fully-loaded rather than starting
Only if the system estimates there’s
additional clusters. After 5 to 6 consecutive successful
Economy enough query load to keep the cluster
checks …
busy for at least 6 minutes.
Result: May result in queries being queued and
taking longer to complete.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Objects

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Objects in Snowflake Centralized view of
accounts in
organization
Organization
E.g. billing

Account 2 Account 1 Managed by

ORGADMIN

Users Roles Databases Warehouses Other account objects

To organize a
Schemas database

UDFs Views Tables Stages Other database objects

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

SnowSQL

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

SnowSQL
1 What is SnowSQL?

2 Download SnowSQL

3 Install SnowSQL

4 Connect to your Snowflake Account

5 Run a query in SnowSQL on your account

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
SnowSQL
Connect to Snowflake through the command line

Execute queries, load, and unload data

Perform all DDL & DML operations

Windows Linux MacOS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Loading &
Unloading
Domain 4.0:
Data Loading and Unloading
Internal Stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

Stages in Snowflake are

Stage locations used to store data.

✓ Location where data is loaded FROM

Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Stages in Snowflake are

Stage locations used to store data.

✓ Location where data is loaded TO

Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Internal Stages in Snowflake are

locations used to store data.
Stage Stage
✓ Location where data is loaded FROM/TO
✓ Not to be confused with dataware house stages
External
Stage
Database
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Internal Snowflake managed External External cloud provider

Stage ▪
▪
Cloud provider storage
Upload file before load
Stage ▪ AWS S3 (AWS)
▪ Google Cloud Storage (GCP)
▪ User stages ▪ Azure Container (Azure)
▪ Table stages
▪ Internal Named Stages
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Stages

Internal
Uploading
Stage
PUT
• Data will be compressed
COPY INTO … Loading
• (.gz file ending)
• Automatically encrypted
Tables • (128-bit or 256-bit keys)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

Internal
Downloading
Stage
GET
COPY INTO location
Unloading

Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

Internal
Stage

Internal Named
User Stages Table Stages
Stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Internal stages

User Stages • Tied to a user

• Cannot be accessed by other users
• Every user has default stage
• Cannot be altered or dropped
• Put files to that stage before loading
• Explicitly remove files again
• Loading to multiple tables
• Referred to with '@~'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Internal stages

Table Stages • Automatically created with a table

• Can only be accessed by one table
• Cannot be altered or dropped
• Load to one table
• Referred to with '@%TABLE_NAME'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Internal stages

Named Stages • CREATE STAGE …

• Snowflake database object
• Everyone with privileges can access it
• Most flexible
• Referred to with '@STAGE_NAME'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Use cases

Local System Multiple Snowflake tables

Load

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Use cases
1. Connect to SnowSQL

Local System Multiple Snowflake tables

User Stage
PUT
COPY INTO

GET COPY INTO

2. PUT files to stage

3. COPY INTO tables 4. Process data

6. GET files from stage

5. COPY INTO stage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

External Stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

Internal Snowflake managed External External cloud provider

External • CREATE STAGE …

Stage • Snowflake database object

• Everyone with privileges can access it
• Referred to with '@STAGE_NAME'
External cloud provider
▪ AWS S3 (AWS) • References external storage location
▪ Google Cloud Storage (GCP)
CREATE STAGE <stage_name>
▪ Azure Container (Azure) URL = 's3://bucket/path/'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

External • CREATE STAGE …

Stage • Snowflake database object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

External • CREATE STAGE …

Stage • Snowflake database object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stages

External • CREATE STAGE …

Stage • Snowflake database object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

LIST
LIST @STAGE_NAME; External Stage / Internal Named Stage

List all files

LIST @~; User Stage
and additional properties

LIST @%TABLE_STAGE_NAME; Table Stage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Referencing stages
Copy FROM stage Query from stage

COPY INTO TABLE_NAME SELECT * FROM @STAGE_NAME;

FROM @STAGE_NAME;

Table Stage
Copy TO stage
SELECT
COPY INTO @STAGE_NAME $1,
FROM TABLE_NAME; $2,
$3
FROM @STAGE_NAME;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

COPY INTO

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

COPY INTO
Copy FROM stage

COPY INTO TABLE_NAME ▪ Loads data into a table

FROM @STAGE_NAME;
Files must be already staged in:
• Internal or External Stage
Copy TO stage

COPY INTO @STAGE_NAME

▪ Unloads data into a file
FROM TABLE_NAME;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

COPY INTO
Copy FROM stage
▪ Loads data into a table
COPY INTO TABLE_NAME ⇒ Bulk Loading
FROM @STAGE_NAME;
▪ Files must be already staged in:
• Internal Named Stage
File formats: • External Stage
• CSV (Default)
• JSON ▪ Warehouses are needed
• AVRO
• ORC ▪ Data Transfer costs may apply
• PARQUET
• XML

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

COPY INTO
Load specific files

COPY INTO TABLE_NAME Load with Copy Options

FROM @STAGE_NAME
FILES = file_name1,…;
COPY INTO TABLE_NAME
FROM @STAGE_NAME
FILES = file_name1,…
CopyOptions;
Load with pattern

COPY INTO TABLE_NAME

FROM @STAGE_NAME
PATTERN = .*sales.*;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

COPY INTO

COPY INTO TABLE_NAME

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

File Format

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

FILE FORMAT

COPY INTO TABLE_NAME

FROM @STAGE_NAME;

COPY INTO TABLE_NAME

FROM @STAGE_NAME
FILE_FORMAT = (TYPE = CSV …);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

FILE FORMAT

CREATE FILE FORMAT <fileformatname>

TYPE = CSV
FIELD_DELIMITER =','
SKIP_HEADER = 1;

CREATE STAGE <stagename>

URL = '<location>'
FILE_FORMAT = (FORMAT_NAME = <fileformatname >);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

FILE FORMAT

CREATE FILE FORMAT <fileformatname> COPY INTO table_name

TYPE = CSV FROM @stagename;
FIELD_DELIMITER =','
SKIP_HEADER = 1;

COPY INTO table_name

CREATE STAGE <stagename> FROM @stagename
URL = '<location>' FILE_FORMAT = (TYPE = CSV …);
FILE_FORMAT = (FORMAT_NAME = <fileformatname >);

COPY INTO table_name

FROM @stagename
FILE_FORMAT = (FORMAT_NAME = <fileformatname >);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Storage Integration

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Integration object
Storage Integration ✓ Stores a generated identity for external cloud storage

Create Snowflake object ✓ Contains ALLOWED_LOCATIONS

Grant permission in Cloud Provider ✓ Allow it as Enterprise Application

Assign Role in Cloud Provider ✓ Grant permissions

Use it in Stage ✓ Connect it to stage object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpipe

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Loading Data

BULK CONTINUOUS
LOADING LOADING

✓ Manual execution ✓ Snowpipe

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

What is Snowpipe?

✓ Loads data immediately when a file appears in a blob storage

✓ According to defined COPY statement
✓ If data needs to be available immediately for analysis
✓ Snowpipe uses serverless features instead of warehouses
⇒ No user-created warehouse is needed!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpipe

Cloud notification Serverless

Load
COPY
S3 bucket / Container
Snowflake DB

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpipe for Azure

Event Notification
Queue Storage

Azure Container Notification Integration

Load

Snowflake DB
Snowpipe for Azure

CREATE PIPE <name>

AUTO_INGEST = TRUE | FALSE
INTEGRATION = '<string>'
COMMENT = '<string_literal>'
AS <copy_statement>

✓ PIPE contains COPY statement

Snowpipe for Azure

CREATE PIPE <name>

AUTO_INGEST = TRUE | FALSE
INTEGRATION = '<string>'
COMMENT = '<string_literal>'
AS COPY INTO table_name
FROM @stage_name

✓ PIPE contains COPY statement

Setting up Snowpipe
Storage Integration ✓ Connection details to container
✓ Grant permissions

Create Stage ✓ Location to container

Queue + Notification ✓ To trigger snowpipe

✓ Notification can be received by Snowflake

Notification Integration
✓ Grant permissions

Create Pipe ✓ Create pipe as object with COPY COMMAND

Test COPY COMMAND ✓ To make sure it works

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpipe
Snowpipe Methods

Cloud messaging REST API

✓ Uses event notifications ✓ Calls REST API Endpoints

✓ External stages ✓ Internal stages & External stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpipe
Serverless No dedicated warehouse

Cost Per-second/per-core granulatiry

Load Time Typically within 1 minute

File Size Ideally 100MB – 250MB (or more)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpipe

Load History 14 Days

Location Store in Schema

ALTER PIPE
Pause / Resume SET PIPE_EXECUTION_PAUSED = TRUE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options

COPY INTO <table_name>

FROM @externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions

How data is copied

Unloading vs. Loading

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options
Stage properties

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options

CREATE STAGE <stage_name>

URL = 's3://bucket/path/'
STORAGE_INTEGRATION = …
FILE_FORMAT = …
COPY_OPTIONS = ( copyOptions )

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options

COPY INTO <table_name>

FROM @externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions

If specified in COPY command they will overwrite the stage

copy options

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – ON ERROR
COPY INTO <table_name> Only with Data Loading
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
ON_ERROR = CONTINUE

CONTINUE Continue loading file if errors are found

DEFAULT
SNOWPIPE SKIP_FILE Skip file if errors are found

SKIP_FILE_<num> e.g. SKIP_FILE_10: Skip file when # errors >= 10

SKIP_FILE_<num>% e.g. SKIP_FILE_10%: Skip file when # errors >= 10%

DEFAULT
ABORT_STATEMENT Aborts load if error is found
BULK LOAD
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Copy Options – SIZE_LIMIT
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
SIZE_LIMIT = <num>

SIZE_LIMIT = <num> Specifies max. size of data loaded

DEFAULT Next file will be loaded until <num> (in bytes) is

null exceeded

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – SIZE_LIMIT
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
SIZE_LIMIT = 25000000

25 MB

10MB 0 MB ✓
10MB 10 MB ✓ 3 files loaded with 30MB
10MB 20 MB ✓
10MB 30 MB X
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Copy Options – PURGE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
PURGE= TRUE | FALSE

Remove files (if possible) from stage after

TRUE successful load

DEFAULT FALSE Leave files in stage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – MATCH_BY_COLUMN_NAME
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE

Load semi-structured data columns by matching field names

CASE_SENSITIVE Matches case-sensitive

CASE_INSENSITIVE Matches case-insensitive

DEFAULT NONE Loads data in variant column or based on COPY statement

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – ENFORCE_LENGTH
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
ENFORCE_LENGTH = TRUE | FALSE

DEFAULT TRUE Produces error if loaded string length is exceeded

FALSE Strings are automatically truncated

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – TRUNCATECOLUMNS
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
TRUNCATECOLUMNS = TRUE | FALSE

TRUE Strings are automatically truncated

DEFAULT FALSE Produces error if loaded string length is exceeded

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – FORCE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FORCE = TRUE | FALSE

TRUE Load files even if they have been loaded before

DEFAULT FALSE Don't load file when they have been loaded before

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options – LOAD_UNCERTAIN_FILES
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
LOAD_UNCERTAIN_FILES = TRUE | FALSE

TRUE Load file if load status is unknown

DEFAULT FALSE Don't load file if load status is unknown

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Copy Options
CopyOption Description Values Default
Specifies the error handling for the load CONTINUE | SKIP_FILE | SKIP_FILE_num ABORT_STATEME
ON_ERROR operation | 'SKIP_FILE_num%' | ABORT_STATEMENT NT
Specifies the maximum size (in bytes) of
SIZE_LIMIT data to be loaded
<num> null (no limit)

PURGE Remove files after successful load TRUE | FALSE FALSE

RETURN_FAILED_ONLY Return only files that have failed to load TRUE | FALSE FALSE

Load semi-structured data into columns in CASE_SENSITIVE |

MATCH_BY_COLUMN_NAME matching the columns names
NONE
CASE_INSENSITIVE | NONE
Truncate text strings that exceed the
ENFORCE_LENGTH target column length
TRUE | FALSE TRUE

Truncate text strings that exceed the

TRUNCATECOLUMNS target column length
TRUE | FALSE FALSE

FORCE Load files even if loaded before TRUE | FALSE FALSE

LOAD_UNCERTAIN_FILES Load files even if load status unknown TRUE | FALSE FALSE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

VALIDATION_MODE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

VALIDATION_MODE
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS

VALIDATE THE DATA INSTEAD OF LOADING

RETURN_n _ROWS e.g. RETURN_5_ROWS: Validates <n> rows (returns error or rows)

RETURN_ERRORS Returns all errors across all files

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

VALIDATE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

VALIDATE

VALIDATE( <table_name> , JOB_ID => { '<query_id>' | '_last' } )

Validates the files loaded in a past execution of

the COPY INTO <table> command and returns all the
errors encountered during the load

Doesn't return anything if

ON_ERROR = ABORT_STATEMENT

SELECT * FROM TABLE(VALIDATE( ORDERS , JOB_ID => '_last' ))

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading

Stage
Download

COPY INTO location

Unload

Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading

Internal
Stage Download
GET
COPY INTO location
Unload

Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name
FROM table_name

External & Internal Stages File formats:

Structured

Interface (Cloud Provider) GET command • CSV (Default), TSV, etc.

Semi-structured:
• JSON
• PARQUET
GET @internal_stage file://<path_to_file>/<filename>
• XML

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name
FROM table_name

SINGLE = FALSE Split into multiple files

DEFAULT

SINGLE = TRUE Everything in one file

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name
FROM table_name

MAX_FILE_SIZE <num>
DEFAULT Can be increased to 5 GB
=16777216 16MB

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name
FROM (SELECT id, name, start_date
FROM table_name)

Unload using SELECT SELECT statement can be used for the COPY statement

Transformations Transformations can be used

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name
FROM (SELECT id, name, start_date
FROM table_name)
FILE_FORMAT=(TYPE=JSON)

FILE_FORMAT File format can be set in the COPY command

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name
FROM (SELECT id, name, start_date
FROM table_name)
FILE_FORMAT=(TYPE=CSV)
HEADER = TRUE

FILE_FORMAT File format can be set in the COPY command

HEADER = TRUE Will include a header in the output files

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unloading
COPY INTO @stage_name/myfile
FROM (SELECT id, name, start_date
FROM table_name)
FILE_FORMAT=(TYPE=CSV)
HEADER = TRUE

Prefix If not specified 'data_' is prefix

Suffix Suffix is added to ensures each file name is unique

e.g. data_0_1_0.csv.gz

e.g. myfile_0_0_1. csv.gz

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data
Transformations
Domain 5.0:
Data Transformations
Transformations &
Functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Transformations
Data can be transformed when loading data

Simplify ETL pipeline

Supported Not Supported

Column reordering FLATTEN function

Cast data types Aggregation functions

Remove columns GROUP BY

Truncate
Filter with WHERE
(TRUNCATECOLUMNS)

Subset of SQL functions JOINs

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Functions
Supports most standard SQL functions defined in SQL:1999 and
parts of SQL:2003 extensions

Scalar functions Returns one value per invocation (one value per row)

SELECT DAYNAME('2023-12-31')

SELECT DAYNAME("effective_date")
FROM LOAN_PAYMENT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across

Aggregate functions rows

SELECT MAX(amount)
FROM orders

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across

Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

SELECT ORDER_ID, SUBCATEGORY,

MAX(AMOUNT) OVER (PARTITION BY SUBCATEGORY)
FROM ORDERS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across

Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

Return a set of rows for each input row –

Table functions used to obtain information about Snowflake features

SELECT * FROM TABLE(VALIDATE( ORDERS , JOB_ID => '_last' ))

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across

Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

Return a set of rows for each input row –

Table functions used to obtain information about Snowflake features

Control & information functions –

System functions usually SYSTEM$... - SYSTEM$CANCEL_ALL_QUERIES

SELECT SYSTEM$TYPEOF('abc');
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Functions
Supports most standard SQL functions defined in SQL:1999

Scalar functions Returns one value per invocation (one value per row)

Mathematical calculations such as max & min across

Aggregate functions rows

Window functions Aggregate functions that operate on a subset of rows

Return a set of rows for each input row –

Table functions used to obtain information about Snowflake features

Control & information functions –

System functions usually SYSTEM$... - SYSTEM$CANCEL_ALL_QUERIES

Define functions – store & execute outside of

UDFs + External Snowflake

https://docs.snowflake.com/en/sql-reference/intro-summary-operators-functions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Estimation functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
The idea:
Exact calculations on very large tables can
be very compute-/ memory-intensive.

Mathematical algorithms can approximate the

exact number, they might be good enough and
require fewer resources.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets

MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets

MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Number of Distinct Values
The idea:
HyperLogLog (cardinality estimation
algorithm) is used to estimate the number
of distinct values.

Situation:
Large input
COUNT(DISTINCT (COLUMN1,…))
Average error is acceptable 1.62338%

HLL(COLUMN1,…)) APPROX_COUNT_DISTINCT (COLUMN1,…)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Number of Distinct Values
SELECT HLL(C_NAME) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER;

-1.244%

SELECT COUNT(DISTINCT (C_NAME)) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets

MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Frequent Values
The idea:
Space-Saving algorithm is used to estimate
the most frequent values along with their
frequency.

Function:

APPROX_TOP_K (COLUMN)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Frequent Values
APPROX_TOP_K (COLUMN)

k = No. of values whose frequency

APPROX_TOP_K (COLUMN,<k>) k=1
should be approximated

counters = Max. no. of distinct

APPROX_TOP_K (COLUMN,<k>, <counters>) values that can be tracked

count >> k

count large
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273 ⇒ more accurate
Frequent Values
SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE_SALES 28.8B rows

SELECT SS_CUSTOMER_SK,COUNT(SS_CUSTOMER_SK)
FROM STORE_SALES
GROUP BY SS_CUSTOMER_SK

SELECT APPROX_TOP_K (SS_CUSTOMER_SK,5,20)

FROM STORE_SALES

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets

MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Percentile Values

The idea:
t-Digest algorithm is used to estimate
percentile values.

Function:

APPROX_PERCENTILE (COLUMN,<percentile>)
Returns the percentile
value

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Percentile Values
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000 1.5B rows

SELECT APPROX_PERCENTILE(O_TOTALPRICE,0.5)
FROM ORDERS;

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY O_TOTALPRICE)

FROM ORDERS;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets

MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Similarity of Two or More Sets
The idea:
Uses MinHash to estimate the similarity
between two or more data sets.

Typically the Jaccard similarity coefficient

J(A,B) = (A ∩ B) / (A ∪ B)
is used to compare similarity.

MinHash can estimate J(A,B) quickly.

/
Computationally expensive!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Similarity of Two or More Sets
2-step-process

1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH
{ [
MH
{
"state": [
2200169610250,
22818457966550,
SELECT MINHASH(7, O_ORDERKEY) AS mh FROM ORDERS; 2507497641893,
12337014946743,
5083517324927,
1039435359430,
967271249674
],
"type": "minhash",
"version": 1
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Similarity of Two or More Sets
2-step-process

1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH

SELECT APPROXIMATE_SIMILARITY(mh) Use MinHash states to calculate

2 FROM
similarity with APPROXIMATE_SIMILARITY().

((SELECT MINHASH(100, *) AS mh FROM mhtab1)

APPROXIMATE_SIMILARITY UNION ALL
(SELECT MINHASH(100, *) AS mh FROM mhtab2));

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Similarity of Two or More Sets
2-step-process

1
Returns a MinHash state
SELECT MINHASH(100, *) AS mh FROM mhtab1;
SELECT MINHASH(100, *) AS mh FROM mhtab2; k is # of hash functions
the larger k, the more accurate
MINHASH

SELECT APPROXIMATE_SIMILARITY(mh) Use MinHash states to calculate

2 FROM
similarity with APPROXIMATE_SIMILARITY().

((SELECT MINHASH(100, *) AS mh FROM mhtab1)

APPROXIMATE_SIMILARITY UNION ALL Returns a value between 0 and 1.

(SELECT MINHASH(100, *) AS mh FROM mhtab2));

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Estimating functions
1 Number of Distinct Values HLL()

2 Frequent Values APPROX_TOP_K()

3 Percentile Values APPROX_PERCENTILE()

4 Similarity of Two or More Sets

MINHASH +
APPROXIMATE_SIMILARITY()

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

User-defined functions (UDFs)
Way of extending functionality with
additional functions.

Supported languages: create function add_two(n int)

o SQL returns int
o Python as
$$
o Java n+2
o JavaScript $$;

select add_two(3);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

User-defined functions (UDFs)
Way of extending functionality with
additional functions.

create function add_two(n int)

Supported languages: returns int
o SQL language python
o Python runtime_version = '3.8'
o Java handler = 'addtwo'
o JavaScript as
$$
def add_two(n):
return n+2
$$;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

User-defined functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

User-defined functions (UDFs)

Scalar functions Returns one output row per input row

Tabular functions Returns a tabular value for each input row

Securable schema-level object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

External Functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

External functions
User-defined functions that are
stored and executed outside of Snowflake.

Calls code that is executed outside of Snowflake.

✓ No code is stored in function definition

✓ Reference third-party libraries, services and data

create external function my_az_funct(string_col VARCHAR)

returns variant
api_integration = azure_external_api_integration
as 'https://my-api-management-svc.azure-api.net/my-api-url/my_http_function'

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

External functions
User-defined functions that are
stored and executed outside of Snowflake.

Calls code that is executed outside of Snowflake.

✓ No code is stored in function definition

✓ Reference third-party libraries, services and data

✓ Remotely executed code ⇒ "remote service"

✓ Security related information are stored in API integration

✓ Schema-level object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

External functions
User-defined functions that are
stored and executed outside of Snowflake.

Calls code that is executed outside of Snowflake.

Examples:
o AWS Lambda function
o Microsoft Azure function
o HTTPS server

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Advantages & Limitations
Advantages:

✓ Additional languages including GO and C#

✓ Accessing 3rd-party libraries such as machine learning
scoring libraries
✓ Can be called from Snowflake and from other software

Limitations:

• Must be scalar
• Slower performance (overhead + fewer optimizations)
• Not sharable

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Way of extending functionality with additional functions.

Stored Procedure UDF

Typically performs database

operations - usually Typically calculate and return a
administrative operations like value.
DELETE, UPDATE or INSERT.

Doesn't need to return a value Need to return a value

No need to have access to objects

Caller or Owner's rights
reference in the function.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Way of extending functionality with additional functions.

Supported languages:
o Snowflake Scripting
(Snowflake SQL + procedural logic)

o JavaScript
o Snowpark API
(Python, Scala, Java)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Creating a procedure
Securable objects

create procedure find_min(n1 int, n2 int)

returns int
language sql
as
BEGIN
IF (n1 < n2)
THEN RETURN n1;
ELSE RETURN n2;
END IF;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Calling a procedure

Result

call find_min(5, 7);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Define one or multiple operations.

create procedure update_test_table()

returns varchar
language sql
as
BEGIN
UPDATE manage_db.public.test1
SET test_col = 3;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Define one or multiple operations.

create procedure update_test_table()

returns varchar
language sql
as
BEGIN
UPDATE manage_db.public.test1
SET test_col = 3;
UPDATE manage_db.public.test1
SET test_col2 = 4;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
If argument is used in SQL
statement ⇒ use :argument

create procedure update_test_table(new_value varchar)

returns int
language sql
as
BEGIN
UPDATE manage_db.public.test1
SET test_col = :new_value;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
If argument is used in SQL
statement to refer to as an object
⇒ use IDENTIFIER(:argument)

create procedure update_table(new_v varchar,table_name varchar)

returns varchar
language sql
as
BEGIN
UPDATE IDENTIFIER(:table_name)
SET test_col = :new_value;
RETURN 'Successfully updated table';
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
Calling a stored procedure

call update_table('new_value','table_name');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Runs either with caller's or owner's rights

Runs with the Runs with the

caller's privileges owner's privileges

Can make user of user information Delegation of

(e.g session variables) Administrative tasks

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Stored Procedures
When creating specify rights DEFAULT
⇒ use execute as caller/owner OWNER

create procedure update_table(new_v varchar,table_name varchar)

execute as caller
returns int
language sql
as
BEGIN
UPDATE IDENTIFIER(:table_name)
SET test_col = :new_value;
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Secure UDFs &
Stored Procedures

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Secure UDFs & Stored Procedures
DESC FUNCTION MANAGE_DB.PUBLIC.ADD_TWO(NUMBER);

Secure UDFs & Stored Procedures

✓ Hide certain information, e.g. definition

✓ Prevent users from seeing underlying data

CREATE SECURE FUNCTION <function_name> …

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Secure UDFs & Stored Procedures
Trade-off:

Reduced query performance ⇔ Security

Use-case:

• Consider purpose of UDF / Stored Procedure

• NOT make it secure if it is define just for

convenience

• Make it secure when the data is sensitive

enough

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Sequences
CREATE SEQUENCE my_seq
START = 1
DEFAULT = 1
INCREMENT = 1;

SELECT my_seq.nextval;

SELECT my_seq.nextval;
Sequences

✓ Are securable objects

✓ Typically create for default values

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Sequences

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Sequences
CREATE
CREATE TABLE
TABLE sequence_test(
sequence_test(
id
id int
int DEFAULT
DEFAULT my_seq.nextval,
my_seq.nextval,
first_name
first_name varchar);
varchar);
Sequences
✓ Not guarantueed to be gap-free

INSERT INTO sequence_test(first_name)

VALUES
('Maria'),('Frank');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Semi-structured data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Semi-structured data
What is semi-structured data? JSON

{
▪ no fixed schema "courses":[
{
▪ contains tags/labels and a nested structure "topic":"Snowflake",
"level":"All levels"
},
{
What is structured data? "topic":"SQL"
"language":["English","German"]
},
{
Data has a well-defined structure "topic":"Azure",
"level":"Beginner"
}
]
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Semi-structured Data Types
Supported formats:

▪ JSON

▪ XML

▪ PARQUET

▪ ORC

▪ Avro

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Types in Snowflake
OBJECT
Unordered set of name value pairs.

{
"topic":"Snowflake",
"level":"All levels"
},

ARRAY
Consists of 0 or more pieces of data.

["USA", "India", "Canada"]

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Types in Snowflake
VARIANT Use-cases

Can store values of any other data Explicitely define hierarchy of

type including ARRAY and OBJECT. ARRAYs and OBJECTs

Suitable to store and query semi- Let Snowflake convert semi-structured

structured data. data into hierarchy of ARRAY, OBJECT,
and VARIANT data stored into VARIANT

CREATE FILE FORMAT my_fileformat

TYPE = {JSON | AVRO | XML | PARQUET | ORC}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Types in Snowflake
VARIANT Use-cases

SQL nulls are just stored as "null" Explicitely define hierarchy of

strings ARRAYs and OBJECTs
They are called JSON null (or "VARIANT null")

Non-native strings (e.g. dates) are Let Snowflake convert semi-structured

stored in strings data into hierarchy of ARRAY, OBJECT,
and VARIANT data stored into VARIANT

Maximum length is 16 MB.

(uncompressed per row)
CREATE FILE FORMAT my_fileformat
TYPE = {JSON | AVRO | XML | PARQUET | ORC}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Loading semi-structured data
VARIANT "Native support for semi-structured data"

Load the data as it is and transform it later. ELT approach

Extract & Load raw data Analyze / Parse Flatten

VARIANT

COPY INTO …

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query
Semi-structured data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query semi-structured data
{"courses": {
"snowflake": {
"module1": {
To access elements of VARIANT column use : "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
SELECT raw_column:courses "video lectures"
],
FROM variant_table "difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query semi-structured data
{"courses": {
"snowflake": {
"module1": {
To access elements of VARIANT column use : "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
SELECT $1:courses "video lectures"
],
FROM variant_table "difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query semi-structured data
{"courses": {
"snowflake": {
"module1": {
To access elements of VARIANT column use : "topic": "Introduction",
Use . to access subsequent elements "difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
"video lectures"
SELECT column_name:courses.snowflake ],
FROM variant_table "difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query semi-structured data
{"courses": {
To access elements of VARIANT column use : "snowflake": {
"module1": {
Use [] to access array elements "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
SELECT column_name: courses.azure.module1.formats[1] "formats": [
"video lectures"
FROM variant_table ],
"difficulty": "All levels"
}
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Semi-structured data
{"courses": {
To access elements of VARIANT column use : "snowflake": {
"module1": {
Use [] to access array elements "topic": "Introduction",
"difficulty": "All levels"
},
"module2": {
"topic": "Loading data",
"formats": [
SELECT "video lectures"
column_name: courses.azure.module1.formats[1]::VARCHAR ],
"difficulty": "All levels"
FROM variant_table }
},
"azure": {
"module1": {
"topic": "Introduction",
"formats": [
"video lectures",
"hands-on",
"quizzes"]
}
}
}
}

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Flatten hierarchical data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Flatten data
FLATTEN(INPUT => <expression> ) Used to convert semi-structured data into
relational table view

SELECT * FROM TABLE(FLATTEN(INPUT => [2,4,6])) Cannot be used in COPY command!

Produces a lateral view:

Contains references to other

tables in FROM clause

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unstructured data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unstructured data support
What is unstructured data?

▪ Does not fit into any pre-defined data model

o video files

o audio files

o documents

Snowflake supports:
Internal & External Stages

o Access files through URL in cloud storage.

o Share file access URLs.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Unstructured data support
Stage
URL

Scoped URL File URL Pre-signed URL

Encoded URL with temporary

Permits prolonged access to HTTPS URL used to access a
access to a file
a specified file file via a web browser
(no access to the stage)

Expires when the persisted query

The expiration time for the access
result period ends (i.e. results Does not expire
token is configurable.
cache expires) - currently 24 hours.

returned by calling returned by calling returned by calling

the BUILD_SCOPED_FILE_URL function the BUILD_STAGE_FILE_URL function the GET_PRESIGNED_URL function

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

SQL File Functions
URL

Scoped URL File URL Pre-signed URL

returned by calling returned by calling returned by calling

the BUILD_SCOPED_FILE_URL function the BUILD_STAGE_FILE_URL function the GET_PRESIGNED_URL function

SELECT BUILD_SCOPED_FILE_URL (@stage_azure,'Logo.png')

SELECT BUILD_STAGE_FILE_URL (@stage_azure,'Logo.png')

SELECT GET_PRESIGNED_URL (@stage_azure,'Logo.png',60)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Directory Tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Directory Tables
What is a directory table?

Stores metadata of staged files

o Layered on a stage

o Can be queried with sufficient privileges (on stage)

o Retrieve file URLs to access files

Needs to be enabled for stages

CREATE STAGE stage_azure ALTER STAGE stage_azure

URL = <'url'> DEFAULT SET DIRECTORY = (ENABLE = TRUE)
STORAGE_INTEGRATION = integration FALSE
DIRECTORY = (ENABLE = TRUE)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Directory Tables
CREATE STAGE stage_azure Scoped URL
URL = <'url'>
BUILD_SCOPED_FILE_URL function
STORAGE_INTEGRATION = integration
DIRECTORY = (ENABLE = TRUE)

SELECT * FROM DIRECTORY(@stage_azure)

ALTER STAGE stage_azure REFRESH;

Manual refresh
Automatical refresh
using event notification

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sampling

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sampling

10 TB
Why Sampling?
SAMPLE

500 GB

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sampling

Why Sampling?

- Use-cases: Query development, data analysis etc.

- Faster & more cost efficient (less compute resources)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sampling Methods

SELECT * FROM table

SAMPLE ROW (<p>) SEED(15)
ROW or BERNOULLI method

Percentage of rows
Reproducable results
BLOCK or SYSTEM method

SELECT * FROM table

SAMPLE SYSTEM (<p>) SEED(15)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sampling Methods
ROW or BERNOULLI method BLOCK or SYSTEM method

Every row is chosen with percentage p Every block is chosen with percentage p

More "randomness" More effective processing

Smaller tables Larger tables

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Tasks

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Tasks
Used to schedule execution of SQL statement / stored procedures

Often combined with streams to set up continuous ETL workflows

EXECUTE MANAGED TASK

Account
CREATE TASK my_task
WAREHOUSE = my_wh Snowflake-managed compute CREATE TASK
Schema
SCHEDULE = '15 MINUTE'
Run using privilges of task OWNER
AS
USAGE Warehouse
INSERT INTO my_table(time_col) VALUES(CURRENT_TIMESTAMP);

Schema-level object
ALTER TASK my_task RESUME;
Can be cloned ALTER TASK my_task SUSPEND;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Directed Acyclic Graph (DAG)
Root task

Directed Acyclic Graph (DAG)

Limited to
• 1000 tasks in total
Task A Task B • 100 child tasks

Task C Task D Task E Task F

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Tree of Tasks
Root task
Limited to
• 1000 tasks in total
• 100 child tasks

Task A Task B
CREATE TASK my_task
WAREHOUSE = my_wh
AFTER my_task_a
AS
…;

Task C Task D Task E Task F

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams
Record (DML-)changes made to a table

Sales data ETL

HR data
Data scources
Schema-level object

Can be cloned

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams

Table
Stream object
DELETE
INSERT
UPDATE

Records (DML-)changes made to a table

This process is called change data capture (CDC)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams

Table
Stream object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams

Table
Stream object

METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams

CREATE STREAM my_stream Create Stream

ON TABLE my_table

SELECT * FROM my_stream We can query from stream

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams
Table
Stream object

Consuming a stream

INSERT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams
Table
Stream object

Consuming a stream

Empties records

INSERT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Streams

Sales data
ETL

HR data
Data scources

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Types of streams
STANDARD APPEND-ONLY INSERT-ONLY

✓ INSERT ✓ INSERT ✓ INSERT

✓ UPDATE
✓ DELETE

Tables Standard tables External tables only

Directory tables Directory tables

Views Views

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Staleness
Stream becomes stale when offset is outside the
data retention period of source table DATA_RETENTION_TIME_IN_DAYS

Unconsumed change records won't be accessible anymore.

How frequently stream should be consumed

DESCRIBE STREAM or SHOW STREAMS command

Column indicating when the stream is predicted to become stale STALE_AFTER

Stream extends retention to 14 days (default). DEFAULT=14

Regardless of Snowflake edition MAX_DATA_EXTENSION_TIME_IN_DAYS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Types of streams
STANDARD APPEND-ONLY

✓ INSERT ✓ INSERT

✓ UPDATE

✓ DELETE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Combining Streams & Tasks

CREATE TASK my_task

WAREHOUSE = my_wh
SCHEDULE = '15 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('MY_STREAM')
AS
INSERT INTO my_table(time_col) VALUES(CURRENT_TIMESTAMP);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Additional Tools,
Drivers &
Connectors
Domain 1.0:
Snowflake Data Cloud Features & Architecture
Transformations &
Functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Connectors & Drivers
WebUI Drivers Connectors

Snowsight Go
Pyhton
JDBC
Kafka
Command Line Tool .NET
Spark
Node.js
SnowSQL
ODBC
PHP PDO

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Partner Connect
Create trial account and integrate them with Snowflake

Data Integration Security

ML & Data Sciene CI/CD Business Intelligence
& Governance

Informatica Talend Dataiku SqlDBM Immuta Sisense

Matillion Stitch Alteryx DataOps Hunters Domo

Qlik DataRobot Alation Sigma

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
Extention to Snowflake SQL with added support for procedural logic

Snowflake Scripting

variables
cursors
if/case
Stored Procedures Procedual Code resultsets
loops
Outside of
Most commonly
stored procedures

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
Extention to Snowflake SQL with added support for procedural logic

Snowflake Scripting

IF/ELSE
if/case
CASE

FOR REPEAT
loops
WHILE LOOP

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
How to write procedural code in a block

DECLARE <variable & cursor declaration>

…
sections BEGIN <Scripting and SQL statements>
… block
EXCEPTION <Handling exceptions>
…
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
How to write procedural code in a block

DECLARE <variable & cursor declaration>

…
optional BEGIN <Scripting and SQL statements>
… block
EXCEPTION <Handling exceptions>
…
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
How to write procedural code in a block

DECLARE <variable & curso declaration>

…
optional BEGIN <Scripting and SQL statements>
… block
EXCEPTION <Handling exceptions>
…
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
How to write procedural code in a block

Object created in block can be

used outside the block

BEGIN
Minimizing confusion CREATE TABLE employee (id INTEGER,…);
CREATE TABLE store (id INTEGER, …); block
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
How to write procedural code in a block

variables created in block can

CREATE PROCEDURE calc_area() be used only inside the block
RETURNS float
LANGUAGE SQL
AS
DECLARE Snowsight
length_a float;
area float;
BEGIN
length_a := 4;
area := length_a * length_a;
RETURN area
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Scripting
How to write procedural code in a block

variables created in block can

CREATE PROCEDURE calc_area() be used only inside the block
RETURNS float
LANGUAGE SQL
AS
$$ Classic UI
DECLARE
length_a float;
area float; SnowSQL
BEGIN
length_a := 4;
area := length_a * length_a;
RETURN area
END;
$$

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpark

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpark
Snowpark API provides support for three programming languages

Python Code Build application and query data outside the system
No need to move data!
Converts to SQL

Snowflake Process at scale with serverless Snowflake engine

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpark
Snowpark API provides support for three programming languages

Python Java Scala

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowpark

Lazy evaluation Expression is not evaluated until it is needed

Pushdown Code is pushed down to Snowflake and executed there

UDFs inline Created functions can be executed in UDFs

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Protection
Domain 6.0:
Data Protection and Data Sharing
Time Travel

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Protection Lifecycle
Access and query data etc.
Current
Data Storage SELECT * FROM table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Protection Lifecycle
Drop dabase or table accidentally? Truncate or update table accidentally?
Current
Data Storage
DROP DATABASE prod_db; TRUNCATE TABLE prod_table;

Time Travel enables accessing historical data.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Time Travel

What is possible with Time Travel?

- Query deleted or updated data

- Restore tables, schemas and databases that have been dropped

- Create clones of tables, schemas and and databases from previous state

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Time Travel SQL
Query historic data within retention period.

SELECT * FROM table AT (TIMESTAMP => timestamp) 1 TIMESTAMP

SELECT * FROM table AT (OFFSET => -10*60)

2 OFFSET

SELECT * FROM table BEFORE (STATEMENT => query_id)

3 QUERY

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Time Travel SQL
Recover objects that have been dropped within retention period.

UNDROP TABLE table_name; 1 Table

UNDROP SCHEMA schema_name;

2 Schema

UNDROP DATABASE database_name;

3 Database

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Considerations
UNDROP fails if an object with the
same name already exists.

OWNERSHIP privileges are needed

for an object to be restored.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Retention period

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Protection Lifecycle

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Retention period
Number of days for which this historical data is preserved
and Time Travel can be applied.

Configurable for
table, schema, DATA_RETENTION_TIME_IN_DAYS = 2 DEFAULT = 1
database and account
For all accounts

Retention period of 0 "disables" time travel.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Retention period
Create table that overwrites
default retention period Alter table's retention period.

CREATE TABLE table_name (

ALTER TABLE table_name (
column1 int,
SET DATA_RETENTION_TIME_IN_DAYS = 0;
column2 varchar)
DATA_RETENTION_TIME_IN_DAYS = 0;

ALTER ACCOUNT SET ALTER ACCOUNT SET

DATA_RETENTION_TIME_IN_DAYS = 2; MIN_DATA_RETENTION_TIME_IN_DAYS = 2;

Alter account's Set minimum

default retention period retention period

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Retention period

Standard Enterprise Business Critical Virtual Private

Time travel up to 1 day Time travel up to 90 days Time travel up to 90 days Time travel up to 90 days

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Fail Safe

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Continuous Data Protection Lifecycle

✓ SELECT … AT | BEFORE ✓ Access and query data etc.

✓ UNDROP

Current
Disaster? Time Travel Data Storage

0 – 90 days Currently available

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Continuous Data Protection Lifecycle
✓ Recovery beyond Time Travel
✓ Non-configurable ✓ SELECT … AT | BEFORE ✓ Access and query data etc.

✓ No user operations/queries ✓ UNDROP

✓ Restoring only by snowflake support

Current
Fail Safe Time Travel Data Storage

permanent: 7 days
(transient: 0 days)
0 – 90 days Currently available

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Fail Safe

✓ Protection of historical data in case of disaster

✓ No user interaction & recoverable only by Snowflake

✓ Non-configurable 7-day period for permanent tables

✓ Period starts immediately after Time Travel period ends

✓ Contributes to storage cost

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Table Types

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Table types
Only for data that does
Permanent data Non-permanent data
not need to be protected

Permanent Transient Temporary

CREATE TABLE CREATE TRANSIENT TABLE CREATE TEMPORARY TABLE

Time Travel
0 – 90 days 0 or 1 day 0 or 1 day
Retention Period

Fail Safe ✓ Fail Safe × Fail Safe × Fail Safe

Auto-Drop Until dropped Until dropped Only in session

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Managing Storage Cost Table types
Large tables that does
Permanent data Non-permanent data
not need to be protected

Permanent Transient Temporary

CREATE TABLE CREATE TRANSIENT TABLE CREATE TEMPORARY TABLE

Time Travel
0 – 90 days 0 or 1 day 0 or 1 day
Retention Period

Fail Safe ✓ Fail Safe × Fail Safe × Fail Safe

Persistence Until dropped Until dropped Only in session

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Table types notes
Table Stages
✓ Types are also available for other database objects
If database is transient all included objects are transient. Schema Database

✓ For temporary table no naming conflicts with permanent/transient tables

Other tables will be effectively hidden! Relevant for time travel.

Not visible to other users!

✓ Not possible to change type of object for existing object

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Zero-Copy Cloning

New table name

CREATE TABLE table_new

CLONE table_source
Cloning Syntax

Source for clone

Database Schema Table

Stream File Format Sequence

Cloning a database or schema will
clone all contained objects Stage Task Pipe
Named internal stages
not cloned Only for external stages

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Zero-Copy
Cloning
Domain 6.0:
Data Protection and Data Sharing
Zero-Copy Cloning

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Zero-Copy Cloning

✓ Create copies of a database, a schema or a table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Zero-Copy Cloning

✓ Create copies of a database, a schema or a table

Original
Cloud Service Layer

Snapshot metadata operation

Copy SELECT * FROM table_copy

Updates

New data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Zero-Copy Cloning
✓ Create copies of a database, a schema or a table

✓ Cloned object is independent from original table

✓ Easy to copy all metadata & improved storage management

✓ Creating backups for development purposes

✓ Typically combined with time travel

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

How about privileges?

inherited
inherited

NOT to database itself!

inherited

NOT to schema itself!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

How about privileges?

inherited
inherited

NOT to database itself!

Privileges will always only be inherited to
child objects never to source object itself

inherited

NOT to schema itself!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

What privileges are needed?
Table SELECT

Pipe

Stream OWNER
Task

All other
ojects USAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Additional Considerations
✓ Load history metadata is not copied
Loaded data can be loaded again

CREATE TABLE table_new

CLONE table_source Cloning from specific point in time is possible.
BEFORE (TIMESTAMP => 'timestamp')

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Zero-Copy Cloning

CREATE TABLE table_new

CLONE table_source Cloning from specific point in time is possible.
BEFORE (TIMESTAMP => 'timestamp')

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing

Usually this can be also a rather complicated process….

✓ Sharing with actually copying data

✓ Data is always up-to-date

✓ Compute paid by consumer

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing Standard Edition

Account 1
Account 1 Provider
Storage
Data is synchronized

Cloud Service Layer

Account 2
Account 2 Consumer
Compute Resources Read-only
Cannot be modified!

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing Standard Edition

Read-only

Account 1
Account 1 Provider & Consumer
Storage

Cloud Service Layer

Account 2
Account 2 Provider & Consumer
Compute Resources Read-only

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Setting up share
1. Create share ACCOUNTADMIN role or CREATE SHARE privileges required

CREATE SHARE my_share;

2. Grant privileges to share

GRANT USAGE ON DATABASE my_db TO SHARE my_share;

GRANT USAGE ON SCHEMA my_schema.my_db TO SHARE my_share;
GRANT SELECT ON TABLE my_table.myschema.my_db TO SHARE my_share;

3. Add consumer account(s)

ALTER SHARE my_share ADD ACCOUNT bl67131;

4. Import share ACCOUNTADMIN role or IMPORT SHARE / CREATE DATABASE privileges required

CREATE DATABASE my_db FROM SHARE my_share;

5. Grant privileges
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
What can be shared
Tables External Tables Secure views Secure materialized views

Secure UDFs

Share Best practice Share

Database Schema Objects Account(s) Database Schema Secure views Account(s)

Privileges Privileges

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing with Non-Snowflake Users
Storage ACCOUNTADMIN

Provider Account
Compute
Resources Created
&
managed by

Non-snowflake users
Reader Account
Provider responsible
for all costs

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing Considerations
✓ Share becomes immediately visible once shared
New objects added immediately visible as well

✓ Each account can share and consume

Account 1
Even own share can be consumed Producer

✓ Virtual Private Edition doesn't allow sharing

Dedicated compute and metadata storage

✓ Marketplace: Find & Import third-party datasets

ACCOUNTADMIN role or IMPORT SHARE privileges required Account 2
Consumer
✓ Data Exchange: Private Hub for sharing data
Members can be invited
Needs to be enabled by reaching out to Snowflake support

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Sharing Considerations
✓ Share becomes immediately visible once shared
New objects added immediately visible as well

✓ Each account can share and consume ✓ Sharing across region / cloud provider must be enabled
Even own share can be consumed Done via replicating

✓ Virtual Private Edition doesn't allow sharing

Dedicated compute and metadata storage

✓ Marketplace: Find & Import third-party datasets

ACCOUNTADMIN role or IMPORT SHARE privileges required

✓ Data Exchange: Private Hub for sharing data

Members can be invited
Needs to be enabled by reaching out to Snowflake support

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Sharing with
Non Snowflake users
New Reader ✓ Indepentant instance with
Account own url & own compute resources

Share data ✓ Share database & table

Create Users ✓ As administrator create user & roles

Create database ✓ In reader account create database from share

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Database replication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Database replication
Replicates a database between accounts within the same organization.
Snowflake Standard feature

Must be enabled first

Provider Account 1
Primary Database
Region 1
Across cloud provider
Across regions Data and objects synchronized periodically
"Cross-region sharing"

Consumer Account 2 Secondary Database

Read-only
Region 2 (Replica)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Database replication
1. Enable replication for source account with ORGADMIN role

show organization accounts;

-- Enable replication for each source and target account in your organization
select system$global_account_set_parameter('<organization_name>.<account_name>',
'ENABLE_ACCOUNT_DATABASE_REPLICATION', 'true');

Step 2: Promote a Local Database to Primary Database with ACCOUNTADMIN role

ALTER DATABASE my_db1 ENABLE REPLICATION TO ACCOUNTS myorg.account2, myorg.account3;

Step 3: Create replica in consumer account

CREATE DATABASE my_db1 AS REPLICA OF myorg.account1.my_db1;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Database replication
Step 4: Refresh database

ALTER DATABASE my_db1 REFRESH;

Ownership privileges are needed

A task can be scheduled with this command

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Database replication
✓ Privileges must be granted separately (not inherited)

✓ All objects in database are replicated apart from:

Temporary tables Stages Pipes

✓ Data Transfer cost apply according to cloud provider

✓ Compute cost apply

✓ Data is actually physically replicated

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account &
Security
Domain 2.0:
Account Access and Security
Acess Control

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Access Control
Two aspects

Discretionary Access Control Role-based Access Control

(DAC) (RBAC)
Each object has an owner Privileges

Owner can grant access to that object Roles

Users

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Access Control
Key Concepts
Access can be granted
Securable object Access denied unless granted

Privilege Defined level of access

Entity to which privileges are granted

Role Will be assigned to users … or other roles

User Identity associated with person or program

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Access Control

Creates
Role Securable object
Owns
TO User 1

Privilege Role TO User 2

GRANT <privilege>
Privilege 1 User
ON <obeject> Privilege 2
TO <role> GRANT <role>
TO <user>

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles Hierarchy
Role
ACCOUNTADMIN

SECURITYADMIN SYSADMIN

USERADMIN Custom Role 1 Custom Role 2

Custom Role 3

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Objects in Snowflake

Securable object

USAGE

SELECT

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Objects in Snowflake
✓ Every object is owned by one single role OWNERSHIP privileges

• All privileges per default

• Including GRANT and REVOKE
Securable object • Active role in the session

✓ Ownership can be transferred

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
Role
ACCOUNTADMIN

SECURITYADMIN SYSADMIN

USERADMIN Custom Role 1 Custom Role 2

Custom Role 3

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles

GRANT
Privilege 1
Role REVOKE

GRANT <privilege> SELECT

ON <obeject>
TO <role> my_table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles

GRANT <role> ✓ Roles are assigned to users

TO <user>
User ✓ Multiple roles can be assigned

Role 1
Role 2

Privilege 1
Role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
Role 1 ✓ "Current role" in every session

User Role 2
⇒ Primary role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

System-defined Roles
ACCOUNTADMIN ✓ Can't be dropped

✓ Privileges can be added

SECURITYADMIN SYSADMIN but not revoked

USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
ACCOUNTADMIN ✓ Roles can be assigned to roles

✓ Hierarchy of roles
SECURITYADMIN SYSADMIN
✓ Privileges are inherited
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
ACCOUNTADMIN ✓ Custom roles

✓ Best practice:
SECURITYADMIN SYSADMIN Assigned to SYSADMIN

USERADMIN Custom Role 1 Custom Role 2

Custom Role 3

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
ORGADMIN
ORGADMIN
Manages actions on organizational level.
ACCOUNTADMIN

✓ Create accounts
SECURITYADMIN SYSADMIN
✓ View all accounts
✓ View account usage information
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
ACCOUNTADMIN
ORGADMIN
Top-level role
ACCOUNTADMIN

✓ Should be to limited number of users

SECURITYADMIN SYSADMIN
✓ Contains SECURITYADMIN & SYSADMIN
✓ Can manage all objects in account
USERADMIN
✓ Incl. share and reader accounts
✓ Modify account-level parameters
✓ Manage billing & resource monitors
PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
SECURITYADMIN
ORGADMIN
Manage any object grant globally
ACCOUNTADMIN

✓ MANAGE GRANTS privilege

SECURITYADMIN SYSADMIN
✓ Create, monitor, and manage users & roles
✓ Inherits USERADMIN privileges
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
SYSADMIN
ORGADMIN
Create warehouses, databases & other objects
ACCOUNTADMIN

✓ All custom roles should be assigned to

SECURITYADMIN SYSADMIN
✓ Can grant privileges on warehouses,
databases, and other objects
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
USERADMIN
ORGADMIN
Dedicated to user and role management
ACCOUNTADMIN

✓ CREATE USER & CREATE ROLE privileges

SECURITYADMIN SYSADMIN
✓ Can mange users and roles that are owned

USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
PUBLIC
ORGADMIN
Automatically granted per default
ACCOUNTADMIN

✓ Granted to when no access control needed

SECURITYADMIN SYSADMIN
✓ Objects can be owned but are available to
to everyone
USERADMIN

PUBLIC

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Roles
CUSTOM

ACCOUNTADMIN ✓ Created by USERADMIN or higher

✓ CREATE ROLE privilege

SECURITYADMIN SYSADMIN
✓ Should be assigned to SYSADMIN
Otherwise, SYSADMIN won't be able to
USERADMIN Custom Role 1 manage objects created by these roles

✓ Custom database roles

Custom Role 3 can be created by owner

PUBLIC Custom Role 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Privileges

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Define granular level of access

OWNERSHIP
global
GRANT MANAGE GRANTS
privilege
REVOKE

Privilege 1
Role
GRANT <privilege> SELECT
ON <obeject>
my_table
TO <role>
REVOKE <privilege>
ON <obeject>
FROM <role>

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Important privileges

CREATE SHARE Enables provider to create a share.

Global privileges IMPORT SHARE Enables to create a database.

APPLY MASKING POLICY Enables to set masking policies.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Important privileges

MODIFY Enables to alter properties of a warehouse – e.g. resizing.

Virtual Warehouse MONITOR Enables to view executed queries by the warehouse.

OPERATE Enables to change the state of a warehouse (e.g. suspend and resume).

USAGE Enables to use the warehouse and execute queries.

OWNERSHIP Full control over the warehouse.

ALL All privileges apart from OWNERSHIP.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Important privileges

MODIFY Enables to alter properties and settings of a database.

Databases MONITOR Enables to perform DESCRIBE command.

USAGE Enables to use the database and execute SHOW DATABASES command.

REFERENCE_USAGE Enables using an object (shared secure view) to reference another object in a different database.

ALL All privileges apart from OWNERSHIP.

OWNERSHIP Full control over the database.

CREATE SCHEMA Enable creating a schema in the database.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Important privileges

Enables to perform operations that require reading (e.g. GET, LIST, COPY INTO table)
READ
Stages from internal stages; not applicable to external stages

USAGE Enables to use an external stage; not applicable to internal stage.

Enables to perform writing to internal stage (PUT, REMOVE, COPY INTO location); not
WRITE applicable to external stages

ALL All privileges apart from OWNERSHIP.

OWNERSHIP Full control over the stage.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Priviliges
Important privileges

SELECT Using SELECT to query table.

Tables INSERT Inserting values into the table and manually reclustering tables.

UPDATE Using UPDATE command on a table.

TRUNCATE Using TRUNCATE command on a table.

DELETE Using DELETE command on a table.

ALL All privileges apart from OWNERSHIP.

OWNERSHIP Full control over the database.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-factor
authentication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Factor Authentication

Authentication is proving that you are who you say you are.

Multi-Factor Authentication provides additional login security. Standard Edition

Powered by Duo Security but managed by Snowflake. No sign-up,

only installation.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Factor Authentication
Strongly recommended for
ACCOUNTADMIN
Per default enabled for accounts but requires user to enroll.

SECURITYADMIN (or ACCOUNTADMIN) can disable MFA for user.

Fully-supported by web interface, SnowSQL, Snowflake ODBC

and JDBC, and Python Connectors.

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Multi-Factor Authentication

MFA token caching can reduce the number of prompts during

authentication.

Needs to be enabled first.

MFA token is valid for up to 4 hours.

ODBC driver version 2.23.0 (or later).

JDBC driver version 3.12.16 (or later).
Python Connector for Snowflake version 2.3.7 (or later).

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Federated Authentication
(SSO)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Federated Authentication (SSO)
✓ Login
Enables users to use login via SSO. ✓ Logout
✓ Time out due to inactivity

Service provider External Identity provider Federated

(Snowflake) (IdP) environment

Maintaining credentials ✓ most SAML 2.0-compliant vendors are supported

as an IdP are supported
Authenticate users
✓ Native support for Okta and Microsoft AD FS

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

SSO-Login Workflow
Snowflake-initiated

User navigates to Snowflake WebUI.

Choose login via configured IdP (e.g. Okta or AD FS)

Authenticate via IdP credentials (e.g. email and passoword)

External Identity provider Opens Snowflake

Snowflake
(IdP) SAML session
response

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

SSO-Login Workflow
IdP-initiated

User navigates to IdP

Authenticate with credentials

Select Snowflake as application

External Identity provider Opens Snowflake

Snowflake
(IdP) SAML session
response

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

SCIM support
Snowflake is compatible with SCIM 2.0

SCIM is an open standard for automating user provisioning.

Create user in IdP Snowflake

Provision
User

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Key Pair Authentication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Key Pair Authentication
Enhanced security as an alternative to basic username/password.

One or two

Public Key Public Key Private Key

Minimum:
Connecting via Snowflake
2048-bit RSA Clients (SnowSQL etc.)
key pair

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Key Pair Authentication
Enhanced security as an alternative to basic username/password.

1. Generate Private Key

2. Generate Public Key

3. Store the Keys locally

ALTER USER my_user SET

4. Assign public key to user
RSA_PUBLIC_KEY 'FGKdojfeFdD…';

5. Configure client to use

key pair authentication

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Column-level security

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Column-level Security
Column-level security masks data in tables and views enforced on columns

Enterprise Edition
Dynamic Data Masking

Masking policy

Based on

Masked at runtime Role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Column-level Security
Column-level security masks data in tables and views enforced on columns

Define policy
Dynamic Data Masking
CREATE MASKING POLICY my_policy
AS (val varchar) returns varchar ->
Original column value CASE
WHEN current_role in (role_name) Masking value
THEN val
ELSE '##-##'
END;

Apply policy
ALTER TABLE my_table MODIFY COLUMN phone
SET MASKING POLICY my_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Column-level Security
Column-level security masks data in tables and views enforced on columns

Define policy
Dynamic Data Masking
CREATE MASKING POLICY my_policy
AS (val varchar) retruns varchar ->
Original column value CASE
WHEN current_role in (role_name) Masking value
THEN val
ELSE '##-##'
END;

Apply policy
ALTER TABLE my_table MODIFY COLUMN phone
UNSET MASKING POLICY my_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Column-level Security
Column-level security masks data in tables and views enforced on columns

Enterprise Edition
External Tokenization

✓ Analytical value preseved

Data is tokenized
✓ Sensitive information protected

Pre-load ✓ Data is tokenized pre-load and detokenized at query runtime

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Row-level security

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Row-Level Security
Supported through row access policies to determine which rows are returned

Enterprise Edition
Row Access Policies

Filtered at runtime

Condition

User Role

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Row-Level Security
Define policy
CREATE ROW ACCESS POLICY my_policy
AS (column1 varchar) returns boolean ->
Signature column CASE
WHEN 'ROLE_NAME' = current_role()
and 'value1'= column1 THEN true
ELSE false
END;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Apply policy
ALTER TABLE my_table ADD ROW POLICY my_policy
ON (column1);

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Apply policy

ALTER TABLE my_table DROP ROW POLICY my_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Allow to restrict access to account based on user IP addresses

Standard Edition

Allowed IP adddresses IP_address1, IP_address2, IP_address3,

Blocked IP addresses IP_address2

Has priority

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Allow to restrict access to account based on user IP addresses

Standard Edition

Allowed IP adddresses 192.168.1.0/24 (192.168.1.0 - 192.168.1.255)

Blocked IP addresses 192.168.1.75, 192.168.1.15

Has priority
CIDR notation

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Allow to restrict access to account based on user IP addresses

Standard Edition

Allowed IP adddresses 192.168.1.95, 192.168.1.113

Blocked IP addresses N/A

Has priority

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Create Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

CREATE NETWORK POLICY my_network_policy

ALLOWED_IP_LIST = ('192.168.1.95', '192.168.1.113');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Create Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

CREATE NETWORK POLICY my_network_policy

ALLOWED_IP_LIST = ('192.168.1.95', '192.168.1.113'),
BLOCKED_IP_LIST = ('192.168.1.95');

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Apply Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

ALTER ACCOUNT SET NETWORK_POLICY = mynetwork_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Apply Network Policy SECURITYADMIN

global CREATE NETWORK POLICY privilege

ALTER ACCOUNT UNSET NETWORK_POLICY;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Network policies
Apply Network Policy to USER
OWNERSHIP of User & Network Policy

ALTER USER SET NETWORK_POLICY = mynetwork_policy;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Encryption

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Encryption
All data is encrypted at rest and in transit
Standard Edition
Automatically by default
Encryption at rest

Enterprise Edition
Tables Internal Stages Re-keying every year If enabled
Key Rotation
AES 256-bit encryption

Snowflake-managed

Key Rotation every 30 days

Old keys will be destroyed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Data Encryption
All data is encrypted at rest and in transit
Standard Edition
Automatically by default
Data in Transit

WebUI SnowSQL JDBC ODBC Python Connector

TLS 1.2

End-to-End encryption

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

End-to-End Encryption
All data is encrypted at rest and in transit

PUT
Internal Stage

Encrypted on Encrypted in Encrypted Encrypted

user-machine the stage in Transit at rest

External Stage
Client-side
encryption

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Tri-Secret Secure
Enables customer to use own keys
Business Critical Edition
Snowflake Support

Customer- Snowflake-
Master Key
managed managed

E.g. Azure Key Vault Composite Key

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage &
Information Schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage and Information Schema

Query object metadata and historical usage data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage

Shared database

ACCOUNTADMIN
can view everything

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage

Object metadata

Historical usage data

Reader accounts

Object metadata

Historical usage data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage
Reader accounts

Long-term historical usage data

Object metadata

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage
Long-term historical usage data

Object metadata

Data provided is not real-time

45 min – 3 hours latency

Retention: 365 days

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage vs. Information Schema
Table functions Views

ACCOUNT_USAGE INFORMATION_SCHEMA Automatically created

Read-only
Long-term historical
Historical usage data
usage data
Output depends on privileges
Parent DB + account-level

Object metadata Object metadata

Not real-time
45 min – 3 hours latency Information schema query returned too much data.
Please repeat query with more selective predicates.

Retention: 365 days To prevent performance issues

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Account Usage vs. Information Schema

ACCOUNT_USAGE INFORMATION_SCHEMA Automatically created

Read-only
Long-term historical
Historical usage data
usage data
Output depends on privileges
Parent DB + account-level

Object metadata Object metadata

Not real-time
No latency
45 min – 3 hours latency
Does not includes dropped objects
Includes dropped objects

Shorter retention
Retention: 365 days
(7 days – 6 months)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Release Process

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Releases
How are releases deployed?

Full Releases New features Enhancements or Updates

Weekly new Fixes Behavior changes

releases

Seamless - No downtime Patch Releases Fixes

Impact on
Monthly
workload

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Releases: Three stage approach
Helps to monitor and react to issues

Day 1 Early Access Designated Enterprise (or higher) accounts

Enterprise Edition

Day 1 or 2 Regular Access Standard accounts

Day 2 Final Access Enterprise (or higher) accounts

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Performance
Concepts
Domain 3.0:
Performance Concepts
Query Profile & History

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

When to use?

Understand mechanics of a query

Performance and behavior of a query

Identify performance bottlenecks

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

Available for all queries (Completed, Failed, Running)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

Percentage Percentage of time this

operator needed

Nodes Building blocks

Operator Tree Graphical representation

Operator Types Aspect of query processing

Data Flow No. of records processed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query Profile
Provide execution details for a query

Overview Where time was spent

Bytes Scanned
Statistics
Scanned from Cache

Data Spilling

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

What is spilling?
Warehouse

S Data doesn't fit in memory

Performance Local Storage

Data spilled to local storage

Performance Remote Storage

Data spilled to remote cloud storage

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

How to avoid it?
Warehouse

S Reduce the amount of data processed

M Increase size of warehouse

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Query History
Query History in
Snowsight

QUERY_HISTORY
table function in select * from table(information_schema.query_history())
INFORMATION_SCHEMA order by start_time;

QUERY_HISTORY
view in select * from snowflake.account_usage.query_history
ACCOUNT_USAGE schema
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Caching

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Caching
Statistics for tables and columns

CLOUD SERVICES Metadata Cache Result Cache Cached copy of the

query result

Virtual Virtual Virtual

Warehouse Warehouse Warehouse
QUERY PROCESSING Virtual Warehouse Cache

(COMPUTE) Locally caches data of the query

Data Cache Local Disc I/O

STORAGE
Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Caching
Result Cache
Statistics for tables and columns
Stores the results of a query (Cloud Services)

CLOUD SERVICES Metadata Cache Result Cache Same queries can use that cache in the future
Table data has not changed
Micro-partitions have not changed
Query doesn't include UDFs or external functions
Virtual Virtual Virtual Sufficient privileges & results are still available
Warehouse Warehouse Warehouse
Very fast result (persisted query result)
QUERY PROCESSING Avoids re-execution
(COMPUTE) Can be disable by using
Data Cache
USE_CACHED_RESULT parameter

If query is not re-used purged after 24 hours

STORAGE If query is re-used can be stored up to 31 days

Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Caching Data Cache

Statistics for tables and columns

Local SSD cache

CLOUD SERVICES Metadata Cache Result Cache Cannot be shared with other warehouses

Improve performance of subsequent queries

that use the same data

Virtual Virtual Virtual Purged if warehouse is suspended or resized

Warehouse Warehouse Warehouse
QUERY PROCESSING Queries with similar data ⇒ same warehouse

(COMPUTE) Size depends on warehouse size

Data Cache

STORAGE
Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Caching Metadata Cache

Stores statistics and metadata about objects

CLOUD SERVICES Metadata Cache Result Cache Porperties for query optimization and processing
Range of values in micro-partition

Count rows, count distinct values, max/min value

Virtual Virtual Virtual
Warehouse Warehouse Warehouse Without using virtual warehouse

QUERY PROCESSING DESCRIBE + system-defined functions

(COMPUTE) "Metadata store"

Data Cache
Virtual Private Edition: Dedicated metadata store

STORAGE
Remote Disk

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Micro-partitions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Micro-partitions

CLOUD SERVICES

Virtual Virtual Virtual

Warehouse Warehouse Warehouse
QUERY PROCESSING
(COMPUTE)

STORAGE

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Micro-partitions
External Cloud Provider

STORAGE

Automatically performed
Can't be disabled
Hundreds of millions Allow very granular partition pruning

Can overlap! Very small Eliminate unnecessary partitions

Range of values Contain 50-500 MB of uncompressed data

Actual size is less ⇒ data is compressed automatically
No. of distinct values Most efficient compression algorithm found independently

Additional properties
for query optimization Data is stored in columnar format
Unnessary columns are eliminated when querying

Micro-partitions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Micro-partitions
External Cloud Provider

STORAGE

Immutable
Can't be changed

new data
=
new micro-partitions
CLUSTERING KEYS
With order in which
data is created

Micro-partitions
003-1040559 1250 003-77156.8 1760 0009-14563.7 73273
Clustering Keys

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys

CLUSTERING KEYS

Clustering table on a specific column

redistributes the data in the micro-partions.
⇒ Improves access to this column
⇒ Optimized partition pruning

Micro-partitions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount

2023-06-02 Sunglases TR-7 $25
2023-06-01 Chocolate bar 70% cacao $3
2023-06-02 Sunglases TR-7 $25
2023-06-03 Oat meal biscuits $4
2023-06-02 Chocolate bar 70% cacao $3
2023-06-03 Oat meal biscuits $4
2023-06-02 Oat meal biscuits $4
Micro-partitions 2023-06-05 Sunglases TR-7 $25

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
How is data stored in micro-partitions?

Row 1-4 Row 5-8

Micro-partition 1 Micro-partition 2
Sorted by Sales_Date Name Amount
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-02 Sunglases TR-7 $25
Sales_Date 2023-06-01 Chocolate bar 70% cacao $3
2023-06-02 2023-06-03 2023-06-02 2023-06-03
2023-06-02 Sunglases TR-7 $25
Chocolate bar Chocolate bar Oat meal 2023-06-03 Oat meal biscuits $4
Sunglases TR-7
70% cacao
Name 70% cacao biscuits 2023-06-02 Chocolate bar 70% cacao $3

Sunglases TR-7
Oat meal Oat meal
Sunglases TR-7
2023-06-03 Oat meal biscuits $4
biscuits biscuits
2023-06-02 Oat meal biscuits $4
2023-06-05 Sunglases TR-7 $25
$3 $25 $3 $4
Amount
$25 $4 $4 $25

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount

Row 1-4 Row 5-8 2023-06-02 Sunglases TR-7 $25
Micro-partition 1 Micro-partition 2 2023-06-01 Chocolate bar 70% cacao $3
Sorted by 2023-06-02 Sunglases TR-7 $25
2023-06-01 2023-06-01 2023-06-02 2023-06-02 2023-06-03 Oat meal biscuits $4
Sales_Date
2023-06-02 2023-06-03 2023-06-02 2023-06-03 2023-06-02 Chocolate bar 70% cacao $3
2023-06-03 Oat meal biscuits $4
Chocolate bar
Sunglases TR-7
Chocolate bar Oat meal 2023-06-02 Oat meal biscuits $4
70% cacao
Name 70% cacao biscuits
2023-06-05 Sunglases TR-7 $25
Oat meal Oat meal
Sunglases TR-7 Sunglases TR-7
biscuits biscuits

SELECT * FROM SALES

$3 $25 $3 $4 WHERE SALES_DATE = '2023-06-01'
Amount
$25 $4 $4 $25

2023-06-01 - 2023-06-03 2023-06-02 - 2023-06-03

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount

SELECT * FROM SALES

$3 $25 $3 $4 WHERE SALES_DATE = '2023-06-01'
Amount
$25 $4 $4 $25

2023-06-01 - 2023-06-03 2023-06-02 - 2023-06-03 Well-clustered

Natural order of data

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
How is data stored in micro-partitions?

Sales_Date Name Amount

SELECT * FROM SALES

$3 $25 $3 $4 WHERE AMOUNT BETWEEN 3 AND 4
Amount
$25 $4 $4 $25

$3 - $25 $3 - $25 Not-clustered

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
How is data stored in micro-partitions?

Row 1-4 Row 5-8

Micro-partition 1 Micro-partition 2
Metadata stored

2023-06-01 2023-06-01 2023-06-02 2023-06-02 Number of micropartions in table

Sales_Date
2023-06-02 2023-06-03 2023-06-02 2023-06-03
Overlapping micro-partitions
Chocolate bar Chocolate bar Oat meal
Sunglases TR-7
Name 70% cacao 70% cacao biscuits Clustering depth
Oat meal Oat meal
Sunglases TR-7 Sunglases TR-7
biscuits biscuits

$3 $25 $3 $4
Amount
$25 $4 $4 $25

$3 - $25 $3 - $25

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions

Number of partitions that overlap Micro-partition 1

A - F Micro-partition 1 A - D

Micro-partition 2
Clustering depth A - F Micro-partition 2 A - C

Average depth of the overlapping micro- B - L Micro-partition 3

G-H Micro-partition 3
partitions for specific column.

In how many micro-partions value occurs.

Overlapping micro-partitions: 3 Overlapping micro-partitions: 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions Better but not ideal

Number of partitions that overlap. Micro-partition 1
A - F Micro-partition 1 A - C

Micro-partition 2
Clustering depth A - F Micro-partition 2 A - C

In how many micro-partions value occurs. A - F Micro-partition 3 D - F Micro-partition 3

Overlapping micro-partitions: 3 Overlapping micro-partitions: 2

Average Depth: 3 Average Depth: 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions Better but not ideal

Number of partitions that overlap. Micro-partition 1
A - F Micro-partition 1 A - C

B - D Micro-partition 2
Clustering depth A - F Micro-partition 2

In how many micro-partions value occurs. A - F Micro-partition 3 D - F Micro-partition 3

Overlapping micro-partitions: 3 Overlapping micro-partitions: 3

Average Depth: 3 Average Depth: 2

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
When is a table well-clustered?

Overlapping micro-partitions Worst Ideal

Constant state
Number of partitions that overlap.
A - F Micro-partition 1

A - B C - D F
Clustering depth A - F Micro-partition 2

Micro-partition 1 Micro-partition 2 Micro-partition 3

In how many micro-partions value occurs. A - F Micro-partition 3

Overlapping micro-partitions: 3 Overlapping micro-partitions: 0

Average Depth: 3 Average Depth: 1

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
Similar rows in similar micro-partitions
Defining Clustering Keys Worst Ideal
Constant state
More beneficial distribution of rows
in micropartions A - F Micro-partition 1

A - B C - D F
Improved Query Performance A - F Micro-partition 2

Better Scan Efficiency Micro-partition 1 Micro-partition 2 Micro-partition 3

A - F Micro-partition 3
Better Column Compression
Especially when columns are similar

Overlapping micro-partitions: 3 Overlapping micro-partitions: 0

No Future Maintenance
Fully managed by Snowflake Average Depth: 3 Average Depth: 1

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Defining Clustering Keys

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Reclustering
Only after periodic reclustering
Defining Clustering Key
Rows are not always updated immediately
Before After

A - F Micro-partition 1

A - B C - D F
A - F Micro-partition 2

Micro-partition 1 Micro-partition 2 Micro-partition 3

A - F Micro-partition 3

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Reclustering
Defining Clustering Key
Clustering Key is used for reclustering
Only after periodic reclustering
Reclustering
Rows are not always updated immediately
Before After

A - F Micro-partition 1
Reclustering is automatic
Cloud Services (Serverless) A - B C - D F
A - F Micro-partition 2

Micro-partition 1 Micro-partition 2 Micro-partition 3

Automatic Reclustering A - F Micro-partition 3

Only adjusts micro-partitions that benefit

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
Clustering is not for all tables

Query performance Cost

Storage Costs

Old partitions are maintained (Time Travel)

Micro-partitions

New partitions are created

Serverless Costs

Credit consumption of reclustering Old partitions are marked as deleted

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
On which columns a clustering key is most effective?

Large number of micro-partitions High enough cardinality High enough cardinality

Very large tables Too high cardinality ⇒ no efficient grouping

Too low cardinality ⇒ no effective pruning
Multiple terabytes of data.

Non-ideal pruning Overhead for micro-partitioning

SELECT * FROM SALES
WHERE AMOUNT BETWEEN 3 AND 4 M / F

Frequently used column in WHERE / JOIN /( ORDER BY) M / F

Selective queries and sorting of columns M / F

⇒ most performance improvement

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Clustering Keys
Clustering Key SQL commands

Adding cluster key on one or multiple colums Expression

Cluster key can be added at any time. Clustering key on expression.

Low ⇒ High Cardinality

ALTER TABLE t1 CLUSTER BY (c1, c5); ALTER TABLE t1 CLUSTER BY (DATE(timestamp));

Create table with cluster key Removing clustering key

Cluster key can be added at any time.
Cluster key can be defined in table definition. Cluster key can be removed.

CREATE TABLE t1 CLUSTER BY (c1, c5); ALTER TABLE t1 DROP CLUSTER KEY;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

System functions
for Clustering Keys

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

System Functions on Clustering

Information on Clustering Depth

Find out more clustering information.

SYSTEM$CLUSTERING_INFORMATION ('table_name', ['(columns/expression)'])

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Returning clustering information
+--------------------------------------------------------------+
| SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)') |
|--------------------------------------------------------------|
| { |
| "cluster_by_keys" : "(COL1, COL3)", |
| "total_partition_count" : 1156, |
| "total_constant_partition_count" : 0, |
| "average_overlaps" : 117.5484, |
SYSTEM$CLUSTERING_INFORMATION ('table_name','(col1,col3)') | "average_depth" : 64.0701, |
| "partition_depth_histogram" : { |
| "00000" : 0, |
| "00001" : 0, |
| "00002" : 3, |
| "00003" : 3, |
| "00004" : 4, |
|
|
"00005" : 6,
"00006" : 3, Not well partitioned! |
|
| "00007" : 5, |
| "00008" : 10, |
| "00009" : 5, |
| "00010" : 7, |
SYSTEM$CLUSTERING_INFORMATION ('table_name') | "00011" : 6, |
| "00012" : 8, |
| "00013" : 8, |
| "00014" : 9, |
| "00015" : 8, |
| "00016" : 6, |
| "00032" : 98, |
| "00064" : 269, |
| "00128" : 698 |
| } |
| } |
+--------------------------------------------------------------+

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Returning clustering information
+--------------------------------------------------------------+
| SYSTEM$CLUSTERING_INFORMATION('TEST2', '('CUSTOMER','(C_NAME)')
|---------------------------------------------------------------
| {
| "cluster_by_keys" : "LINEAR(C_NAME)",
| "notes" : "Clustering key columns contain high cardinality key C_NAME which
| might result in expensive re-clustering. Consider reducing the cardinality of
| clustering keys. Please refer to https://docs.snowflake.net/manuals/user -
| guide/tables-clustering-keys.html for more information.",
| "total_partition_count" : 16,
| "total_constant_partition_count" : 16,
| "average_overlaps" : 0.0,
| "average_depth" : 1.0,
SYSTEM$CLUSTERING_INFORMATION ('CUSTOMER','(C_NAME)') | "partition_depth_histogram" : {
| "00000" : 0,
| "00001" : 16,
| "00002" : 0,
| "00003" : 0,
| "00004" : 0,
| "00005" : 0,
| "00006" : 0,
| "00007" : 0,
| "00008" : 0,
| "00009" : 0,
| "00010" : 0,
| "00011" : 0,
| "00012" : 0,
| "00013" : 0,
| "00014" : 0,
| "00015" : 0,
| "00016" : 0
| }
|}
+--------------------------------------------------------------+

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Returning clustering information

Average Depth of Table

Average Depth of Table according to
speficied column or clustering key.

SELECT SYSTEM$CLUSTERING_DEPTH ('orders','amount')

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization Service

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization Enterprise Edition

Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to column

Beneficial queries

Selective point look-up Returns on one or very few rows

Equality predicates (=) or IN predicates WHERE AMOUNT = 1

Similar to secondary index concept Substring and regular expression searches e.g. LIKE or ILIKE, VARIANT column

Selective geospatial functions with GEOGRAPHY values

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path

Maintained by Search Optimization Service

Serverless Costs Storage Costs

Credit consumption Additional Storage needed

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to table

ALTER TABLE mytable ADD SEARCH OPTIMIZATION;

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

ALTER TABLE mytable DROP SEARCH OPTIMIZATION;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to table

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY (*);

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to column

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON GEO(mycol);

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Search Optimization
Can improve performance of certain types of lookup and analytical queries

Many predicates for filtering

Search Access Path Add Search Optimization to column

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON GEO(mycol);

OWNERSHIP ADD SEARCH OPTIMIZATION privileges on schema

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Materialized Views

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Materialized Views Enterprise Edition

Method to handle performance issues of views

Frequently run query SELECT …;

Serverless Costs Storage Costs

Credit consumption Additional Storage needed

View Compute-intensive?

Queries that are…

Materialized View Pre-computed and physically stored Performance
… frequently run
Updated automatically … sufficiently complex
Base table

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Materialized Views
Method to handle performance issues of views

Start slow
Serverless Costs Storage Costs

Credit consumption Additional Storage needed Resource monitors can't control Snowflake-managed warehouses

SELECT * FROM TABLE(INFORMATION_SCHEMA.MATERIALIZED_VIEW_REFRESH_HISTORY());

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Materialized Views
Create Materialized View
Use CREATE MATERIALIZED VIEW statement

CREATE MATERIALIZED VIEW v_1 AS

SELECT * FROM table1 where c1 = 200

Limitations
• Query only 1 table (no joins)

• No views / materialized views Can be created on external tables

• No window functions, UDFs, HAVING

• Some aggregate functions

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Materialized Views
Create Materialized View
Use CREATE MATERIALIZED VIEW statement

CREATE MATERIALIZED VIEW v_1 AS

SELECT * FROM table1 where c1 = 200

ALTER MATERIALIZED VIEW v_1 SUSPEND;

ALTER MATERIALIZED VIEW v_1 RESUME;

DROP MATERIALIZED VIEW v_1;

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Warehouse
Considerations

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Warehouse Considerations
• Warehouses can be resized even when query is running or when suspended
Resizing
⇒ Impact only future queries, not on the running one

• Scale up (resize): More complex queries

Scale up vs. Scale out
• Scale out: More users (more queries)

• Isolate workload of specific user

Dedicated warehouse • Different type of workload ⇒ different warehouse
• Enable auto-suspend and auto-resume (available for all warehouses)

003-1040559 1250 003-77156.8 1760 0009-14563.7 73273

Snowflake Notes
100% (9)
Snowflake Notes
67 pages
Snowflake PPT 22
50% (2)
Snowflake PPT 22
220 pages
Snowflake Overview: The Datawarehouse Build For Cloud
100% (2)
Snowflake Overview: The Datawarehouse Build For Cloud
8 pages
Programming+in+Snowflake+ +All+Slides
100% (1)
Programming+in+Snowflake+ +All+Slides
342 pages
Snowflake Architecture - Concepts
No ratings yet
Snowflake Architecture - Concepts
38 pages
Snowflake Intro
No ratings yet
Snowflake Intro
13 pages
Snowflake Training Presentation v1
No ratings yet
Snowflake Training Presentation v1
111 pages
What Is The Snowflake Data Warehouse
No ratings yet
What Is The Snowflake Data Warehouse
7 pages
Solutions Partner Technical Onboarding Guide
100% (1)
Solutions Partner Technical Onboarding Guide
27 pages
SH69P26K PDF
83% (12)
SH69P26K PDF
43 pages
Snowflake Unit 1 Introduction
No ratings yet
Snowflake Unit 1 Introduction
43 pages
Snowflake - T
No ratings yet
Snowflake - T
108 pages
Pricing Guide Snowflake
No ratings yet
Pricing Guide Snowflake
9 pages
Snowflake Training Slide SANMs
67% (6)
Snowflake Training Slide SANMs
218 pages
All Course Slides
100% (1)
All Course Slides
192 pages
Snowflake Material
No ratings yet
Snowflake Material
2,157 pages
The Simple Guide To Snowflake Pricing
No ratings yet
The Simple Guide To Snowflake Pricing
13 pages
Architecture
No ratings yet
Architecture
4 pages
Editions & Pricing
No ratings yet
Editions & Pricing
9 pages
2 - Snowflake de Feb25
No ratings yet
2 - Snowflake de Feb25
90 pages
Snowflake - Billing Components
No ratings yet
Snowflake - Billing Components
9 pages
SnowPro Core Study Guide
No ratings yet
SnowPro Core Study Guide
37 pages
Why The Company Was Called Snowflake?: Founders
0% (1)
Why The Company Was Called Snowflake?: Founders
15 pages
GCP Snowflake
No ratings yet
GCP Snowflake
83 pages
Snowflake
No ratings yet
Snowflake
122 pages
Introduction To Snowflake: Sunil Gurav
No ratings yet
Introduction To Snowflake: Sunil Gurav
65 pages
Snowflake
No ratings yet
Snowflake
11 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Snowflake
No ratings yet
Snowflake
73 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
Snowflake PPT 22
No ratings yet
Snowflake PPT 22
220 pages
Snowflake Prctice1
No ratings yet
Snowflake Prctice1
51 pages
Snowflake Optimization Best Practices: A Guide To Balancing Cost and Performance at Scale
No ratings yet
Snowflake Optimization Best Practices: A Guide To Balancing Cost and Performance at Scale
16 pages
Key Concepts & Architecture: Data Platform As A Cloud Service
No ratings yet
Key Concepts & Architecture: Data Platform As A Cloud Service
4 pages
Snowflake Intro 5 Min Guide SravanKumarBodla 1749390051
No ratings yet
Snowflake Intro 5 Min Guide SravanKumarBodla 1749390051
5 pages
Snowflake Overview
No ratings yet
Snowflake Overview
8 pages
3 Snowflake+Architecture
No ratings yet
3 Snowflake+Architecture
20 pages
Tecnical Seminar
No ratings yet
Tecnical Seminar
16 pages
Snowflake Learning Path. Let Your Data Take Centerstage - by DataCouch - Medium
No ratings yet
Snowflake Learning Path. Let Your Data Take Centerstage - by DataCouch - Medium
19 pages
Sample DCDFGapAssessment Sanitized
No ratings yet
Sample DCDFGapAssessment Sanitized
58 pages
Snowflake Elastic Data Warehouse
No ratings yet
Snowflake Elastic Data Warehouse
2 pages
Snowflake Note
No ratings yet
Snowflake Note
35 pages
Snowflake Overview
No ratings yet
Snowflake Overview
44 pages
Snowflake
No ratings yet
Snowflake
16 pages
Teradata To Snowflake Migration Guide
No ratings yet
Teradata To Snowflake Migration Guide
14 pages
Cloud Migration
No ratings yet
Cloud Migration
5 pages
SnowFlake SnowPro Core Certification Notes - Fromblogs (1) 1 2
No ratings yet
SnowFlake SnowPro Core Certification Notes - Fromblogs (1) 1 2
172 pages
(Views4You - English) Getting Started - Architecture & Key Concepts
No ratings yet
(Views4You - English) Getting Started - Architecture & Key Concepts
6 pages
SF Notes Anuja
No ratings yet
SF Notes Anuja
12 pages
Clickhouse vs. Snowflake Overview
No ratings yet
Clickhouse vs. Snowflake Overview
3 pages
Snowflake
No ratings yet
Snowflake
3 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
What Is Snowflake
No ratings yet
What Is Snowflake
34 pages
SnowflakeDataCloudConnector en
No ratings yet
SnowflakeDataCloudConnector en
74 pages
Snowflake Notes
No ratings yet
Snowflake Notes
2 pages
Snowflake Data Cloud
No ratings yet
Snowflake Data Cloud
13 pages
Best Practices For Using Tableau With Snowflake
No ratings yet
Best Practices For Using Tableau With Snowflake
63 pages
Snowflake Course Slides
0% (1)
Snowflake Course Slides
71 pages
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
From Everand
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
Anand Vemula
No ratings yet
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Readme
No ratings yet
Readme
3 pages
Variables, Expressions, and Statements
No ratings yet
Variables, Expressions, and Statements
34 pages
Local Control Unit - DCS
80% (5)
Local Control Unit - DCS
44 pages
License Trouble Shooting
No ratings yet
License Trouble Shooting
89 pages
First Python Program: $ Python GCC On Linux2 More Information
No ratings yet
First Python Program: $ Python GCC On Linux2 More Information
6 pages
Programming Techniques For Turing Machine Construction
No ratings yet
Programming Techniques For Turing Machine Construction
31 pages
Class 2 IMO Papers Forc
0% (1)
Class 2 IMO Papers Forc
41 pages
Module 4
No ratings yet
Module 4
26 pages
Android 153
No ratings yet
Android 153
87 pages
Yugabyte Fundamentals Certification Exam Preparation Guide
No ratings yet
Yugabyte Fundamentals Certification Exam Preparation Guide
7 pages
Software Architecture Module 2
No ratings yet
Software Architecture Module 2
39 pages
Sophos Firewall OS v19.0.2 Guidance Supplement v0.5
No ratings yet
Sophos Firewall OS v19.0.2 Guidance Supplement v0.5
22 pages
Software Application Written Technical Test Series5
No ratings yet
Software Application Written Technical Test Series5
2 pages
Peachtree Users Manual Complete
100% (3)
Peachtree Users Manual Complete
317 pages
Eclk 5 PDF
No ratings yet
Eclk 5 PDF
1,117 pages
Event Handling Event-Delegation Model: Setactioncommnand ("Add")
No ratings yet
Event Handling Event-Delegation Model: Setactioncommnand ("Add")
4 pages
Data Structure MCQ Questions
No ratings yet
Data Structure MCQ Questions
12 pages
Schmatics Dell 7548 7547 Quanta AM6 MB - DA0AM6MB8E0 REV E PDF
No ratings yet
Schmatics Dell 7548 7547 Quanta AM6 MB - DA0AM6MB8E0 REV E PDF
58 pages
C.S. Project Report On Railway Ticket Reservation
No ratings yet
C.S. Project Report On Railway Ticket Reservation
23 pages
Data Types: CMPS401 Class Notes (Chap06) Page 1 / 35 Dr. Kuo-Pao Yang
No ratings yet
Data Types: CMPS401 Class Notes (Chap06) Page 1 / 35 Dr. Kuo-Pao Yang
35 pages
Embed Lab10
No ratings yet
Embed Lab10
8 pages
9 - Procedures and Macros
No ratings yet
9 - Procedures and Macros
23 pages
Bionic Arduino: Class 2
No ratings yet
Bionic Arduino: Class 2
60 pages
CCS0007 - Laboratory Exercise 6
No ratings yet
CCS0007 - Laboratory Exercise 6
13 pages
Install ESP8266 Filesystem Uploader in Arduino IDE
No ratings yet
Install ESP8266 Filesystem Uploader in Arduino IDE
9 pages
Ictl Form 2
No ratings yet
Ictl Form 2
10 pages
Rook-Ceph: Bare Mental Persistent Storage Strategies For Kubernetes
No ratings yet
Rook-Ceph: Bare Mental Persistent Storage Strategies For Kubernetes
7 pages
08 OS Main Memory Management
No ratings yet
08 OS Main Memory Management
61 pages