0% found this document useful (0 votes)

39 views16 pages

Snowflake

Uploaded by

karangole7074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views16 pages

Snowflake

Uploaded by

karangole7074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

What is Snowflake

● Snowflake is a massively popular cloud-based data warehouse management

platform.
● It has a unique architecture, which uses separate storage and compute layers,
allowing it to be incredibly flexible and scalable

1. Storage Layer:
● storing data in an efficient and scalable manner.
● Cloud based: integrates with major cloud providers such as AWS, GCP, and
Microsoft Azure.
● Columnar format: Snowflake stores data in a columnar format, optimized
for analytical queries.
● columnar format is well-suited for data aggregation.
● Micro-partitioning: Snowflake uses a technique called micro-partitioning
that stores tables in memory in small chunks. Execution becomes faster
● Zero-copy cloning: Snowflake has a unique feature that allows it to create
virtual clones of data. And doesnt consume additional memory unless
changes made
● The storage layer scales horizontally, which means it can handle
increasing data volumes by adding more servers to distribute the load.
2. Compute Layer:
is the engine that executes your queries.
● Virtual warehouses: You can think of Virtual Warehouses as teams of
computers (compute nodes) designed to handle query processing.
● Virtual Warehouses in different sizes, and subsequently, at different prices
(the sizes include XS, S, M, L, XL).
● Multi-cluster, multi-node architecture: The compute layer uses multiple
clusters with multiple nodes for high concurrency, allowing several users
to access and query the data simultaneously.
● Automatic query optimization: Snowflake’s system analyzes all queries
and identifies patterns to optimize using historical data.
● Results cache: The compute layer includes a cache that stores the results
of frequently executed queries.

3. Cloud service layer:

● Security and access control: This layer enforces security measures,
including authentication, authorization, and encryption.
● Data sharing: This layer implements secure data sharing protocols across
different accounts and even third-party organizations.
● Semi-structured data support: Another unique benefit of Snowflake is its
ability to handle semi-structured data, such as JSON and Parquet, despite
being a data warehouse management platform.
We must manually define a file format and name it because Snowflake cannot infer the
schema and structure of data files such as CSV, JSON, or XMLs.

A stage in Snowflake is a storage area where you can upload your local files. These can
be structured and semi-structured data files. Above, we are creating a stage named
my_local_files.
Approach for Data Modelling (designing and maintaining data
structures in Snowflake)

1. Designing Data Structures

a. Schema Design

● Use Snowflake's Schema-on-Read Approach: Snowflake's unique architecture

allows you to design flexible schemas. Utilize this by designing schemas that
accommodate changes and evolution in data requirements.
● Normalized vs. Denormalized Schemas: Depending on your use case, choose
between normalized schemas (for OLTP-like operations) and denormalized
schemas (for OLAP and reporting). Snowflake can handle both efficiently.
● Star and Snowflake Schemas: For data warehousing and analytics, consider
using star or snowflake schemas. Star schemas are typically simpler and faster
for querying, while snowflake schemas normalize data into multiple related
tables, reducing redundancy.

b. Data Types

● Appropriate Data Types: Choose the correct data types for columns to optimize
storage and performance. Snowflake supports a wide range of data types,
including STRING, NUMBER, BOOLEAN, VARIANT (for semi-structured data like
JSON), and more.
● Variable vs. Fixed Data Types: Use variable-length data types (e.g., VARCHAR)
where the length of the data varies significantly, and fixed-length types (e.g.,
CHAR) when the length is consistent.

2. Loading Data

a. Data Ingestion

● COPY Command: Use the COPY INTO command for bulk loading data from
various sources (e.g., S3, Azure Blob Storage, local files). Snowflake
automatically handles parallel loading and ensures efficient ingestion.
● File Formats: Snowflake supports multiple file formats like CSV, JSON, AVRO,
ORC, and PARQUET. Choose the format that best suits your data characteristics
and performance requirements.
● Staging: Stage your data files before loading them into Snowflake tables. This
allows for data validation and quality checks before they are made available for
querying.

3. Partitioning and Clustering

a. Micro-Partitions

● Automatic Partitioning: Snowflake automatically partitions data into

micro-partitions, which optimizes query performance. Ensure your data model
and queries leverage this feature.
● Clustering Keys: For large tables, define clustering keys to improve the
performance of selective queries. Clustering keys help Snowflake maintain data
locality and reduce the need for full table scans.
4. Performance Optimization

a. Query Performance

● Result Caching: Utilize Snowflake's result caching to speed up repeated queries.

Snowflake automatically caches query results for 24 hours by default.
● Materialized Views: Reuse query results without re-running queries
● Query Optimization: Use query profiling and optimization techniques like proper
JOINs, filtering early in the query, and avoiding unnecessary complex operations.

b. Resource Management

● Warehouses: Configure virtual warehouses based on workload requirements.

Scale warehouses up or down to match the query load, ensuring optimal
performance and cost management.
● Auto-Suspend and Auto-Resume: Enable auto-suspend and auto-resume
features for virtual warehouses to minimize costs by suspending idle
warehouses and resuming them automatically when new queries are submitted.

5. Security and Governance

a. Access Control

● Role-Based Access Control (RBAC): Implement RBAC to manage permissions

and access to data and resources in Snowflake. Define roles and assign them to
users or groups based on their responsibilities.
● Data Masking: Use dynamic data masking to protect sensitive data by obscuring
it from unauthorized users, while still allowing authorized users to access the full
data.

b. Data Governance

● Data Lineage: Implement data lineage tracking to understand the flow and
transformation of data within Snowflake. This aids in compliance and auditing.
● Metadata Management: Use Snowflake's INFORMATION_SCHEMA to manage
and query metadata about your data structures, helping in the maintenance.
6. Monitoring and Maintenance

a. Monitoring

● Snowflake's Monitoring Tools: Utilize Snowflake's built-in monitoring tools and

dashboards to track the performance and health of your data structures and
queries.
● Third-Party Tools: Integrate third-party monitoring tools, such as those from
AWS CloudWatch or Azure Monitor, for more comprehensive monitoring and
alerting.

b. Maintenance

● Regular Audits: Conduct regular audits of your data structures and usage
patterns to ensure they still meet business requirements and perform optimally.
● Automated Maintenance: Leverage Snowflake's automated maintenance
features, such as automatic clustering and vacuuming, to keep your data
structures optimized without manual intervention.

Imp Points:

1) Schema-on-Read in Snowflake

Service provided by snowflake for designing data structure. Schema-on-read is a data

management pattern where the data schema is applied at the time of reading the data,
rather than when the data is ingested or stored.
E.g. suppose we have following csv file:

Will do staging and upload the file to stage:

Then create a file format

And here we will fetch the data directly from the stage where file is stored without
defining the schema and directly setting the schema on read
Note: t.$1, t.$2, etc., are used to refer to the columns in the CSV file, with $1
corresponding to the first column (id), $2 to the second column (name), and so on.

Benefits
● Flexibility: can do changes in data structure without requiring changes to the
storage format.
● Adaptability: It is particularly useful for handling semi-structured and
unstructured data, where the schema might not be known upfront or might
change frequently.
● Efficiency: It enables efficient querying and transformation of raw data without
the need for upfront schema definitions, simplifying the process of integrating
and analyzing diverse data sources.

2) Clustering Keys in Snowflake

● used to optimize the physical layout of data in large tables, improving query
performance by minimizing the amount of data scanned during query execution.
● Clustering keys define one or more columns that are used to sort and organize
the data within micro-partitions, enhancing data locality and reducing the need
for full table scans.

Benefits
● Improved Query Performance
● Efficient Storage:
● Better maintenance

For clustering Always choose a column which is used for filtering the data based on.

E.g. creating a table with cluster keys

Adding cluster keys to the existing table

If you want to check clustering info then use below query

3) Snowflake tasks vs streams

Streams Tasks
capture changes to tables and provide Tasks help run async pieces of code like
change data to consumers in near ETL transformations.
real-time.

Run continuously Run once

Capture changes Run the code

4) Staging in Snowflake

Staging in Snowflake refers to the process of temporarily storing data files before
loading them into database tables.
It is a step where you can validate, transform, and manage data before it is ingested into
your Snowflake tables.

Types of Stages

1. Internal Stage: Managed entirely by Snowflake.

2. External Stage: Uses external cloud storage locations

E.g. consider we have a csv file

Here we created an internal stage

Uploading a csv file to it

Then we created a table

Then we load a data from stage to table

Here we add a file format directly in the query without defining. We can also define file
format separately and then can add.

5) Data Lineage in Snowflake

● Its an ability to track and visualize the flow of data through various stages of its
lifecycle, from initial ingestion to transformation, and finally to its use in reports
and analyses.
● helps in understanding where the data comes from, how it is transformed, and
where it goes
e.g.Suppose you have a CSV file “employees.csv”

You stage and load this data into a table named “employees_raw”
Then we transformed it from raw table to more refined table.
table=”employees_refined”

Here we created a table for reporting purpose from cleaned table.

Steps goes- putting data into stage - putting that into raw table - then clean table - then
table used for reporting.

1. Source Data: employees.csv file.

2. Stage and Load: Data loaded into employees_raw table.
3. Transformation: Data transformed into employees_cleaned table.
4. Reporting: Summary data generated in department_summary table
6) Data Masking in Snowflake

security feature that allows you to hide sensitive information in your database, so that
unauthorized users see masked data instead of the actual data

e.g.Here we created a table with a sensitive info like SSN(social security numbers)

Now we created a masking policy to mask the sensitive data

● CURRENT_ROLE() is a function that returns the role of the current user.

● If the current role of the user is 'AUTHORIZED_ROLE', then the policy returns the
original value (val).
● 'XXX-XX-' is a string that represents the masked part of the SSN.
● RIGHT(val, 4) is a function that takes the last four characters of the input string
val.
● || is the concatenation operator used to combine the masked part with the last
four characters of the SSN.
Now apply masking policy to table:

Snowflake Notes
100% (9)
Snowflake Notes
67 pages
Snowflake Syllabus
100% (1)
Snowflake Syllabus
2 pages
Snowflake Notes
No ratings yet
Snowflake Notes
2 pages
Snowflake PPT 22
50% (2)
Snowflake PPT 22
220 pages
Snowflake Material
No ratings yet
Snowflake Material
2,157 pages
Programming+in+Snowflake+ +All+Slides
100% (1)
Programming+in+Snowflake+ +All+Slides
342 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
A Study On Developmental of Local Public Transport Policy-The Case of Tricycles and HabalHabal
75% (8)
A Study On Developmental of Local Public Transport Policy-The Case of Tricycles and HabalHabal
94 pages
Sample DCDFGapAssessment Sanitized
No ratings yet
Sample DCDFGapAssessment Sanitized
58 pages
Snowflake
No ratings yet
Snowflake
11 pages
Snowflake Snowpro Exam Cheatsheet
83% (12)
Snowflake Snowpro Exam Cheatsheet
7 pages
Ravi Snowflake Interview Questions-1
No ratings yet
Ravi Snowflake Interview Questions-1
20 pages
Snowflake PPT 22
No ratings yet
Snowflake PPT 22
220 pages
Snowflake Interview Questions PDF
No ratings yet
Snowflake Interview Questions PDF
6 pages
Snowflake Question
No ratings yet
Snowflake Question
446 pages
Snowflake Prctice1
No ratings yet
Snowflake Prctice1
51 pages
Snowflake Architecture - Concepts
No ratings yet
Snowflake Architecture - Concepts
38 pages
IC ENGINES PPT Revised1
No ratings yet
IC ENGINES PPT Revised1
80 pages
Snowflake 101 - For Data Architects - LinkedIn
No ratings yet
Snowflake 101 - For Data Architects - LinkedIn
17 pages
Snowflake Training Slide SANMs
67% (6)
Snowflake Training Slide SANMs
218 pages
Best Practices For Using Tableau With Snowflake
No ratings yet
Best Practices For Using Tableau With Snowflake
64 pages
Snowflake Optimization Best Practices: A Guide To Balancing Cost and Performance at Scale
No ratings yet
Snowflake Optimization Best Practices: A Guide To Balancing Cost and Performance at Scale
16 pages
Snowflake+Interview+Questions+ +Part+I
No ratings yet
Snowflake+Interview+Questions+ +Part+I
27 pages
SF Notes Anuja Ibm
No ratings yet
SF Notes Anuja Ibm
16 pages
Snowflake - Syllubus and DBT
No ratings yet
Snowflake - Syllubus and DBT
11 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
Snowflake Note
No ratings yet
Snowflake Note
35 pages
SF Notes Anuja
No ratings yet
SF Notes Anuja
12 pages
GCP Snowflake
No ratings yet
GCP Snowflake
83 pages
All Course Slides
100% (1)
All Course Slides
192 pages
Getting Started With Snowflake Guide
100% (1)
Getting Started With Snowflake Guide
23 pages
Data Pipeline Pharmarack
No ratings yet
Data Pipeline Pharmarack
3 pages
Snowflake - T
No ratings yet
Snowflake - T
108 pages
AI Project Cycle Question Bank
No ratings yet
AI Project Cycle Question Bank
14 pages
Snowflake
No ratings yet
Snowflake
122 pages
Snowflake
No ratings yet
Snowflake
73 pages
Joanne Hershfield "Paradise Regained Sergei Eisenstein S Que Viva Mexico! As Ethnography"
No ratings yet
Joanne Hershfield "Paradise Regained Sergei Eisenstein S Que Viva Mexico! As Ethnography"
17 pages
Arch and Types of Snowflake Stages
No ratings yet
Arch and Types of Snowflake Stages
51 pages
Snowflake Snowpro Certification Exam Cheat Sheet by Jeno Yamma
100% (1)
Snowflake Snowpro Certification Exam Cheat Sheet by Jeno Yamma
7 pages
SnowPro Core Study Guide
No ratings yet
SnowPro Core Study Guide
37 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
2 - Snowflake de Feb25
No ratings yet
2 - Snowflake de Feb25
90 pages
EGR System Diagnostic Procedures
No ratings yet
EGR System Diagnostic Procedures
7 pages
Liebherr A309 Litronic TCD Wheel Excavator Service Repair Manual SN 40998 and Up PDF
No ratings yet
Liebherr A309 Litronic TCD Wheel Excavator Service Repair Manual SN 40998 and Up PDF
50 pages
Snowflake Syllabus
No ratings yet
Snowflake Syllabus
8 pages
Mithun Snowflake
No ratings yet
Mithun Snowflake
3 pages
Odms E-14 Ef009-50 - Manual
100% (1)
Odms E-14 Ef009-50 - Manual
247 pages
Tecnical Seminar
No ratings yet
Tecnical Seminar
16 pages
Introducing Snowflake: Data Warehousing For Everyone
No ratings yet
Introducing Snowflake: Data Warehousing For Everyone
15 pages
Snowflake
No ratings yet
Snowflake
7 pages
Architecture
No ratings yet
Architecture
4 pages
Grid Scale Battery Storage
100% (1)
Grid Scale Battery Storage
8 pages
Interview Questions
No ratings yet
Interview Questions
16 pages
Snowflake To Oracle
No ratings yet
Snowflake To Oracle
16 pages
Snowflake Certification Syllabus
No ratings yet
Snowflake Certification Syllabus
4 pages
SnowProCoreStudyGuide 042621
No ratings yet
SnowProCoreStudyGuide 042621
13 pages
(Views4You - English) Getting Started - Architecture & Key Concepts
No ratings yet
(Views4You - English) Getting Started - Architecture & Key Concepts
6 pages
Teradata To Snowflake Migration Guide
100% (2)
Teradata To Snowflake Migration Guide
15 pages
Why The Company Was Called Snowflake?: Founders
0% (1)
Why The Company Was Called Snowflake?: Founders
15 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Snowflake Elastic Data Warehouse
No ratings yet
Snowflake Elastic Data Warehouse
2 pages
Boiler and Boiler Calculations
No ratings yet
Boiler and Boiler Calculations
7 pages
What Is Snowflake
No ratings yet
What Is Snowflake
34 pages
Newseam 1 Module 2 Matanacio
No ratings yet
Newseam 1 Module 2 Matanacio
32 pages
Teradata To Snowflake Migration Guide
No ratings yet
Teradata To Snowflake Migration Guide
14 pages
What Is The Snowflake Data Warehouse
No ratings yet
What Is The Snowflake Data Warehouse
7 pages
ILT-Fundamentals 4-Day - Datasheet
No ratings yet
ILT-Fundamentals 4-Day - Datasheet
4 pages
Pseudocode - 2
No ratings yet
Pseudocode - 2
106 pages
IR
No ratings yet
IR
8 pages
TCP Ip Multimedia
No ratings yet
TCP Ip Multimedia
87 pages
API 101 - 2024 Fall (170000) Harvard Kennedy
No ratings yet
API 101 - 2024 Fall (170000) Harvard Kennedy
24 pages
Home Stay Registration Way of Sri Lanka Tourism
No ratings yet
Home Stay Registration Way of Sri Lanka Tourism
12 pages
19eme331 - Manufacturing Technology
No ratings yet
19eme331 - Manufacturing Technology
3 pages
Tao Et Al - 2017 - Reconfigurable Conversions of Reflection, Transmission, and Polarization States
No ratings yet
Tao Et Al - 2017 - Reconfigurable Conversions of Reflection, Transmission, and Polarization States
6 pages
Specifications Alphasorb Barrier Fabric Wrapped Acoustic Panels
No ratings yet
Specifications Alphasorb Barrier Fabric Wrapped Acoustic Panels
3 pages
PR5259610 BME Non CAA HVAC PM SOW
No ratings yet
PR5259610 BME Non CAA HVAC PM SOW
68 pages
History 4/3 Gold Mining 1886
No ratings yet
History 4/3 Gold Mining 1886
15 pages
Strategic Competitice Analysis
No ratings yet
Strategic Competitice Analysis
30 pages
Contract - II
No ratings yet
Contract - II
8 pages
GROUP6
No ratings yet
GROUP6
13 pages
Sped MApeh6
No ratings yet
Sped MApeh6
5 pages
Winglets Brochure 2009
No ratings yet
Winglets Brochure 2009
4 pages
Assignment 1 Solutionasdwa
100% (1)
Assignment 1 Solutionasdwa
2 pages
Performance Review of Thermal Power Stations 2011-12: Sl. No Name of Station Unit No Organisation Capacity
No ratings yet
Performance Review of Thermal Power Stations 2011-12: Sl. No Name of Station Unit No Organisation Capacity
4 pages
Legal Framework For Truck Logistics in India
No ratings yet
Legal Framework For Truck Logistics in India
2 pages
Smallest Physical Size: Screen Screen Operate Nozzle at or Above 4 Bar
No ratings yet
Smallest Physical Size: Screen Screen Operate Nozzle at or Above 4 Bar
1 page
Loesche Procurement Status
No ratings yet
Loesche Procurement Status
3 pages
Interesting Facts About Johny Srouji The Man Behind Apples Custom Processors 6474325
No ratings yet
Interesting Facts About Johny Srouji The Man Behind Apples Custom Processors 6474325
4 pages

Snowflake

Uploaded by

Snowflake

Uploaded by

What is Snowflake

● Snowflake is a massively popular cloud-based data warehouse management

3. Cloud service layer:

1. Designing Data Structures

● Use Snowflake's Schema-on-Read Approach: Snowflake's unique architecture

3. Partitioning and Clustering

● Automatic Partitioning: Snowflake automatically partitions data into

● Result Caching: Utilize Snowflake's result caching to speed up repeated queries.

● Warehouses: Configure virtual warehouses based on workload requirements.

5. Security and Governance

● Role-Based Access Control (RBAC): Implement RBAC to manage permissions

● Snowflake's Monitoring Tools: Utilize Snowflake's built-in monitoring tools and

Service provided by snowflake for designing data structure. Schema-on-read is a data

Will do staging and upload the file to stage:

Then create a file format

2) Clustering Keys in Snowflake

E.g. creating a table with cluster keys

If you want to check clustering info then use below query

3) Snowflake tasks vs streams

Run continuously Run once

Capture changes Run the code

1. Internal Stage: Managed entirely by Snowflake.

E.g. consider we have a csv file

Here we created an internal stage

Uploading a csv file to it

Then we created a table

5) Data Lineage in Snowflake

Here we created a table for reporting purpose from cleaned table.

1. Source Data: employees.csv file.

Now we created a masking policy to mask the sensitive data

● CURRENT_ROLE() is a function that returns the role of the current user.

You might also like