0% found this document useful (0 votes)
1K views38 pages

Snowflake Architecture - Concepts

Uploaded by

vr.sf99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views38 pages

Snowflake Architecture - Concepts

Uploaded by

vr.sf99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Introduction

• Snowflake was founded on 2012 in California

• It founded by Benoit and Thierry, Previously they are worked as Data Architects at Oracle Corporation

• The Snowflake Data Warehouse publicly launched on 2014

Benoit Dageville Thierry Cruanes


Data

• Data is an information of the object

• Data is one of the asset in the current world

• Without data no one can runs the business

• Data might be numbers, characters, symbols,


images, etc.,
Database
• Database is a collection of information
• We should store our data any of the place, that place is called as a Database
• Without storing data in database we can not re-use the data again and again

• Databases:
• Oracle
• SQL Server
• DB2
• Teradata
• Mango DB
• Snowflake
• Ingres
• MySQL
Database Data Warehouse

• A database is a collection of data or • A data warehouse is a system that stores highly


information structured information from various sources

• Databases are Online Transaction Processing • Data warehouses are Online Analytical
(OLTP). Which means here we can do the day Processing (OLAP). Which means here we can
to day current data transactions. keep the years of historical data.

• Normalized architecture. Which is avoiding • Denormalized architecture. Which means


the data redundancy (Junk data, Duplicates.,) storing the very complex tables.
ETL
• ETL means Extract, Transform and Load

• It Extract the data from the different sources and Transform the data according to the business
logic and Load it into another database

• ETL Tools:
• Informatica Power Centre
• Talend
• Oracle Data Integration (ODI)
• Data Stage
• SSIS
• Ab initio
• Pentaho
• Big Data
Data Warehouse Architecture

DB2 Reporting

Oracle Database Data Warehouse Visualization

ETL OLTP OLAP

CRM BI

Reporting
Source
Generations of Data Warehouses

Oracle SQL
1st Gen
MySQL

Teradata On-Premises
2nd Gen
Vertica

3rd Gen Big Data

4th Gen RedShift Platform-as-a-Service

5th Gen Snowflake Software-as-a-Service


Why Snowflake?
What is Snowflake?
• Snowflake is a cloud based data warehousing solution
• Snowflake offers data storage and analytics services.
• Snowflake does not have their own infrastructure.
• It runs on Amazon S3, Microsoft Azure, and the Google Cloud
platform.
• Snowflake runs completely on cloud infrastructure.
• Available as Software-as-a-Service.
Why Snowflake?
• Pay for what you use model.
• It is a cloud platform, no Infrastructure cost.
• Snowflake is more than a Datawarehouse.
• It also helps in some transformations, create data pipes, create visual
dashboard etc.
• High scalability.
• Data recovery, backup, sharing, masking.
• Can analyze the data present in external files.
• Easy integration with Data Visualization/Reporting tolls.
Traditional WH Vs Snowflake
Feature Traditional WH Snowflake
Infrastructure cost yes No Infrastructure cost

Handle semi structure data Need ETL tools Snowflake can process

Data loading and unloading Need ETL tools Can be done by using “COPY”

Scalability Not an easy task Highly Scalable


(support Scale-up and Scale-out)

Database Administration Highly Required In-built performance optimization with its


micro partitions and cluster keys
Traditional WH Vs Snowflake
Feature Traditional WH Snowflake
Data Backup Need additional storage Easy and no cost with “Cloning”

Data Recovery Difficult Very easy with “Time Travel”

Data Sharing Difficult Easy with Data Sharing feature

Change Data Capture Need ETL tools Can be done by using “Streams”

Scheduling Tools required Can schedule by using “Tasks”


Snowflake Architecture

• What is Snowflake

• Snowflake is an analytic and cloud data warehouse provided

• Snowflake as Software-as-a-Service (SaaS).

• Snowflake provides a data warehouse that is faster, easier to use, and far more

flexible than traditional data warehouse offerings .


Snowflake Architecture
• Data Warehouse as a Cloud Service
• There is no hardware (virtual or physical) for you to select, install, configure, or
manage.
• There is no software for you to install, configure, or manage.
• Ongoing maintenance, management, and tuning is handled by Snowflake
• Snowflake runs completely on cloud infrastructure
• Snowflake manages all aspects of software installation and updates.
• SAAS – Software as a service
• Snowflake cannot be run on private cloud infrastructures (on-premises or
hosted)
• Snowflake is not a packaged software offering that can be installed by a user.
Snowflake Architecture
Traditional Architectures Snowflake

Snowflake

Shared-disk Shared-nothing
Shared storage Decentralized, local storage
Single cluster Single cluster

Multi-cluster,
1
6 shared data
Centralized, scale-out storage
Multiple, independent compute clusters
25
Snowflake Architecture
• Snowflake Architecture
• Snowflake’s architecture is a hybrid of traditional shared-disk database
architectures and shared-nothing database architectures.
• Similar to shared-disk architectures, Snowflake uses a central data repository for
persisted data that is accessible from all compute nodes in the data warehouse
• Similar to shared-nothing architectures, Snowflake processes queries using MPP
(massively parallel processing) compute clusters where each node in the cluster
stores a portion of the entire data set locally
• This approach offers the data management simplicity of a shared-disk architecture,
but with the performance and scale-out benefits of a shared-nothing architecture
Snowflake’s unique architecture consists of three key layers:
• Database Storage
• Query Processing
• Cloud Services
Snowflake Architecture

19
Database Storage:
• Whenever data loading into snowflake,
• Stores table data and query results
• Snowflake stores this optimized in cloud storage.
• The Snowflake reorganizes
• The data into its internal optimized, compressed, columnar format.
• Data will be stored in columnar format
• Data will be stored in micro partitions
• The data objects stored by Snowflake are not directly visible nor accessible by
customers.
• They are only accessible through SQL query operations run using Snowflake.
Snowflake manages all aspects of how this data is stored i.e.
the data organization,
file size,
structure,
compression,
metadata,
statistics

We can define cluster keys on large tables for better performance.


Query Processing:
 ­Query execution is performed by Query processing layer.
 Query processing queries using "virtual warehouses".
 Warehouses are required for queries, as well as all DML operations,
including loading data into tables.
 A warehouse is defined by its size.
 Each virtual warehouse is composed of multiple compute nodes allocated by
Snowflake from a cloud provider.
• On AWS they are a group of EC2 instances and on AZURE a
group of Virtual Machines
• Compute cost will be calculated on the basis of query execution
time on virtual warehouses
• Virtual Warehouses are considered as the muscle of the system

• Can scale up and scale down easily

• Auto-Suspend and Auto-Resume is available


 Increasing the size of a warehouse does not always improve data loading
performance.
 Data loading performance is influenced more by the number of files being
loaded (and the size of each file) than the size of the warehouse.

 What is a Multi-cluster Warehouse?

o By default, a virtual warehouse consists of a single cluster of compute


resources available to the warehouse for executing queries.
o As queries are submitted to a warehouse, the warehouse allocates resources
to each query and begins executing the queries.
o If sufficient resources are not available to execute all the queries submitted to
the warehouse, Snowflake queues the additional queries until the necessary
resources become available.
If minimum cluster and maximum cluster both size is different it is called auto-scale
warehouse plan
o With multi-cluster warehouses,
o Snowflake supports allocating, either statically or dynamically,
additional clusters to make a larger pool of compute
resources available.
o A multi-cluster warehouse is defined by
o Specifying the following properties:
 Maximum number of clusters, greater than 1 (up to 10).
 Minimum number of clusters, equal to or less than the
maximum (up to 10).
 If minimum cluster and maximum cluster both size is same
means it is called Maximized warehouse plan.
Cloud Services Layer:

1. Collection of services that coordinate


activities across Snowflake
2. This is the Brain of the snowflake
3. Authentication and access control
4. Infrastructure management
5. Metadata Management
6. Security
7. Manages all Serverless tasks like Snowpipe,
Tasks, Materialized view maintenance etc.

Note:
• Snowflake is not charging if we are querying the Metadata information
• Snowflake is not charging DDL statements also
Snowflake Architecture
Snowflake on GCP
• Cloud Services
• Among the services in this layer:
• Authentication
• Infrastructure management
• Metadata management
• Query parsing and optimization
• Access control

• Connecting to Snowflake
• Web UI – Web Interface
• CLI – SnowSQL – Command Line Utility
• ODBC – JDBC
• Native Connectors – Java, Python
• Third Party Connectors – ETL Tools – Informatica, Talend, Matallion
Connecting to Snowflake

Snowflake supports multiple ways of connecting to the service:

• A web-based user interface from which all aspects of managing and using Snowflake can
be accessed.
• Command line clients (e.g. SnowSQL) which can also access all aspects of managing
and using Snowflake.
• ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect
to Snowflake.
• Through native connectors available in ETL tools (e.g. Datastage, Informatica)
Snowflake Editions

32
Snowflake on GCP
Snowflake WebUI
• Snowflake web-based graphical interface, you can create and manage all
Snowflake objects
• Databases
• Virtual Warehouses
• All Database Objects (Schemas, Tables, Stages etc..)
• Load Limited Amount of Data into Tables
• Execute Queries , DDL and DML Operations
• Based on the Role (Privileges) you have you can perform admin actions like Creating and
Managing Users
• Logging into the Snowflake
• Please enter / click on the Url that you have received in your email for the trial
Snowflake Account you have requested
Snowflake on
Snowflake WebUI GCP
• WebUI Page Details
• Databases Page
• Warehouses Page
• Worksheet Page
• History Page
• Help Menu
• User Preferences Menu
• Databases Page
Snowflake on
Snowflake WebUI GCP
• Databases Page
• Shows information about the Databases that you have created or access to.
• You can create , clone ,transfer ownership or drop a database
• You can also access or navigate to the below objects in database
Snowflake on
Snowflake WebUI GCP
• Warehouses Page
• You can view Virtual Warehouses that you have created or access to
• You can do the below actions in Warehouses Page
• Create or drop a warehouse.
• Suspend or resume a warehouse.
• Configure a warehouse.
• Transfer ownership of a warehouse to a different role.
Snowflake on
Snowflake WebUI GCP
• Worksheet Page
• This page helps us to run our SQLs, Create Procedures, Load Tables.
• We can create up to 16 Worksheets and each work sheet can connect to different database with different roles as well
• All the worksheets are saved by default and when you connect back the next time to Snowflake all the Worksheets are
intact.
Snowflake on
Snowflake WebUI GCP
• Worksheet Page – Notes Details
• 1 – Database Navigator
• 2 – Create New Worksheet
• 3 – Open Existing Worksheet
• 4 – Context Sensitive Menu – Virtual Warehouse
• 5 – Load a Script, Delete a Worksheet, Highlight Code
• 6 – Queries or SQL text that can be executed or run
• 7 – Download the SQL Query Result Set
• 8 – Copy the SQL Query Result Set
• 9 – Expand the columns with Result set
• 10 – Select the columns in the History Details for viewing

You might also like