Snowflake Architecture - Concepts
Snowflake Architecture - Concepts
• It founded by Benoit and Thierry, Previously they are worked as Data Architects at Oracle Corporation
• Databases:
• Oracle
• SQL Server
• DB2
• Teradata
• Mango DB
• Snowflake
• Ingres
• MySQL
Database Data Warehouse
• Databases are Online Transaction Processing • Data warehouses are Online Analytical
(OLTP). Which means here we can do the day Processing (OLAP). Which means here we can
to day current data transactions. keep the years of historical data.
• It Extract the data from the different sources and Transform the data according to the business
logic and Load it into another database
• ETL Tools:
• Informatica Power Centre
• Talend
• Oracle Data Integration (ODI)
• Data Stage
• SSIS
• Ab initio
• Pentaho
• Big Data
Data Warehouse Architecture
DB2 Reporting
CRM BI
Reporting
Source
Generations of Data Warehouses
Oracle SQL
1st Gen
MySQL
Teradata On-Premises
2nd Gen
Vertica
Handle semi structure data Need ETL tools Snowflake can process
Data loading and unloading Need ETL tools Can be done by using “COPY”
Change Data Capture Need ETL tools Can be done by using “Streams”
• What is Snowflake
• Snowflake provides a data warehouse that is faster, easier to use, and far more
Snowflake
Shared-disk Shared-nothing
Shared storage Decentralized, local storage
Single cluster Single cluster
Multi-cluster,
1
6 shared data
Centralized, scale-out storage
Multiple, independent compute clusters
25
Snowflake Architecture
• Snowflake Architecture
• Snowflake’s architecture is a hybrid of traditional shared-disk database
architectures and shared-nothing database architectures.
• Similar to shared-disk architectures, Snowflake uses a central data repository for
persisted data that is accessible from all compute nodes in the data warehouse
• Similar to shared-nothing architectures, Snowflake processes queries using MPP
(massively parallel processing) compute clusters where each node in the cluster
stores a portion of the entire data set locally
• This approach offers the data management simplicity of a shared-disk architecture,
but with the performance and scale-out benefits of a shared-nothing architecture
Snowflake’s unique architecture consists of three key layers:
• Database Storage
• Query Processing
• Cloud Services
Snowflake Architecture
19
Database Storage:
• Whenever data loading into snowflake,
• Stores table data and query results
• Snowflake stores this optimized in cloud storage.
• The Snowflake reorganizes
• The data into its internal optimized, compressed, columnar format.
• Data will be stored in columnar format
• Data will be stored in micro partitions
• The data objects stored by Snowflake are not directly visible nor accessible by
customers.
• They are only accessible through SQL query operations run using Snowflake.
Snowflake manages all aspects of how this data is stored i.e.
the data organization,
file size,
structure,
compression,
metadata,
statistics
Note:
• Snowflake is not charging if we are querying the Metadata information
• Snowflake is not charging DDL statements also
Snowflake Architecture
Snowflake on GCP
• Cloud Services
• Among the services in this layer:
• Authentication
• Infrastructure management
• Metadata management
• Query parsing and optimization
• Access control
• Connecting to Snowflake
• Web UI – Web Interface
• CLI – SnowSQL – Command Line Utility
• ODBC – JDBC
• Native Connectors – Java, Python
• Third Party Connectors – ETL Tools – Informatica, Talend, Matallion
Connecting to Snowflake
• A web-based user interface from which all aspects of managing and using Snowflake can
be accessed.
• Command line clients (e.g. SnowSQL) which can also access all aspects of managing
and using Snowflake.
• ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect
to Snowflake.
• Through native connectors available in ETL tools (e.g. Datastage, Informatica)
Snowflake Editions
32
Snowflake on GCP
Snowflake WebUI
• Snowflake web-based graphical interface, you can create and manage all
Snowflake objects
• Databases
• Virtual Warehouses
• All Database Objects (Schemas, Tables, Stages etc..)
• Load Limited Amount of Data into Tables
• Execute Queries , DDL and DML Operations
• Based on the Role (Privileges) you have you can perform admin actions like Creating and
Managing Users
• Logging into the Snowflake
• Please enter / click on the Url that you have received in your email for the trial
Snowflake Account you have requested
Snowflake on
Snowflake WebUI GCP
• WebUI Page Details
• Databases Page
• Warehouses Page
• Worksheet Page
• History Page
• Help Menu
• User Preferences Menu
• Databases Page
Snowflake on
Snowflake WebUI GCP
• Databases Page
• Shows information about the Databases that you have created or access to.
• You can create , clone ,transfer ownership or drop a database
• You can also access or navigate to the below objects in database
Snowflake on
Snowflake WebUI GCP
• Warehouses Page
• You can view Virtual Warehouses that you have created or access to
• You can do the below actions in Warehouses Page
• Create or drop a warehouse.
• Suspend or resume a warehouse.
• Configure a warehouse.
• Transfer ownership of a warehouse to a different role.
Snowflake on
Snowflake WebUI GCP
• Worksheet Page
• This page helps us to run our SQLs, Create Procedures, Load Tables.
• We can create up to 16 Worksheets and each work sheet can connect to different database with different roles as well
• All the worksheets are saved by default and when you connect back the next time to Snowflake all the Worksheets are
intact.
Snowflake on
Snowflake WebUI GCP
• Worksheet Page – Notes Details
• 1 – Database Navigator
• 2 – Create New Worksheet
• 3 – Open Existing Worksheet
• 4 – Context Sensitive Menu – Virtual Warehouse
• 5 – Load a Script, Delete a Worksheet, Highlight Code
• 6 – Queries or SQL text that can be executed or run
• 7 – Download the SQL Query Result Set
• 8 – Copy the SQL Query Result Set
• 9 – Expand the columns with Result set
• 10 – Select the columns in the History Details for viewing