4.1 Snowflake
4.1 Snowflake
Business Scenarios
Components
Know Snowflake
SnowSQL
Loading Data
3rd Party Connections
Agenda.
Snowflake Usage
Business Scenarios
Dataflow
Snowflake used as a Data warehouse in a BI platform.
Business Scenarios
Components
Know Snowflake
SnowSQL CLI
Loading Data
3rd Party Connections
Agenda.
Components in Snowflake
Components
● Database
○ Database is an organization's data sets. Dev and Prod use different databases.
● Schema
○ schema shows a logical view of an database. It constructs views and tables. It
represent a set of data in certain purpose.
● Table and views
○ Contain structured data
● Worksheet
○ Editor to write and run SQL.
● Warehouse
○ Engine to run queries.
● Roles
○ Access and operation permission group.
● History
○ Historical queries.
Warehouse
Components
Definition: Database is an organization's data sets. Dev and Prod use different databases.
Clone Structures
Try in system
Schema
Components
Creation
● Console ● Query
○ Creation
CREATE SCHEMA IF NOT EXISTS DEMO_DB.SCHMA;
Try in system
Schema
Components
● Create
CREATE TABLE IF NOT EXISTS enterprise.table
AS (SELECT col_1, col_2 FROM landing.table);
● Insert
INSERT INTO enterprise.table (col_1, col_2)
(SELECT col_1, col_2 FROM landing.table);
Try in system
Tables and Views
Components
Name Definition Usages
Regular Table Regular Table is a collection that contains the Contain the formal tables in the database.
data physically, which means the data take store
space in hardwares.
Transient Table Transient tables are similar to regular tables with It is mainly used as a staging table in ETL process.
the main difference that they do not have a
Fail-safe period.
Temporary Table Temporary table is a physical table, but it only Temp table is rarely used on ETL process. It is only
exist within the session in which they were used when you want to create some simple and
created and persist only for the remainder of the temporary staging table in ETL; or when you need to do
session.They are not visible to other users or some testing but don’t want impact the entire database.
sessions. [Try in system]
View A view allows the result of a query to be It is used when you only need the query result, but
accessed as if it were a table. don’t need it contains data permanently. Mostly in ETL
process.
Materialized View A materialized view is a pre-computed data set It is expensive than a regular view, so it is used when:
derived from a query and stored for later use. ● you need faster and significant processing;
Querying a materialized view is faster than a ● the result is small size data;
view. ● The aggregates that take a long time to calculate.
Regular , Transient, Temporary Table
Components
● Table Information
❖ Show all the tables in a schema
● Table Creation
❖ Create an table
CREATE TABLE [TRANSIENT TABLE, TEMP TABLE].....
● Table Creation
❖ Create a table by clone another table totally
● Table Creation
● Table Modification
❖ Change name
ALTER TABLE [IF EXISTS] <table name> RENAME TO <new table name>;
● Drop Table
DROP TABLE [IF EXISTS] <table name> ;
● Table Information
❖ Show all the views in a schema
● View Creation
❖ Create an view by SELECT
❖ Drop View
Regular Table OK OK
Try in system
Tables – Data type
Components
Name Definition Example
INT Integer product_key, Is_flag (converted), qty (sometimes)
Try in system
Worksheet
Components
Try in system
Role
Components
Try in system
Business Scenarios
Components
Know Snowflake
SnowSQL CLI
Loading Data
3rd Party Connections
Agenda.
Snowflake Architecture
Know Snowflake
Snowflake Storage
Know Snowflake
❖ The freedom to store your data
Try in system
Snowflake Pricing
Know Snowflake
● Storage
○ $23 USD per compressed TB each month of data stored in US.
● Computing
○ Compute costs are $0.00056 per second for each credit consumed on
Snowflake Standard Edition.
Example: X-Large
16 x 3600 x 0.00056 = 32.256 dollars
Business Scenarios
Components
Know Snowflake
SnowSQL CLI
Loading Data
3rd Party Connections
Agenda.
installation on Linux (For Ubuntu)
SnowSQL CLI
1. Go to Download Folder
cd ~/Downloads
1. Check version
snowsql -v
Useful Link: https://docs.snowflake.com/en/user-guide/snowsql-install-config.html#installing-snowsql-on-linux-using-the-rpm-package
installation on Mac
SnowSQL CLI
2. Input Password
Account Name
User Name
SnowSQL config
SnowSQL CLI
Why Config?
● Connection Config file
Location: ~/.snowsql/config
➔ Default Connection Setting:
Open: nano ~/.snowsql/config
Connection:
default setting: snowsql ………
➔ Specific Connection Setting: Specific setting: snowsql -c wcd
Try in system
SnowSQL Command
SnowSQL CLI
➔ Options:
SnowSQL command
snowsql -c wcd -o variable_substitution=true -D q_table=enterprise.city -D cty_id=100001 -f script.sql
-c Connection
-q Query
-o Option. Here must HAVE variable_substitution=true , otherwise, -D will not
work.
-D Variables in script file
-f File of script
Try in system
Useful Link: https://docs.snowflake.com/en/user-guide/snowsql-start.html#d-variable
Business Scenarios
Components
Know Snowflake
SnowSQL CLI
Loading Data
3rd Party Connections
Agenda.
Load Data with Query – Steps
Loading data
Step 1
Upload (i.e. stage) one or more data files to a Snowflake stage (named internal stage or table/user stage)
using the PUT command.
Step 2
Use the COPY INTO <table> command to load the contents of the staged file(s) into a Snowflake
database table.
Before a file being loaded into a table, it must be loaded into a Staging Area first. There are 2 important
types of Staging Areas — User Stages and Named Stages:
User Stages
By default, each user and table in Snowflake is automatically allocated an internal stage for staging data
files to be loaded. To list the file under the user stage use ‘LIST @~;’
Named Stages
Named stages are database objects that provide the greatest degree of flexibility for data loading.Named
stages are optional but recommended when you plan regular data loads that could involve multiple users
and/or tables. Usually it is stages integrate with other data lake, such as S3.
1. Load data into Staging Area (The process must be done in CLI)
snowsql -c wcd -q "PUT file://countries.csv @~"
---------------------------
@~: is the User stage.
file://cities.csv is the file dir in the local system.
1. Copy Data from Staging area to the table (The process works in both Editor or CLI)
COPY INTO schma.countries FROM @~/countries.csv [FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ','
SKIP_HEADER = 1)];
COPY INTO schma.countries FROM (SELECT $1, $2 FROM@~/cities.csv);
j. Go to snow flake create a FORMAT called CSV_COMMA with this query. This step is to tell
Snowflake, we are going to use the format csv to load data.
k. grant stage and integration to schema with the following query:
● GRANT CREATE STAGE ON SCHEMA ENTERPRISE to ROLE accountadmin;
● GRANT USAGE ON INTEGRATION S3_INT_WCD_LECT1 to ROLE accountadmin;
j. Create the STAGE with this query:
CREATE OR REPLACE STAGE WCD_LECT1_STAGE
STORAGE_INTEGRATION = S3_INT_WCD_LECT1
URL='s3://your bucket name'
FILE_FORMAT = CSV_COMMA;
k. Upload the city.csv file to the bucket;
l. Check if the file has been in stage WCD_LECT1_STAGE with query:
List @WCD_LECT1_STAGE;
m. Copy file to city table from stage @WCD_LECT1_STAGE with query:
COPY INTO walmart_dev.enterprise.city from @WCD_LECT1_STAGE/city.csv
FILE_FORMAT =CSV_COMMA;
Business Scenarios
Components
Know Snowflake
SnowSQL CLI
3rd Party Connections
Agenda.
3rd Party tools Connection
Loading data
1. DBeaver
3rd Party tools Connection
Loading data
2. PowerBI
import snowflake.connector as sf
import pandas as pd
3rd Party tools Connection
Loading data # make changes as per your credentials
user='snowflake050701'
3. Python password = 'Code123456'
account='ozb44782.us-east-1'
database='walmart_dev'
warehouse='COMPUTE_WH'
pip install snowflake-connector-python schema='enterprise'
role='accountadmin'
def run_query(connection,query):
cursor = conn.cursor()
cursor.execute(query)
cursor.close()