Snowflake Notes
Snowflake Notes
https://Snowflake.com/en/data-cloud/pricing-options
Creating Warehouse
INFORMATION_SCHEMA and PUBLIC are the 2 defaults schemas after creating DB.
-- Alternative
COPY INTO EXERCISE_DB.PUBLIC.CUSTOMERS
FROM @aws_stage
file_format= EXERCISE_DB.public.aws_fileformat
Load History
Can identify from below table
snowflake.account_usage.LOAD_HISTORY table or
DB.INFORMATION_SCHEMA.LOAD_HISTORY
with ON_ERROR=Continue
Snowflake Editions
Standard Edition
Enterprise Edition –Multi Cluster
Business Critical – For higher security with extremely sensitive data
Virtual Private – Highest level of security
Automatic Data Multi cluster Additionally, security features such Dedicated virtual servers
Encryption warehouse as customer managed encryption and separate snowflake env
Broad support for
Standard and special Time Travel up to 90
data Types days Support for data specific regulation Dedicated metadata store
Time Travel up to 1 Database failover/failback Isolated from all other
day Materialized views (disaster recovery) snowflake accounts
Disaster recovery for 7 Search Optimization
days
beyond time travel
Network Policies Column level security
24 hours early access
to
Secure data share weekly new releases
Federated
authentication &
SSO
Premier support 24/7
CTE:
A CTE is typically used to simplify complex queries and make them more
readable.
It does not involve recursion or repeated self-referencing.
It is evaluated once and does not iterate.
Recursive CTE:
A recursive CTE is used to work with hierarchical data, such as organizational
charts or tree-like structures.
It involves recursion, meaning the CTE references itself.
It consists of two parts: the anchor member and the recursive member.
The anchor member is the base result set, and the recursive member
repeatedly executes based on the previous result.
A Recursive CTE is useful for hierarchical data, where you need to traverse
multiple levels, such as finding all employees under a specific manager, no
matter how deep the hierarchy goes.
Resource Monitor can set at Account level ,Virtual warehouse and Multiple
virtual warehouse.
3 Types of actions can implement.
1. Suspend immediately and notify when this % of credit is used.
2. Suspend and notify when this % of credit is used
3. Notify when this % of credit is used.
This can be created by Account Admin.
Type of Loading
Creating Stage
create or replace stage EXERCISE_DB.external_stages.aws_stage
url='s3://bucketsnowflakes3'
credentials=(aws_key_id='ABCD_DUMMY_ID',aws_secret_key='1234abcd_key'
);
LIST @aws_stage;
Time Travel : Recover the object that have been dropped within
retention period.
Table Types
Snowpipe
1. Enables loading once a file appears in a bucket
2. If data needs to be available immediately for analysis
3. Snowpipe uses serverless features instead of datawarehouse.
Configuration
1.create stage
2. Test Copy command
3. create pipe create a pipe object with a copy command
4.S3 Notification
Refresh pipe
Alter pipe <pipename> refresh;
Pause pipe
ALTER PIPE <pipename> set PIPE_EXECUTION_PUASED=TRUE;
Azure Integration
list @azure_Stage
How can you load historical data files from External storage using
snowpipe?
Performance Tuning
Roles
Create user ds1 password =’DS1’ login_name=’DS1’
DEFAULT_ROLE=’DATA_ENGINEER’,DAFAULT_WAREHOUSE=’DS_WH’
MUST_CHANGE_PASSWORD=FALSE;
CREATE ROLE DATAENGINEER;
Grant usage on COMPUTE_DW TO ROLE DATAENGINEER;
Scaling up: Increasing the size of virtual warehouses for more complex query
Scaling out: Adding of warehouse or Multi Cluster warehouses for more concurrent
users or queries.
Clustering: Used only huge data tables.
Columns that are more frequently used in WHERE clause
Columns that is frequently used in joins
Create table name CLUSTER BY (columns)
ALTER TABLE name DROP CLUSTERING KEY
ALTER TABLE name CLUSTER (DATE)
Swaps the metadata (like table names and properties) between the two tables.
select
$1:__index_level_0__::int as id
,$1:"__index_level_0__"
,$1:"cat_id"
,$1:"date"
,DATE($1:date::int) as date
,METADATA$FILENAME AS FILENAME
,METADATA$FILE_ROW_NUMBER AS ROWNUMBER
,TO_TIMESTAMP_NTZ(current_timestamp) load_Date
from @public.parquetstage limit 100;
Tree of Tasks
CREATE TASK –
After <Parent taks>
AS
ALTER TASK –
Add after <parent task>
Access Control
DAC Discretionary Access control Each object has an owner who can
grant access to that object.
RBAC Role based Access Control
Strored Procedure
declare
v_string varchar(10);
begin
select 'Welcome to snowflake scripting'
into v_string;
return v_string;
exception
when expression_error then return 'sorry my bad';
end;
Streams
Create stream <streamname> on table <tablename>
Sharing
1. Create share : create share my_share
2. Grant privileges to share :
Grant usage on database my_db to share my_share;
Grant usage on database my_schema.my_db to share my_share;
Grant select on database my_Table.my_schema.my_db to share my_share;
3. Add Consumers to Account
Alter share my_share Add Account ;
4. Import share
Create Database my_db from share my_share
Tables, External Tables, Secure Views, Secure materialized views, Secure UDFS
Select * from Information_schema.TABLE_STORAGE_METRICS where
id<>CLONE_GROUP_ID