0% found this document useful (0 votes)
77 views12 pages

Snowflake - Data Ingestion - Loading

The document discusses Snowflake data ingestion options including batch bulk data ingestion using copy commands scheduled via tasks or triggered via Python/Glue/Airflow at intervals, and real-time data ingestion using Snowpipe for continuous ingestion or Kafka connector. It also covers creating Snowflake objects for AWS integration, stages and loading data into tables.

Uploaded by

clouditlab9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views12 pages

Snowflake - Data Ingestion - Loading

The document discusses Snowflake data ingestion options including batch bulk data ingestion using copy commands scheduled via tasks or triggered via Python/Glue/Airflow at intervals, and real-time data ingestion using Snowpipe for continuous ingestion or Kafka connector. It also covers creating Snowflake objects for AWS integration, stages and loading data into tables.

Uploaded by

clouditlab9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Snowflake - Data Ingestion/Loading

Few million transactions every hour


Snowflake

- Table-1
Few thousand transactions every minute
- Table-2

- Table-3

- Table-4

Few thousand rows to be fetched every 4-6 hours

Rest API
Snowflake - Data loading options
Batch Bulk Data Ingestion :

● Write/load the data into your staging location (S3 , GCS Buckets)

● Ingest the data into Snowflake in batches at frequent time intervals using :

○ Snowflake Copy commands scheduled using Snowflake tasks


○ Trigger Copy commands using python/Glue/Airflow running at specified time intervals

Real-time Data Ingestion :

● Write/load the data into your staging location (S3 , GCS Buckets) and ingest the data in near-real time
using :
○ Snowpipe (Continuous data ingestion)
○ Airflow S3 sensors/triggers

● Kafka-snowflake Connector for real-time data ingestion


Snowflake - Batch Data Ingestion

Snowflake

python/airflow/glue

RDBMS/Structured Data
Data Ingestion pipelines running at
Data extraction
pipelines
scheduled time intervals - Table-1

- Table-2

- Table-3
Load into Snowflake tables
- Table-4

Load/Write to Staging
location AWS S3 Bucket
Snowflake - Real-Time Data Ingestion

Snowflake

AWS Airflow Sensors/Snowpipe objects

Semi-structured streaming data


Triggered on write event - Table-1

- Table-2
Load into Snowflake tables
AWS S3 Bucket
- Table-3
Load/Write to Staging
location
- Table-4
Snowflake - Real-Time Data Ingestion

Snowflake

Semi-structured streaming data


- Table-1

Load into Snowflake tables


- Table-2
Snowflake Kafka Connector
- Table-3
Load/Write to Staging
location
- Table-4
Snowflake - Batch Data Ingestion
Snowflake

- Table-1
Few thousand rows to be fetched every 4-6 hours

- Table-2

Rest API - Table-3


python/airflow/glue

- Table-4
Snowflake - AWS Connection

● Step-1 : Create an IAM Role for Snowflake to access data in S3 Buckets

● Step-2 : Create S3 Bucket in AWS and upload Sample Files into the bucket

● Step-3 : Create an Integration Object in Snowflake for authentication


(using Accountadmin Role )

● Step-4 : Create a File Format Object (Using Sysadmin/Custom Role)

● Step-5: Create a stage object referencing the location from which the data needs to be ingested (Using
Sysadmin/Custom Role)

● Step-5 : Load the data into Snowflake Tables (Using Sysadmin/Custom Role)
Snowflake - What have we done so far ?

● Created a Storage Integration Object and authenticated Snowflake to read data from S3

● Created a Stage Object which refers to the Integration Object (one stage object per table)

● Executed copy commands manually to ingest data into the respective tables
Snowflake - Batch Data Ingestion
Snowflake

- Country

- Orders

- Lineitem
AWS S3 Staging Bucket
Copy command to ingest data
- PARTSUPP
Snowflake - Batch Data Ingestion

Snowflake

New Batch of Data arriving every X Hours

- Country

- Orders

- Lineitem
AWS S3 Staging Bucket
Automatically execute copy commands to ingest
data into tables using Snowpipe - PARTSUPP
Snowflake - Continuous Data Ingestion

● Snowpipe is Snowflake’s continuous data ingestion service

● Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion.

● Snowpipe loads data from stages files in micro-batches rather than manually executing COPY statements
on a schedule to load larger batches.
Data Ingestion - Key Considerations

● Recommended Approach to ensure concurrent data is ingested concurrently .


○ Avoid loading data in a single large file
○ Split large files into smaller files (~100-250 MB Compressed)

● Snowpipe

● Snowpipe loads data from stages files in micro-batches rather than manually executing COPY statements
on a schedule to load larger batches.

You might also like