Snowflake - Data Ingestion - Loading
Snowflake - Data Ingestion - Loading
- Table-1
Few thousand transactions every minute
- Table-2
- Table-3
- Table-4
Rest API
Snowflake - Data loading options
Batch Bulk Data Ingestion :
● Write/load the data into your staging location (S3 , GCS Buckets)
● Ingest the data into Snowflake in batches at frequent time intervals using :
● Write/load the data into your staging location (S3 , GCS Buckets) and ingest the data in near-real time
using :
○ Snowpipe (Continuous data ingestion)
○ Airflow S3 sensors/triggers
Snowflake
python/airflow/glue
RDBMS/Structured Data
Data Ingestion pipelines running at
Data extraction
pipelines
scheduled time intervals - Table-1
- Table-2
- Table-3
Load into Snowflake tables
- Table-4
Load/Write to Staging
location AWS S3 Bucket
Snowflake - Real-Time Data Ingestion
Snowflake
- Table-2
Load into Snowflake tables
AWS S3 Bucket
- Table-3
Load/Write to Staging
location
- Table-4
Snowflake - Real-Time Data Ingestion
Snowflake
- Table-1
Few thousand rows to be fetched every 4-6 hours
- Table-2
- Table-4
Snowflake - AWS Connection
● Step-2 : Create S3 Bucket in AWS and upload Sample Files into the bucket
● Step-5: Create a stage object referencing the location from which the data needs to be ingested (Using
Sysadmin/Custom Role)
● Step-5 : Load the data into Snowflake Tables (Using Sysadmin/Custom Role)
Snowflake - What have we done so far ?
● Created a Storage Integration Object and authenticated Snowflake to read data from S3
● Created a Stage Object which refers to the Integration Object (one stage object per table)
● Executed copy commands manually to ingest data into the respective tables
Snowflake - Batch Data Ingestion
Snowflake
- Country
- Orders
- Lineitem
AWS S3 Staging Bucket
Copy command to ingest data
- PARTSUPP
Snowflake - Batch Data Ingestion
Snowflake
- Country
- Orders
- Lineitem
AWS S3 Staging Bucket
Automatically execute copy commands to ingest
data into tables using Snowpipe - PARTSUPP
Snowflake - Continuous Data Ingestion
● Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion.
● Snowpipe loads data from stages files in micro-batches rather than manually executing COPY statements
on a schedule to load larger batches.
Data Ingestion - Key Considerations
● Snowpipe
● Snowpipe loads data from stages files in micro-batches rather than manually executing COPY statements
on a schedule to load larger batches.