AWS Athena Serverless Querying
AWS Athena Serverless Querying
ATHENA
Query Output
S3 Bucket
Swipe
@thenischald
ELT workflow
https://ksbcl.karnataka.gov.in
EventBridge
S3 file tracker
Status
Data Transformation
SNS
Swipe
@thenischald
ELT
Extract:
- A Lambda function is scheduled to run every 2 days.
- It downloads an XLS file from the website.
- The file is uploaded to an S3 bucket.
Load:
- The raw XLS file is stored in S3 before transformation.
- S3 event notifications trigger another Lambda function.
Transform:
- The triggered Lambda function cleans and processes the data.
- The Lambda’s execution status is sent via SNS email notifications.
- After transformation, the cleaned data is saved back to S3.
- The CSV output file is named by the invoice date.
Swipe
@thenischald
I Executed the above DDL in Athena to query
the CSV file from S3 without inserting data
into a traditional database.
Swipe
@thenischald
Converted CSV to Parquet for optimized
querying and cost reduction, as Athena
charges based on the amount of data
scanned.
Swipe
@thenischald
Query executed on the CSV and Parquet
file, showing the data scanned for the
same query. I had a small dataset, but
imagine handling a larger one.
@thenischald