0% found this document useful (0 votes)
22 views6 pages

AWS Athena Serverless Querying

The document outlines an ELT workflow that involves a Lambda function running every two days to download an XLS file, which is then uploaded to an S3 bucket for storage. The data is transformed by another Lambda function triggered by S3 event notifications, with execution status communicated via SNS. Additionally, the document discusses querying the transformed data in CSV and Parquet formats using Athena for optimized performance and cost efficiency.

Uploaded by

suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

AWS Athena Serverless Querying

The document outlines an ELT workflow that involves a Lambda function running every two days to download an XLS file, which is then uploaded to an S3 bucket for storage. The data is transformed by another Lambda function triggered by S3 event notifications, with execution status communicated via SNS. Additionally, the document discusses querying the transformed data in CSV and Parquet formats using Athena for optimized performance and cost efficiency.

Uploaded by

suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

S3 Bucket

Glue Data Catalog


for Metadata
Management

ATHENA

Query Output

S3 Bucket

Swipe
@thenischald
ELT workflow
https://ksbcl.karnataka.gov.in

Lambda 2 days once

EventBridge

S3 file tracker

Event notification DynamoDB


Lambda

Status
Data Transformation
SNS

Swipe
@thenischald
ELT

Extract:
- A Lambda function is scheduled to run every 2 days.
- It downloads an XLS file from the website.
- The file is uploaded to an S3 bucket.

Load:
- The raw XLS file is stored in S3 before transformation.
- S3 event notifications trigger another Lambda function.

Transform:
- The triggered Lambda function cleans and processes the data.
- The Lambda’s execution status is sent via SNS email notifications.
- After transformation, the cleaned data is saved back to S3.
- The CSV output file is named by the invoice date.

Swipe
@thenischald
I Executed the above DDL in Athena to query
the CSV file from S3 without inserting data
into a traditional database.

Swipe
@thenischald
Converted CSV to Parquet for optimized
querying and cost reduction, as Athena
charges based on the amount of data
scanned.

Swipe
@thenischald
Query executed on the CSV and Parquet
file, showing the data scanned for the
same query. I had a small dataset, but
imagine handling a larger one.

@thenischald

You might also like