0% found this document useful (0 votes)
53 views

Class Activity

This document outlines building a climate data analysis pipeline using AWS services including S3, Athena, and Glue to upload, query, transform, and analyze a climate dataset to generate insights on global temperature trends over time.

Uploaded by

samreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Class Activity

This document outlines building a climate data analysis pipeline using AWS services including S3, Athena, and Glue to upload, query, transform, and analyze a climate dataset to generate insights on global temperature trends over time.

Uploaded by

samreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Activity Overview: Climate Data Analysis Pipeline

Objective: Build a data analysis pipeline that uploads, queries, and transforms a climate
dataset to generate insights on global temperature trends.
Duration: Approximately 3-4 hours
Prerequisites:
 Basic understanding of AWS services (S3, Athena, and Glue)
 AWS Account and AWS CLI installed
 Basic knowledge of SQL
Step 1: Setup and Data Preparation
1. Create an S3 Bucket: Follow Lab 7 to create an S3 bucket named climate-data-
bucket. Ensure the bucket has encryption enabled and public access blocked.
2. Upload Dataset: Obtain a climate dataset, preferably in CSV format, that contains
daily temperature records. The dataset should have columns for Date, Temperature,
City, and Country. Upload this dataset to climate-data-bucket.
Step 2: Data Querying with Athena
1. Query Data with Athena: Follow the guidance from Lab 8 to setup Athena. Use
Athena to create a database climate_analysis and a table temperature_records that
references the CSV file in your S3 bucket.
2. Perform Initial Analysis: Run SQL queries to answer the following:
 Average temperature per country.
 Top 10 hottest cities.
Step 3: Data Transformation with AWS Glue
1. Create a Glue Crawler: Using instructions from Lab 9, setup a Glue crawler to
populate the AWS Glue Data Catalog with the climate_analysis database schema.
This database should now appear in the AWS Glue Data Catalog.
2. Transform Data: Create an ETL job in AWS Glue that transforms the temperature
from Celsius to Fahrenheit (if applicable) and filters records to only include data from
the last decade.
3. Store Transformed Data: Save the transformed data back into climate-data-bucket
in a new folder named transformed.
Step 4: Advanced Analysis and Visualization
1. Advanced SQL Queries: Using Athena, perform more complex queries on the
transformed dataset to uncover insights, such as:
 Yearly temperature trends.
 Comparison of temperature changes by country.
2. Visualization (Optional): Utilize Amazon QuickSight or a tool of your choice to
visualize the query results, showcasing temperature trends over time.
Deliverables:
 A document outlining:
 The SQL queries used and their outputs.
 A brief analysis of the findings from the temperature data.
 (Optional) Visualizations of the temperature trends.
Reflection:
After completing the activity, reflect on how each AWS service contributed to the data
pipeline and how this approach can be scaled or modified for different datasets or analytical
needs.
This activity provides a hands-on experience with AWS services for handling data at scale,
from storage and querying to transformation, and leverages the power of the cloud to analyze
and visualize climate data trends.

You might also like