0% found this document useful (0 votes)
98 views11 pages

Olympic Dataset Analysis

This document outlines a series of queries to analyze an Olympic dataset stored across AWS S3, Databricks, and Snowflake. It includes queries to find the team with the most gold medals, players who have won medals in both summer and winter Olympics, and players who have won gold medals in the same event over consecutive summer Olympics from 2000 onward. The document directs the user to a GitHub link containing code to ingest the Olympic data from S3 into Databricks and connect to Snowflake.

Uploaded by

Sohel Sayyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views11 pages

Olympic Dataset Analysis

This document outlines a series of queries to analyze an Olympic dataset stored across AWS S3, Databricks, and Snowflake. It includes queries to find the team with the most gold medals, players who have won medals in both summer and winter Olympics, and players who have won gold medals in the same event over consecutive summer Olympics from 2000 onward. The document directs the user to a GitHub link containing code to ingest the Olympic data from S3 into Databricks and connect to Snowflake.

Uploaded by

Sohel Sayyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Olympic Dataset

Analysis
With AWS S3, Databricks & Snowflake
Data Quality Check

--Check the Count of Rows in athlete_events

--Check the Count of Rows in athletes


1. Which team has won the maximum gold medals over the years?

Output:
2. For each team print total silver medals and year in which they won maximum
silver medal. Output 3 columns -> team,total_silver_medals, year_of_max_silver

Output:
3. which player has won the maximum gold medals amongst the players which
have won the only gold medal (never won silver or bronze) over the years

Output:
4. In each year which player has won maximum gold medal. Write a query to print
year, player name and no of golds won in that year. In case of a tie print comma
separated player names.

Output:
5. In which event and year India has won its first gold medal, first silver medal and
first bronze medal print 3 columns medal, year, sport

Output:
6. Find players who won gold medal in summer and winter Olympics both.

Output:
7. Find players who won gold, silver and bronze medal in a single Olympics. print
player name along with year.

Output:
8. Find players who have won gold medals in consecutive 3 summer Olympics in
the same event. Consider only Olympics 2000 onwards. Assume summer
Olympics happens every 4-year starting 2000. print player name and event
name.

Output:
THANK YOU
You Can find the code to ingest the dataset from AWS s3 to Databricks and
establish a connection to snowflake in below link

Olympic_Dataset_Analysis_Github

You might also like