0% found this document useful (0 votes)
33 views5 pages

HW 1 - Version 2.ipynb - Colab

Uploaded by

Werd We
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

HW 1 - Version 2.ipynb - Colab

Uploaded by

Werd We
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...

 Homework Assignment 1

The Art of Analyzing Big Data - The Data Scientist’s Toolbox


By Dr. Michael Fire

For this homework you will need to write code that analyzes real-world datasets. The code needs
to be written in Python using the sqlite3 package.

Please note: You need to answer only the questions that match your ID �rst digit.

 1. Babies Names Dataset (35pt)

Task 1 (for everyone): Write a code that uses the babies names dataset (use
NationalNames.csv) and creates a table named (Names) with the dataset data and the following
columns: 'State', 'Gender', 'Name', 'Count' , and 'Year' (5pt) Bonus: Load the data using a Batch
INSERT SQL Query (2pt)

Start coding or generate with AI.

Task 2 (for everyone): Write a query that returns the statistics for the name Mary (5pt). Use the
the timeit package to measure the time it takes the query to run (5pt). Bonus: Create an index on
the Name column and use the the timeit package to measure the time it takes the query to run
with the index (5pt)

Start coding or generate with AI.

Please answer only one of the following questions according to your ID


 number (use the formula Question = mod 4 +1)

# which question to answer - put your ID number and run the code
your_id = "<fill_your_id>"
q = int(your_id) % 4 + 1
print("You need to answer question number %s" % q)

1 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...

Question 1: Write a function that returns how many female and male babies were born in a given
state in a given year. Use it to calculate the number of babies born in WA in 2000 (10pt)

Start coding or generate with AI.

Question 2: Write a function that returns how many female babies were born between a given
range of years. Use it to calculate how many babies were born between 1850 and 1950 (10pt)

Start coding or generate with AI.

Question 3: Write a function that returns the most common female name in a given state. Use it
to calculate the most common female name in CA in 1999 (10pt)

Start coding or generate with AI.

Question 4: Write a function that returns how many male babies named Robert where born in a
given state in a given year. Use it to �nd the state in which the highest number of babies Robert
where born in 1950 (10pt)

Start coding or generate with AI.

Question (for everyone): For the state of CA write code that calculates the third most popular
female/male names in each decade (10pt). Bonus: Visualize it somehow using Matplotlib (5pt)

Start coding or generate with AI.

 2. Flavors of Cacao Dataset (15pt)

Using the Flavors of Cacao dataset, answer the following questions:

Question 1: Write a function that returns the number of bars manufactured where the bars'
BroadBean Origin is a given country. Use the function to calculate the number of bars where
BroadBean Origin is 'Ecuador' (15pt)

Start coding or generate with AI.

2 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...

Start coding or generate with AI.

Question 2: Write a function that returns the maximal and average cocoa percentage in a bar
manufactured by a company in a speci�c country. Use the function to calculate the minimal and
average cocoa percentage in bars manufactured by a Swiss company (15pt).

Start coding or generate with AI.

Question 3: Calculate the second most common bean type(s) and the most rare bean type(s) (15
pt)

Start coding or generate with AI.

Question 4: Calculate the number of reviews and the average rating in each year. Calculate the
number of reviews and the average rating of each company in each year (15pt)

Start coding or generate with AI.

 3. Kickstarter Projects Dataset (25pt)

Using the Kickstarter Projects Dataset, answer the following questions:

Task 1 (for everyone): Load the dataset to SQLite DB using PonyORM (10pt)

Start coding or generate with AI.

Please answer only one of the following questions according to your ID


 number (use the formula mod 3 +1)

# which question to answer - put your ID number and run the code
your_id = "<fill_your_id>"
q = int(your_id) % 3 + 1
print("You need to answer question number %s" % q)

Question 1: On average which project category received the highest number of backers? (15 pt)

3 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...

Start coding or generate with AI.

Question 2: On average which project category received the highest pledged USD? (15 pt)

Start coding or generate with AI.

Question 3: In which month occurred the highest number of projects? (15 pt)

Start coding or generate with AI.

 4. Oscars Datasets (10pt)

Using the Oscars Dataset, please answer only one of the following questions (you can chose):

Question 1: Who is the male actress with the most Oscar nominees? (10pt)

Start coding or generate with AI.

Question 2: Who is the female director with the most Oscar nominees? (10pt)

Start coding or generate with AI.

Question 3: Which top-10 movies received the highest number of Oscar nominees? (10pt)

Start coding or generate with AI.

Question 4: Write a function that receives an actor's name and returns the actor’s number of
Oscar nominees. Use the function to calculate the number of times Leonardo DiCaprio was a
nominee (10pt)

Start coding or generate with AI.

 5. Cool Bonus: LLMs & Stable Difussion (10pt)

4 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...

Using GPT-2 (or any other LLM model), create a simple code that generates a bedtime story with
10-page of related images.

Start coding or generate with AI.

5 of 5 12/21/2024, 12:55 PM

You might also like