HW 1 - Version 2.ipynb - Colab
HW 1 - Version 2.ipynb - Colab
Homework Assignment 1
For this homework you will need to write code that analyzes real-world datasets. The code needs
to be written in Python using the sqlite3 package.
Please note: You need to answer only the questions that match your ID �rst digit.
Task 1 (for everyone): Write a code that uses the babies names dataset (use
NationalNames.csv) and creates a table named (Names) with the dataset data and the following
columns: 'State', 'Gender', 'Name', 'Count' , and 'Year' (5pt) Bonus: Load the data using a Batch
INSERT SQL Query (2pt)
Task 2 (for everyone): Write a query that returns the statistics for the name Mary (5pt). Use the
the timeit package to measure the time it takes the query to run (5pt). Bonus: Create an index on
the Name column and use the the timeit package to measure the time it takes the query to run
with the index (5pt)
# which question to answer - put your ID number and run the code
your_id = "<fill_your_id>"
q = int(your_id) % 4 + 1
print("You need to answer question number %s" % q)
1 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...
Question 1: Write a function that returns how many female and male babies were born in a given
state in a given year. Use it to calculate the number of babies born in WA in 2000 (10pt)
Question 2: Write a function that returns how many female babies were born between a given
range of years. Use it to calculate how many babies were born between 1850 and 1950 (10pt)
Question 3: Write a function that returns the most common female name in a given state. Use it
to calculate the most common female name in CA in 1999 (10pt)
Question 4: Write a function that returns how many male babies named Robert where born in a
given state in a given year. Use it to �nd the state in which the highest number of babies Robert
where born in 1950 (10pt)
Question (for everyone): For the state of CA write code that calculates the third most popular
female/male names in each decade (10pt). Bonus: Visualize it somehow using Matplotlib (5pt)
Question 1: Write a function that returns the number of bars manufactured where the bars'
BroadBean Origin is a given country. Use the function to calculate the number of bars where
BroadBean Origin is 'Ecuador' (15pt)
2 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...
Question 2: Write a function that returns the maximal and average cocoa percentage in a bar
manufactured by a company in a speci�c country. Use the function to calculate the minimal and
average cocoa percentage in bars manufactured by a Swiss company (15pt).
Question 3: Calculate the second most common bean type(s) and the most rare bean type(s) (15
pt)
Question 4: Calculate the number of reviews and the average rating in each year. Calculate the
number of reviews and the average rating of each company in each year (15pt)
Task 1 (for everyone): Load the dataset to SQLite DB using PonyORM (10pt)
# which question to answer - put your ID number and run the code
your_id = "<fill_your_id>"
q = int(your_id) % 3 + 1
print("You need to answer question number %s" % q)
Question 1: On average which project category received the highest number of backers? (15 pt)
3 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...
Question 2: On average which project category received the highest pledged USD? (15 pt)
Question 3: In which month occurred the highest number of projects? (15 pt)
Using the Oscars Dataset, please answer only one of the following questions (you can chose):
Question 1: Who is the male actress with the most Oscar nominees? (10pt)
Question 2: Who is the female director with the most Oscar nominees? (10pt)
Question 3: Which top-10 movies received the highest number of Oscar nominees? (10pt)
Question 4: Write a function that receives an actor's name and returns the actor’s number of
Oscar nominees. Use the function to calculate the number of times Leonardo DiCaprio was a
nominee (10pt)
4 of 5 12/21/2024, 12:55 PM
HW 1 - Version 2.ipynb - Colab https://colab.research.google.com/drive/1zJAJYEWSeDDH3W5Zkku...
Using GPT-2 (or any other LLM model), create a simple code that generates a bedtime story with
10-page of related images.
5 of 5 12/21/2024, 12:55 PM