0% found this document useful (0 votes)
6 views2 pages

TASK2

Uploaded by

merrtatay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

TASK2

Uploaded by

merrtatay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Please read this section and post your replies to Questions 4-5-6-7)

Below is a dataset for daily vaccination numbers per country. Please implement a
small program that reads the input file and produces the desired outputs for below
questions, optimized for speed of operation. (Preferred language is Python or C#)

Please paste the link to the code implementation tasks as answer shared using
pastebin, github, bitbucket, coderpad.io, dotnetfiddle, jsfiddle, or any other
public web-site.

Dataset link :
https://www.piworks.net/Upload/Document/Original/country_vaccination_stats.csv

4- Code Implementation Task: Implement code to fill the missing data (impute) in
daily_vaccinations column per country with the minimum daily vaccination number of
relevant countries.
Note: If a country does not have any valid vaccination number yet, fill it with “0”
(zero).
Please provide the link to your code as answer to this question.

5- Code Implementation Task: Implement code to list the top-3 countries with
highest median daily vaccination numbers by considering missing values imputed
version of dataset.
Please provide the link to your code as answer to this question.

6- What is the number of total vaccinations done on 1/6/2021 (MM/DD/YYYY) by


considering missing values imputed version of dataset? Please just provide the
number as answer.

7- Code Implementation Task: If this list would be a database table, please provide
SQL query to fill in the missing daily vaccination numbers with discrete median of
country as similar to question a.
Please provide the link to your code as answer to this question.
Note: This time SQL equivalent is requested, and imputation value is median of each
country, not minimum. Please remember filling countries with zero if they do not
have any valid daily_vaccination records like Kuwait.

https://drive.google.com/file/d/13vQzDAzRt9pJpQHb-kswHk8QSHLXqRjU/view

import pandas as pd

# read the csv file


df =
pd.read_csv('https://www.piworks.net/Upload/Document/Original/country_vaccination_s
tats.csv')

# group the dataframe by country


grouped = df.groupby('country')

# fill the missing values with the minimum daily vaccinations of the relevant
country
df['daily_vaccinations'] = grouped['daily_vaccinations'].transform(lambda x:
x.fillna(x.min()))

# fill the remaining missing values with 0


df['daily_vaccinations'] = df['daily_vaccinations'].fillna(0)
# save the output to a csv file
df.to_csv('country_vaccination_stats_imputed.csv', index=False)

import pandas as pd

# read the imputed csv file


df = pd.read_csv('country_vaccination_stats_imputed.csv')

# group the dataframe by country


grouped = df.groupby('country')

# calculate the median daily vaccinations per country


median_daily_vaccinations = grouped['daily_vaccinations'].median()

# get the top-3 countries with highest median daily vaccinations


top_countries = median_daily_vaccinations.nlargest(3)

print(top_countries)

https://pastebin.pl/view/15ed4355

You might also like