0% found this document useful (0 votes)
14 views

LAB 3

Uploaded by

Farhan Khurshid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

LAB 3

Uploaded by

Farhan Khurshid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lab Assignment 3

Due Date: 26/10/2024 at 11:00 PM


Maximum Points: 500
Section: BSCS 5 (M,E), BSIT 3
Deadline is Deadline!!!
Note: Plagiarism (Copy past, Group copy, Sharing) will lead to zero marks in all the
assignments. No submissions will be accepted on email in any case
Submission will be Accepted only Google Classroom.
File name must be your AG Number: 2020_AG_XXXX_Assignment3.ipynb. File
extension never be changed
Q1: Apply Data preprocessing technique on given Covid-19 Data Set. Apply following
steps:

 Load dataset
 Print dataset
 Find shape of dataset
 Print information related given data
 Data Cleaning
o Print name of columns
 Find Null values and replace with relative values if each column contains
samples greater than 200000
o Find null values containing columns with isna()
o Apply isna().sum().
o Apply isna().sum.sum()
o Replace all null values with mean or median in numerical based
columns (if each column contains samples greater than 200000)
o Drop those columns which contain null as well as categorical values in
data set

☺ HAPPY LEARNING ☺
 Final shape of Dataset must be (401236, 33) and show following detail when
call info()

Data columns (total 33 columns):


# Column Non-Null Count Dtype
--- ------ -------------- -----
0 iso_code 401236 non-null object
1 continent 401236 non-null object
2 location 401236 non-null object
3 date 401236 non-null object
4 total_cases 401236 non-null float64
5 new_cases 401236 non-null float64
6 new_cases_smoothed 401236 non-null float64
7 total_deaths 401236 non-null float64
8 new_deaths 401236 non-null float64
9 new_deaths_smoothed 401236 non-null float64
10 total_cases_per_million 401236 non-null float64
11 new_cases_per_million 401236 non-null float64
12 new_cases_smoothed_per_million 401236 non-null float64
13 total_deaths_per_million 401236 non-null float64
14 new_deaths_per_million 401236 non-null float64
15 new_deaths_smoothed_per_million 401236 non-null float64
16 new_vaccinations_smoothed_per_million 401236 non-null float64
17 new_people_vaccinated_smoothed 401236 non-null float64
18 new_people_vaccinated_smoothed_per_hundred 401236 non-null float64
19 stringency_index 401236 non-null float64
20 population_density 401236 non-null float64

☺ HAPPY LEARNING ☺
21 median_age 401236 non-null float64
22 aged_65_older 401236 non-null float64
23 aged_70_older 401236 non-null float64
24 gdp_per_capita 401236 non-null float64
25 extreme_poverty 401236 non-null float64
26 cardiovasc_death_rate 401236 non-null float64
27 diabetes_prevalence 401236 non-null float64
28 handwashing_facilities 401236 non-null float64
29 hospital_beds_per_thousand 401236 non-null float64
30 life_expectancy 401236 non-null float64
31 human_development_index 401236 non-null float64
32 population 401236 non-null int64

 Than apply df["continent"].value_counts() you will get following combinations


 Africa 95419
 Europe 91031
 Asia 82525
 North America 68638
 Oceania 40183
 South America 23440

 Now filter and print data on the bases of above continent like, Africa, Europe,
Asia, North America, Oceania, and South America separately.

 Print info of each continent separately on the basis of name.

 Example: Let for Africa

0 iso_code 95419 non-null object


1 continent 95419 non-null object
2 location 95419 non-null object
3 date 95419 non-null object
4 total_cases 95419 non-null float64
5 new_cases 95419 non-null float64 so on,,,,,,,,,,,,

☺ HAPPY LEARNING ☺

You might also like