0% found this document useful (0 votes)
3 views8 pages

Exp 3 Data Wrangling SDK Ok

The document outlines a Python experiment focusing on data wrangling using the Titanic dataset. It includes examples of importing libraries, handling missing values, replacing NaN values, and changing data types. The document demonstrates various data manipulation techniques such as dropping missing values, calculating means, and renaming columns.

Uploaded by

gmranuj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

Exp 3 Data Wrangling SDK Ok

The document outlines a Python experiment focusing on data wrangling using the Titanic dataset. It includes examples of importing libraries, handling missing values, replacing NaN values, and changing data types. The document demonstrates various data manipulation techniques such as dropping missing values, calculating means, and renaming columns.

Uploaded by

gmranuj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

Experiment No:03

#Name : ____________________
#Class :SE Branch : E&TC
#Roll no :_________ Subject: Python for Data Analytics
# Data Wrangling
#Example no.01
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
kashti=sns.load_dataset('titanic')
ks1=kashti
#ks2=kashti
#ks=sns.load_dataset('titanic')
kashti.head()
Output:-
survived pclass sex age sibsp parch fare embarked class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third

who adult_male deck embark_town alive alone


0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
#Example no.2
(kashti['age']+1).head(10)
Output:-
0 23.0
1 39.0
2 27.0
3 36.0
4 36.0
5 NaN
6 55.0
7 3.0
8 28.0
9 15.0
Name: age, dtype: float64
#Example no.3
# dealing with missing value?
#where exactly missing values are?
kashti.isnull().sum()
Output:-
survived 0
pclass 0
sex 0
age 177
sibsp 0
parch 0
fare 0
embarked 2
class 0
who 0
adult_male 0
deck 688
embark_town 2
alive 0
alone 0
dtype: int64
#Example no.4
#use drop.na method
print(kashti.shape)
kashti.dropna(subset=['deck'],axis=0,inplace=True) #this will remove specifically
#inplace=true mdifies the data frame
kashti.isnull().sum() # find again null value
# to drop na
kashti.dropna()
Output:-
(203, 15)
survived 0
pclass 0
sex 0
age 0
sibsp 0
parch 0
fare 0
embarked 0
class 0
who 0
adult_male 0
deck 0
embark_town 0
alive 0
alone 0
dtype: int64
#Example no.05

# to drop na
kashti.dropna()
# to update the main dataframe
kashti=kashti.dropna()
kashti.dropna().isnull().sum()
kashti.shape
Output:-(182, 15)

#Example no.6
ks1.isnull().sum()
Output:-
survived 0
pclass 0
sex 0
age 19
sibsp 0
parch 0
fare 0
embarked 2
class 0
who 0
adult_male 0
deck 0
embark_town 2
alive 0
alone 0
dtype: int64
#Example no.7
# replacing missing values with the average of that column
# finding an average (mean)
mean=ks1['age'].mean()
mean
Output:- 35.77945652173913

#Example no.8
#replacing nan with mean of the data(updating as well)
ks1['age']=ks1['age'].replace(np.nan,mean)
ks1['age']
ks1.isnull().sum()
Output:-
survived 0
pclass 0
sex 0
age 0
sibsp 0
parch 0
fare 0
embarked 2
class 0
who 0
adult_male 0
deck 0
embark_town 2
alive 0
alone 0
dtype: int64
kashti.dtypes
Output:-
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object

ks1.dtypes
Output:-
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object
# use this method to convert datatype from one to another format
kashti['survived']=kashti['survived'].astype('int64')
kashti.dtypes
Output:-
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object
kashti['survived']=kashti['survived'].astype('float64')
kashti.dtypes
Output:-
survived float64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object
kashti['survived']=kashti['survived'].astype('int64')
kashti.dtypes
Output:-
survived Int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object
kashti['age']=ks1['age']*365
ks1.head(10)
Output:-
survived pclass sex age sibsp parch fare embarked
1 1 1 female 38.000000 1 0 71.2833 C
3 1 1 female 35.000000 1 0 53.1000 S
6 0 1 male 54.000000 0 0 51.8625 S
10 1 3 female 4.000000 1 1 16.7000 S
11 1 1 female 58.000000 0 0 26.5500 S
21 1 2 male 34.000000 0 0 13.0000 S
23 1 1 male 28.000000 0 0 35.5000 S
27 0 1 male 19.000000 3 2 263.0000 S
31 1 1 female 35.779457 1 0 146.5208 C
52 1 1 female 49.000000 1 0 76.7292 C

class who adult_male deck embark_town alive alone


1 First woman False C Cherbourg yes False
3 First woman False C Southampton yes False
6 First man True E Southampton no True
10 Third child False G Southampton yes False
11 First woman False C Southampton yes True
21 Second man True D Southampton yes True
23 First man True A Southampton yes True
27 First man True C Southampton no False
31 First woman False B Cherbourg yes False
52 First woman False D Cherbourg yes False
# always rename afterwards
ks1.rename(columns={"age":"age in days"},inplace=True)
ks1.head()
kashti.head()
Output:-
survived pclass sex age sibsp parch fare embarked class \
1 1 1 female 13870.0 1 0 71.2833 C First
3 1 1 female 12775.0 1 0 53.1000 S First
6 0 1 male 19710.0 0 0 51.8625 S First
10 1 3 female 1460.0 1 1 16.7000 S Third
11 1 1 female 21170.0 0 0 26.5500 S First

who adult_male deck embark_town alive alone


1 woman False C Cherbourg yes False
3 woman False C Southampton yes False
6 man True E Southampton no True
10 child False G Southampton yes False
11 woman False C Southampton yes True

You might also like