0% found this document useful (0 votes)

18 views5 pages

Prac3 23bme053

The document details the use of Pandas for reading and manipulating data from a CSV file containing passenger information. It includes various operations such as removing null values, inserting constants, and filling missing data based on statistical measures. The document also highlights the structure of the DataFrame and provides insights into the data's characteristics.

Uploaded by

Mazin Vora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views5 pages

Prac3 23bme053

Uploaded by

Mazin Vora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2/21/24, 11:35 PM PRAC3_23BME053

23BME053 MAZIN VORA P3 PRACTICAL - 3

In [ ]: USING PANDAS FOR READING DATA FROM CSV FILE

In [3]: import pandas as pd

df = pd.read_csv(r'C:\Users\mazin\Downloads\train.csv')
print(df.head(7))

PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
5 6 0 3
6 7 0 1

Name Sex Age SibSp \

0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
5 0 330877 8.4583 NaN Q
6 0 17463 51.8625 E46 S
C:\Users\mazin\AppData\Local\Temp\ipykernel_22848\1674290458.py:1: DeprecationWar
ning:
Pyarrow will become a required dependency of pandas in the next major release of
pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better i
nteroperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

import pandas as pd

In [4]: df.info()

localhost:8888/doc/tree/Documents/PRAC3_23BME053.ipynb 1/5
2/21/24, 11:35 PM PRAC3_23BME053

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [5]: print(df.shape)

(891, 12)

In [6]: print(df.describe())

PassengerId Survived Pclass Age SibSp \

count 891.000000 891.000000 891.000000 714.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008
std 257.353842 0.486592 0.836071 14.526497 1.102743
min 1.000000 0.000000 1.000000 0.420000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000
50% 446.000000 0.000000 3.000000 28.000000 0.000000
75% 668.500000 1.000000 3.000000 38.000000 1.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000

Parch Fare
count 891.000000 891.000000
mean 0.381594 32.204208
std 0.806057 49.693429
min 0.000000 0.000000
25% 0.000000 7.910400
50% 0.000000 14.454200
75% 0.000000 31.000000
max 6.000000 512.329200

In [ ]: MAKING COPIES OF INITIAL DATAFRAME FOR APPLING DIFFERENT TYPES OF METHOD FOR FIL

In [7]: df1 = df.copy()

df2 = df.copy()
df3 = df.copy()
df4 = df.copy()
df5 = df.copy()

In [ ]: REMOVING ROWS HAVING NULL VALUES

In [8]: df1 = df1.dropna(axis=0)

print(df1.shape)
print(df.shape)

localhost:8888/doc/tree/Documents/PRAC3_23BME053.ipynb 2/5
2/21/24, 11:35 PM PRAC3_23BME053

(183, 12)
(891, 12)

In [ ]: REMOVING COLUMNS HAVING NULL VALUES

In [9]: df2 = df2.dropna(axis=1)

print(df2.shape)
print(df.shape)

(891, 9)
(891, 12)

In [10]: df3.loc[1,"Age"]

Out[10]: 38.0

In [ ]: INSERTING A CONSTANT IN

In [11]: df3.loc[df3.loc[:,"Age"].isna(),"Age"] = 21
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [12]: df3.loc[df3["Embarked"].isna(),"Embarked"]='S'
df3.info()

localhost:8888/doc/tree/Documents/PRAC3_23BME053.ipynb 3/5
2/21/24, 11:35 PM PRAC3_23BME053

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 891 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In Python, .isna() is a method used to check if a Pandas DataFrame or Series contains

missing or null values. It returns a boolean DataFrame or Series where True indicates that
the corresponding element is null or missing and False indicates that it is not.

In [13]: df3.loc[df3["Cabin"].isna(),"Cabin"]='C85'
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 891 non-null object
11 Embarked 891 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [15]: import numpy as np

import statistics as stats
meanage=np.mean(df.loc[~df.loc[:,"Age"].isna(),"Age"].values)
meanage=np.round(meanage)
df4.loc[df4.loc[:,"Age"].isna(),"Age"]=meanage
modeembarked=stats.mode(df.loc[~df.loc[:,"Embarked"].isna(),"Embarked"])
df4.loc[df4.loc[:,"Embarked"].isna(),"Embarked"]=modeembarked
df4.info()

localhost:8888/doc/tree/Documents/PRAC3_23BME053.ipynb 4/5
2/21/24, 11:35 PM PRAC3_23BME053

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 891 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [21]: meanage0 = df.loc[df["Survived"] == 0, "Age"].mean()

meanage1 = df.loc[df["Survived"] == 1, "Age"].mean()
df5.loc[df5["Age"].isna() & (df5["Survived"] == 0), "Age"] = meanage0
df5.loc[df5["Age"].isna() & (df5["Survived"] == 1), "Age"] = meanage1
df5.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [ ]:

localhost:8888/doc/tree/Documents/PRAC3_23BME053.ipynb 5/5

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Install
No ratings yet
Install
31 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
A09Ass01 - Jupyter Notebook
No ratings yet
A09Ass01 - Jupyter Notebook
8 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Assign9.Ipynb - Colab
No ratings yet
Assign9.Ipynb - Colab
4 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Homework 1
No ratings yet
Homework 1
17 pages
U19ADS2035-Python For Data Science Laboratory Page No:17
No ratings yet
U19ADS2035-Python For Data Science Laboratory Page No:17
5 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
5 pages
Titanic Survival Prediction ML
No ratings yet
Titanic Survival Prediction ML
36 pages
Experiment 9
No ratings yet
Experiment 9
7 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
Assignment 1 DSB Da
No ratings yet
Assignment 1 DSB Da
14 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Titanic
100% (2)
Titanic
13 pages
???? ???????????? ???? ??????
No ratings yet
???? ???????????? ???? ??????
63 pages
Seaborn Ploting in Titanic
No ratings yet
Seaborn Ploting in Titanic
18 pages
Python Pandas Library
No ratings yet
Python Pandas Library
10 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Untitled3.Ipynb - Colab
No ratings yet
Untitled3.Ipynb - Colab
6 pages
Rajat DM
No ratings yet
Rajat DM
54 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
No ratings yet
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
16 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
TITANIC CLASSIFICATION - Task1
No ratings yet
TITANIC CLASSIFICATION - Task1
2 pages
Learneverythingai 1695069129
No ratings yet
Learneverythingai 1695069129
56 pages
FDS Practical 2
No ratings yet
FDS Practical 2
8 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Day 20
No ratings yet
Day 20
5 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Titanic
No ratings yet
Titanic
22 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Aiml Lab04&5 - Output
No ratings yet
Aiml Lab04&5 - Output
18 pages
ML Lab File
No ratings yet
ML Lab File
19 pages
Programs Week 10
No ratings yet
Programs Week 10
11 pages
Data Cleaning and Exploratory Analysis On A Public Dataset
No ratings yet
Data Cleaning and Exploratory Analysis On A Public Dataset
11 pages
Onkar Exp 3 - Jupyter Notebook
No ratings yet
Onkar Exp 3 - Jupyter Notebook
2 pages
ML 3
No ratings yet
ML 3
9 pages
Unlocking Python: A Comprehensive Guide for Beginners
From Everand
Unlocking Python: A Comprehensive Guide for Beginners
Ryan Mitchell
No ratings yet
Prac5 23bme053
No ratings yet
Prac5 23bme053
1 page
MP1
No ratings yet
MP1
31 pages
ES Unit-2
No ratings yet
ES Unit-2
55 pages
ProgressReport 23BME053
No ratings yet
ProgressReport 23BME053
2 pages
ANALOG ELECTRONICS by MAP
No ratings yet
ANALOG ELECTRONICS by MAP
53 pages
REVIEW OF DC CIRCUITS by MAP
No ratings yet
REVIEW OF DC CIRCUITS by MAP
75 pages
ES Unit-4
No ratings yet
ES Unit-4
18 pages
AC FUNDAMENTALS by MAP
No ratings yet
AC FUNDAMENTALS by MAP
22 pages
ES Unit-3
No ratings yet
ES Unit-3
23 pages
Cloud Computing Issues and Challenges
No ratings yet
Cloud Computing Issues and Challenges
7 pages
Final Group 1
No ratings yet
Final Group 1
73 pages
Lecture Notes On Operating Systems
No ratings yet
Lecture Notes On Operating Systems
7 pages
Abhishek Chauhan Resume
No ratings yet
Abhishek Chauhan Resume
2 pages
Developer Manual en
No ratings yet
Developer Manual en
51 pages
NGSign WS v2.20 1.0
No ratings yet
NGSign WS v2.20 1.0
49 pages
Practical File-2
No ratings yet
Practical File-2
40 pages
Ultimate Resource
No ratings yet
Ultimate Resource
30 pages
Python
No ratings yet
Python
7 pages
Lecture 4
No ratings yet
Lecture 4
34 pages
L1 IT9 Creating Websites (Website Builders)
No ratings yet
L1 IT9 Creating Websites (Website Builders)
3 pages
DIPAM_PATEL_CV_2025
No ratings yet
DIPAM_PATEL_CV_2025
3 pages
How To Use Generics in Java PDF
No ratings yet
How To Use Generics in Java PDF
1 page
CR Format
No ratings yet
CR Format
11 pages
8 Modularization Techniques
100% (2)
8 Modularization Techniques
34 pages
Ranveer Resume
No ratings yet
Ranveer Resume
1 page
TotT 2009 08 07
No ratings yet
TotT 2009 08 07
1 page
Libft en
No ratings yet
Libft en
19 pages
How To Update Non-Jailbroken Iphones & Ipads To Unsigned Firmware With DelayOTA PDF
No ratings yet
How To Update Non-Jailbroken Iphones & Ipads To Unsigned Firmware With DelayOTA PDF
26 pages
INTRODUCTION TO JAVA Question With Answer 2018
No ratings yet
INTRODUCTION TO JAVA Question With Answer 2018
7 pages
2022 1 Praktdddp Tugas 5 12d2 211210007 Febriyanto Maria Bang Ritan
No ratings yet
2022 1 Praktdddp Tugas 5 12d2 211210007 Febriyanto Maria Bang Ritan
7 pages
Electron
No ratings yet
Electron
7 pages
Mad Lab Manual
No ratings yet
Mad Lab Manual
60 pages
P.V.Kailash: Email Contact No
No ratings yet
P.V.Kailash: Email Contact No
3 pages
Java
No ratings yet
Java
211 pages
TSPV Eligible Courses - Schedule Q1 2024 EUD
No ratings yet
TSPV Eligible Courses - Schedule Q1 2024 EUD
10 pages
Option Explicit
No ratings yet
Option Explicit
25 pages
Online Banking - PHP
No ratings yet
Online Banking - PHP
18 pages
Create Custom MDX Query
No ratings yet
Create Custom MDX Query
23 pages

Prac3 23bme053

Uploaded by

Prac3 23bme053

Uploaded by

2/21/24, 11:35 PM PRAC3_23BME053

23BME053 MAZIN VORA P3 PRACTICAL - 3

In [ ]: USING PANDAS FOR READING DATA FROM CSV FILE

In [3]: import pandas as pd

PassengerId Survived Pclass \

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

PassengerId Survived Pclass Age SibSp \

In [7]: df1 = df.copy()

In [ ]: REMOVING ROWS HAVING NULL VALUES

In [8]: df1 = df1.dropna(axis=0)

In [ ]: REMOVING COLUMNS HAVING NULL VALUES

In [9]: df2 = df2.dropna(axis=1)

In Python, .isna() is a method used to check if a Pandas DataFrame or Series contains

In [15]: import numpy as np

In [21]: meanage0 = df.loc[df["Survived"] == 0, "Age"].mean()

You might also like