0% found this document useful (0 votes)
0 views

Python pandas library

The document provides a tutorial on using the Pandas library in Python, specifically focusing on reading and manipulating a CSV file containing Titanic passenger data. It demonstrates how to read the data, display the first and last few rows, summarize statistics, and modify the DataFrame by adding and removing columns. Key operations include reading CSV files, displaying data, and performing basic data analysis with Pandas.

Uploaded by

Nikhil Dayma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Python pandas library

The document provides a tutorial on using the Pandas library in Python, specifically focusing on reading and manipulating a CSV file containing Titanic passenger data. It demonstrates how to read the data, display the first and last few rows, summarize statistics, and modify the DataFrame by adding and removing columns. Key operations include reading CSV files, displaying data, and performing basic data analysis with Pandas.

Uploaded by

Nikhil Dayma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Exercise No. 12 : To study Pandas library.

In [1]: #Import the library

import numpy as np
import pandas as pd

1. Read csv
In [2]: # Reading data from a CSV file

data = pd.read_csv('F:\\MP\\ML\\ML Program Notebooks\\titanic_train.csv')


data.drop('Name', axis = 1, inplace = True)

In [ ]:

2. Display first and last five elements of


data
In [3]: # display first five elements of data
data.head()

Out[3]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [4]: # display first ten elements of data


data.head(10)
Out[4]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

5 6 0 3 male NaN 0 0 330877 8.4583 NaN Q

6 7 0 1 male 54.0 0 0 17463 51.8625 E46 S

7 8 0 3 male 2.0 3 1 349909 21.0750 NaN S

8 9 1 3 female 27.0 0 2 347742 11.1333 NaN S

9 10 1 2 female 14.0 1 0 237736 30.0708 NaN C

In [5]: # display last five elements of data


data.tail()

Out[5]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

886 887 0 2 male 27.0 0 0 211536 13.00 NaN S

887 888 1 1 female 19.0 0 0 112053 30.00 B42 S

W./C.
888 889 0 3 female NaN 1 2 23.45 NaN S
6607

889 890 1 1 male 26.0 0 0 111369 30.00 C148 C

890 891 0 3 male 32.0 0 0 370376 7.75 NaN Q

In [6]: # display last ten elements of data


data.tail(10)
Out[6]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Emba

881 882 0 3 male 33.0 0 0 349257 7.8958 NaN

882 883 0 3 female 22.0 0 0 7552 10.5167 NaN

C.A./SOTON
883 884 0 2 male 28.0 0 0 10.5000 NaN
34068

SOTON/OQ
884 885 0 3 male 25.0 0 0 7.0500 NaN
392076

885 886 0 3 female 39.0 0 5 382652 29.1250 NaN

886 887 0 2 male 27.0 0 0 211536 13.0000 NaN

887 888 1 1 female 19.0 0 0 112053 30.0000 B42

888 889 0 3 female NaN 1 2 W./C. 6607 23.4500 NaN

889 890 1 1 male 26.0 0 0 111369 30.0000 C148

890 891 0 3 male 32.0 0 0 370376 7.7500 NaN

In [7]: data

Out[7]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embark

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN

... ... ... ... ... ... ... ... ... ... ...

886 887 0 2 male 27.0 0 0 211536 13.0000 NaN

887 888 1 1 female 19.0 0 0 112053 30.0000 B42

W./C.
888 889 0 3 female NaN 1 2 23.4500 NaN
6607

889 890 1 1 male 26.0 0 0 111369 30.0000 C148

890 891 0 3 male 32.0 0 0 370376 7.7500 NaN

891 rows × 11 columns

In [ ]:

3. Describe a summary of data statistics


In [8]: data.describe()
Out[8]: PassengerId Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000

mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

In [9]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Sex 891 non-null object
4 Age 714 non-null float64
5 SibSp 891 non-null int64
6 Parch 891 non-null int64
7 Ticket 891 non-null object
8 Fare 891 non-null float64
9 Cabin 204 non-null object
10 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(4)
memory usage: 76.7+ KB

In [ ]:

4. Insert a new column to data


In [10]: data['new_column'] = 1

data.head()

Out[10]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S


In [ ]:

5. Delete a column from a dataframe


In [11]: data.drop('new_column', axis = 1).head()

Out[11]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [12]: data

Out[12]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embark

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN

... ... ... ... ... ... ... ... ... ... ...

886 887 0 2 male 27.0 0 0 211536 13.0000 NaN

887 888 1 1 female 19.0 0 0 112053 30.0000 B42

W./C.
888 889 0 3 female NaN 1 2 23.4500 NaN
6607

889 890 1 1 male 26.0 0 0 111369 30.0000 C148

890 891 0 3 male 32.0 0 0 370376 7.7500 NaN

891 rows × 12 columns

In [13]: data.drop('new_column', axis = 1, inplace = True)

In [14]: data.head()
Out[14]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [ ]:

6. Set column as index


In [15]: # set column 'PassengerId' as index

data.set_index('PassengerId')

Out[15]: Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

PassengerId

A/5
1 0 3 male 22.0 1 0 7.2500 NaN S
21171

2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

4 1 1 female 35.0 1 0 113803 53.1000 C123 S

5 0 3 male 35.0 0 0 373450 8.0500 NaN S

... ... ... ... ... ... ... ... ... ... ...

887 0 2 male 27.0 0 0 211536 13.0000 NaN S

888 1 1 female 19.0 0 0 112053 30.0000 B42 S

W./C.
889 0 3 female NaN 1 2 23.4500 NaN S
6607

890 1 1 male 26.0 0 0 111369 30.0000 C148 C

891 0 3 male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 10 columns

In [16]: data.head()
Out[16]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [ ]:

7. Sort values along an axis (ascending


order)
In [17]: data.sort_values(by='Pclass')

Out[17]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embark

445 446 1 1 male 4.0 0 2 33638 81.8583 A34

310 311 1 1 female 24.0 0 0 11767 83.1583 C54

PC
309 310 1 1 female 30.0 0 0 56.9292 E36
17485

PC
307 308 1 1 female 17.0 1 0 108.9000 C65
17758

306 307 1 1 female NaN 0 0 17421 110.8833 NaN

... ... ... ... ... ... ... ... ... ... ...

379 380 0 3 male 19.0 0 0 347069 7.7750 NaN

381 382 1 3 female 1.0 0 2 2653 15.7417 NaN

STON/O
382 383 0 3 male 32.0 0 0 2. 7.9250 NaN
3101293

371 372 0 3 male 18.0 1 0 3101267 6.4958 NaN

890 891 0 3 male 32.0 0 0 370376 7.7500 NaN

891 rows × 11 columns

In [18]: # sort values in a descending order

data.sort_values(by='Pclass', ascending = False)


Out[18]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Emba

0 1 0 3 male 22.0 1 0 A/5 21171 7.2500 NaN

SOTON/OQ
511 512 0 3 male NaN 0 0 8.0500 NaN
3101316

500 501 0 3 male 17.0 0 0 315086 8.6625 NaN

501 502 0 3 female 21.0 0 0 364846 7.7500 NaN

502 503 0 3 female NaN 0 0 330909 7.6292 NaN

... ... ... ... ... ... ... ... ... ... ...

102 103 0 1 male 21.0 0 1 35281 77.2875 D26

710 711 1 1 female 24.0 0 0 PC 17482 49.5042 C90

711 712 0 1 male NaN 0 0 113028 26.5500 C124

712 713 1 1 male 48.0 1 0 19996 52.0000 C126

445 446 1 1 male 4.0 0 2 33638 81.8583 A34

891 rows × 11 columns

In [19]: data.head()

Out[19]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

A/5
0 1 0 3 male 22.0 1 0 7.2500 NaN S
21171

1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C

STON/O2.
2 3 1 3 female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [ ]:

8. Description of Rows and Columns


In [20]: # Retrieve rows and columns count

data.shape

(891, 11)
Out[20]:

In [21]: # Display list of column names

data.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch',


Out[21]:
'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
In [22]: # Display list of row names (indexes)

data.index

RangeIndex(start=0, stop=891, step=1)


Out[22]:

In [ ]:

9. Retrieve number of non-NA and missing


values
In [23]: # display the count of non-NA values in each column

data.count()

PassengerId 891
Out[23]:
Survived 891
Pclass 891
Sex 891
Age 714
SibSp 891
Parch 891
Ticket 891
Fare 891
Cabin 204
Embarked 889
dtype: int64

In [24]: # Display missing value status for each data point

data.isna()

Out[24]: PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked

0 False False False False False False False False False True False

1 False False False False False False False False False False False

2 False False False False False False False False False True False

3 False False False False False False False False False False False

4 False False False False False False False False False True False

... ... ... ... ... ... ... ... ... ... ... ...

886 False False False False False False False False False True False

887 False False False False False False False False False False False

888 False False False False True False False False False True False

889 False False False False False False False False False False False

890 False False False False False False False False False True False

891 rows × 11 columns

In [25]: # display the count of missing values in each column


data.isna().sum()

PassengerId 0
Out[25]:
Survived 0
Pclass 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

In [ ]:

10. Display datatype of each column


In [26]: data.dtypes

PassengerId int64
Out[26]:
Survived int64
Pclass int64
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object

In [ ]:

You might also like