0% found this document useful (0 votes)
28 views6 pages

13-9-23 Data Pre-Processing - Jupyter Notebook

Uploaded by

Vidisha Arvind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

13-9-23 Data Pre-Processing - Jupyter Notebook

Uploaded by

Vidisha Arvind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

In [1]:

1 import pandas as pd
2 import matplotlib.pyplot as plt

In [35]:

1 # Read the dataset


2 data=pd.read_csv("Datanew1.csv")
3 data

Out[35]:

Country Age Salary Buy

0 France 44.0 72000.0 No

1 Spain 27.0 48000.0 Yes

2 Germany 30.0 54000.0 NaN

3 NaN 38.0 61000.0 No

4 Germany 40.0 NaN Yes

5 France 35.0 58000.0 Yes

6 Spain NaN 52000.0 No

7 France 48.0 79000.0 NaN

8 Germany 50.0 83000.0 No

9 France 37.0 67000.0 Yes

10 France 44.0 72000.0 No

11 Spain 27.0 48000.0 Yes

12 Germany 30.0 54000.0 NaN

13 NaN 38.0 61000.0 No

14 Germany 40.0 NaN Yes

15 France 35.0 58000.0 Yes

16 Spain NaN 52000.0 No

17 France 48.0 79000.0 NaN

18 Germany 50.0 83000.0 No

19 France 37.0 67000.0 Yes

In [4]:

1 type(data)
2

Out[4]:

pandas.core.frame.DataFrame
In [5]:

1 # Operations on Data Frame


2
3 # head()---top 5 rows
4
5 data.head()

Out[5]:

Country Age Salary Buy

0 France 44.0 72000.0 No

1 Spain 27.0 48000.0 Yes

2 Germany 30.0 54000.0 NaN

3 NaN 38.0 61000.0 No

4 Germany 40.0 NaN Yes

In [6]:

1 data.head(2)

Out[6]:

Country Age Salary Buy

0 France 44.0 72000.0 No

1 Spain 27.0 48000.0 Yes

In [7]:

1 # tail()- last 5 rows


2
3
4 data.tail()

Out[7]:

Country Age Salary Buy

15 France 35.0 58000.0 Yes

16 Spain NaN 52000.0 No

17 France 48.0 79000.0 NaN

18 Germany 50.0 83000.0 No

19 France 37.0 67000.0 Yes


In [8]:

1 # check the shape of the dataset


2
3 data.shape

Out[8]:

(20, 4)

In [36]:

1 # display the columns name


2
3 data.columns

Out[36]:

Index(['Country', 'Age', 'Salary', 'Buy'], dtype='object')

In [37]:

1 data.columns=['Country', 'AGE', 'Salary', 'Buy']


In [38]:

1 data

Out[38]:

Country AGE Salary Buy

0 France 44.0 72000.0 No

1 Spain 27.0 48000.0 Yes

2 Germany 30.0 54000.0 NaN

3 NaN 38.0 61000.0 No

4 Germany 40.0 NaN Yes

5 France 35.0 58000.0 Yes

6 Spain NaN 52000.0 No

7 France 48.0 79000.0 NaN

8 Germany 50.0 83000.0 No

9 France 37.0 67000.0 Yes

10 France 44.0 72000.0 No

11 Spain 27.0 48000.0 Yes

12 Germany 30.0 54000.0 NaN

13 NaN 38.0 61000.0 No

14 Germany 40.0 NaN Yes

15 France 35.0 58000.0 Yes

16 Spain NaN 52000.0 No

17 France 48.0 79000.0 NaN

18 Germany 50.0 83000.0 No

19 France 37.0 67000.0 Yes

In [39]:

1 data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country 18 non-null object
1 AGE 18 non-null float64
2 Salary 18 non-null float64
3 Buy 16 non-null object
dtypes: float64(2), object(2)
memory usage: 768.0+ bytes
In [41]:

1 data.index

Out[41]:

RangeIndex(start=0, stop=20, step=1)

In [42]:

1 data.T

Out[42]:

0 1 2 3 4 5 6 7 8 9

Country France Spain Germany NaN Germany France Spain France Germany France

AGE 44 27 30 38 40 35 NaN 48 50 37

Salary 72000 48000 54000 61000 NaN 58000 52000 79000 83000 67000

Buy No Yes NaN No Yes Yes No NaN No Yes

performing the data frame operations

head(), tail(), shape, columns, index, T, info()

mean(), median(), mode(), std(), min(), max()

In [44]:

1 # check the presence of missing value


2
3 data.isnull().sum()

Out[44]:

Country 2
AGE 2
Salary 2
Buy 4
dtype: int64
In [45]:

1 data.isna().sum()

Out[45]:

Country 2
AGE 2
Salary 2
Buy 4
dtype: int64

In [ ]:

You might also like