0% found this document useful (0 votes)
20 views6 pages

LP Practical ! Jupyter Notebook

Uploaded by

xifavo8319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

LP Practical ! Jupyter Notebook

Uploaded by

xifavo8319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

In [1]: import os

os.getcwd()

Out[1]: 'C:\\Users\\kunal'

In [2]: import pandas as pd # used to access data which in table format

In [5]: #import the database


df = pd.read_csv('Heart.csv')

In [6]: df.head() #df stands for data frame (it shows the first 5 entry of dataset)

Out[6]:
Unnamed: 0 Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca Thal AHD

0 1 63 1 typical 145 233 1 2 150 0 2.3 3 0.0 fixed No

1 2 67 1 asymptomatic 160 286 0 2 108 1 1.5 2 3.0 normal Yes

2 3 67 1 asymptomatic 120 229 0 2 129 1 2.6 2 2.0 reversable Yes

3 4 37 1 nonanginal 130 250 0 0 187 0 3.5 3 0.0 normal No

4 5 41 0 nontypical 130 204 0 2 172 0 1.4 1 0.0 normal No

In [7]: #shape find no. of rows and columns


df.shape

Out[7]: (303, 15)

In [8]: # Finding missing values


df.isnull()

Out[8]:
Unnamed: 0 Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca Thal AHD

0 False False False False False False False False False False False False False False False

1 False False False False False False False False False False False False False False False

2 False False False False False False False False False False False False False False False

3 False False False False False False False False False False False False False False False

4 False False False False False False False False False False False False False False False

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

298 False False False False False False False False False False False False False False False

299 False False False False False False False False False False False False False False False

300 False False False False False False False False False False False False False False False

301 False False False False False False False False False False False False False False False

302 False False False False False False False False False False False False True False False

303 rows × 15 columns


In [9]: # view in summary format false 0 true 1 we add every column
df.isnull().sum()

Out[9]: Unnamed: 0 0
Age 0
Sex 0
ChestPain 0
RestBP 0
Chol 0
Fbs 0
RestECG 0
MaxHR 0
ExAng 0
Oldpeak 0
Slope 0
Ca 4
Thal 2
AHD 0
dtype: int64

In [10]: # we can use other method this gives the not null values
df.count()

Out[10]: Unnamed: 0 303


Age 303
Sex 303
ChestPain 303
RestBP 303
Chol 303
Fbs 303
RestECG 303
MaxHR 303
ExAng 303
Oldpeak 303
Slope 303
Ca 299
Thal 301
AHD 303
dtype: int64

In [11]: # find data type of each column by using attribute not method
df.dtypes

Out[11]: Unnamed: 0 int64


Age int64
Sex int64
ChestPain object
RestBP int64
Chol int64
Fbs int64
RestECG int64
MaxHR int64
ExAng int64
Oldpeak float64
Slope int64
Ca float64
Thal object
AHD object
dtype: object
In [12]: # find where out zeros in column mark 0 as true
df==0

Out[12]:
Unnamed: 0 Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca Thal AHD

0 False False False False False False False False False True False False True False False

1 False False False False False False True False False False False False False False False

2 False False False False False False True False False False False False False False False

3 False False False False False False True True False True False False True False False

4 False False True False False False True False False True False False True False False

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

298 False False False False False False True True False True False False True False False

299 False False False False False False False True False True False False False False False

300 False False False False False False True True False False False False False False False

301 False False True False False False True False False True True False False False False

302 False False False False False False True True False True True False False False False

303 rows × 15 columns

In [13]: # to highlight zeros


df[df ==0]

Out[13]:
Unnamed: 0 Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca Thal AHD

0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN 0.0 NaN NaN

1 NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN

2 NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN

3 NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN 0.0 NaN NaN 0.0 NaN NaN

4 NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN 0.0 NaN NaN 0.0 NaN NaN

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

298 NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN 0.0 NaN NaN 0.0 NaN NaN

299 NaN NaN NaN NaN NaN NaN NaN 0.0 NaN 0.0 NaN NaN NaN NaN NaN

300 NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN

301 NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN 0.0 0.0 NaN NaN NaN NaN

302 NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN 0.0 0.0 NaN NaN NaN NaN

303 rows × 15 columns

In [14]: # count number of zeros in each column


df[df==0].count()

Out[14]: Unnamed: 0 0
Age 0
Sex 97
ChestPain 0
RestBP 0
Chol 0
Fbs 258
RestECG 151
MaxHR 0
ExAng 204
Oldpeak 99
Slope 0
Ca 176
Thal 0
AHD 0
dtype: int64
In [15]: # find mean age from age column so we first list the all columns name
df.columns

Out[15]: Index(['Unnamed: 0', 'Age', 'Sex', 'ChestPain', 'RestBP', 'Chol', 'Fbs',


'RestECG', 'MaxHR', 'ExAng', 'Oldpeak', 'Slope', 'Ca', 'Thal', 'AHD'],
dtype='object')

In [16]: # accessing age column called as label based listing and also want to find mean hence .mean()
df['Age'].mean()

Out[16]: 54.43894389438944

In [21]: # extracting given columns only for more than one column use double brackets
newdf =df[['Age' , 'Sex' , 'ChestPain' , 'Chol']]

In [22]: #store above data in one variable and show it


newdf

Out[22]:
Age Sex ChestPain Chol

0 63 1 typical 233

1 67 1 asymptomatic 286

2 67 1 asymptomatic 229

3 37 1 nonanginal 250

4 41 0 nontypical 204

... ... ... ... ...

298 45 1 typical 264

299 68 1 asymptomatic 193

300 57 1 asymptomatic 131

301 57 0 nontypical 236

302 38 1 nonanginal 175

303 rows × 4 columns

In [24]: # for cross validation we pass 75% data for training sklearn is library in which train_test method is present
#cross validation
from sklearn.model_selection import train_test_split

In [26]: train, test= train_test_split(df, random_state=0 ,test_size=0.25) # we can give any random state to shuffle da
# by default also size is given as 75% and 25%

In [27]: train.shape

Out[27]: (227, 15)

In [28]: test.shape

Out[28]: (76, 15)

In [29]: import numpy as np # import if you want to create array we take some randdom data for testing

In [30]: actual=list(np.ones(45)) + list(np.zeros(55)) # create array as actual ones mesans aray of 1,1,1....
#zeros for remaining 55 values
In [31]: np.array(actual)

Out[31]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [32]: predicted=list(np.ones(40)) + list(np.zeros(52)) + list(np.ones(8))


np.array(predicted)

Out[32]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1.])

In [33]: from sklearn.metrics import ConfusionMatrixDisplay

In [36]: ConfusionMatrixDisplay.from_predictions(actual,predicted)

Out[36]: <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1f7fe3394f0>

In [38]: # in above matrix 47 and 40 values are matching


from sklearn.metrics import classification_report

In [39]: print(classification_report(actual, predicted))

precision recall f1-score support

0.0 0.90 0.85 0.88 55


1.0 0.83 0.89 0.86 45

accuracy 0.87 100


macro avg 0.87 0.87 0.87 100
weighted avg 0.87 0.87 0.87 100
In [40]: # recall indicate accuracy for individual class out of 55 i.e 44+8 , 47 are matching hence 47/55=0.85
#40/45=0.89

#for precison 47+5= 52 , 47/52=0.90

#f1-score is mean of 0.90 and 0.85 (harmonic mean)

# direct formula for accuracy
from sklearn.metrics import accuracy_score
accuracy_score(actual , predicted)

Out[40]: 0.87

In [ ]: ​

You might also like