0% found this document useful (0 votes)
19 views6 pages

Data Science Practical 9

The document details a practical exercise using Jupyter Notebook to analyze a diabetes dataset with 768 entries and 9 columns. It involves importing the dataset, exploring its structure, and applying linear regression and K-nearest neighbors models to predict diabetes outcomes. The accuracy of the linear regression model is reported at approximately 72.73%.

Uploaded by

Om Bachhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Data Science Practical 9

The document details a practical exercise using Jupyter Notebook to analyze a diabetes dataset with 768 entries and 9 columns. It involves importing the dataset, exploring its structure, and applying linear regression and K-nearest neighbors models to predict diabetes outcomes. The accuracy of the linear regression model is reported at approximately 72.73%.

Uploaded by

Om Bachhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3/15/24, 1:03 PM Pract 9 - Jupyter Notebook

Practical 9
In [1]: import pandas as pd

In [2]: df=pd.read_csv('diabetes.csv')

In [3]: df.head(6)

Out[3]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

5 5 116 74 0 0 25.6 0.201 30 0

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

In [5]: df.describe()

Out[5]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction

count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000

mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 0.471876 33.240

std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760

min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000

25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000

50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000

75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000

max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000

In [6]: df.columns

Out[6]: Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',


'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
dtype='object')

In [7]: y=df['Outcome']

localhost:8888/notebooks/Desktop/Data S Practical/Data/Pract 9.ipynb 1/6


3/15/24, 1:03 PM Pract 9 - Jupyter Notebook

In [8]: x=df[['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',


'BMI', 'DiabetesPedigreeFunction', 'Age']]

In [9]: df.shape

Out[9]: (768, 9)

In [10]: x.shape,y.shape

Out[10]: ((768, 8), (768,))

In [16]: from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state= 23)

In [17]: x_train.shape ,x_test.shape ,y_train.shape ,y_test.shape

Out[17]: ((537, 8), (231, 8), (537,), (231,))

In [18]: from sklearn.linear_model import LinearRegression

In [19]: model = LinearRegression()

In [20]: model.fit(x_train, y_train)

Out[20]: ▾ LinearRegression
LinearRegression()

In [21]: y_pred=model.predict(x_test)

In [22]: y_pred

Out[22]: array([ 0.3902873 , 0.98050369, 0.69860448, 0.38370667, 0.16942134,


0.54341467, 0.6634484 , 0.28851541, 0.29446341, 0.99211187,
0.3336832 , 0.24153111, 0.61408084, 1.01913506, 0.16896795,
0.77058446, -0.04324103, 0.00513359, 0.21999668, 0.37967129,
0.15905319, 0.300937 , 0.29683961, 0.38047096, 0.63895148,
0.27249391, 0.07122218, -0.0620718 , 0.20315493, 0.41660028,
-0.00273878, 0.06634673, 0.49635303, 0.19501587, 0.43415791,
0.28588567, 0.20406664, 0.46073746, 0.18716791, 0.2959408 ,
0.37639319, 0.69704002, 0.32831732, 0.31933481, 0.19180343,
0.83446784, 0.2299272 , 0.5873763 , 0.66018584, 0.41992143,
0.48173494, 0.24970499, 0.07600902, 1.04108796, 0.34173773,
0.04193416, 0.41188794, 0.66538149, 0.16732282, 0.15636314,
0.23514829, 0.25434215, 0.12914508, 0.37635347, 0.10745146,
0.66049595, 0.4298873 , 0.37888447, 0.10599681, 0.14341 ,
0.32689914, 0.20404038, 0.44111383, 0.20009057, 0.22284185,
0.91910781, 0.54231851, 0.44719395, 0.41212563, 0.59358623,
0.02489535, 0.11258134, 0.27834427, 0.87620394, 0.25705258,
0.06886981, 0.42871676, 0.27730822, 0.01694301, 0.03091513,
0.70605075, 0.19074226, 0.25812647, 0.58653571, -0.06595021,
0 67410775 0 30546431 0 31538989 0 39650717 0 91346971
In [23]: y_test

Out[23]: 93 1
228 0
424 1
635 1
684 0
..
271 0
46 0
476 1
130 1
359 1
Name: Outcome, Length: 231, dtype: int64

localhost:8888/notebooks/Desktop/Data S Practical/Data/Pract 9.ipynb 2/6


3/15/24, 1:03 PM Pract 9 - Jupyter Notebook

In [24]: from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_erro

In [77]: e1 =mean_absolute_error(y_test,y_pred)

In [26]: per_e= mean_absolute_percentage_error(y_test,y_pred)

In [27]: per_e

Out[27]: 773763381247726.2

In [78]: accuracy =(1-e1)*100

In [79]: accuracy

Out[79]: 72.72727272727273

In [31]: mean_squared_error(y_test,y_pred)

Out[31]: 0.17082809946294492

Model 2

In [32]: from sklearn.neighbors import KNeighborsRegressor

In [33]: model2 = KNeighborsRegressor()

In [34]: model2.fit(x_train,y_train)

Out[34]: ▾ KNeighborsRegressor
KNeighborsRegressor()

In [35]: y_pred2 =model2.predict(x_test)

In [36]: y_pred2

Out[36]: array([0.2, 0.6, 0. , 0.6, 0.4, 0.4, 0.8, 0.2, 0.4, 1. , 0. , 0.2, 0.4,
1. , 0.2, 0.8, 0. , 0. , 0.4, 0.8, 0.2, 0.6, 0. , 0.4, 0.6, 0.2,
0. , 0. , 0.4, 0.4, 0.2, 0.2, 0.6, 0.2, 0.4, 0.4, 0. , 0.6, 0.2,
0.4, 0.4, 0.8, 0.6, 0.4, 0.2, 1. , 0.2, 0.6, 1. , 0.4, 0.4, 0. ,
0. , 0.8, 0.6, 0. , 0.2, 0.6, 0.4, 0. , 0.2, 0.2, 0.2, 0.4, 0. ,
0.6, 0.6, 0.4, 0.2, 0. , 0. , 0.2, 0.6, 0.6, 0.4, 1. , 0.8, 0.2,
0.6, 0.6, 0.2, 0.2, 0. , 0.6, 0.2, 0.2, 0.6, 0.6, 0.2, 0. , 0.6,
0.4, 0.2, 0.8, 0. , 0.6, 0.4, 0.4, 0.4, 0.8, 0. , 0.6, 0.2, 0.6,
0.8, 0. , 1. , 0.4, 0.8, 0. , 0. , 0. , 0.6, 0.2, 0.2, 0.8, 0.8,
0. , 0. , 0. , 0. , 0.8, 0. , 0.6, 0. , 0.4, 0.6, 1. , 0.6, 1. ,
0.2, 0.2, 0. , 1. , 0. , 0.2, 0.4, 0. , 0.6, 0.6, 0. , 0.2, 0. ,
0.4, 1. , 1. , 0. , 0.4, 1. , 0.4, 0. , 0.6, 0.4, 0.8, 0.6, 0. ,
0.2, 1. , 0.6, 0.2, 0.4, 0.2, 0.2, 0.8, 0. , 0.4, 0.6, 0.6, 0.2,
0. , 0.2, 0.2, 0.2, 0.2, 0.2, 1. , 0. , 0. , 0.2, 0.2, 0.6, 0.2,
0.4, 0.4, 0.2, 0. , 0.6, 0.4, 1. , 0.4, 0.6, 1. , 0. , 0.2, 0.2,
0.8, 0.2, 0. , 0.4, 0.4, 0.6, 0.4, 0.2, 0. , 0.6, 0.2, 0.4, 0. ,
0.4, 0.6, 0.4, 0.2, 0.4, 0.4, 0.6, 0.2, 0. , 0. , 0.2, 0. , 0.8,
0.4, 0.2, 1. , 0.6, 0. , 0.2, 0.8, 0.2, 0.8, 0.6])

localhost:8888/notebooks/Desktop/Data S Practical/Data/Pract 9.ipynb 3/6


3/15/24, 1:03 PM Pract 9 - Jupyter Notebook

In [37]: y_test

Out[37]: 93 1
228 0
424 1
635 1
684 0
..
271 0
46 0
476 1
130 1
359 1
Name: Outcome, Length: 231, dtype: int64

In [80]: error2 =mean_absolute_error(y_test,y_pred2)

In [81]: error2

Out[81]: 0.3341991341991342

In [82]: accuracy2 =(1-error2)*100

In [83]: accuracy2

Out[83]: 66.58008658008659

Model 3

In [43]: from sklearn.tree import DecisionTreeRegressor

In [49]: model3 = DecisionTreeRegressor()

In [50]: model3.fit(x_train,y_train)

Out[50]: ▾ DecisionTreeRegressor
DecisionTreeRegressor()

In [54]: y_pred3 = model3.predict(x_test)

In [55]: y_pred3

Out[55]: array([0., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 1., 0., 0., 0.,
0., 0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0.,
0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1.,
0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 1., 1.,
0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0., 0.,
0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.,
0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0.,
1., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.,
1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1., 1.,
0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0.,
1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.,
1., 1., 1., 1., 0., 0., 0., 1., 0., 1.])

localhost:8888/notebooks/Desktop/Data S Practical/Data/Pract 9.ipynb 4/6


3/15/24, 1:03 PM Pract 9 - Jupyter Notebook

In [56]: y_test

Out[56]: 93 1
228 0
424 1
635 1
684 0
..
271 0
46 0
476 1
130 1
359 1
Name: Outcome, Length: 231, dtype: int64

In [84]: error3 = mean_absolute_error(y_test,y_pred3)

In [85]: error3

Out[85]: 0.2727272727272727

In [86]: accuracy3 =(1-error3)*100

In [87]: accuracy3

Out[87]: 72.72727272727273

Model 4

In [61]: from sklearn.ensemble import RandomForestRegressor

In [64]: model4 = RandomForestRegressor()

In [65]: model4.fit(x_train,y_train)

Out[65]: ▾ RandomForestRegressor
RandomForestRegressor()

In [66]: y_pred4=model4.predict(x_test)

In [67]: y_pred4

Out[67]: array([0.03, 0.6 , 0.68, 0.25, 0.18, 0.53, 0.65, 0.21, 0.21, 0.97, 0.52,
0.09, 0.66, 0.96, 0.06, 0.61, 0.13, 0.03, 0.06, 0.33, 0.3 , 0.61,
0.38, 0.69, 0.69, 0.15, 0.02, 0. , 0.02, 0.6 , 0.03, 0.11, 0.61,
0.11, 0.14, 0.35, 0.05, 0.54, 0.34, 0.36, 0.13, 0.7 , 0.37, 0.23,
0.21, 0.94, 0.1 , 0.77, 0.7 , 0.26, 0.42, 0.24, 0.01, 0.61, 0.15,
0.02, 0.21, 0.67, 0.01, 0.02, 0.21, 0.21, 0.1 , 0.3 , 0.23, 0.9 ,
0.66, 0.63, 0.19, 0.06, 0.02, 0.1 , 0.11, 0.05, 0.14, 0.98, 0.76,
0.67, 0.46, 0.69, 0.12, 0.1 , 0.11, 0.88, 0.34, 0.3 , 0.77, 0.4 ,
0.08, 0.28, 0.62, 0.04, 0.09, 0.64, 0.14, 0.76, 0.32, 0.42, 0.21,
0.52, 0.03, 0.69, 0. , 0.6 , 0.88, 0.36, 0.94, 0.57, 0.88, 0.01,
0.2 , 0.18, 0.39, 0.2 , 0.15, 0.69, 0.2 , 0.01, 0.29, 0.03, 0.06,
0.5 , 0.02, 0.26, 0. , 0.26, 0.23, 0.85, 0.66, 0.8 , 0.02, 0.23,
0.34, 0.91, 0. , 0.36, 0.37, 0.07, 0.46, 0.52, 0.44, 0.07, 0. ,
0.15, 0.89, 0.92, 0. , 0.67, 0.92, 0.05, 0. , 0.61, 0.24, 0.78,
0.5 , 0.2 , 0.02, 0.51, 0.43, 0.31, 0.47, 0.42, 0. , 0.91, 0.18,
0.66, 0.1 , 0.6 , 0.39, 0.04, 0.29, 0.16, 0.64, 0.07, 0.5 , 0.69,
0.22, 0. , 0.48, 0.06, 0.35, 0.52, 0.7 , 0.4 , 0.09, 0.28, 0.95,
0.2 , 0.75, 0.69, 0.64, 0.94, 0.23, 0.29, 0.1 , 0.48, 0.25, 0.03,
0.31, 0.61, 0.6 , 0.27, 0.45, 0.2 , 0.82, 0.3 , 0.42, 0. , 0.4 ,
0.58, 0.07, 0.25, 0.38, 0.44, 0.38, 0.03, 0.01, 0.04, 0.43, 0. ,
0.16, 0.64, 0.17, 0.95, 0.71, 0.31, 0.05, 0.57, 0.28, 0.72, 0.93])

localhost:8888/notebooks/Desktop/Data S Practical/Data/Pract 9.ipynb 5/6


3/15/24, 1:03 PM Pract 9 - Jupyter Notebook

In [68]: y_test

Out[68]: 93 1
228 0
424 1
635 1
684 0
..
271 0
46 0
476 1
130 1
359 1
Name: Outcome, Length: 231, dtype: int64

In [88]: error4 = mean_absolute_error(y_test,y_pred4)

In [89]: accuracy4 =(1-error4)*100

In [90]: accuracy4

Out[90]: 68.58008658008657

Let's compare accuracy of all 4 models

Linear Regressor

In [91]: accuracy

Out[91]: 72.72727272727273

KNeighborsRegressor

In [92]: accuracy2

Out[92]: 66.58008658008659

DecisionTreeRegressor

In [93]: accuracy3

Out[93]: 72.72727272727273

RandomForestRegressor

In [94]: accuracy4

Out[94]: 68.58008658008657

In [ ]: ​

localhost:8888/notebooks/Desktop/Data S Practical/Data/Pract 9.ipynb 6/6

You might also like