0% found this document useful (0 votes)
16 views

Unit5 - Logistic Regression

Logistic regression
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Unit5 - Logistic Regression

Logistic regression
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

28/06/2023, 00:19 Logistic Regression

Logistic Regression
Steps of Development of ML in Python

Importing necessary packages


Data preparation and preprocessing
Segregation of Data (Independent and Dependents)
Splitting the dataset into train data and test data
Choosing the model
Training the model
Testing model
Evaluation of the model
Prediction

Importing necessary packages


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

Data preparation and preprocessing


In [7]:
dataset = pd.read_csv("diabetes.csv")
dataset

Out[7]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction A

0 6 148 72 35 0 33.6 0.627

1 1 85 66 29 0 26.6 0.351

2 8 183 64 0 0 23.3 0.672

3 1 89 66 23 94 28.1 0.167

4 0 137 40 35 168 43.1 2.288

... ... ... ... ... ... ... ...

763 10 101 76 48 180 32.9 0.171

764 2 122 70 27 0 36.8 0.340

765 5 121 72 23 112 26.2 0.245

766 1 126 60 0 0 30.1 0.349

767 1 93 70 31 0 30.4 0.315

768 rows × 9 columns

localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 1/4


28/06/2023, 00:19 Logistic Regression

In [8]:
print(dataset.columns)
print(dataset.shape)
print(dataset.info())
print(dataset.isnull().sum())

Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',


'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
dtype='object')
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

Segregation of Data (Independent and Dependents)


In [9]:
X = data.drop('Outcome', axis=1)
Y = data['Outcome']

Splitting the dataset into train data and test data


In [10]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)

Choosing the model


In [11]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

Training the model


In [12]:
model.fit(X_train,Y_train)

localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 2/4


28/06/2023, 00:19 Logistic Regression

C:\Users\chinu\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:763: Co
nvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
LogisticRegression()
Out[12]:

Teting model
In [13]:
predictions = model.predict(X_test)
print(predictions)

[0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0
1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1
0 1 0 0 0 0]

Evaluation of model
In [25]:
from sklearn.metrics import confusion_matrix, accuracy_score , recall_score , precis
print(accuracy_score(Y_test,predictions))
print(recall_score(Y_test,predictions))
print(precision_score(Y_test,predictions))
print(confusion_matrix(Y_test,predictions))

0.7662337662337663
0.4807692307692308
0.7352941176470589
[[93 9]
[27 25]]

In [29]:
sns.distplot(predictions,hist=False, color = 'r', label = 'Predicted Values')
sns.distplot(Y_test, hist=False, color = 'b', label = 'Actual Values')
plt.legend(loc = "upper left")
plt.show()

C:\Users\chinu\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarn
ing: `distplot` is a deprecated function and will be removed in a future version. Pl
ease adapt your code to use either `displot` (a figure-level function with similar f
lexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
C:\Users\chinu\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarn
ing: `distplot` is a deprecated function and will be removed in a future version. Pl
ease adapt your code to use either `displot` (a figure-level function with similar f
lexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)

localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 3/4


28/06/2023, 00:19 Logistic Regression

Predictions
In [51]:
Pregnancies = 8
Glucose = 183
BloodPressure = 64
SkinThickness = 0
Insulin = 0
BMI =23.3
DiabetesPedigreeFunction=0.672
Age = 32

new_data = [[Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,
DiabetesPedigreeFunction,Age]]
prediction = model.predict(new_data)

print(prediction)

[1]

In [ ]:

localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 4/4

You might also like