0% found this document useful (0 votes)
16 views

Documentation Code

Uploaded by

Dharani Dharani
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Documentation Code

Uploaded by

Dharani Dharani
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Description:¶

The objective of the dataset is to predict whether or not a patient has


diabetes, based on certain diagnostic measurements included in the dataset.
The datasets consists of several medical predictor variables and one target
variable, Outcome. Predictor variables includes the number of pregnancies
the patient has had, their BMI, insulin level, age, and so on.

By using logistic regression ,svm ,random forest algorithm we are going to


find best accuracy score among mentioned algorithms and choose best
algorithm for further process.this all process done in jupyter notebook of
visual studio.

Step 0: Import libraries and Dataset

# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

Step 1: Descriptive Statistics


# Preview data
dataset.head()
output :

Pregnan Gluc BloodPre SkinThic Insu B DiabetesPedigre A Outco


cies ose ssure kness lin MI eFunction ge me

33
0 6 148 72 35 0 0.627 50 1
.6

26
1 1 85 66 29 0 0.351 31 0
.6
Pregnan Gluc BloodPre SkinThic Insu B DiabetesPedigre A Outco
cies ose ssure kness lin MI eFunction ge me

23
2 8 183 64 0 0 0.672 32 1
.3

28
3 1 89 66 23 94 0.167 21 0
.1

43
4 0 137 40 35 168 2.288 33 1
.1

# Dataset dimensions - (rows, columns)

dataset.shape

o/p
(768, 9)

# Features data-type
dataset.info()

o/p: <class 'pandas.core.frame.DataFrame'>


RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

# Count of null values


dataset.isnull().sum()

Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
Observations:
1. There are a total of 768 records and 9 features in the dataset.
2. Each feature can be either of integer or float dataype.
3. Some features like Glucose, Blood pressure , Insulin, BMI have zero
values which represent missing data.
4. There are zero NaN values in the dataset.
5. In the outcome column, 1 represents diabetes positive and 0 represents
diabetes negative.

Step 2: Data Visualization¶


# Outcome countplot

sns.countplot(x = 'Outcome',data = dataset)

Heatmap

sns.heatmap(dataset.corr(), annot = True)

plt.show()
Observations:¶
1. The countplot tells us that the dataset is imbalanced, as number of
patients who don't have diabetes is more than those who do.
2. From the correaltion heatmap, we can see that there is a high
correlation between Outcome and [Glucose,BMI,Age,Insulin]. We can
select these features to accept input from the user and predict the
outcome.

Step 3: Data Preprocessing¶

dataset_new = dataset

# Replacing zero values with NaN

dataset_new[["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]]


= dataset_new[["Glucose", "BloodPressure", "SkinThickness", "Insulin",
"BMI"]].replace(0, np.NaN)
# Count of NaN

dataset_new.isnull().sum()

o/p: Pregnancies 0
Glucose 5
BloodPressure 35
SkinThickness 227
Insulin 374
BMI 11
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

# Replacing NaN with mean values

dataset_new["Glucose"].fillna(dataset_new["Glucose"].mean(), inplace = True)

dataset_new["BloodPressure"].fillna(dataset_new["BloodPressure"].mean(),
inplace = True)

dataset_new["SkinThickness"].fillna(dataset_new["SkinThickness"].mean(),
inplace = True)

dataset_new["Insulin"].fillna(dataset_new["Insulin"].mean(), inplace = True)

dataset_new["BMI"].fillna(dataset_new["BMI"].mean(), inplace = True)

# Feature scaling using MinMaxScaler

from sklearn.preprocessing import MinMaxScaler

sc = MinMaxScaler(feature_range = (0, 1))

dataset_scaled = sc.fit_transform(dataset_new)

dataset_scaled = pd.DataFrame(dataset_scaled)

# Selecting features - [Glucose, Insulin, BMI, Age]

X = dataset_scaled.iloc[:, [1, 4, 5, 7]].values

Y = dataset_scaled.iloc[:, 8].values

# Splitting X and Y
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20, random_state = 42, stratify =
dataset_new['Outcome'] )

# Checking dimensions

print("X_train shape:", X_train.shape)

print("X_test shape:", X_test.shape)

print("Y_train shape:", Y_train.shape)

print("Y_test shape:", Y_test.shape)


o/p: X_train shape: (614, 4)
X_test shape: (154, 4)
Y_train shape: (614,)
Y_test shape: (154,)

Step 4: Data Modelling


# Logistic Regression Algorithm

from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(random_state = 42)

logreg.fit(X_train, Y_train)

o/p: LogisticRegression(random_state=42)

# Plotting a graph for n_neighbors

from sklearn import metrics

from sklearn.neighbors import KNeighborsClassifier

X_axis = list(range(1, 31))

acc = pd.Series()

x = range(1,31)

for i in list(range(1, 31)):


knn_model = KNeighborsClassifier(n_neighbors = i)

knn_model.fit(X_train, Y_train)

prediction = knn_model.predict(X_test)

acc = acc.append(pd.Series(metrics.accuracy_score(prediction, Y_test)))

plt.plot(X_axis, acc)

plt.xticks(x)

plt.title("Finding best value for n_estimators")

plt.xlabel("n_estimators")

plt.ylabel("Accuracy")

plt.grid()

plt.show()

print('Highest value: ',acc.values.max())

o/p: Highest value: 0.7857142857142857


# Support Vector Classifier Algorithm

from sklearn.svm import SVC

svc = SVC(kernel = 'linear', random_state = 42)

svc.fit(X_train, Y_train)
o/p:SVC(kernel='linear', random_state=42)

Random forest Algorithm


from sklearn.ensemble import RandomForestClassifier
ranfor = RandomForestClassifier(n_estimators = 11, criterion =
'entropy', random_state = 42)
ranfor.fit(X_train, Y_train)

o/p:RandomForestClassifier(criterion='entropy', n_estimators=11,
random_state=42)

Making predictions on test dataset


Y_pred_logreg = logreg.predict(X_test)

Y_pred_svc = svc.predict(X_test)

Y_pred_ranfor = ranfor.predict(X_test)

Step 5: Model Evaluation


Evaluating using accuracy_score metric
from sklearn.metrics import accuracy_score
accuracy_logreg = accuracy_score(Y_test, Y_pred_logreg)
accuracy_svc = accuracy_score(Y_test, Y_pred_svc)

accuracy_ranfor = accuracy_score(Y_test, Y_pred_ranfor)

# Accuracy on test set


print("Logistic Regression: " + str(accuracy_logreg * 100))
print("Support Vector Classifier: " + str(accuracy_svc * 100))
print("Random Forest: " + str(accuracy_ranfor * 100))

o/p: Logistic Regression: 72.07792207792207


Support Vector Classifier: 73.37662337662337
Random Forest: 75.97402597402598

From the above comparison, we can observe that RANDOM FOREST algorithm
gets the highest accuracy of 75.97

So random forest algorithm is used for web frame work


(flask)
Now create Diabetes Predictor - Deployment.py in visual studio seperatle for randomforest as we
choosen best accuracy

# Importing essential libraries

import numpy as np

import pandas as pd

import pickle

# Loading the dataset

df = pd.read_csv('diabetes.csv')

# Renaming DiabetesPedigreeFunction as DPF

df = df.rename(columns={'DiabetesPedigreeFunction':'DPF'})

# Replacing the 0 values from ['Glucose','BloodPressure','SkinThickness','Insulin','BMI'] by NaN

df_copy = df.copy(deep=True)

df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']] =
df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']].replace(0,np.NaN)

# Replacing NaN value by mean, median depending upon distribution

df_copy['Glucose'].fillna(df_copy['Glucose'].mean(), inplace=True)

df_copy['BloodPressure'].fillna(df_copy['BloodPressure'].mean(), inplace=True)

df_copy['SkinThickness'].fillna(df_copy['SkinThickness'].median(), inplace=True)

df_copy['Insulin'].fillna(df_copy['Insulin'].median(), inplace=True)

df_copy['BMI'].fillna(df_copy['BMI'].median(), inplace=True)

# Model Building

from sklearn.model_selection import train_test_split

X = df.drop(columns='Outcome')

y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)


# Creating Random Forest Model

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators=20)

classifier.fit(X_train, y_train)

# Creating a pickle file for the classifier

filename = 'diabetes-prediction-rfc-model.pkl'

pickle.dump(classifier, open(filename, 'wb'))

As we dumped the algorithm with module pickle it will generate a binary format file and now
create app.py for flask webframework

App.py

from flask import Flask, render_template, request

import pickle

import numpy as np

filename = 'diabetes-prediction-rfc-model.pkl'

classifier = pickle.load(open(filename, 'rb'))

app = Flask(__name__)

@app.route('/')

def home():

return render_template('index.html')

@app.route('/predict', methods=['POST'])

def predict():

if request.method == 'POST':

preg = int(request.form['pregnancies'])

glucose = int(request.form['glucose'])

bp = int(request.form['bloodpressure'])

st = int(request.form['skinthickness'])

insulin = int(request.form['insulin'])
bmi = float(request.form['bmi'])

dpf = float(request.form['dpf'])

age = int(request.form['age'])

data = np.array([[preg, glucose, bp, st, insulin, bmi, dpf, age]])

my_prediction = classifier.predict(data)

return render_template('result.html', prediction=my_prediction)

if __name__ == '__main__':

app.run(debug=True)

Create a template folder where we place html files

Index.html

<!DOCTYPE html>

<html >

<!--From https://codepen.io/frytyler/pen/EGdtg-->

<head>

<meta charset="UTF-8">

<title>Diabetes Predictor</title>

<link href='https://fonts.googleapis.com/css?family=Pacifico' rel='stylesheet' type='text/css'>

<link href='https://fonts.googleapis.com/css?family=Arimo' rel='stylesheet' type='text/css'>

<link href='https://fonts.googleapis.com/css?family=Hind:300' rel='stylesheet' type='text/css'>

<link href='https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300' rel='stylesheet'


type='text/css'>

<link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">

<link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />

<script defer src="https://pyscript.net/latest/pyscript.js"></script>

<style>
@import url(https://fonts.googleapis.com/css?family=Open+Sans);

.btn { display: inline-block; *display: inline; *zoom: 1; padding: 4px 10px 4px; margin-bottom: 0; font-
size: 13px; line-height: 18px; color: #333333; text-align: center;text-shadow: 0 1px 1px rgba(255,
255, 255, 0.75); vertical-align: middle; background-color: #f5f5f5; background-image: -moz-linear-
gradient(top, #ffffff, #e6e6e6); background-image: -ms-linear-gradient(top, #ffffff, #e6e6e6);
background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#ffffff), to(#e6e6e6)); background-
image: -webkit-linear-gradient(top, #ffffff, #e6e6e6); background-image: -o-linear-gradient(top,
#ffffff, #e6e6e6); background-image: linear-gradient(top, #ffffff, #e6e6e6); background-repeat:
repeat-x; filter: progid:dximagetransform.microsoft.gradient(startColorstr=#ffffff,
endColorstr=#e6e6e6, GradientType=0); border-color: #e6e6e6 #e6e6e6 #e6e6e6; border-color:
rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.25); border: 1px solid #e6e6e6; -webkit-border-
radius: 4px; -moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: inset 0 1px 0
rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 1px 0 rgba(255, 255,
255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px
rgba(0, 0, 0, 0.05); cursor: pointer; *margin-left: .3em; }

.btn:hover, .btn:active, .btn.active, .btn.disabled, .btn[disabled] { background-color: #e6e6e6; }

.btn-large { padding: 9px 14px; font-size: 15px; line-height: normal; -webkit-border-radius: 5px; -
moz-border-radius: 5px; border-radius: 5px; }

.btn:hover { color: #333333; text-decoration: none; background-color: #e6e6e6; background-


position: 0 -15px; -webkit-transition: background-position 0.1s linear; -moz-transition: background-
position 0.1s linear; -ms-transition: background-position 0.1s linear; -o-transition: background-
position 0.1s linear; transition: background-position 0.1s linear; }

.btn-primary, .btn-primary:hover { text-shadow: 0 -1px 0 rgba(0, 0, 0, 0.25); color: #ffffff; }

.btn-primary.active { color: rgba(255, 255, 255, 0.75); }

.btn-primary { background-color: #4a77d4; background-image: -moz-linear-gradient(top, #6eb6de,


#4a77d4); background-image: -ms-linear-gradient(top, #6eb6de, #4a77d4); background-image: -
webkit-gradient(linear, 0 0, 0 100%, from(#6eb6de), to(#4a77d4)); background-image: -webkit-
linear-gradient(top, #6eb6de, #4a77d4); background-image: -o-linear-gradient(top, #6eb6de,
#4a77d4); background-image: linear-gradient(top, #6eb6de, #4a77d4); background-repeat: repeat-x;
filter: progid:dximagetransform.microsoft.gradient(startColorstr=#6eb6de, endColorstr=#4a77d4,
GradientType=0); border: 1px solid #3762bc; text-shadow: 1px 1px 1px rgba(0,0,0,0.4); box-shadow:
inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.5); }

.btn-primary:hover, .btn-primary:active, .btn-primary.active, .btn-primary.disabled, .btn-


primary[disabled] { filter: none; background-color: #4a77d4; }

.btn-block { width: 100%; display:block; }

body {

width: 100%;

height:auto;

font-family: 'Open Sans', sans-serif;


color: #fff;

font-size: 18px;

text-align:center;

letter-spacing:1.2px;

background-image: url("../static/s.jpg");

.login {

text-align: center;

display: flex;

justify-content: center;

align-items: center;

margin-left: auto;

margin-right: auto;

margin-bottom: 50px;

h1 {

text-align: center;

color: white;

text-transform: uppercase;

font-size: 40px;

text-shadow: 2px 2px 4px black;

animation: bounceIn 2s infinite alternate;

font-family: Arial, Helvetica, sans-serif;

@keyframes bounceIn {

0% {
transform: scale(0.1);

opacity: 0;

60% {

transform: scale(1.2);

opacity: 1;

100% {

transform: scale(1);

input {

width: 500px;

margin-bottom: 10px;

background: rgba(0,0,0,0.7);

border: none;

outline: none;

padding: 15px;

font-size: 13px;

color: #fff;

text-shadow: 1px 1px 1px rgba(0,0,0,0.3);

border: 3px solid greenyellow;

border-radius: 20px;

box-shadow: inset 0 -5px 45px rgba(100,100,100,0.2), 0 1px 1px rgba(255,255,255,0.2);

-webkit-transition: box-shadow .5s ease;

-moz-transition: box-shadow .5s ease;

-o-transition: box-shadow .5s ease;


-ms-transition: box-shadow .5s ease;

transition: box-shadow .5s ease;

input:hover{

background: rgba(0,0,0,1);

font-size: 15px;

border: #670d10 5px solid;

input:focus {

box-shadow: inset 0 -5px 45px rgba(100,100,100,0.4), 0 1px 1px rgba(255,255,255,0.2); }

</style>

</head>

<body>

<h1> Diabetes Predictor </h1>

<div class="login">

<form action="{{ url_for('predict')}}"method="post">

<input class="form-input" type="text" name="pregnancies" placeholder="Number of


Pregnancies eg. 0"><br>

<input class="form-input" type="text" name="glucose" placeholder="Glucose (mg/dL) eg.


80"><br>
<input class="form-input" type="text" name="bloodpressure" placeholder="Blood Pressure
(mmHg) eg. 80"><br>

<input class="form-input" type="text" name="skinthickness" placeholder="Skin Thickness


(mm) eg. 20"><br>

<input class="form-input" type="text" name="insulin" placeholder="Insulin Level (IU/mL)


eg. 80"><br>

<input class="form-input" type="text" name="bmi" placeholder="Body Mass Index (kg/m²)


eg. 23.1"><br>

<input class="form-input" type="text" name="dpf" placeholder="Diabetes Pedigree


Function eg. 0.52"><br>

<input class="form-input" type="text" name="age" placeholder="Age (years) eg. 34"><br>

<button type="submit" class="btn btn-primary btn-block btn-large">Predict</button>

</form>

</html>

Result.html

<!DOCTYPE html>

<html lang="en" dir="ltr">

<head>

<meta charset="utf-8">

<title>Diabetes Predictor</title>

<link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />

<script defer src="https://pyscript.net/latest/pyscript.js"></script>

<style>

@import url(https://fonts.googleapis.com/css?family=Open+Sans);

.btn {

display: inline-block;

*display: inline;

*zoom: 1;

padding: 4px 10px 4px;


margin-bottom: 0;

font-size: 13px;

line-height: 18px;

color: #333333;

text-align: center;

text-shadow: 0 1px 1px rgba(255, 255, 255, 0.75);

vertical-align: middle;

background-color: #f5f5f5;

background-image: -moz-linear-gradient(top, #ffffff, #e6e6e6);

background-image: -ms-linear-gradient(top, #ffffff, #e6e6e6);

background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#ffffff), to(#e6e6e6));

background-image: -webkit-linear-gradient(top, #ffffff, #e6e6e6);

background-image: -o-linear-gradient(top, #ffffff, #e6e6e6);

background-image: linear-gradient(top, #ffffff, #e6e6e6);

background-repeat: repeat-x;

filter: progid:dximagetransform.microsoft.gradient(startColorstr=#ffffff,
endColorstr=#e6e6e6, GradientType=0);

border-color: #e6e6e6 #e6e6e6 #e6e6e6;

border-color: rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.1)

rgba(0, 0, 0, 0.25); border: 1px solid #e6e6e6;

-webkit-border-radius: 4px;

-moz-border-radius: 4px;

border-radius: 4px;

-webkit-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05);

-moz-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05);

box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05);

cursor: pointer; *margin-left: .3em;

.btn:hover, .btn:active, .btn.active, .btn.disabled, .btn[disabled] { background-color:


#e6e6e6; }

.btn-large { padding: 9px 14px; font-size: 15px; line-height: normal; -webkit-border-radius:


5px; -moz-border-radius: 5px; border-radius: 5px; }
.btn:hover { color: #333333; text-decoration: none; background-color: #e6e6e6;
background-position: 0 -15px; -webkit-transition: background-position 0.1s linear; -moz-transition:
background-position 0.1s linear; -ms-transition: background-position 0.1s linear; -o-transition:
background-position 0.1s linear; transition: background-position 0.1s linear; }

.btn-primary, .btn-primary:hover { text-shadow: 0 -1px 0 rgba(0, 0, 0, 0.25); color: #ffffff; }

.btn-primary.active { color: rgba(255, 255, 255, 0.75); }

.btn-primary { background-color: #4a77d4; background-image: -moz-linear-gradient(top,


#6eb6de, #4a77d4); background-image: -ms-linear-gradient(top, #6eb6de, #4a77d4); background-
image: -webkit-gradient(linear, 0 0, 0 100%, from(#6eb6de), to(#4a77d4)); background-image: -
webkit-linear-gradient(top, #6eb6de, #4a77d4); background-image: -o-linear-gradient(top, #6eb6de,
#4a77d4); background-image: linear-gradient(top, #6eb6de, #4a77d4); background-repeat: repeat-x;
filter: progid:dximagetransform.microsoft.gradient(startColorstr=#6eb6de, endColorstr=#4a77d4,
GradientType=0); border: 1px solid #3762bc; text-shadow: 1px 1px 1px rgba(0,0,0,0.4); box-shadow:
inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.5); }

.btn-primary:hover, .btn-primary:active, .btn-primary.active, .btn-primary.disabled, .btn-


primary[disabled] { filter: none; background-color: #4a77d4; }

.btn-block { width: 100%; display:block; }

body {

width: 100%;

height:auto;

font-family: 'Open Sans', sans-serif;

color: #fff;

font-size: 18px;

text-align:center;

letter-spacing:1.2px;

background-image: url("../static/wallpaper.jpg");

.results{

margin-top:150px;

</style>

</head>

<body>
<!-- Result -->

<div class="results">

{% if prediction==1 %}

<h1 style="color: red"></>>Opps! You have DIABETES.</h1>

<p style="text-align: center"></p><img class="gif"


src="{{ url_for('static', filename='diabetes.webp') }}" alt="Diabetes Image">

{% elif prediction==0 %}

<h1 style="color: red">Hurrah !!! You DON'T have diabetes.</h1>

<p style="text-align: center"></p><img class="gif1" src="{{ url_for('static', filename='no-


diabetes.webp') }}" alt="Not Diabetes Image">

{% endif %}

</div>

1.7 METHODOLOGY

The purpose of the project is used to help the doctors to detect the Parkinson’s
disease early to cure the disease. To execute this project, we completed these nine
steps:

1. Learnt about the Parkinson’s disease by reading research papers.


2. Knowing the problem of this disease.
3. Learnt about the existing model and its disadvantages.
4. Getting knowledge about different algorithms.
5. Selecting the desired and efficient algorithm.
6. Developing action plan.
7. Collecting the data to implement.
8. Presented key findings and recommendation.
Submitting the final report
Tool used: visual studio,jupyter of visual studio
Technology used: flask,python3.11.1
Libraries: numpy,pandas,sckit,matlotlib,seaborn,numpy

You might also like