Documentation Code
Documentation Code
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
33
0 6 148 72 35 0 0.627 50 1
.6
26
1 1 85 66 29 0 0.351 31 0
.6
Pregnan Gluc BloodPre SkinThic Insu B DiabetesPedigre A Outco
cies ose ssure kness lin MI eFunction ge me
23
2 8 183 64 0 0 0.672 32 1
.3
28
3 1 89 66 23 94 0.167 21 0
.1
43
4 0 137 40 35 168 2.288 33 1
.1
dataset.shape
o/p
(768, 9)
# Features data-type
dataset.info()
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
Observations:
1. There are a total of 768 records and 9 features in the dataset.
2. Each feature can be either of integer or float dataype.
3. Some features like Glucose, Blood pressure , Insulin, BMI have zero
values which represent missing data.
4. There are zero NaN values in the dataset.
5. In the outcome column, 1 represents diabetes positive and 0 represents
diabetes negative.
Heatmap
plt.show()
Observations:¶
1. The countplot tells us that the dataset is imbalanced, as number of
patients who don't have diabetes is more than those who do.
2. From the correaltion heatmap, we can see that there is a high
correlation between Outcome and [Glucose,BMI,Age,Insulin]. We can
select these features to accept input from the user and predict the
outcome.
dataset_new = dataset
dataset_new.isnull().sum()
o/p: Pregnancies 0
Glucose 5
BloodPressure 35
SkinThickness 227
Insulin 374
BMI 11
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
dataset_new["BloodPressure"].fillna(dataset_new["BloodPressure"].mean(),
inplace = True)
dataset_new["SkinThickness"].fillna(dataset_new["SkinThickness"].mean(),
inplace = True)
dataset_scaled = sc.fit_transform(dataset_new)
dataset_scaled = pd.DataFrame(dataset_scaled)
Y = dataset_scaled.iloc[:, 8].values
# Splitting X and Y
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20, random_state = 42, stratify =
dataset_new['Outcome'] )
# Checking dimensions
logreg.fit(X_train, Y_train)
o/p: LogisticRegression(random_state=42)
acc = pd.Series()
x = range(1,31)
knn_model.fit(X_train, Y_train)
prediction = knn_model.predict(X_test)
plt.plot(X_axis, acc)
plt.xticks(x)
plt.xlabel("n_estimators")
plt.ylabel("Accuracy")
plt.grid()
plt.show()
svc.fit(X_train, Y_train)
o/p:SVC(kernel='linear', random_state=42)
o/p:RandomForestClassifier(criterion='entropy', n_estimators=11,
random_state=42)
Y_pred_svc = svc.predict(X_test)
Y_pred_ranfor = ranfor.predict(X_test)
From the above comparison, we can observe that RANDOM FOREST algorithm
gets the highest accuracy of 75.97
import numpy as np
import pandas as pd
import pickle
df = pd.read_csv('diabetes.csv')
df = df.rename(columns={'DiabetesPedigreeFunction':'DPF'})
df_copy = df.copy(deep=True)
df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']] =
df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']].replace(0,np.NaN)
df_copy['Glucose'].fillna(df_copy['Glucose'].mean(), inplace=True)
df_copy['BloodPressure'].fillna(df_copy['BloodPressure'].mean(), inplace=True)
df_copy['SkinThickness'].fillna(df_copy['SkinThickness'].median(), inplace=True)
df_copy['Insulin'].fillna(df_copy['Insulin'].median(), inplace=True)
df_copy['BMI'].fillna(df_copy['BMI'].median(), inplace=True)
# Model Building
X = df.drop(columns='Outcome')
y = df['Outcome']
classifier = RandomForestClassifier(n_estimators=20)
classifier.fit(X_train, y_train)
filename = 'diabetes-prediction-rfc-model.pkl'
As we dumped the algorithm with module pickle it will generate a binary format file and now
create app.py for flask webframework
App.py
import pickle
import numpy as np
filename = 'diabetes-prediction-rfc-model.pkl'
app = Flask(__name__)
@app.route('/')
def home():
return render_template('index.html')
@app.route('/predict', methods=['POST'])
def predict():
if request.method == 'POST':
preg = int(request.form['pregnancies'])
glucose = int(request.form['glucose'])
bp = int(request.form['bloodpressure'])
st = int(request.form['skinthickness'])
insulin = int(request.form['insulin'])
bmi = float(request.form['bmi'])
dpf = float(request.form['dpf'])
age = int(request.form['age'])
my_prediction = classifier.predict(data)
if __name__ == '__main__':
app.run(debug=True)
Index.html
<!DOCTYPE html>
<html >
<!--From https://codepen.io/frytyler/pen/EGdtg-->
<head>
<meta charset="UTF-8">
<title>Diabetes Predictor</title>
<style>
@import url(https://fonts.googleapis.com/css?family=Open+Sans);
.btn { display: inline-block; *display: inline; *zoom: 1; padding: 4px 10px 4px; margin-bottom: 0; font-
size: 13px; line-height: 18px; color: #333333; text-align: center;text-shadow: 0 1px 1px rgba(255,
255, 255, 0.75); vertical-align: middle; background-color: #f5f5f5; background-image: -moz-linear-
gradient(top, #ffffff, #e6e6e6); background-image: -ms-linear-gradient(top, #ffffff, #e6e6e6);
background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#ffffff), to(#e6e6e6)); background-
image: -webkit-linear-gradient(top, #ffffff, #e6e6e6); background-image: -o-linear-gradient(top,
#ffffff, #e6e6e6); background-image: linear-gradient(top, #ffffff, #e6e6e6); background-repeat:
repeat-x; filter: progid:dximagetransform.microsoft.gradient(startColorstr=#ffffff,
endColorstr=#e6e6e6, GradientType=0); border-color: #e6e6e6 #e6e6e6 #e6e6e6; border-color:
rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.25); border: 1px solid #e6e6e6; -webkit-border-
radius: 4px; -moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: inset 0 1px 0
rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 1px 0 rgba(255, 255,
255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px
rgba(0, 0, 0, 0.05); cursor: pointer; *margin-left: .3em; }
.btn-large { padding: 9px 14px; font-size: 15px; line-height: normal; -webkit-border-radius: 5px; -
moz-border-radius: 5px; border-radius: 5px; }
body {
width: 100%;
height:auto;
font-size: 18px;
text-align:center;
letter-spacing:1.2px;
background-image: url("../static/s.jpg");
.login {
text-align: center;
display: flex;
justify-content: center;
align-items: center;
margin-left: auto;
margin-right: auto;
margin-bottom: 50px;
h1 {
text-align: center;
color: white;
text-transform: uppercase;
font-size: 40px;
@keyframes bounceIn {
0% {
transform: scale(0.1);
opacity: 0;
60% {
transform: scale(1.2);
opacity: 1;
100% {
transform: scale(1);
input {
width: 500px;
margin-bottom: 10px;
background: rgba(0,0,0,0.7);
border: none;
outline: none;
padding: 15px;
font-size: 13px;
color: #fff;
border-radius: 20px;
input:hover{
background: rgba(0,0,0,1);
font-size: 15px;
input:focus {
</style>
</head>
<body>
<div class="login">
</form>
</html>
Result.html
<!DOCTYPE html>
<head>
<meta charset="utf-8">
<title>Diabetes Predictor</title>
<style>
@import url(https://fonts.googleapis.com/css?family=Open+Sans);
.btn {
display: inline-block;
*display: inline;
*zoom: 1;
font-size: 13px;
line-height: 18px;
color: #333333;
text-align: center;
vertical-align: middle;
background-color: #f5f5f5;
background-repeat: repeat-x;
filter: progid:dximagetransform.microsoft.gradient(startColorstr=#ffffff,
endColorstr=#e6e6e6, GradientType=0);
-webkit-border-radius: 4px;
-moz-border-radius: 4px;
border-radius: 4px;
-webkit-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05);
-moz-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05);
box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05);
body {
width: 100%;
height:auto;
color: #fff;
font-size: 18px;
text-align:center;
letter-spacing:1.2px;
background-image: url("../static/wallpaper.jpg");
.results{
margin-top:150px;
</style>
</head>
<body>
<!-- Result -->
<div class="results">
{% if prediction==1 %}
{% elif prediction==0 %}
{% endif %}
</div>
1.7 METHODOLOGY
The purpose of the project is used to help the doctors to detect the Parkinson’s
disease early to cure the disease. To execute this project, we completed these nine
steps: