0% found this document useful (0 votes)
55 views49 pages

Mini Docs Batch 7

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views49 pages

Mini Docs Batch 7

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

DIABETES DISEASE PREDICTION USING

MACHINE LEARNING ALGORITHMS

A Minor-Project Report
Submitted in partial fulfillment of the requirements for the award of the
degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by
K. VISHNU VARDHAN REDDY- (21BT5A0508)
D.SUDHEER PATNAIK-(20BT1A0507)
CH. NAVEEN- (21BT5A0502)
S. ANVESH NAIDU -(20BT1A0525)
M.AKHIL-(21BT5A0509)

Under the Guidance of


Mr. JAIPAL
Assistant Professor (CSE)

VISVESVARAYA COLLEGE OF ENGINEERING & TECHNOLOGY


Approved by AICTE, New Delhi & Govt.of T.S. Accredited with NAAC 'A' Grade, Affiliated to JNTUH, HYD
Sponsored by: Jawaharlal Educational Society, an ISO 9001: 2018 and ISO 14001: 2015 Certified Institution
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

C E R TI FI CATE

This is to certify that the Mini-project entitled “DIABETES DISEASE PREDICTION


USING MACHINE LEARNING ALGORITHMS” submitted by
K.VISHNU VARDHAN REDDY (21BT5A0508) ,D.SUDHEER PATNAIK (20BT1A0507)
,CH.NAVEEN(21BT5A0502), S.ANVESH NAIDU (20BT1A0525) ,M.AKHIL (21BT5A0509)
impartial fulfilment for the award of BACHELOR OF TECHNOLOGY IN
COMPUTER SCIENCE AND ENGINEERING of Jawaharlal Nehru technology
University Hyderabad, during the academic year 2021-2024 is a record of Bonafide work
carried out by him/her under my guidance and supervision. The Contents of this project
have not been submitted and will not be submitted either in part or in full, for the award
of any other degree or diploma in this institute.

PROJECT GUIDE HEAD OF THE DEPARMENT


Mr. JAIPAL Mrs. RAMYASRI
M TECH M TECH

Internal Examiner External examiner


DECLARATION

We here declare that the Mini-project report entitled “DIABETES DISEASE PREDICTION
USING MACHINE LEARNING ALGORITHMS” submitted by K.VISHNU VARDHAN REDDY
(21BT5A0508), D.SUDHEER PATNAIK (20BT1A0507), CH.NAVEEN(21BT5A0502),
S.ANVESH NAIDU (20BT1A0525), M.AKHIL(21BT5A0509)to VCET, JNTUH in partial
fulfilment of the award of the degree of bachelor of technology in electronics and
communication engineering is a record of bonafide project work carried out by us under
the guidance of Mr.JAIPAL, we further declare that the work reported has not been
submitted and will not be submitted either in part or in fail, for the award of any degree
or diploma in this institute or any other or university

Place:

Date:

Signature of the candidates


K. VISHNU VARDHAN REDDY
D.SUDHEER PATNAIK
CH. NAVEEN
S. ANVESH NAIDU
M. AKHIL
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the project undertaken
during B. Tech. We would like to express our special thanks to our Principal Dr.
D RAMESH for moral support and College Management of Visvesvaraya
College of Engineering & Technology, Hyderabad for providing us infrastructure
to complete the project.

We thank Mrs.RAMYASRI, Head of the Department of Computer Science &


Engineering for his constant support and cooperation.

We owe special debt of gratitude to our internal Guide Mr.JAIPAL, Assistant


Professor Visvesvaraya College of Engineering & Technology, Hyderabad for his
guidance throughout the course of our work. It is only his cognizant efforts that
our endeavors have seen light of the day.

We do not want to miss the opportunity to acknowledge the contribution of all


Faculty members of the department for their kind assistance and cooperation
during the development of our project. Last but not the least, we acknowledge our
friends for their contribution in the completion of the project.

Submitted by

-K. Vishnu Vardhan Reddy

- D. Sudheer Patnaik

-CH. Naveen

-S. Anvesh Naidu

-M.Akhil
ABSTRACT

This paper deals with the prediction of Diabetes Disease by performing an


analysis of five supervised machine learning algorithms, i.e., K-Nearest Neighbors, Naïve
Baye, Decision Tree Classifier, Random Forest and Support Vector Machine. Further, by
incorporating all the present risk factors of the dataset, we have observed a stable accuracy
after classifying and performing cross-validation. We managed to achieve a stable and
highest accuracy of 76% with KNN classifier and remaining all other classifiers also give
a stable accuracy of above 70%. We analyzed why specific Machine Learning classifiers
do not yield stable and good accuracy by visualizing the training and testing accuracy and
examining model overfitting and model underfitting. The main goal of this paper is to
find the most optimal results in terms of accuracy and computational time for Diabetes
disease prediction.
TABLE OF CONTENTS
S.NO CONTENT PAGE NO

CHAPTER 1 INTRODUCTION 1

CHAPTER 2 LITERATURE SURVEY 3

2.1 EXISTING SYSTEM 3

2.2 DISADVANTAGES OF EXISTING SYSTEM 3

2.3 PROPOSED SYSTEM 4

2.4 ADVANTAGES OF PROPOSED SYSTEM 5

2.5 FEASIBILITY STUDY 5

2.5.1 OPERATIONAL FEASIBILITY 5

2.5.2 ECONOMIC FEASIBILITY 5

2.5.3 TECHNICAL FEASIBILITY 6

CHAPTER 3 SYSTEM ANALYSIS 7

3.1 HARDWARE REQUIREMENTS 7

3.2 SOFTWARE REQUIREMENTS 7

3.3 MODULES 8

CHAPTER 4 ALGORITHMS 9

4.1 K-NEAREST NEIGHBOURS(KNN) 9

4.2 NAIVES BAYES 10

4.3 RANDOM FOREST 11

4.4 DECISION CLASSIFIERS 11

4.5 SUPPORT VECTOR MACHINE 12

CHAPTER 5 DESIGN 13

5.1 ARCHITECHTURE DIAGRAM 13


5.2 DATA FLOW DIAGRAM 14

5.3 CLASS DIAGRAM 15

5.4 FLOW CHART DIAGRAMS 16

5.5 USE CASE DIAGRAM 18

5.6 SEQUENCE DIAGRAM 19

CHAPTER 6 IMPLEMENTATION 20

CHAPTER 7 TESTING 26

7.1 TESTING METHODOLOGIES 26

7.1.1 UNIT TESTING 26

7.1.2 INTEGRATION TESTING 27

7.1.3 USER ACCEPTANCE TESTING 28

7.1.4 OUTPUT TESTING 28

7.1.5 VALIDATION CHECKING 28

7.2 USER TRAINING 30

7.3 MAINTAINENCE 31

7.4 BLACK BOX TESTING 31

7.5 WHITE BOX TESTING 31

7.6 TESTING STRATEGY 32

CHAPTER 8 EXECUTION SLIDES 33


8.1-8.10

CHAPTER 9 CONCLUSION 38

CHAPTER FUTURE SCOPE 39


10
CHAPTER BIBLIOGRAPHY 40
11
LIST OF FIGURES

FIGURE NAME OF THE FIGURE PAGE NO


NO
2.3 DIFFERENT PHASES OF OUR EXPERIMENT 4

5.1 ARCHITECHTURE DIAGRAM 13

5.2 DATA FLOW DIAGRAM 14

5.3 CLASS DIAGRAM 15

5.4 FLOW CHART DIAGRAMS 16

5.5 USE CASE DIAGRAM 18

5.6 SEQUENCE DIAGRAM 19

8.1 STARTING THE SERVER IN XAMPP 33


CONTROL PANEL
8.2 USERS, SERVICE PROVIDERS LOGIN AND 33
REGISTRATION
8.3 USER REGISTRATION PAGE 34

8.4 USER DATASET UPLOADING PAGE 34

8.5 SERVICE PROVIDER LOGIN PAGE 35

8.6 DIABETES STATUS FROM DATASET DETAILS 35

8.7 VIEW TRAINED AND TESTED DATA IN PIE 36


CHART
8.8 VIEW TRAINED AND TESTED DATA IN BAR 36
CHART
8.9 ALGORITHMS ACCURACY 37

8.10 VIEW ALL REMOTE USERS 37

10 FUTURE SCOPE 39
CHAPTER 1
INTRODUCTION

In this day and age, one of the most notorious diseases to have taken the world by
storm is Diabetes, which is a disease which causes an increase in blood glucose levels as
a result of the absence or low levels of insulin. Due to the many criteria to be taken into
consideration for an individual to Harbor this disease, it’s detection and prediction might
be tedious or sometimes inconclusive. Nevertheless, it isn’t impossible to detect it, even
at an early stage. Federation- IDF). 79% of the adult population were living in the
countries with the low and middle-income groups. It is estimated that by the year 2045
approx. 700 million people will have diabetes (IDF).

Diabetes is increasing day by day in the world because of environmental, genetic


factors. The numbers are rising rapidly due to several factors which includes unhealthy
foods, physical inactivity and many more. Diabetes is a hormonal disorder in which the
inability of the body to produce insulin causes the metabolism of sugar in the body to be
abnormal, thereby, raising the blood glucose levels in the body of a particular individual.
Intense hunger, thirst and frequent urination are some of the observable characteristics.
Certain risk factors such as age, BMI, Glucose Levels, Blood Pressure, etc., play an
important role to the contribution of the disease.

In the present we can see that the number of cases is rising every year and there
is not slowing down in the active cases. It is a very crucial thing to worry as diabetes has
become one of the most dangerous and fastest diseases to take the lives of many
individuals around the globe.

Machine Learning is very popular these days as it is used everywhere, where a


large amount of data is present, and we need some knowledge from it. Generally, we can
categorise the Machine Learning algorithms in two types but not limited to-
• Unsupervised Learning: In unsupervised learning, the information is not labelled
and also not trained. Here, we just put the data in action to find some patterns if possible.

1
• Supervised Learning: In supervised learning, we train the model based on the labels
attached to the information and based on that we classify or test the new data with labels.

With the rise of Machine Learning and its relative algorithms, it has come to light
that the significant problems and hindrances in its detection faced earlier, can now be
eased with much simplicity, yet, giving a detailed and accurate outcome. As of the
modern-day, it is comprehended that Machine Learning has become even more effective
and helpful in collaboration with the domain of Medicine. Early determination of a
disease can be made possible through machine learning by studying the characteristics of
an individual. Such early tries can lead to the inhibition of disease as well as obstruction
of permitting the disease to reach a critical degree. The work which will be described in
this paper is to perform the diabetes disease prediction using machine learning algorithms
for early care of an individual.

2
CHAPTER 2
2.LITERATURE SURVEY

2.1EXISTING SYSTEM

In previous, they have used the WEKA tool for data analytics for diabetes disease
prediction on Big Data of healthcare. They used the publicly available dataset from UCI
and applied different machine learning classifiers on it. The classifiers which they
incorporated are Naive Bayes, Support Vector Machine, Random Forest and Simple
CART.
Their approach starts with accessing the dataset, preprocess it in Weka tool and
then did the 70:30 train and test split for applying different machine algorithms. They did
not go with the cross-validation step as it is imperative to get the optimal and accurate
results as well.
The authors also used the publicly available dataset named as Pima Indians
Diabetes Database for performing their experiment. Their framework of performing the
prediction starts with the dataset selection and then with data pre-processing. Once the
data was preprocessed, they applied three classification algorithms, i.e., Naïve Bayes,
SVM and Decision tree. As they incorporated different evaluation metrics, they did
compare the different performance measure and comparatively analyzed the accuracy.
The highest accuracy achieved with their experiment was 76.30%. Like they have also
not practiced Cross-validation.

2.2 DISADVANTAGES OF EXISTING SYSTEM

1). There are no techniques and models for analyzing large scale datasets in the
existing system.
2). There is no facility for diabetes dataset in collaboration with a hospital or a medical
institute and will try to achieve better results.

3
2.3 PROPOSED SYSTEM

To perform our experiment, we have used a publicly available dataset named as Pima
Indians Diabetes Database [4]. This dataset includes a various diagnostic measure of
diabetes disease. The dataset was originally from the National Institute of Diabetes and
Digestive and Kidney Diseases. All the recorded instances are of the patients whose age
are above 21 years old. Our proposed model exists of 5 phases which are shown in the
proposed system by following Figure.

Fig 2.3 Different phases of our Experiment

4
2.4 ADVANTAGES OF PROPOSED SYSTEM

➢ The system more effective due to fitting datasets for different ML Models by
Applying Machine Learning Algorithms.
➢ The Early determination of a disease can be made possible through machine
learning by studying the characteristics of an individual in the proposed system.

2.5. FEASIBILITY STUDY

An important outcome of preliminary investigation is the determination that the


system request is feasible. This is possible only if it is feasible within limited resource
and time. The different feasibilities that have to be analyzed are
• Operational Feasibility
• Economic Feasibility
• Technical Feasibility

2.5.1Operational Feasibility-
Operational Feasibility deals with the study of prospects of the system to be
developed. This system operationally eliminates all the tensions of the admin and helps
him in effectively tracking the project progress. This kind of automation will surely
reduce the time and energy, which previously consumed in manual work. Based on the
study, the system is proved to be operationally feasible.

2.5.2Economic Feasibility-

Economic Feasibility or Cost-benefit is an assessment of the economic


justification for a computer-based project. As hardware was installed from the beginning
& for lots of purposes thus the cost on project of hardware is low. Since the system is a
network based, any number of employees connected to the LAN within that organization

5
can use this tool from at any time. The Virtual Private Network is to be developed using
the existing resources of the organization. So, the project is economically feasible.

2.5.3 Technical Feasibility


According to Roger S. Pressman, Technical Feasibility is the assessment of the
technical resources of the organization. The organization needs IBM compatible
machines with a graphical web browser connected to the Internet and Intranet. The system
is developed for platform independent environment. Java Server Pages, JavaScript,
HTML, SQL server and WebLogic Server are used to develop the system. The technical
feasibility has been carried out. The system is technically feasible for development .

6
CHAPTER 3
SYSTEM ANALYSIS

3.1 HARDWARE REQUIREMENTS-

➢ Processor - Pentium –IV


➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

3.2 SOFTWARE REQUIREMENTS-

❖ Operating system : Windows 7 Ultimate.

❖ Coding Language : Python.

❖ Front-End : Python.

❖ Back-End : Django-ORM

❖ Designing : Html, CSS, java script.

❖ Data Base : MySQL (WAMP Server).

7
3.3 MODULES

(i)Service Provider-

In this module, the Service Provider has to login by using valid user name
and password. After login successful he can do some operations such as
Train and Test Data Sets, View Trained and Tested Accuracy in Bar Chart, View
Trained and Tested Accuracy Results, Find Diabetic Status from Data Set Details,
Find Diabetic Ratio on Data Sets, View All Emergency for Diabetic Treatment,
Download Trained Data Sets, View Diabetic Ratio Results, View All Remote
Users.
View and Authorize Users
In this module, the admin can view the list of users who all registered. In
this, the admin can view the user’s details such as, user name, email, address and
admin authorizes the users.

(ii)Remote User-

In this module, there are n numbers of users are present. User should
register before doing any operations. Once user registers, their details will be
stored to the database. After registration successful, he has to login by using
authorized user name and password. Once Login is successful user will do some
operations like POST DIABETIC DATA SETS, SEARCH AND PREDICT
DIABETIC STATUS, VIEW YOUR PROFILE.

8
CHAPTER 4
ALGORITHMS

The fives supervised machine learning algorithms used in our project are
1) K-NEAREST NEIGHBOURS (KNN)
2) NAIVES BAYES
3) RANDOM FOREST
4) DECISION CLASSIFIERS
5) SUPPORT VECTOR MACHINE

4.1 K-NEAREST NEIGHBOURS (KNN)-

Certainly! K-Nearest Neighbors (KNN) is a simple and intuitive machine


learning algorithm used for both classification and regression tasks. It is a type of
instance-based learning, also known as lazy learning, as it doesn't create a model during
the training phase. Instead, it memorizes the training data and makes predictions based
on the similarity of new instances to known instances.
Here's a breakdown of how the KNN algorithm works:
Initialization:
Store the training dataset. Each data point in the dataset is associated with a class
label (in the case of classification) or a numerical value (in the case of regression).
Input Data:
When a new, unseen data point is given for prediction, the algorithm identifies
its k-nearest neighbors from the training dataset.
Distance Metric:
The distance between data points is typically measured using metrics such as
Euclidean distance, Manhattan distance, or other distance measures, depending on the
nature of the data.
Voting (for Classification) or Averaging (for Regression):
For classification, the algorithm counts the occurrences of each class among the
k-nearest neighbors and assigns the class label with the majority vote to the new data
point.

9
For regression, it calculates the average of the target values of the k-nearest
neighbors and assigns this average as the predicted value for the new data point.
Choice of 'k':
The value of 'k' is a crucial parameter that needs to be specified. It represents the
number of nearest neighbors to consider. A small 'k' may lead to noisy predictions,
while a large 'k' might lead to oversmoothed predictions.
Key Characteristics:
• KNN is a non-parametric algorithm, meaning it doesn't make any assumptions
about the underlying data distribution.
• It's sensitive to outliers in the data.
• The computational cost of making predictions can be high, especially for large
datasets.

4.2 NAIVES BAYES-

The naive bayes approach is a supervised learning method which is based on a


simplistic hypothesis: it assumes that the presence (or absence) of a particular feature of
a class is unrelated to the presence (or absence) of any other feature .
Yet, despite this, it appears robust and efficient. Its performance is comparable to
other supervised learning techniques. Various reasons have been advanced in the
literature. In this tutorial, we highlight an explanation based on the representation bias.
The naive bayes classifier is a linear classifier, as well as linear discriminant analysis,
logistic regression or linear SVM (support vector machine). The difference lies on the
method of estimating the parameters of the classifier (the learning bias).

While the Naive Bayes classifier is widely used in the research world, it is not
widespread among practitioners which want to obtain usable results. On the one hand, the
researchers found especially it is very easy to program and implement it, its parameters
are easy to estimate, learning is very fast even on very large databases, its accuracy is
reasonably good in comparison to the other approaches. On the other hand, the final users
do not obtain a model easy to interpret and deploy, they does not understand the interest
of such a technique.

10
Thus, we introduce in a new presentation of the results of the learning process.
The classifier is easier to understand, and its deployment is also made easier. In the first
part of this tutorial, we present some theoretical aspects of the naive bayes classifier.
Then, we implement the approach on a dataset with Tanagra. We compare the obtained
results (the parameters of the model) to those obtained with other linear approaches such
as the logistic regression, the linear discriminant analysis and the linear SVM. We note
that the results are highly consistent. This largely explains the good performance of the
method in comparison to others. In the second part, we use various tools on the same
dataset (Weka 3.6.0, R 2.9.2, Knime 2.1.1, Orange 2.0b and RapidMiner 4.6.0). We
try above all to understand the obtained results.

4.3 RANDOM FOREST-

Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks that operates by constructing a multitude of
decision trees at training time. For classification tasks, the output of the random forest is
the class selected by most trees. For regression tasks, the mean or average prediction of
the individual trees is returned. Random decision forests correct for decision trees’ habit
of overfitting to their training set. Random forests generally outperform decision trees,
but their accuracy is lower than gradient boosted trees. However, data characteristics can
affect their performance.

4.4 DECISION CLASSIFIERS-

Decision tree classifiers are used successfully in many diverse areas. Their most
important feature is the capability of capturing descriptive decision making knowledge
from the supplied data. Decision tree can be generated from training sets. The procedure
for such generation based on the set of objects (S), each belonging to one of the classes
C1, C2, …, Ck is as follows:

11
Step 1. If all the objects in S belong to the same class, for example Ci, the decision tree
for S consists of a leaf labeled with this class
Step 2. Otherwise, let T be some test with possible outcomes O1, O2…, On. Each object
in S has one outcome for T so the test partitions S into subsets S1, S2… Sn where each
object in Si has outcome Oi for T. T becomes the root of the decision tree and for each
outcome Oi, we build a subsidiary decision tree by invoking the same procedure
recursively on the set Si.

4.5 SUPPORT VECTOR MACHINE-

In classification tasks a discriminant machine learning technique aims at finding,


based on an independent and identically distributed (iid) training dataset, a discriminant
function that can correctly predict labels for newly acquired instances. Unlike generative
machine learning approaches, which require computations of conditional probability
distributions, a discriminant classification function takes a data point x and assigns it to
one of the different classes that are a part of the classification task. Less powerful than
generative approaches, which are mostly used when prediction involves outlier detection,
discriminant approaches require fewer computational resources and less training data,
especially for a multidimensional feature space and when only posterior probabilities are
needed. From a geometric perspective, learning a classifier is equivalent to finding the
equation for a multidimensional surface that best separates the different classes in the
feature space.

SVM is a discriminant technique, and, because it solves the convex optimization


problem analytically, it always returns the same optimal hyperplane parameter—in
contrast to genetic algorithms (Gas) or perceptrons, both of which are widely used for
classification in machine learning.. For a specific kernel that transforms the data from the
input space to the feature space, training returns uniquely defined SVM model parameters
for a given training set, whereas the perceptron and GA classifier models are different
each time training is initialized. The aim of Gas and perceptrons is only to minimize error
during training, which will translate into several hyperplanes’ meeting this requirement.

12
CHAPTER 5

DESIGN

5.1 ARCHITECHTURE DIAGRAM-

Fig 5.1 Architecture Diagram

13
5.2 DATA FLOW DIAGRAM-

Fig 5.2 Data flow diagram

14
5.3 CLASS DIAGRAMS-

Fig 5.3 Class diagram

15
5.4 FLOW CHART DIAGRAMS-

(i)Remote users-

Fig 5.4.1 Flow chart of Remote users

16
(ii)Servicer provider-

Fig 5.4.2 Flow chart of Service Provider

17
5.5 USE CASE DIAGRAM-

Fig 5.5 Use Case Diagram

18
5.6 SEQUENCE DIAGRAM-

Fig 5.6 Sequence Diagram

19
CHAPTER 6

IMPLEMENTATION

from django.db.models import Count


from django.db.models import Q
from django.shortcuts import render, redirect, get_object_or_404
import datetime
import openpyxl
# Create your views here.
from Remote_User.models import
ClientRegister_Model,diabetes_disease_model,diabetes_disease_prediction,detection_r
esults_model
def login(request):
if request.method == "POST" and 'submit1' in request.POST:
username = request.POST.get('username')
password = request.POST.get('password')
try:
enter =
ClientRegister_Model.objects.get(username=username,password=password)
request.session["userid"] = enter.id

return redirect('Add_DataSet_Details')
except
return render(request,'RUser/login.html')
def Add_DataSet_Details(request):
if "GET" == request.method:
return render(request, 'RUser/Add_DataSet_Details.html', {})
else:
excel_file = request.FILES["excel_file"]
# you may put validations here to check extension or file size
wb = openpyxl.load_workbook(excel_file)

20
diabetes_disease_model.objects.create(
Pregnancies= active_sheet.cell(r, 1).value,
Glucose= active_sheet.cell(r, 2).value,
BloodPressure= active_sheet.cell(r, 3).value,
SkinThickness= active_sheet.cell(r, 4).value,
Insulin= active_sheet.cell(r, 5).value,
BMI= active_sheet.cell(r, 6).value,
DiabetesPedigreeFunction= active_sheet.cell(r, 7).value,
Age= active_sheet.cell(r, 8).value
)
return render(request, 'RUser/Add_DataSet_Details.html', {"excel_data": excel_data}
def Register1(request):
if request.method == "POST":
username = request.POST.get('username')
email = request.POST.get('email')
password = request.POST.get('password')
phoneno = request.POST.get('phoneno')
country = request.POST.get('country')
state = request.POST.get('state')
city = request.POST.get('city')
ClientRegister_Model.objects.create(username=username, email=email,
password=password, phoneno=phoneno,
country=country, state=state, city=city)
return render(request, 'RUser/Register1.html')
else:
return render(request,'RUser/Register1.html')
def ViewYourProfile(request):
userid = request.session['userid']
obj = ClientRegister_Model.objects.get(id= userid)
return render(request,'RUser/ViewYourProfile.html',{'object':obj})
def Search_Predict_Diabetic_DataSets(request):
if request.method == "POST":

21
kword = request.POST.get('keyword')
if request.method == "POST":
kword = request.POST.get('keyword')
print(kword)
obj =
diabetes_disease_prediction.objects.all().filter(Prediction__contains=kword)
return render(request, 'RUser/Search_Predict_Diabetic_DataSets.html',{'objs':
obj})
return render(request, 'RUser/Search_Predict_Diabetic_DataSets.html')

from django.db.models import Count, Avg

from django.shortcuts import render, redirect

from django.db.models import Count

from django.db.models import Q

import datetime

import xlwt

from django.http import HttpResponse

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from statistics import mean, stdev

from sklearn.model_selection import StratifiedKFold

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import GridSearchCV, cross_val_score

from sklearn.metrics import confusion_matrix, accuracy_score, mean_squared_error

from sklearn.linear_model import LogisticRegression

22
from sklearn.metrics import accuracy_score

from sklearn.neighbors import KneighborsClassifier

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.ensemble import GradientBoostingClassifier

import warnings

# Create your views here.

From Remote_User.models import


ClientRegister_Model,diabetes_disease_model,diabetes_disease_prediction,detection_r
esults_model,detection_ratio_model

def train_model(request):

obj=’’

diabetes_disease_prediction.objects.create(Pregnancies=Pregnancies,
Glucose=Glucose,
BloodPressure=BloodPressure,
SkinThickness=SkinThickness,
Insulin=Insulin,
BMI=BMI,
DiabetesPedigreeFunction=DiabetesPedigreeFunction,
Age=Age,
Prediction=type,
Status=status
)

obj =diabetes_disease_prediction.objects.all()
return render(request, 'SProvider/Find_Diabetic_Status_Details.html', {'list_objects': obj})

def likeschart(request,like_chart):
charts =detection_results_model.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart.html", {'form':charts, 'like_chart':like_chart})

def Download_Trained_DataSets(request):

response = HttpResponse(content_type='application/ms-excel')

23
# decide file name

response['Content-Disposition'] = 'attachment; filename="TrainedData.xls"'

# creating workbook

wb = xlwt.Workbook(encoding='utf-8')

# adding sheet

ws = wb.add_sheet("sheet1")

# Sheet header, first row

row_num = 0

font_style = xlwt.XFStyle()

# headers are bold

font_style.font.bold = True

# writer = csv.writer(response)

obj = diabetes_disease_prediction.objects.all()

data = obj # dummy method to fetch data.

for my_row in data:

row_num = row_num + 1

ws.write(row_num, 0, my_row.Pregnancies, font_style)

ws.write(row_num, 1, my_row.Glucose, font_style)

ws.write(row_num, 2, my_row.BloodPressure, font_style)

ws.write(row_num, 3, my_row.SkinThickness, font_style)

ws.write(row_num, 4, my_row.Insulin, font_style)

ws.write(row_num, 5, my_row.BMI, font_style)

ws.write(row_num, 6, my_row.DiabetesPedigreeFunction, font_style)

ws.write(row_num, 7, my_row.Age, font_style)

24
ws.write(row_num, 8, my_row.Prediction, font_style)

ws.write(row_num, 9, my_row.Status, font_style)

wb.save(response)

return response

models = {}

models[“’Logistic Regression’”] = LogisticRegression(random_state=12345)

models[“’K Nearest Neighbour’”] = KneighborsClassifier()

models[“’Decision Tree’”] = DecisionTreeClassifier(random_state=12345)

models[“’Random Forest’”] = RandomForestClassifier(random_state=12345)

models[“’SVM’”] = SVC(gamma=’auto’, random_state=12345)

models[“’XGB’”] = GradientBoostingClassifier(random_state=12345)

for key, values in models.items():

train_model1(key, values)

obj = detection_results_model.objects.all()

return render(request,’Sprovider/train_model.html’, {‘objs’: obj})

25
CHAPTER 7

TESTING

7.1 TESTING METHODOLOGIES-

o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.

7.1.1 UNIT TESTING-

Unit testing focuses verification effort on the smallest unit of Software design
that is the module. Unit testing exercises specific paths in a module’s control structure
to ensure complete coverage and maximum error detection. This test focuses on each
module individually, ensuring that it functions properly as a unit. Hence, the naming is
Unit Testing.

During this testing, each module is tested individually and the module interfaces
are verified for the consistency with design specification. All important processing path
are tested for the expected results. All error handling paths are also tested.

7.1.2 INTEGRATION TESTING-

Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested

26
modules and builds a program structure that has been dictated by design.

The following are the types of Integration Testing:

1. Top-Down Integration

This method is an incremental approach to the construction of program structure.


Modules are integrated by moving downward through the control hierarchy, beginning
with the main program module. The module subordinates to the main program module
are incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are
replaced when the test proceeds downwards.

8. Bottom-up Integration

This method begins the construction and testing with the modules at the lowest
level in the program structure. Since the modules are integrated from the bottom up,
processing required for modules subordinate to a given level is always available and the
need for stubs is eliminated. The bottom up integration strategy may be implemented with
the following steps:

▪ The low-level modules are combined into clusters into clusters that
perform a specific Software sub-function.
▪ A driver (i.e.) the control program for testing is written to coordinate test
case input and output.
▪ The cluster is tested.
▪ Drivers are removed and clusters are combined moving upward in the
program structure

The bottom-up approaches tests each module individually and then each module is
module is integrated with a main module and tested for functionality

27
.

7.1.3 User Acceptance Testing-

User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch
with the prospective system users at the time of developing and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.

7.1.4 Output Testing-

After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the required
output in the specified format. Asking the users about the format required by them tests
the outputs generated or displayed by the system under consideration. Hence the output
format is considered in 2 ways – one is on screen and another in printed format.

7.1.5 Validation Checking-


Validation checks are performed on the following fields.

Text Field:

The text field can contain only the number of characters lesser than or equal to its
size. The text fields are alphanumeric in some tables and alphabetic in other tables.
Incorrect entry always flashes and error message.

28
Numeric Field:

The numeric field can contain only numbers from 0 to 9. An entry of any
character flashes an error messages. The individual modules are checked for accuracy and
what it has to perform. Each module is subjected to test run along with sample data.
The individually tested modules are integrated into a single system. Testing involves
executing the real data information is used in the program the existence of any program
defect is inferred from the output. The testing should be planned so that all the
requirements are individually tested.

A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.

Preparation of Test Data

Taking various kinds of test data does the above testing. Preparation of test data
plays a vital role in the system testing. After preparing the test data the system under study
is tested using that test data. While testing the system by using test data errors are again
uncovered and corrected by using above testing steps and corrections are also noted for
future use.

Using Live Test Data:

Live test data are those that are actually extracted from organization files. After a
system is partially constructed, programmers or analysts often ask users to key in a set of
data from their normal activities. Then, the systems person uses this data as a way to
partially test the system. In other instances, programmers or analysts extract a set of live
data from the files and have them entered themselves.

It is difficult to obtain live data in sufficient amounts to conduct extensive testing.


And, although it is realistic data that will show how the system will perform for the typical

29
processing requirement, assuming that the live data entered are in fact typical, such data
generally will not test all combinations or formats that can enter the system. This bias
toward typical values then does not provide a true systems test and in fact ignores the
cases most likely to cause system failure.

Using Artificial Test Data:

Artificial test data are created solely for test purposes, since they can be generated
to test all combinations of formats and values. In other words, the artificial data, which
can quickly be prepared by a data generating utility program in the information systems
department, make possible the testing of all login and control paths through the program.

The most effective test programs use artificial test data generated by persons other
than those who wrote the programs. Often, an independent team of testers formulates a
testing plan, using the systems specifications.

The package “Virtual Private Network” has satisfied all the requirements
specified as per software requirement specification and was accepted.

7.2 USER TRAINING

Whenever a new system is developed, user training is required to educate them


about the working of the system so that it can be put to efficient use by those for whom
the system has been primarily designed. For this purpose the normal working of the
project was demonstrated to the prospective users. Its working is easily understandable
and since the expected users are people who have good knowledge of computers, the use
of this system is very easy.

30
7.3 MAINTAINENCE

This covers a wide range of activities including correcting code and design errors.
To reduce the need for maintenance in the long run, we have more accurately defined the
user’s requirements during the process of system development. Depending on the
requirements, this system has been developed to satisfy the needs to the largest possible
extent. With development in technology, it may be possible to add many more features
based on the requirements in future. The coding and designing is simple and easy to
understand which will make maintenance easier.

7.4 BLACK BOX TESTING-

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of
tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box. You cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works.

7.5 WHITE BOX TESTING-

White Box Testing is a testing in which in which the software tester has knowledge
of the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.

31
7.6 TESTING STRATEGY –

A strategy for system testing integrates system test cases and design techniques into a
well planned series of steps that results in the successful construction of software. The
testing strategy must co-operate test planning, test case design, test execution, and the
resultant data collection and evaluation ..A strategy for software testing must
accommodate low-level tests that are necessary to verify that a small source code
segment has been correctly implemented as well as high level tests that validate
major system functions against user requirements.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.

32
CHAPTER 8
EXECUTION SLIDES

8.1 STARTING THE SERVER IN XAMPP CONTROL PANEL-

Fig 8.1 Staring the sever in xampp control panel

8.2 USERS, SERVICE PROVIDERS LOGIN AND REGISTRATON-

Fig 8.2 Users,Service Providers Login and Registration

33
8.3 USER REGISTRATION PAGE-

Fig 8.3 User Registration Page

8.4 USER DATASET UPLOADING PAGE-

Fig 8.4 User Dataset Uploading page

34
8.5 SERVICE PROVIDER LOGIN PAGE-

Fig 8.5 Service Provider Login Page

8.6 DIABETES STATUS FROM DATASET DETAILS-

fig 8.6 Diabetes status from Dataset details

35
8.7 VIEW TRAINED AND TESTED DATA IN PIE CHART-

Fig 8.7 View trained and tested data in Pie chart

8.8 VIEW TRAINED AND TESTED DATA IN BAR CHART-

Fig 8.8 View trained and tested data in bar chart

36
8.9 ALGORITHMS ACCURACY-

Fig 8.9 Algorithms Accuracy

8.10 VIEW ALL REMOTE USERS-

Fig 8.10 View all Remote Users

37
CHAPTER 9

CONCLUSION

One of the significant impediments with the progression of technology and medicine is
the early detection of a disease, which is in this case, diabetes. However, in this study,
systematic efforts were made into designing a model which is accurate enough in
determining the onset of the disease. With the experiments conducted on the Pima Indians
Diabetes Database, we have readily predicted this disease. Moreover, the results achieved
proved the adequacy of the system, with an accuracy of 76% using the K-Nearest
Neighbours classifiers. With this being said, it is hopeful that we can implement this
model into a system to predict other deadly diseases as well. There can be room for further
improvement for the automation of the analysis of diabetes or any other disease in the
future.

In future, we will try to create a diabetes dataset in collaboration with a hospital


or a medical institute and will try to achieve better results. We will be incorporating more
Machine Learning and Deep learning models for achieving better results as well.

38
CHAPTER 10

FUTURE SCOPE

Fig 10 Future Scope

Expanding the scope of Diabetes Prediction by incorporating Deep Learning techniques


for more Accurate and Timely Predictions, and developing a User-Friendly Application
for widespread accessibility and usability

In future, we will try to create a diabetes dataset in collaboration with a hospital or a


medical institute and will try to achieve better results.

39
CHAPTER 11

BIBLIOGRAPHY

[1] P. Saeedi, I. Petersohn, P. Salpea, B. Malanda, S. Karuranga,


N. Unwin, S. Colagiuri, L. Guariguata, A. A. Motala, K. Ogurtsova,
J. E. Shaw, D. Bright, and R.Williams, “Global and regional diabetes
prevalence estimates for 2019 and projections for 2030 and 2045:
Results from the international diabetes federation diabetes atlas,
9th edition,” Diabetes Research and Clinical Practice, vol. 157, p.
107843, 2019.
[2] A. Mir and S. N. Dhage, “Diabetes disease prediction using machine
learning on big data of healthcare,” in 2018 Fourth International
Conference on Computing Communication Control and Automation
(ICCUBEA), 2018, pp. 1–6.
[3] D. Sisodia and D. S. Sisodia, “Prediction of diabetes using
classification algorithms,” Procedia Computer Science, vol.
132, pp. 1578 – 1585, 2018, international Conference on
Computational Intelligence and Data Science. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1877050918308548
[4] J. Smith, J. Everhart, W. Dickson, W. Knowler, and R. Johannes,
“Using the adap learning algorithm to forcast the onset of diabetes
mellitus,” Proceedings - Annual Symposium on Computer Applications
in Medical Care, vol. 10, 11 1988.
[5] P. S. Kohli and S. Arora, “Application of machine learning in disease
prediction,” in 2018 4th International Conference on Computing
Communication and Automation (ICCCA), 2018, pp. 1–4.
[6] Wes McKinney, “Data Structures for Statistical Computing in
Python,” in Proceedings of the 9th Python in Science Conference,
St´efan van der Walt and Jarrod Millman, Eds., 2010, pp. 56 – 61.
[7] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers,

40
P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J.
Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk,
M. Brett, A. Haldane, J. F. del R’ıo, M. Wiebe, P. Peterson,
P. G’erard-Marchant, K. Sheppard, T. Reddy, W. Weckesser,
H. Abbasi, C. Gohlke, and T. E. Oliphant, “Array programming
with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020.
[Online]. Available: https://doi.org/10.1038/s41586-020-2649-2
[8] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,
J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
´ Edouard Duchesnay, “Scikit-learn: Machine Learning in Python,”
Journal of Machine Learning Research, vol. 12, no. 85, p. 28252830,
2011.

41

You might also like