0% found this document useful (0 votes)

38 views

Predictive Model For Diabetes Using Machine Learning

This project aims to develop a machine learning model to predict diabetes using a dataset of patients' features. Various machine learning algorithms like Support Vector Machine, Logistic Regression, Random Forest, Naive Bayes and Artificial Neural Network will be trained on the dataset and their performance will be evaluated based on parameters like sensitivity, specificity and accuracy. The model with the best performance will help in early detection of diabetes and improve patients' life.

Uploaded by

Kishor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Predictive Model For Diabetes Using Machine Learning

Uploaded by

Kishor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Predictive model for Diabetes using Machine Learning

Project report submitted in fulfillment of the requirement for the

degree of Bachelor of Technology
In
Computer Science and Engineering/Information Technology

HimanshuNadda(161366)
Piyush Thakur(161373)

Under the supervision of

Dr. Ekta Gandotra
to

Department of Computer Science &Engineering and Information Technology

Jaypee University of Information Technology Waknaghat, Solan- 173234, Himachal
Pradesh
Candidate’s Declaration

I hereby declare that the work presented in this report entitled “Diabetes Prediction System
Using Machine Learning” in fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science and Engineering/ Information Technology
submitted in the department of Computer Science & Engineering and Information
Technology, Jaypee University of Information Technology Waknaghat is an authentic record
of the work carried out over a period from July 2019 to July 2020 under the supervision of
Dr. Ekta
Gandotra(Assistant Professor,Computer science and Engineering ).The matter embodied in
the report has not been submitted for the award of any other degree or diploma.

Himanshu Nadda(161366)

Piyush Thakur (161373)

This is to certify that the above statement made by the candidates is true to the best of my
knowledge.

Dr. Ekta Gandotra

Assistant Professor
Department of Computer Science & Engineering
Dated:
ACKNOWLEDGEMENT

We would like to express our special thanks of gratitude to our project guide Dr. Ekta
Gandotrawho helped us in conceptualizing the project and actual building of procedures used
to complete the project. We would also like to thank our Head of department for providing us
this golden opportunity to work on a project like this, which helped us in doing a lot of
research and we came to know about so many things.

Thanking you,

Himanshu Nadda (161366)

Piyush Thakur(161373)
Table of Contents
Abstract....................................................................................................................................................
1. Introduction ........................................................................................................................................ i
1.1 Introduction .................................................................................................................................... i

1.2Problem
Statement……….…………………….…………………………………………………….ii

1.3
Objective……………………………………………………………………………………………..iii

1.4
Methodology…………………………………………………………………………………………iii

2. Literature
Survey….…………………………….……..……………………………………………….iv

2.1
Introduction…………………………………………………………………………………………iv

2.2 Research aboutdiabetes………………………………………………………………….iv

2.3 Conclusion…………………………………………………………………………………..vi

3. System
Development.............................................................................................................................................
.....vii

3.1
Dataset………………………………………………………………………………………………vii

3.2 Data
Preprocessing………………………………………………………………………………….viii

3.3Conclusion……………………………………………………………………………………x

4. Algorithm………………………………………………………………………………………xi

4.1 Support Vector Machine

Algorithm………………………………………………………………..xi

4.2 Logistics
Regression………………………………………………………………………………..xii

4.3 Random Forest

Classifier…………………………………………………………………………...xii

4.4 Naïve Bias

Classifier………………………………………………………………………………..xiii
4.5 Artificial Neuron
Network………………………………………………………………………….xv

4.6 Feature
Importance…………………………………………………………………………………xvi

4.7Conclusion……………………………………………………………………………….xvi

5. Result and
Evaluation…………………………………………………………………………………xvii

5.1 Evaluation parameters

used…………………………………………………………………………………...xvii

5.1.1Confusion
Matrix…………………………………………………………………………………………...xviii

5.1.2sensitivity…………………………………………………………………………………...............
..xvii

5.1.3Specifity…………………………………………………………………………………………xvi
ii

5.1.4

Accuracy………………………………………………………………………………………..xix

5.2Result Analysis………………………………………………………………………………xix

5.3 Conclusion…………………………………………………………………………………..xix

6.
Conclusion And Future Work………………………………………………………..xx
Refrences………………………………………………………………………………xxi
List of Abbreviations

 ACEI Angiotensin-converting-enzyme
inhibitor
 ANN Artificial Neural Networks
 BUN Blood urea nitrogen
 CAD Coronary Artery Disease
 CKD Coronary Kidney Disease
 COPD Chronic obstructive pulmonary disease
 DL Deep Learning
 DM Diabetes Mellitus
 FN False negatives
 FP False positives
 Hb Hemoglobin
 HTN Hypertension
 IHD Ischemic heart disease (IHD)
 LR Logistic Regression
 ML Machine Learning
 NaN Not-A-Number
 PCO Polycystic Ovary Syndrome
 RFC Random Forest Classifier
 SES Social Economic Status
 SVM Support Vector Machine
 T3 Triiodothyronine
 TN True negatives
 TP True positives
 WHO World Health Organization
List of Figures

Figure1 Diabetic Rate among adults v

Figure2Statistic of Type 1, type 2 and type 3 diabetes among people vi

Figure 3 Correlation matrixix

Figure 4 Showing a type of SVM in 2-D xi

Figure5 Sigmoid functionxii

Figure 6 Classification in random forestxiii

Figure 7 Gaussion naïve bayes normal distributionxiv

Figure8 Decision treexv

Figure 9Structure of a Basic ANNxv

Figure 10 Showing importance of all features xvi

Figure 11 Showing importance of all features xvii

Figure12 Comparison of machine learning algorithms on the basis of xxi

sensitivity, specificity and accuracy
List of Tables

Table-1Features of patientsviii

Table-2 Correlation tableix

Table-3 A table about confusion matrixxvii

Table 4Comparison between Confusion Matrix xix

Table 5 Comparisson of sensitivity,specificity,accuracyxix

Abstract

Diabetes has become a common disease to the mankind from young to the old
Personsnowadays. There are various reasons due to which the population of diabetic patients
is increasing day by day such as obesity, bad diet, auto immune reaction, change in lifestyle,
eating habits, environmental pollution etc . Hence, early prediction of diabetes is very
essential to save the human life from diabetes. Data analytics is one of the branches of
computer science ,which is a process of examining the large datasets and find some useful
hidden patterns and draw conclusion based upon those patterns .This analytical process is
carried out using machine learning algorithms in health care system.To carry out medical
diagnoses,machine learning algorithms are used for analysing large medical data to build
the machine learning models. This project presents a diabetes prediction system to diagnosis
of diabetes.Early detection of diabetes is possible with the help of this model.
Chapter-1
Introduction

1.1Introduction
Diabetes is brought about by the expansion level of the sugar (glucose) in the blood. The
diabetes can be of two sorts, for example, type 1 diabetes and type 2 diabetes. Type 1
diabetes is an immune system malady. The cells are demolished by body which are
fundamental to create insulin to assimilate sugar .This sort can be caused paying little heed to
heftiness. The weight is the expansion of weight list (BMI) than the typical degree of BMI of
an individual . Type 1 diabetes can be found in kids and grown-ups at times. The grown-ups
who are corpulent are predominantly influenced by this sort. For the most part moderately
aged individuals are influenced by type 2 diabetes. Diabetes is a major reason to different
ailments, for example, coronary illness, stroke ,kidney sickness, eye issues, dental infection,
nerve harm, foot issues. Side effects which can cause diabetes are over the top discharge of
pee (polyuria), thirst, consistent appetite, weight reduction, vision changes, and weakness,
can happen suddenly.[1]

i
1.2 Problem Statement

The serious issue which is executing a great many individuals all through the world is
diabetes. In any case, with the progressions in innovations human life is succeeding. Hence
for the better eventual fate of human life and medicinal services why not utilize these
advances. Different AI and deep learning calculations are utilized for some sort of forecast
offices. Often these calculations are utilized by business giants for benefit in deals. Given the
subject how might we utilize innovations for the human advancement. Different calculations
utilized and learnare to be tested for expectation of something whose specialization just lives
in the hands of specialists. So as to learn different complexities of different highlights of bio
mechanics of human body and foresee the entangled issues of individuals. the machine must
be prepared with the attitude of a specialist with different highlights and outer components
gave from a valid dataset.

ii
1.3 Objectives

The principle target of this expectation framework for diabetic patients is to discover a
helpful model to serve humankind and can be comprehended be the accompanying focuses.

a) Implementing the essentials of AI

b) To discover connection between Diabetic Patient and his different components that
influences the malady

c) Compare performances of all algorithms and in the end use the most efficient model.

1.4 Methodology

In clinical field grouping of information into various classes is finished by utilizing

diverse order systems as indicated by some compels relatively an individual classifier.
Diabetes influences the capacity of the body in creating the hormone insulin.which
brings about the raise the degrees of glucose in the blood and turn makes the digestion
of sugar anomalous. An individual for the most part experiences high glucose.
Strengthen thirst, Intensify hunger and Frequent pee are a portion of the side effects
caused because of high glucose. Numerous difficulties happen if diabetes stays
untreated. Diabetic ketoacidosis and nonketotic hyperosmolar trance like state are a
portion of the significant complexities. Numerous scientists are directing
investigations for the conclusion of diabetes utilizing different order calculations of
AI approaches like Naive Bayes, Decision Tree, Random timberland and so on. In
Machine Learning we can prepare the PC to gain from different datasets By applying
different calculations and experiencing a lot of cost work the idea of AI calculations
can be utilized in prescient model regularly applying a connection between these
factors.[2]

iii
Chapter-2

Literature survey

2.1 Introduction

Diabetes is a disease of various problems. The World Health Organization (WHO)

survey has found that 1.2 million people died due to this chronic disease. Moreover
2.2 million people suffered the same fate due to cardiovascular diseases.The risk of
stroke and other heart diseases increases due to increase in level of glucose.. Type 2
diabetes patientsare prone to biological factors of hypertension, chest pain, obesity
etc. making them more vulnerable to this disease.Blood pressure, cholesterol,
triglycerides, obesity, sedentary lifestyle, abnormal sugar levels and smoking etc are
factors that are mutually exclusive for both of these chronic diseases. Abundant
amount of research is done in past 10 years and this research has created an
opportunity to develp an important tool for future references.

2.2 Research aboutDiabetes

Diabetes is an incessant infection wherein levels of sugar and glucose are very
unsteady. A few illnesses are the consequence of this shakiness. Once in a while these
medical problems can cause abrupt passing moreover. Diabetes is an ailment which
results in light of turmoil for digestion .It can be arranged in three types.

There are numerous individuals who are experiencing this sickness and number of
these kind of individuals are expanding step by step. It has been found in ongoing
overview that one out of 11 grown-ups are experiencing this sickness. As indicated by
an ongoing study it has been discovered that one of every 11 grown-ups are

iv
experiencing this ailment. It's a serious hazardous measurement for a malady to
spread that way. [3]

Figure1-Diabetic Rate among adults[12]

The inability of body to produce very less amount of insulin or nothing can lead to
many complications. There is an extraordinary hazard on pancreas of the individual
experiencing type 1 of the ailment. A recent study shows that type1 diabetes generally
happens in age group of 1-20.

3]Type 2:

The inability of body to deny or resist any kind of insulin produced by the body
which results in non-availability of insulin to the body. Type 2 diabetic patients are
more prone to heart related ailments.According to recent survey of World Health
Organisation (WHO) has found that maximum of patients suffer from type2 diabetes.

Type 3:

It is a rare type of diabetes which have a serious damage on the brain of a

person,which is commonly known as Gestational Diabetes.

v
Figure2 Main symptoms of diabetes

Treatment related to low blood sugar in most cases is same for type1 and type2.Most cases
are considered to be mild not medical emergencies.Feeling of unease, sweating, trembling
etc. are serious effects .There are more dangerous serious effects such as aggressiveness,
permanent brain damage and death in severe cases.

Figure3- Statistic of Type 1,type2 and type 3diabetes among people

vi
2.3Conclusion

Diabetes is a hurtful malady which have interrelated reactions on the human body.
There can be numerous highlights that are basic in both the sickness. Through those
highlights here it very well may be built up that there can be an important expectation
from the applicable information.

vii
Chapter -3

System Development

We have implemented various methods or approaches to use our data systematically

and in synchronized way for the purpose of the development of our model.Moreover
the test plan is according to our model and can be helpful if we wants to make further
improvements and developments to our model.

3.1Data

This dataset depicts clinical records for Pima Indians and whether every patient will have
a beginning of diabetes inside five years.

The dataset have data related to 768 women with 8 characterstics:

i) Number of times pregnant(NTP)

ii) Plasma glucose concentration
iii) Diastolic blood pressure (mm Hg)
iv) Skin fold thickness (mm)
v) 2-Hour serum insulin (mu U/ml)
vi) Body mass index (weight in kg/(height in m)^2)
vii) Diabetes pedigree function
viii) Age (years)

viii
Attribute Attribute
no.
1 No. of time pregnant(NTP)
2 Plasma glucose concentration(PGC)
3 Distolic blood pressure(mmHg)(DBP)
4 Triceps skin-fold thickness(mm)(TSFT)
5 2-h serum insulin(mu U/mL)(H2SI)
6 Body mass Index(kg/m2)(BMI)
7 Diabetes Pedigree Function(DPF)
8 Age
9 Outcome

Brief description of all eight features is given in table1.

Table-1: Features of patientsin dataset[17]

3.2Data Preprocessing

We may wind up drawing an off base surmising about the information, if the missing
qualities are not dealt with appropriately. Since all the segments or columns probably
won't be helpful for the model or the informational index that is accessible isn't in the
structure wherein it tends to be utilized for the preparation of the machine in every
one of these cases information pre-handling is a significant factor that decides the
sound beginning of the model. Information pre-handling is a procedure which is
utilized to turn crude information to valuable organization. Information Pre-handling
is one of the significant highlights required for the preparation of the model. Data pre-
processing incorporates checking for invalid values on the off chance that these
invalid values are supplanted by mean of entire section. In data pre-processing
straight out information can be changed into numerical information .label_encoder is
object which help us in moving Categorical information into Numerical information.

ix
Relationship shows the quality and course of the straight relationship between two
quantitative factors. It takes esteems between - 1 and +1. A positive incentive for r
shows a positive affiliation and a negative an incentive for r demonstrates a negative
affiliation.

The last step in datapre-processing is thesplitting of data into training and testing
data.In our ML model we have used cross_validation object from sklearn library
train_test_split.

Table-2 correlation table

x
Figure4 -Correlation matrix

3.3 Conclusion

We have pre-processed our data and made it useful to the further implementation.Various
missing values are replaced,many columns are deleted and converted into numerical values in
order to have positive impact on model.

xi
xii
Chapter -4

Algorithms

Method

Various ML and DL algorithms have been used to predict Diabetes in our dataset in this
section.We will utilize logistic regression ,Random forest,Decision Tree,ArtificialNeural
Networks etc. algorithms to predict and analyse results and compare these algorithms on the
basis of performance.

4.1 Support Vector Machine (SVM)

It is a supervised machine learning algorithm which can be used in regression as well as for
classification purposes. But it is mostly used for classification as we are using in our project.
In this algorithm we plot each data item in a n dimensional space in which n is number of
features. By finding the hyper-plane these points are differentiated between two different
classes.One side of hyper-plane has a place with one classification and opposite side has a
place with other class. This algorithm is very effective where number of dimensions are high.
This algorithm identifies those extreme points also known as support vectors in order to find
the hyper-plane .Every one has a place with it is possible that one classification or has a
place with other classification.

The SVM graph usually looks like as following figure.

xiii
This algorithm not only just draw a simple line between two different categories, but
also have a region of certain width about the line .We will fit aSVM Classifier to our
pre-processed data. While the mathematical details of the likelihood model are
interesting.

4.2 Logistic Regression

It is a supervised machine learning algorithm. It is chiefly utilized on the off chance
that when target variable is clear cut, It utilizes a strategic capacity, additionally called
the sigmoid capacity which takes an incentive somewhere in the range of 0 and 1On
the premise of which we can anticipate class of the objective variable. This is sigmoid
capacity
𝑒𝑡 1
𝜎 𝑡 = =
𝑒 + 1 1 + 𝑒−𝑡
𝑡

Figure6sigmoid function[11]

4.3 Random Forest Classifier

Random forest classifier falls in the category of supervised machine learning

algorithm.It creates manyDTs and in the end it merges them together to give the final
result. Many conditional statements are associated with multiple decision trees which
helps in great accuracy.We can avoid over fitting ,due to such great number of
random DTs.

xiv
The working of this algorithm can be better understood with the help of following
steps-

Step1- Randomly select samples from the given date set .

Step2- This algorithm will construct DT for every sample and also get prediction from
each DT.

Step3- Voting will be done on each and every predicted result.

Step4- In the last most voted prediction will be given as result.

Figure7-classification in random forest[10]

4.4 Naive Bayes

Naive Bayes is probabilistic algorithmwhich is based on Bayes' Theorem. This

algorithm do incrediblein grouping issues dependent supposition each element

xv
equivalent or autonomous. Mathematical representation of Naïve Bayes algorithm is
as follows-

𝑃 𝐵𝐴 𝑃 𝐴
𝑃 𝐴𝐵 =
𝑃 𝐵

Probability of event A,when given event B is already true..

Figure8Gaussion naïve bayes normal distribution[11]

Each component is given a similar weight(importance). For instance, knowing just

temperature and mugginess alone can't anticipate the result precisely. All properties
are similarly pertinent and thought to be contributing similarly to the result.

4.5 Decision Tree :

DT isthe preferred tool for classification and prediction. Decision tree looks like tree, in
which the middle nodes contain the test on attributes and the leaf node contains class
label.

xvi
Decision trees arrange occasions by arranging them down the tree from the establishment
to some leaf hub, that gives the grouping of the case. An example is classed by starting
at the establishment hub of the tree, testing the trait such by this hub, at that point
descending the appendage likened to the value of the property as appeared inside the in
beneath of figure. This strategy is then lasting for the subtree frozen in place at the new
hub

Figure9Decision tree[14]
4.6 Artificial Neural Network (ANN)

This algorithm is simply based on human brain.It works exactly same as how humans
take decisions on different conditions. Decision making is done on various internal
noses of ANN.Structural representation of ANN is shown in below figure.

xvii
Figure10 -Structure of a Basic ANN[16]
Each info esteem has some specific weightage related with the expectation of yield
parameter, Then these parameters are passed from these hubs to shrouded hubs. In
these concealed layer summation and actuation work attempts to foresee the last yield.
ANN basically takes a shot at input technique , where for good forecast organize is
compensated and for terrible expectation it is rebuffed.

4.7 Feature Importance

So here we have 8 criteria that can be used to estimate and predict the diabietes. These
criteria are the pregnancy (that is condition of women when give birth to child),
quantity of glucose , the number or value of blood pressure , the thickness of human
skin , Body mass Ratio also called BMI, the function of diabetes pedigree and the
year to which human body has gone through with disease & the age , these all
collectively going to predict the diabetes of homosapien. The collective dataset of
feature are going to give crucial information about the estmation of diabetes in a
human being.

All the features of our dataset play an important role in the prediction of diabetes.
Calculating importance of each feature will help us to find that how much each
feature is relevant in finding the output of our model.

xviii
Figure11 Showing importance of all features [17]

4.8 Conclusion

We have gone through many more different kind of model and weanalyze that the top
most efficient algorithim among them till now is the random forest one , that is
producing best output and result from the all tried model.

xix
Chapter -5

Result and Evaluation

We can explore the right outcomes through the maginificentstuding and going
through variety of literature survey and books on the project , and here we are
implementing various model for this approach deending on the accuracy and various
parameter of them. The different kind of approaches and classifier of AI , deep CNN
and the machine language learning are implemented and then a graph and matrix of
confusion is found out with the help of them and used to find out the accuracy ,
specificity , uniqueness and measure or strength of performance and at last we will
find out which one is closer to the real data output . at last we plot a graph for
acquiring which model is closest to the original data by implementing LSM(least
square method ) and other distance formula.[6]

5.1 Evaluation Parameters Used

5.1.1 Confusion Matrix

Confusion matrix is just a matrix that is consists of 4 elemnts or variable namely TP,
FP, FN & TN. These values are estimated as by the classifier or any model of AI and
it predicts them and then check whether shows correct result or not . depending upon
that the value for each of the 4 variable is inserted . Always remember this that the
sum of the row of matrix is 1 and value for each of the four variable is a probability
value lie between 0& 1. So this is the way the confusion matrix is formed.

TP: This indicates that this value is true or right and estimates also true.

FP: This indicates that this value is false or wrong and estimated as true.

FN: This indicates that this value is true or right and estimated as fasle.

TN: This indicates that this value is fasle or wrong and estimated also false.

xx
Predictions

Class Positive Class Negative

Actual Class Positive TP FN

Class Negative FP TN

Table-3: A table about confusion matrix

5.1.2 Sensitivity

Sensitivity refers to positives which are actually positive and is estimated true with the
help of modelfrom all true or positive ones. In maths or statsical expression of
sensitivity can be seen below.

𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑃 + 𝐹𝑁

5.1.3 Specificity

Specificity refers to which are actually positive and predicted to be wrong or falsefrom
themodel of AIfrom all negative or wrong. The formulatic way of representing in the
Mathematical expression of specificity can be seen below.

𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃

5.1.4 Accuracy

Accuracy tells us about the correctly predicted results. This one give us reprsent or call
about data that upto which extent it predicted it is accurateby a model. This could be
find out by mathematical notation given below-

𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁

xxi
5.2Result Analysis

We used four Machine Learning Algorithms which are Random Forest, Decision Tree,
logistic Regression and Naïve Bayes algorithms on the pre-processed data. The
Confusion Matrix obtained is shown in table4

Classifier Confusion Matrix

Decision Tree
55 25

44 107

Support Vector
Machine 46 34

27 124

Artificial Neural
Network 40 40

29 122

Logistic Regression -
Cross Validation 53 27

42 109

Logistic Regression
48 32

23 128

Random Forest
Classifier 48 32

28 123

Naive Bayes
51 27

32 119

Table 4-Confusion Matrix Obtained

xxii
Further from above confusion matrix we can deduce values of sensitivity ,specificationetc.In
medical field we will always choose ML model with good sensitivity and specification.

xxiii
Comparison
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Logistic Regression Random Forest Naive Bayes Decision Tree Support Vector Artificial Neural
Classifier Machine Network

Sensitivity Specificity Accuracy

Figure 12 :Comparison of machine learning algorithms on the basis of sensitivity

Figure 12 shows the visualization of results.It is clear from the figure random forest
algorithm have highest accuracy i.e 86% and Naïve Bayes algorithm have lowest accuracy
i.e 80%

5.3 Conclusion
We have used six Machine Learning Algorithms which are Random Forest, Decision Tree,
logistic Regression and Naïve Bayes, Support Vector Machine, Artificial Neural Network
algorithms on the pre-processed data .It is clear from the result and all evaluation parameters
used that Artificial Neural Network has the highest accuracy i.e 0.80. and Decision Tree
algorithm has Least accuracy .

xxiv
Chapter-6
Conclusion and future Work

6.1 Conclusion

Here we applied different Machine Learning and Deep Learning techniques to construct a
diabetes classifier .We have accomplished ideal exactness through Artificial Neural Network
classifier i.e 0.80 . One of the significant true clinical issues is the discovery of diabetes at its
beginning period. In this investigation, synchronized endeavors are made in structuring a
framework which brings about the forecast of ailment like diabetes. During this work, four AI
order calculations are considered and assessed on different measures. Investigations are
performed on Pima Indians Diabetes database.

6.2 Future Work

xxv
References

[1]Journal of Computational Intelligence in Bioinformatics ISSN 0973-385X Volume 10,

Number 1 (2017) pp. 1-8 © Research Foundation http://www.rfgindia.com
[2]A Method for Classification Using Machine Learning Technique for Diabetes Aishwarya.
School of Computing Science and Engineering, VIT University, Vellore – 632014, Tamil
Nadu, India.
[3]2) Causes of vision loss worldwide, 1990-2010: a systematic analysis.
Bourne RR, Stevens GA, White RA, Smith JL, Flaxman SR, Price H et al. Lancet Global
Health 2013;1:e339-e349
[4] A. Nather, C.S. Bee, C.Y. Huak, J.L.L. Chew, C.B. Lin, S. Neo, E.Y. Sim,
Epidemiology of diabetic foot problems and predictive factors for limb loss,
J. Diab. Complic. 22 (2) (2008) 77–82.
[5] Shiliang Sun, A survey of multi-view machine learning, Neural Comput. Applic.
23 7–8) (2013) 2031–2038.
[6]R.Alizadehsani, M.H. Zangooei, M.J. Hosseini, J. Habibi, A. Khosravi, M. Roshanzamir,
F. Khozeimeh, N. Sarrafzadegan, S. Nahavandi, Coronary artery disease detection using
computational intelligence methods, Knowledge-Based Systems, 109 (2016) 187-197.

[7] P. Sattigeri, J.J. Thiagarajan, M. Shah, K.N. Ramamurthy, A. Spanias, A scalable

feature learning and tag prediction framework for natural environmentsounds, Signals Syst.
and Computers 48th Asilomar Conference on Signals,Systems and Computers, 2014, 1779–
1783.
[8] M.W. Libbrecht, W.S. Noble, Machine learning applications in genetics andgenomics,
Nat. Rev. Genet. 16 (6) (2015) 321–332.
[9] K. Kourou, T.P. Exarchos, K.P. Exarchos, M.V. Karamouzis, D.I. Fotiadis, Machine
learning applications in cancer prognosis and prediction, Comput. Struct.Biotechnol. J. 13
(2015) 8–17.
[10] E.M. Hashem, M.S. Mabrouk, A study of support vector machine algorithm for
liver disease diagnosis, Am. J. Intell. Sys. 4 (1) (2014) 9–14.
[11] W. Mumtaz, S. Saad Azhar Ali, M. Azhar, M. Yasin, A. Saeed Malik, A machine
learning framework involving EEG-based functional connectivity to diagnosemajor
depressive disorder (MDD), Med. Biol. Eng. Comput. (2017) 1–14.
[12] D.K. Chaturvedi, Soft computing techniques and their applications, in:Mathematical
Models, Methods and Applications, 31–40. Springer Singapore,

xxvi
2015.
[13] A. Tettamanzi, M. Tomassini, Soft computing: integrating evolutionary, neural,
and fuzzy systems, Springer Science & Business Media (2013).
[14] M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, B. Scholkopf, Support vector
machines, IEEE Intell. Syst. Appl. 13 (4) (1998) 18–28.
[15] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and
applications, Neurocomputing 70 (1) (2006) 489–501.
[16] S.A. Dudani, The distance-weighted k-nearest-neighbor rule, IEEE Trans. Syst.
Man Cybernet. SMC-6 (4) (1976) 325–327.
[17]Zhi-Hua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble Based C4.5. IEEE Trans.
Knowl. Data Eng, 16. 2004.
.

xxvii
JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY, WAKNAGHAT
PLAGIARISM VERIFICATION REPORT
Date: 7/15/2020
Type of Document (Tick): PhDThesis M.Tech Dissertation/ Report B.Tech Project Report Paper

Name:_Himanshu Nadda,_Piyush Thakur__Department:_Computer Science_Enrolment No _161366,

161373__Contact No.___9418197210,_8350833681___ [email protected],
[email protected]___________Name of the Supervisor: __Dr. Ekta Gandotra________
Title of the Thesis/Dissertation/Project Report/Paper (In Capital letters): ______PREDICTIVE MODEL FOR
DIABETES USING MACHINE LEARNING ______________________________________________________
________________________________________________________________________________________________________

UNDERTAKING
I undertake that I am aware of the plagiarism related norms/ regulations, if I found guilty of any plagiarism and
copyright violations in the above thesis/report even after award of degree, the University reserves the rights to
withdraw/revoke my degree/report. Kindly allow me to avail Plagiarism verification report for the document
mentioned above.
 Total No. of Pages = 36

 Total No. of Preliminary pages = 10

 Total No. of pages accommodate bibliography/references = 2

(Signature of Student)
FOR DEPARTMENT USE
We have checked the thesis/report as per norms and found Similarity Indexat………………..(%). Therefore, we are
forwarding the complete thesis/report for final plagiarism check. The plagiarism verification report may be
handed over to the candidate.

(Signature of Guide/Supervisor)Signature of HOD

FOR LRC USE

The above document was scanned for plagiarism check. The outcome of the same is reported below:
Copy Received on Excluded Similarity Index Abstract & Chapters Details
(%)
14%
Word Counts
 AllPreliminary
Pages
 Bibliography/ Character Counts
Report Generated on
Images/Quotes Submission ID Page counts
 14 Words String
File Size

Checked by

Name & Signature Librarian

Please send your complete Thesis/Report in (PDF) &DOC (Word File) through yourSupervisor/Guide at
[email protected]
..……………………………………………………………………………………………………………………………………………………………………………

Please send your complete Thesis/Report in (PDF) &DOC (Word File) through yourSupervisor/Guide at
[email protected]

Raymond Lull - Ars Magna (Ars Generalis Ultima)
No ratings yet
Raymond Lull - Ars Magna (Ars Generalis Ultima)
341 pages
5 Dysfunctions Handout
100% (6)
5 Dysfunctions Handout
4 pages
Descriptive Text - LKPD
80% (5)
Descriptive Text - LKPD
2 pages
Airline Industry
No ratings yet
Airline Industry
14 pages
Same Sex Marriage Petition - Respondent Side
50% (2)
Same Sex Marriage Petition - Respondent Side
17 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
Seminar Report Shanu Saklani
No ratings yet
Seminar Report Shanu Saklani
22 pages
Ijarcce 2020 9712
No ratings yet
Ijarcce 2020 9712
7 pages
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
No ratings yet
A Mini Skill Based Project Report On: Machine Learning & Optimization (270404)
20 pages
Synopsis - Diabetes Prediction
No ratings yet
Synopsis - Diabetes Prediction
28 pages
Synopsis Diabetes Pred System ML
No ratings yet
Synopsis Diabetes Pred System ML
9 pages
Diabetes Pridiction Using Machine Learning
No ratings yet
Diabetes Pridiction Using Machine Learning
31 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Diabetes PPT
100% (1)
Diabetes PPT
9 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
Handwriting Recognition: Chappidi Aswarta Reddy (Urk18Cs080)
No ratings yet
Handwriting Recognition: Chappidi Aswarta Reddy (Urk18Cs080)
27 pages
Diabetes Prediciton Model
100% (1)
Diabetes Prediciton Model
23 pages
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
No ratings yet
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
12 pages
Supervised Learning Method of Diabetes Prediction
No ratings yet
Supervised Learning Method of Diabetes Prediction
10 pages
minipro2[1]
No ratings yet
minipro2[1]
24 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
Project Report
No ratings yet
Project Report
20 pages
FINALreportondiabetesprediction-numbered
No ratings yet
FINALreportondiabetesprediction-numbered
33 pages
Mini Project
No ratings yet
Mini Project
15 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Ijs DR 2205103
No ratings yet
Ijs DR 2205103
4 pages
5_6282551093981352604
No ratings yet
5_6282551093981352604
15 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
13 pages
KANAK BLACKBOOK PROJECT (1)
No ratings yet
KANAK BLACKBOOK PROJECT (1)
57 pages
Diagnosis of Diabetes Using Machine Learning
No ratings yet
Diagnosis of Diabetes Using Machine Learning
12 pages
Health Dataset Synopsis New
No ratings yet
Health Dataset Synopsis New
9 pages
Independent Project
No ratings yet
Independent Project
10 pages
REPORT Final
No ratings yet
REPORT Final
29 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
No ratings yet
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
7 pages
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
No ratings yet
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
5 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages
final PPT
No ratings yet
final PPT
44 pages
Major Project Final TABLE DIAGRAM
No ratings yet
Major Project Final TABLE DIAGRAM
28 pages
Predicting Diabetes Using SVM Implemented by Machine Learning
No ratings yet
Predicting Diabetes Using SVM Implemented by Machine Learning
3 pages
Dinesh Paper On Diabetes Mellitus (9%)
No ratings yet
Dinesh Paper On Diabetes Mellitus (9%)
8 pages
Mini Project Report
No ratings yet
Mini Project Report
34 pages
Estimating diabetic risk accurately(ppt)
No ratings yet
Estimating diabetic risk accurately(ppt)
26 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
39 pages
thesis
No ratings yet
thesis
49 pages
Diabetes Prediction - ML
No ratings yet
Diabetes Prediction - ML
29 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
bca 5th sem minor report
No ratings yet
bca 5th sem minor report
46 pages
3 Journal
No ratings yet
3 Journal
9 pages
Minor Project Report
0% (1)
Minor Project Report
25 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
25 pages
B3_442
No ratings yet
B3_442
5 pages
Project Report On Diabetes Prediction
No ratings yet
Project Report On Diabetes Prediction
29 pages
Diabetes Mellitus Prediction and Classifier Comparitive Study
No ratings yet
Diabetes Mellitus Prediction and Classifier Comparitive Study
7 pages
paper4
No ratings yet
paper4
5 pages
Diabetes Prediction Using KNN Algorithm: B. Nagarjuna Reddy (1), Ch. Venkata Nilesh (2), B. Raghunath Reddy
No ratings yet
Diabetes Prediction Using KNN Algorithm: B. Nagarjuna Reddy (1), Ch. Venkata Nilesh (2), B. Raghunath Reddy
11 pages
Synopsis
No ratings yet
Synopsis
13 pages
Proposal
No ratings yet
Proposal
12 pages
Diabetes Prediction Using KNN Algorithm: B. Nagarjuna Reddy (1), Ch. Venkata Nilesh (2), B. Raghunath Reddy
No ratings yet
Diabetes Prediction Using KNN Algorithm: B. Nagarjuna Reddy (1), Ch. Venkata Nilesh (2), B. Raghunath Reddy
11 pages
Diabetes Synopsis Report
No ratings yet
Diabetes Synopsis Report
10 pages
Ext_74513
No ratings yet
Ext_74513
10 pages
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
From Everand
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
Zemelak Goraga
No ratings yet
Data Science Project Ideas, Methodology & Python Codes in Health Care
From Everand
Data Science Project Ideas, Methodology & Python Codes in Health Care
Zemelak Goraga
No ratings yet
ادارة المعرفة
No ratings yet
ادارة المعرفة
172 pages
Mitral Valve Stenosis: Peripheral Organs
No ratings yet
Mitral Valve Stenosis: Peripheral Organs
1 page
Isoket
100% (1)
Isoket
2 pages
RPP Kelas XI Semester 2 (Speaking 2) : Kegiatan Awal (Pre-Activity)
No ratings yet
RPP Kelas XI Semester 2 (Speaking 2) : Kegiatan Awal (Pre-Activity)
3 pages
2 Evil Genius
50% (2)
2 Evil Genius
6 pages
Vaishnavi G Prasad Pooja Premnath
No ratings yet
Vaishnavi G Prasad Pooja Premnath
24 pages
01141545
No ratings yet
01141545
7 pages
S.D. PUBLIC SCHOOL (1)
No ratings yet
S.D. PUBLIC SCHOOL (1)
23 pages
Bacq Toubiana Et Al Entrepreneurship Out of Shame Version 2
No ratings yet
Bacq Toubiana Et Al Entrepreneurship Out of Shame Version 2
50 pages
Chapter - 1: 1.1 Background of The Study
No ratings yet
Chapter - 1: 1.1 Background of The Study
45 pages
B Animation Principles
No ratings yet
B Animation Principles
6 pages
Instructions - Assume Suitable Data Wherever Necessary and Write The Assumptions Clearly
No ratings yet
Instructions - Assume Suitable Data Wherever Necessary and Write The Assumptions Clearly
2 pages
Capstone Project References
0% (1)
Capstone Project References
2 pages
Marine Thesis Proposals
100% (3)
Marine Thesis Proposals
4 pages
Pengaruh Terapi Bermain Dalam Menurunkan Kecemasan Pada Anak Sebagai Dampak Hospitalisasi Di RSUD Ambarawa
No ratings yet
Pengaruh Terapi Bermain Dalam Menurunkan Kecemasan Pada Anak Sebagai Dampak Hospitalisasi Di RSUD Ambarawa
6 pages
Cheatsheet
No ratings yet
Cheatsheet
9 pages
Agency: Mercantile and Non-Mercantile Agency
No ratings yet
Agency: Mercantile and Non-Mercantile Agency
4 pages
Creamy Bacon Pap
No ratings yet
Creamy Bacon Pap
4 pages
Literature Matrix Plan
100% (1)
Literature Matrix Plan
3 pages
Digitalsachkhandbook PDF
No ratings yet
Digitalsachkhandbook PDF
180 pages
Lecture 1
No ratings yet
Lecture 1
33 pages
Bonding
No ratings yet
Bonding
51 pages
1050 Commandments Christians To Follow
No ratings yet
1050 Commandments Christians To Follow
12 pages
Best Practices For Container Security - Forrester VMware
No ratings yet
Best Practices For Container Security - Forrester VMware
12 pages
United States v. Ron Lafraugh, 893 F.2d 314, 11th Cir. (1990)
No ratings yet
United States v. Ron Lafraugh, 893 F.2d 314, 11th Cir. (1990)
8 pages

Predictive Model For Diabetes Using Machine Learning

Uploaded by

Predictive Model For Diabetes Using Machine Learning

Uploaded by

Predictive model for Diabetes using Machine Learning

Project report submitted in fulfillment of the requirement for the

Under the supervision of

Department of Computer Science &Engineering and Information Technology

Piyush Thakur (161373)

Dr. Ekta Gandotra

Himanshu Nadda (161366)

2.2 Research aboutdiabetes………………………………………………………………….iv

4.1 Support Vector Machine

4.3 Random Forest

4.4 Naïve Bias

5.1 Evaluation parameters

Figure1 Diabetic Rate among adults v

Figure2Statistic of Type 1, type 2 and type 3 diabetes among people vi

Figure 3 Correlation matrixix

Figure 4 Showing a type of SVM in 2-D xi

Figure5 Sigmoid functionxii

Figure 6 Classification in random forestxiii

Figure 7 Gaussion naïve bayes normal distributionxiv

Figure8 Decision treexv

Figure 9Structure of a Basic ANNxv

Figure 11 Showing importance of all features xvii

Figure12 Comparison of machine learning algorithms on the basis of xxi

Table-2 Correlation tableix

Table-3 A table about confusion matrixxvii

Table 4Comparison between Confusion Matrix xix

Table 5 Comparisson of sensitivity,specificity,accuracyxix

a) Implementing the essentials of AI

In clinical field grouping of information into various classes is finished by utilizing

Diabetes is a disease of various problems. The World Health Organization (WHO)

2.2 Research aboutDiabetes

Figure1-Diabetic Rate among adults[12]

It is a rare type of diabetes which have a serious damage on the brain of a

Figure3- Statistic of Type 1,type2 and type 3diabetes among people

We have implemented various methods or approaches to use our data systematically

The dataset have data related to 768 women with 8 characterstics:

i) Number of times pregnant(NTP)

Brief description of all eight features is given in table1.

Table-1: Features of patientsin dataset[17]

Table-2 correlation table

4.1 Support Vector Machine (SVM)

The SVM graph usually looks like as following figure.

4.2 Logistic Regression

4.3 Random Forest Classifier

Random forest classifier falls in the category of supervised machine learning

Step1- Randomly select samples from the given date set .

Step3- Voting will be done on each and every predicted result.

Step4- In the last most voted prediction will be given as result.

Figure7-classification in random forest[10]

4.4 Naive Bayes

Naive Bayes is probabilistic algorithmwhich is based on Bayes' Theorem. This

Probability of event A,when given event B is already true..

Figure8Gaussion naïve bayes normal distribution[11]

Each component is given a similar weight(importance). For instance, knowing just

4.5 Decision Tree :

4.7 Feature Importance

Result and Evaluation

5.1 Evaluation Parameters Used

5.1.1 Confusion Matrix

Class Positive Class Negative

Actual Class Positive TP FN

Table-3: A table about confusion matrix

Classifier Confusion Matrix

Table 4-Confusion Matrix Obtained

Sensitivity Specificity Accuracy

Figure 12 :Comparison of machine learning algorithms on the basis of sensitivity

6.2 Future Work

[1]Journal of Computational Intelligence in Bioinformatics ISSN 0973-385X Volume 10,

[7] P. Sattigeri, J.J. Thiagarajan, M. Shah, K.N. Ramamurthy, A. Spanias, A scalable

Name:_Himanshu Nadda,_Piyush Thakur__Department:_Computer Science_Enrolment No _161366,

 Total No. of Preliminary pages = 10

 Total No. of pages accommodate bibliography/references = 2

(Signature of Guide/Supervisor)Signature of HOD

FOR LRC USE

Name & Signature Librarian

You might also like