0% found this document useful (0 votes)

8 views10 pages

Ext 74513

The document discusses a research study on predictive modeling and analytics for diabetes using machine learning techniques, specifically applied to the Pima Indian diabetes dataset. Five different supervised machine learning algorithms were utilized, including support vector machines, k-nearest neighbors, artificial neural networks, and multifactor dimensionality reduction, to classify patients as diabetic or non-diabetic. The study emphasizes the importance of early detection of diabetes and presents the performance metrics of the models, highlighting the accuracy of the linear kernel SVM model as 0.89.

Uploaded by

cynthiachinaza2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

Ext 74513

Uploaded by

cynthiachinaza2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

Predictive Modelling and Analytics for Diabetes

using a Machine Learning Approach
[1]
Prateek Mishra, [2] Dr.Anurag Sharma, [3]Dr.Abhishek Badholi
[1][2][3]
Computer Science and Engineering, MATS University, Raipur,India

Abstract: Diabetes may be a major disorder which may affect entire body system adversely. Undiagnosed diabetes can increase the
danger of cardiac stroke, diabetic nephropathy and other disorders. everywhere the planet many people are suffering from this
disease. Early detection of diabetes is extremely important to take care of a healthy life. This disease may be a reason of worldwide
concern because the cases of diabetes are rising rapidly. Machine learning (ML) may be a computational method for automatic
learning from experience and improves the performance to form more accurate predictions. within the current research we've
utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R
data manipulation tool. To classify the patients into diabetic and non-diabetic we've developed and analyzed five different
predictive models using R data manipulation tool. For this purpose, we used supervised machine learning algorithms namely linear
kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbor (k-
NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).
Keywords— Machine learning, Multifactor dimensionality reduction (MDR) Support vector machine (SVM), k-nearest neighbor
(kNN), (ANN), Artificial neural network

I. INTRODUCTION research the info and retrieve the knowledge from the past

Diabetes may be a quite common metabolic disease. Usually experiences [4]. Next step is testing the model to calculate
onset of diabetes happens in time of life and sometimes the accuracy and performance of the system. and eventually,
in adulthood. But nowadays incidences of this disease are optimization of the system, i.e. improvising the model by
reported in children also. There are several factors for using new rules or data set [5]. The techniques of machine
developing diabetes like genetic susceptibility, weight, food learning are used for classification, prediction and pattern
habit and sedentary lifestyle. Undiagnosed diabetes may end recognition. Machine learning are often applied in various
in very high blood glucose level referred as areas like: program, website ranking, email filtering, face
hyperglycemia which may cause complication like diabetic tagging and recognizing, related advertisements, character
retinopathy, nephropathy, neuropathy, cardiac stroke and recognition, gaming, robotics, disease prediction and traffic
foot ulcer. So, early detection of diabetes is management [6]. The essential learning process to develop a
extremely important to enhance quality of lifetime predictive model.
of patients and enhancement of Now days, machine learning algorithms are used for
their anticipation [1].Machine Learning cares with the automatic analysis of high dimensional biomedical data
event of algorithms and techniques that permits the [7].Diagnosis of disease, skin lesions, cancer classification,
computers to find out and gain intelligence supported the risk assessment for disorder and analysis of genetic and
past experience. it's a branch of AI (AI) and is genomic data are a number of the samples of biomedical
closely associated with statistics. By learning it means the application of ML [8,9]. For disease diagnosis. has
system is in a position to spot and understand the input successfully implemented SVM algorithm [10]. so as to
file, in order that it can make decisions and diagnose major clinical depression (MDD) supported EEG
predictions supported it [2]. The learning process starts with dataset have used classification models like support vector
the gathering of knowledge by different means, from machine (SVM), logistic regression (LR) and Naïve
various resources. Then subsequent step is to organize the Bayesian (NB) [11]. Our novel model is implemented using
info, that's pre-process it so as to repair the info related supervised machine learning techniques in R for Pima
issues and to scale back the dimensionality of the space by Indian diabetes dataset to know patterns for knowledge
removing the irrelevant data (or selecting the info of discovery process in diabetes. This dataset discusses the
interest) [3]. Since the quantity of knowledge that's getting Pima Indian population’s medical history regarding the
used for learning is large, it's difficult for the system to onset of diabetes. It includes several independent variables
and one variable class value of diabetes in terms of 0 and
form decisions, so algorithms are designed using some 1. during this work, we've studied performance of
logic, probability, statistics, control theory etc. to 5 different models based upon linear kernel support vector

9
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

machine (SVM-linear), radial basis kernel support vector

machine (SVM-RBF), k-nearest neighbor (k-NN), artificial
neural network (ANN) and multifactor dimensionality
reduction (MDR) algorithms to detect diabetes in female
patients[12].
II. Related Material and Method
Dataset of female patients with minimum twenty-one-year
age of Pima Indian population has been taken from UCI
machine learning repository. This dataset is originally
owned by the National institute of diabetes and digestive
and kidney diseases. during this dataset there are total 768
instances classified into two classes: diabetic and non- Figure1: Essential Learning process to develop a predictive
diabetic with eight different risk factors: number of times model.
pregnant, plasma glucose concentration of two hours in an There are a couple of machine learning techniques which
oral glucose tolerance test, diastolic vital sign , triceps skin will be wont to implement the machine learning process.
fold thickness, two-hour serum insulin, body mass index, Learning techniques like supervised and unsupervised
diabetes pedigree function and age[13]. learning are most generally used. Supervised learning
We have investigated this diabetes dataset using powerful R technique is employed when the historical data is out
data manipulation tool Feature engineering is a crucial step there for a particular problem. The system is trained with the
in applications of machine learning process. Modern data inputs and respective responses then used for the prediction
sets are described with many attributes for practical machine of the response of latest data [17]. Common supervised
learning model building. Usually most of the attributes are approaches include artificial neural network, back
irrelevant to the supervised machine learning classification. propagation, decision tree, support vector machines and
Preprocessing phase of the data involved feature selection, Naïve Bayes classifier. Unsupervised learning technique is
removal of outliers and k-NN imputation to predict the employed when the available training data is unlabeled. The
missing values [14]. system isn't given any prior information or training [18].
There are various methods for handling the irrelevant and The algorithm has got to explore and identify the patterns
inconsistent data. during this work, we've selected the from the available data so as to form decisions or
attributes containing the highly correlated data. This step is predictions. Common unsupervised approaches include k-
implemented by feature selection method which may be means clustering, hierarchical clustering, and principle
done by either ‘manual method’ or Boruta wrapper component analysis and hidden-Markov model [19].
algorithm. Boruta package provides stable and unbiased Supervised machine learning algorithms are selected to
selection of important features from an data system whereas perform binary classification of diabetes dataset of Pima
manual method is error prone. So, feature selection has Indians. For predicting whether a patient is diabetic or
been through with the assistance of R package Boruta. the not, we've used five different algorithms: linear kernel and
tactic is out there as an R package [15]. This package radial basis function (RBF) kernel support vector machine
provides a convenient interface for machine learning (SVM), k-nearest neighbour (k-NN), artificial neural
algorithms. Boruta package is meant as a wrapper built network (ANN) and multifactor dimensionality reduction
around random forest classification algorithm (MDR) in our machine learning predictive models which
implemented within the R. Boruta wrapper is run on the details are given below:
Pima Indian dataset with all the attributes and it yielded four A. Support Vector Machine
attributes as important. With these attributes, the accuracy, Support vector machine (SVM) is employed in both
precision and recall and other parameters are calculated classification and regression. In SVM model, the info points
[16]. are represented on the space and are categorized into
groups and therefore the points with similar properties falls
in same group.

10
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

Figure 3: Representation of Radical Basis Function (RBF)

Kernel support vector machine.
Figure 2: Representation of Support Vector Machine Kernel function plays very important role to put data into
feature space. Mathematically, kernel trick (K) is defined as:
In linear SVM the given data set is taken into account as p-
dimensional vector which will be separated by maximum of ……….3
p-1 planes called hyper-planes [20]. These planes A Gaussian function is also known as Radial basis function
separate the info space or set the boundaries among the (RBF) kernel. In Figure 3, the input space separated by
info groups for classification or regression problems as in
feature map (Փ). By applying equation 1& 2 we get:
Figure 2. the simplest hyper-plane are often selected
among the amount of hyper-planes on the idea of distance ………….4
between the 2 classes it separates. The plane that has the By applying equation 3 in 4 we get new function, where N
utmost margin between the 2 classes is named the represents the trained data.
maximum-margin hyper-plane [21].
For n data points is defined as:
(X1,Y1)…….,(Xn,Yn)………………………….1 ………5
Where X1 is real vector and Y1 can be 1 or -1, representing C. k-Nearest Neigh bour (k-NN)
the class to which X1 belongs. k- Nearest neighbour may be a simple algorithm but
A hyper-plane can be constructed so as to maximize the yields excellent results. it's a lazy, nonparametric and
distance between the two classes y=1 and y=-1, is defined instance-based learning algorithm. This algorithm are
as: often utilized in both classification and regression problems.
W. X- b = 0 …………………………………2 In classification, k-NN is applied to seek out out the
category, to which new unlabeled object belongs. For this, a
Where W is normal vector and b is offset of hyper-plane ‘k’ is set (where k is number of neighs bours to be
along considered) which is usually odd and therefore the distance
. between the info points that are nearest to the objects is
B. Radial Basis Function (RBF) Kernel Support Vector calculated by the ways like Euclidean’s distance, Hamming
Machine distance, Manhattan distance or Minkowski distance. After
Support vector machine has proven its efficiency on linear calculating the space, ‘k’ nearest neighbours are selected the
data and nonlinear data. Radial base function has been resultant class of the new object is calculated on the idea of
implemented with this algorithm to classify nonlinear data the votes of the neighbours. The k-NN predicts the
[21]. result with high accuracy [22].
D. Artificial neural network (ANN)
Artificial neural network mimics the functionality of human
brain. It is often seen as a set of nodes called artificial
neurons. All of those nodes can transmit information to at
least one another. The neurons are often represented by
some state (0 or 1) and every node can also have some
weight assigned to them that defines its strength or
importance within the system. The structure of ANN is
split into layers of multiple nodes; the info travels from first

11
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

layer (input layer) and after passing through middle layers system in predicting the category variable. Several
(hidden layers) it reaches the output layer, every layer extensions of MDR are utilized in machine
transforms the info into some relevant information and learning. variety of them are fuzzy methods, odds ratio, risk
eventually gives the specified output [23].Transfer and scores, covariates and much more [24].
activation functions play important role in functioning of III. Predictive Model
neurons. The transfer function sums up all the weighted In our proposed predictive model (Figure 4), we've done
inputs as: preprocessing of data and different feature engineering
techniques to urge better results. Pre-processing involved
removal of outliers and k-NN imputation to predict the
……………6
missing values. Boruta wrapper algorithm is employed for
feature selection because it provides unbiased selection of
Where b is bias value, which is usually 1.
important features and unimportant features from a data
The activation function basically flattens the output of the
system. Training of data after feature engineering features
transfer function to a selected range. It might be either linear
a significant role in supervised learning. we've used highly
or nonlinear. the straightforward activation function is:
correlated variables for better outcomes [25]. input file, here
indicates to check data used for predict and confusion
matrix.
………………………………….7 Early diagnosis of diabetes are often helpful to enhance the
Since this function does not provide any limits to the data, standard of lifetime of patients and enhancement of
sigmoid function is used which can be expressed as: their anticipation. Supervised algorithms are wont
to develop different models for diabetes detection. gives a
view of the various machine learning models trained on
…………………8
Pima Indian diabetes dataset with optimized tuning
D. Multifactor Dimensionality Reduction (MDR)
parameters. All techniques of classification were
Multifactor dimensionality reduction is an approach for
experimented in “R” programming studio. the
locating and representing the consolidation of independent
info set are partitioned into two parts (training and testing).
variables which can somehow influence the dependent
We trained our model with 70% training data and tested
variables. it's basically designed to hunt out the interactions
with 30% remaining data. Five different
between the variables which can affect the output of the
models are developed using supervised learning to detect
system. It doesn't depend on parameters or the type of model
whether the patient is diabetic or nondiabetic. For this
getting used, which makes it better than the other traditional
purpose, linear kernel support vector machine (SVM-linear),
systems. It takes two or more attributes and converts it into
radial basis
one. This conversion changes the space representation of
data. This results in improvement of the performance of

12
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

Figure 4: Framework for evaluating Predictive Model.

function (RBF) kernel support vector machine, k-NN, ANN like random guessing [20].
and MDR algorithm are used. To diagnose diabetes for Pima From which represents different parameter for evaluating all
Indian population, performance of all the five different the models, it's found that accuracy of linear kernel SVM
models are evaluated upon parameters like precision, recall, model is 0.89. For radial basis function kernel SVM,
area under curve (AUC) and F1 score. so on avoid problem accuracy is 0.84. For k-NN model accuracy is found to 0.88,
of over fitting and under fitting, tenfold cross validation is while for ANN it's 0.86. Accuracy of MDR based model is
completed. Accuracy indicates our classifier is how often found to be 0.83. Recall or sensitivity which indicates
correct in diagnosis of whether patient is diabetic or not. correctly identified proportion of actual positives diabetic
Precision has been used to determine classifier’s ability cases for SVM-linear model is 0.87 and for SVM-RBF it's
provides correct positive predictions of diabetes. Recall or 0.83. For k-NN, ANN and MDR based models recall values
sensitivity is used in our work to hunt out the proportion of are found to be 0.90, 0.88 and 0.87 respectively. Precision
actual positive cases of diabetes correctly identified by the of SVM-linear, SVM-RBF, k-NN, ANN and MDR models
classifier used. Specificity is getting want is found to be 0.88, 0.85, 0.87, 0.85 and 0.82 respectively.
to compute classifier’s capability of determining negative F1 score of SVM-linear, SVM-RBF, k-NN ANN and MDR
cases of diabetes. because the weighted average of precision models is found to be 0.87, 0.83, 0.88, 0.86 and 0.84
and recall provides F1 score so this score takes into respectively. we've calculated area under the curve
account of both. The classifiers of F1 score near 1 are (AUC) to measure performance of our models. it's found
termed as best one [18]. Receiver operating characteristic that AUC of SVM linear model is 0.90 while for SVM-
(ROC) curve could also be a documented tool to RBF, k-NN, ANN and MDR model the values are
ascertain performance of a binary classifier algorithm [19]. respectively 0.85, 0.92 0.88 and 0.89. So, from above
it's plot of true positive rate against false positive rate studies, it is often said that on the thought of all the
because the edge for assigning observations are varied to a parameters SVM-linear and k-NN are two best models to
selected class. Area under curve (AUC) value of a classifier hunt out that whether patient is diabetic or not. Further it is
may lie between 0.5 to1. Values below 0.50 indicated for a often seen that accuracy and precision of SVM- linear
gaggle of random data which couldn't distinguish between model are higher as compared to k-NN model. But recall
true and false. An optimal classifier has value of area under and F1 score of k-NN model are above SVM- linear model.
the curve (AUC) near 1.0. If it's near 0.5 then this value is If we examine our diabetic dataset carefully, it's found to be

13
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

an example of imbalanced class with 500 negative instances and testing). We trained our model with 70% training data
and 268 positive instances giving an imbalance ratio of 1.87. and tested with 30% remaining data. Five different
Accuracy alone won't provide a very good indication of models are developed using supervised learning to detect
performance of a binary classifier just in case of imbalanced whether the patient is diabetic or nondiabetic. For this
class. F1 score provides better insight into classifier purpose, linear kernel support vector machine (SVM-linear),
performance just in case of uneven class distribution radial basis upon parameters like precision, recall, area
because it. provides balance between precision and recall under curve (AUC) and F1 score. so on avoid problem of
[21, 25]. So, during this case F1 score should even be taken over fitting and under fitting, tenfold cross validation is
care of. Further it is often seen that AUC value of SVM- completed optimal classifier has value of area under the
linear and k-NN model are 0.90 and 0.92 respectively. curve near 1.0. If it's near 0.5 then this value is like random
IV. Patient demographics guessing [20]. Accuracy indicates our classifier is how of a
The dataset has been taken from This dataset consisted of classifier may lie between 0.5 to1. Values below varied to a
768 female patients, a minimum of 21 years old of Pima specific class. Area under
Indian heritage, diabetes diagnoses (diabetic or
control). there have been 268 cases of diabetic patients and
500 cases of control patients. This dataset contain 9
variables: (1) number of times pregnant, (2) plasma glucose
concentration-a two hour in an oral glucose tolerance test, upon parameters like precision, recall, area under curve
(3) diastolic vital sign (mm Hg), (4) triceps skin fold (AUC) and F1 score. so as to avoid problem of over fitting
thickness (mm), (5) 2-hours serum insulin (mu U/ml), (6) and under fitting, tenfold cross validation is completed
body mass index (weight in kg/ (height in m)2), (7) diabetes Accuracy indicates our classifier is how often correct in
pedigree function, (8) age (in years), (9) class variable diagnosis of whether patient is diabetic or not. Precision has
(diabetic or control). during this dataset ﬁve patient have been wonted to determine classifier’s ability provides
zero blood sugar level, diastolic vital sign is zero for 35 correct positive predictions of diabetes. Recall or
patients, 27 patients have zero body mass index, 227 sensitivity is employed in our work to seek out the
patients have zero skin fold thickness and 374 patients have proportion of actual positive cases of diabetes correctly
zero serum insulin level. However, these zero values were identified by the classifier used. Specificity is getting
meaningless. used to work out classifier’s capability of determining
Attribute No. Attribute Variable Type
A1 Pregnancy Integer 0-17
A2 glucose Real 0-199
A3 blood pressure Real 0-122
A4 skin Thickness Real 0-99
A5 insulin Real 0-846
A6 Body mass index (BMI) Real 0-67.1
A7 Diabetes pedigree Function Real 0.078-2.42
A8 Age integer 21-81
Class binary 1=Tested positive for diabetes
0=Tested Negative for diabetes
negative cases of diabetes. because the weighted average of
Table 1: parameter of different Dataset precision and recall provides F1 score so this score
takes under consideration of both. The classifiers of F1
V.RESULT: Early diagnosis of diabetes are often helpful to score near 1 are termed as best one [18]. Receiver operating
enhance the standard of lifetime of patients and characteristic (ROC) curve may be a documented tool to
enhancement of their anticipation. Supervised see performance of a binary classifier algorithm
algorithms are wont to develop different models for diabetes [19]. it's plot of true positive rate against false positive
detection. Table 2 gives a view of the various machine rate because the threshold for assigning observations are
learning models trained on Pima Indian diabetes dataset
with optimized tuning parameters. All techniques of
classification were experimented in “R” programming
studio. the info set are partitioned into two parts (training curve (AUC) value 0.50 indicated for a group of random

14
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

data which couldn't distinguish between true and false. An comparable the prevailing approaches while comprehensive
often correct in diagnosis of whether patient is diabetic or analysis is that the novelty of the system. Some statistical
not. Precision has been wonted to work out classifier’s information of the variables of the info From Table which
ability provides correct positive predictions of diabetes. represents different parameter for evaluating all the
Recall or sensitivity is used in our work to hunt out the models, it's found that accuracy of linear kernel SVM model
proportion of actual positive cases of diabetes correctly is 0.89. For radial basis function kernel SVM, accuracy is
identified by the classifier used. Specificity is getting want 0.84. For k-NN model
to compute classifier’s capability of determining negative
cases of diabetes. because the weighted average of precision
and recall provides F1 score so this score takes into
account of both. The classifiers of F1 score near 1 are accuracy is found to 0.88, while for ANN it's 0.86.
termed as best one [18]. Receiver operating characteristic Accuracy of MDR based model is found to be 0.83. Recall
(ROC) curve could also be a documented tool to or sensitivity which indicates correctly identified proportion
ascertain performance of a binary classifier algorithm [19]. of actual positives diabetic cases for SVM-linear model is
it's plot of true positive rate against false positive rate 0.87 and for SVM-RBF it's 0.83. For k-NN, ANN and MDR
because the edge for assigning observations are varied to a based models recall values are found to be 0.90, 0.88 and
selected class. Area under curve (AUC) value of a classifier 0.87 respectively. Precision of SVM-linear, SVM-RBF, k-
may lie between 0.5 to1. Values below 0.50 indicated for a NN, ANN and MDR models is found to be 0.88, 0.85, 0.87,
gaggle of random data which couldn't distinguish between 0.85 and 0.82 respectively. F1 score of SVM-linear, SVM-
true and false. An optimal classifier has value of area under RBF, k-NN ANN and MDR models is found to be 0.87,
the curve near 1.0. If it's near 0.5 then this value is like 0.83, 0.88, 0.86 and 0.84 respectively. we've calculated area
random guessing [20]. under the curve (AUC) to live performance of our
We adapted the missing value problem using the median models. it's found that AUC of SVM linear model is 0.90
approach and it offered the simplicity within the process while for SVM-RBF, k-NN, ANN and MDR model the
during our classification paradigm. Note that, there a several values are respectively 0.85, 0.92 0.88 and 0.89. So, from
methods for approaching this issue and within the above studies, it are often said that on the idea of all the
present scope of this paper, we've simplified this using the parameters SVM-linear and k-NN are two best models to
present scope of this paper, we've simplified this using the seek out that whether patient is diabetic or not. Further it are
median-based approach Note that it also depends upon the often seen that accuracy and precision of SVM- linear
info types and therefore the density of the info. Since our model are higher as compared to k-NN model. But recall
data is simple, our strategy yields result which are and F1 score of k-NN model are above SVM- linear model.

Table 2(a): Experiment Predictive Modelling and Analytics

for Diabetes

Table 2(b): Predictive Modelling and Analytics for Diabetes

15
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

Table 2(c): Predictive Modelling and Analytics for Diabetes

Analytics for Diabetes using a Machine Learning

1
0.8
0.6
0.4
0.2
0
-0.2

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome

Figure 5: Predictive Modelling and Analytics for Diabetes.

16
ISSN (Online) 2394-2320

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020
an example of imbalanced class with 500 negative 2(167) (2011) 2- 7.
instances and 268 positive instances giving an imbalance [2] K. Papatheodorou, M. Banach, M. Edmonds, N.
ratio of 1.87. Accuracy alone may not provide a very Papanas, D. Papazoglou, Complications of Diabetes, J.
good indication of performance of a binary classifier in of Diabetes Res. 2015 (2015), 1-5.
case of imbalanced class. F1 score provides better insight [3] L. Mamykinaa, et al., Personal discovery in diabetes
into classifier performance in case of uneven class self-management: Discovering cause and effect using
distribution as it provides balance between precision and self-monitoring data, J. Biomd. Informat. 76 (2017) 1–8.
recall [21, 25]. So, in this case F1 score should also be [4] A. Nather, C. S. Bee, C. Y. Huak, J. L.L. Chew, C. B.
taken care of. Further it can be seen that AUC value of Lin, S. Neo, E. Y. Sim, Epidemiology of diabetic foot
SVM-linear and k-NN model are 0.90 and 0.92 problems and predictive factors for limb loss, J. Diab.
respectively and its Complic. 22 (2) (2008) 77-82.
[5] Shiliang Sun, A survey of multi-view machine
learning, Neural Comput. & Applic. 23 (7–8) (2013)
VI. CONCLUSION AND FUTURE WORK 2031–2038.
[6] M. I. Jordan, M. Mitchell, Machine learning: Trends,
We have developed five different models to detect perspectives, and prospects, Science. 349 (6245) (2015)
diabetes using linear kernel support vector machine 255-260.
(SVM-linear), radial basis kernel, support vector machine [7] P. Sattigeri, J. J. Thiagarajan, M. Shah, K.N.
(SVM-RBF), k-NN, ANN and MDR algorithms. Feature Ramamurthy, A. Spanias, A scalable feature learning and
selection of dataset is done with the help of Boruta tag prediction framework for natural environment sounds
wrapper algorithm which provides unbiased selection of , Signals Syst. and Computers 48th Asilomar Conference
important features. All the models are evaluated on the on Signals, Systems and Computers.( 2014) 17791783.
basis of different parameters- accuracy, recall, precision, [8] M.W. Libbrecht, W.S. Noble, Machine learning
F1 score, and AUC. The experimental results suggested applications in genetics and genomics." Nature Reviews
that all the models achieved good results; SVM-linear Genetics 16, no. 6 (2015): 321-332.
model provides best accuracy of 0.89 and precision of [9] K. Kourou, T. P.Exarchos, K. P.Exarchos, M.
0.88 for prediction of diabetes as compared to other V.Karamouzis, D. I.Fotiadis, Machine learning
models used. On the other hand, k-NN model provided applications in cancer prognosis and prediction,
best recall and F1 score of 0.90 and 0.88. As our dataset is Computation. and Struct. Biotech. J. 13 ( 2015) 8-17.
an example of imbalanced class, F1 score may provide [10]E. M. Hashem, M. S. Mabrouk, A study of support
better insight into performance of our models. F1 score vector machine algorithm for liver disease diagnosis.
provides balance between precision and recall. Further it Amer. J. of Intell. Sys. 4(1) (2014) 9-14. [11]W.
can be seen that AUC value of SVM- linear and k-NN Mumtaz, S. Saad Azhar Ali, M. Azhar, M. Yasin and A.
model is 0.90 and 0.92 respectively. Such a high value of Saeed Malik, A machine learning framework involving
AUC indicates that both SVM- linear and k-NN are EEG-based functional connectivity to diagnose major
optimal classifiers for diabetic dataset. So, from above depressive disorder (MDD)." Medical & biological
studies, it can be said that on the basis of all the engineering & computing (2017): 114. [12]D. K.
parameters linear kernel support vector machine (SVM- Chaturvedi, Soft Computing Techniques and Their
linear) and k-NN are two best models to find that whether Applications, In Mathematical Models, Methods and
patient is diabetic or not. This work also suggests that Applications, 31-40. Springer Singapore, 2015. [13]A.
Boruta wrapper algorithm can be used for feature Tettamanzi, M. Tomassini. Soft computing: integrating
selection. The experimental results indicated that using evolutionary, neural, and fuzzy systems. Springer Science
the Boruta wrapper features selection algorithm is better & Business Media, 2013. [14]M. A. Hearst, S. T. Dumais,
than choosing the attributes manually with less medical E. Osuna, J. Platt and B. Scholkopf, Support vector
domain knowledge. Thus, with a limited number of machines, IEEE Intell. Syst. and their Appl. 13 (4) (1998)
parameters, through the Boruta feature selection 18-28. [15]G. B. Huang, Q. Y. Zhu, C. K. Siew, Extreme
algorithm we have achieved higher accuracy and learning machine: theory and applications. Neurocomput.
precision. 70 (1) (2006), 489-501. [16]S. A. Dudani, The Distance-
References Weighted k-Nearest-Neighbor Rule, IEEE Trans. on
[1] D. Soumya and B Srilatha, Late stage complications Syst., Man, and Cybernet. SMC-6 (4) (1976) 325-327,
of diabetes and insulin resistance, J Diabetes Metab. [17]T. Kohonen, An introduction to neural computing.

17
ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

(IJERCSE)
Vol 7, Issue 10, October 2020

Neural networks 1, no. 1 (1988): 3-16. [18]Z. C. Lipton,

C. Elkan,B. Naryanaswamy, Optimal thresholding of
classifiers to maximize F1 measure. in Joint European
Conference on Machine Learning and Knowledge
Discovery in Databases, Springer, Berlin, Heidelberg.
(2014) 225-239. [19]L. B Ware, et al., Biomarkers of
lung epithelial injury and inflammation distinguish severe
sepsis patients with acute respiratory distress syndrome,
Crit. Care. 17 (5) (2013) 1-7 [20] M. E. Rice, G. T.
Harris, Comparing effect sizes in follow-up studies: ROC
Area, Cohen's d, and r, Law Hum Behav. 29 (5) (2005)
615-620.
[21]A. Ali, S. M. Shamsuddin, A. L. Ralescu,
Classification with class imbalance problem: A Review,
Int. J. Advan. Soft Compu. Appl . 5 (3) (2013) 176-204
[22] S. Park, D. Choi, M. Kim, W. Cha, C. Kim, I.
C. Moon, Identifying prescription patterns with a topic
model of diseases and medications, J. of Biomed.
Informat. 75 (2017) 35-47.
[23] Kaur, H., Lechman, E. and Marszk, A. (2017),
Catalyzing Development through ICT Adoption: The
Developing World Experience, Springer Publishers,
Switzerland.
[24] Kaur, H., Chauhan, R., and Ahmed, Z., Role of data
mining in establishing strategic policies for the efficient
management of healthcare system–a case study from
Washington DC area using retrospective discharge data.
BMC Health Services Research. 12(S1):P12, 2012.
[25] J. Li, O. Arandjelovic, Glycaemic index prediction:
A pilot study of data linkage challenges and the
application of machine learning, in: IEEE EMBS Int.
Conf. on Biomed. & Health Informat. (BHI), Orlando,
FL, (2017)357-360.

Nikhil Major Project
No ratings yet
Nikhil Major Project
60 pages
3rd Year Syllabus 2020-21
No ratings yet
3rd Year Syllabus 2020-21
36 pages
Learning Kernel Classifiers. Theory and Algorithms
100% (2)
Learning Kernel Classifiers. Theory and Algorithms
371 pages
Synopsis - Diabetes Prediction
No ratings yet
Synopsis - Diabetes Prediction
28 pages
Fault Prediction of Transformer Using Machine Learning and DGA
No ratings yet
Fault Prediction of Transformer Using Machine Learning and DGA
5 pages
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
No ratings yet
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
14 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Project
No ratings yet
Project
16 pages
Barakat
No ratings yet
Barakat
7 pages
Advanced Scikit Learn
No ratings yet
Advanced Scikit Learn
98 pages
Prediction of Diabetes Using Artificial Neural Networks: A Review
No ratings yet
Prediction of Diabetes Using Artificial Neural Networks: A Review
6 pages
Diabetes Mellitus Prediction and Classifier Comparitive Study
No ratings yet
Diabetes Mellitus Prediction and Classifier Comparitive Study
7 pages
Analysis and Prediction of Diabetes Mell PDF
No ratings yet
Analysis and Prediction of Diabetes Mell PDF
10 pages
Tarp Final
No ratings yet
Tarp Final
24 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
Applied Computing and Informatics: Harleen Kaur, Vinita Kumari
No ratings yet
Applied Computing and Informatics: Harleen Kaur, Vinita Kumari
6 pages
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
No ratings yet
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
12 pages
Diabetes Prediction Using Supervised Machine Learning
No ratings yet
Diabetes Prediction Using Supervised Machine Learning
10 pages
Predictionof Diabetesusing Machine Learning
No ratings yet
Predictionof Diabetesusing Machine Learning
6 pages
V5i9 0240
No ratings yet
V5i9 0240
4 pages
10.3934 Publichealth.2023030
No ratings yet
10.3934 Publichealth.2023030
21 pages
Efficient Binary Classifier For Prediction of Diabetes Using Data Preprocessing and Support Vector Machine
No ratings yet
Efficient Binary Classifier For Prediction of Diabetes Using Data Preprocessing and Support Vector Machine
2 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
IEEE Paper 1
No ratings yet
IEEE Paper 1
5 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Ijs DR 2205103
No ratings yet
Ijs DR 2205103
4 pages
A Survey On Diabetic Prediction System Using Machine Learning
No ratings yet
A Survey On Diabetic Prediction System Using Machine Learning
5 pages
Diabetes Prediction Using Machine Learning KNN - Algorithm Technique
No ratings yet
Diabetes Prediction Using Machine Learning KNN - Algorithm Technique
4 pages
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
No ratings yet
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
5 pages
Handwriting Style Classification: Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen
No ratings yet
Handwriting Style Classification: Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen
20 pages
A Study of Machine Learning Algorithms On Email Spam Classification
No ratings yet
A Study of Machine Learning Algorithms On Email Spam Classification
10 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
Enabling AI in Future Wireless Networks: A Data Life Cycle Perspective
No ratings yet
Enabling AI in Future Wireless Networks: A Data Life Cycle Perspective
43 pages
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
No ratings yet
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
12 pages
Comparison of ML Techniques
No ratings yet
Comparison of ML Techniques
16 pages
Classification of Diabetes Mellitus Using Machine Learning Techniques
No ratings yet
Classification of Diabetes Mellitus Using Machine Learning Techniques
4 pages
Research Paper
No ratings yet
Research Paper
5 pages
Passion Fruit Disease Detection Using Image Processing: March 2019
No ratings yet
Passion Fruit Disease Detection Using Image Processing: March 2019
9 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Analyzing The Behavior of Different Classification Algorithms in Diabetes Prediction
No ratings yet
Analyzing The Behavior of Different Classification Algorithms in Diabetes Prediction
6 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
No ratings yet
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
24 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
CS229M STATS214 Fall2021 Logistics PDF
No ratings yet
CS229M STATS214 Fall2021 Logistics PDF
8 pages
3 Journal
No ratings yet
3 Journal
9 pages
Diabetes Prediction Using Colab Notebook Based Mac
No ratings yet
Diabetes Prediction Using Colab Notebook Based Mac
6 pages
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
No ratings yet
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
19 pages
Interacting Particle Solutions of Fokker-Planck Equations Through Gradient-Log-Density Estimation
No ratings yet
Interacting Particle Solutions of Fokker-Planck Equations Through Gradient-Log-Density Estimation
34 pages
MetaFraud A Meta-Learning Framework For Detecting Financial Fraud@2012
No ratings yet
MetaFraud A Meta-Learning Framework For Detecting Financial Fraud@2012
37 pages
Diabetes Prediction Using Machine Learning Algorithms and Ontology
No ratings yet
Diabetes Prediction Using Machine Learning Algorithms and Ontology
19 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
24 pages
Ijarcce 2020 9712
No ratings yet
Ijarcce 2020 9712
7 pages
2020 XIA Soh Li-Ion Incremental Capacity
No ratings yet
2020 XIA Soh Li-Ion Incremental Capacity
12 pages
Diabetes Deep Learning
No ratings yet
Diabetes Deep Learning
11 pages
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
No ratings yet
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
5 pages
Dinesh Paper On Diabetes Mellitus (9%)
No ratings yet
Dinesh Paper On Diabetes Mellitus (9%)
8 pages
AKTU ME 4th Yr
No ratings yet
AKTU ME 4th Yr
21 pages
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
No ratings yet
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
12 pages
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
No ratings yet
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
12 pages
Unsupervised Feature Learning and Deep Learning - A Review and New Perspectives Author Yoshua Bengio, Aaron Courville, and Pascal Vincent
No ratings yet
Unsupervised Feature Learning and Deep Learning - A Review and New Perspectives Author Yoshua Bengio, Aaron Courville, and Pascal Vincent
30 pages
Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction
No ratings yet
Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction
4 pages
Data-Driven Prognostic Scheme For Bearings Based On A Novel Health Indicator and Gated Recurrent Unit Network
No ratings yet
Data-Driven Prognostic Scheme For Bearings Based On A Novel Health Indicator and Gated Recurrent Unit Network
11 pages
Final
No ratings yet
Final
44 pages
A Combined EBSD and Machine Learning Study
No ratings yet
A Combined EBSD and Machine Learning Study
15 pages
Literature Survey Paper On Comparative Analysis of Diabetics Prediction Systems Using Machine Learning Algorithms
No ratings yet
Literature Survey Paper On Comparative Analysis of Diabetics Prediction Systems Using Machine Learning Algorithms
4 pages
Paper 1
No ratings yet
Paper 1
9 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
Multi-View Clustering A Survey
No ratings yet
Multi-View Clustering A Survey
25 pages
Brain-Based Computer Interfaces in Virtual Reality
No ratings yet
Brain-Based Computer Interfaces in Virtual Reality
6 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
Paper 2
No ratings yet
Paper 2
5 pages
Independent Project
No ratings yet
Independent Project
10 pages
Improvement of Support Vector Machine For Predicting Diabetes Mellitus With Machine Learning Approach
No ratings yet
Improvement of Support Vector Machine For Predicting Diabetes Mellitus With Machine Learning Approach
12 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
426 pages
10 22399-Ijcesen 1185474-2693654
No ratings yet
10 22399-Ijcesen 1185474-2693654
6 pages
Support Vector Machines 1639601280
No ratings yet
Support Vector Machines 1639601280
16 pages
A Deep Learning-Based Experiment On Forest Wildfire Detection in Machine Vision Course
No ratings yet
A Deep Learning-Based Experiment On Forest Wildfire Detection in Machine Vision Course
11 pages
Performance Analysis of The Support Vector Machine Algorithm in Predicting Rain Potential in DKI Jakarta
No ratings yet
Performance Analysis of The Support Vector Machine Algorithm in Predicting Rain Potential in DKI Jakarta
6 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
DDPIS Diabetes Disease Prediction by Improvising
No ratings yet
DDPIS Diabetes Disease Prediction by Improvising
11 pages
Prediction of Diabetes Using R
No ratings yet
Prediction of Diabetes Using R
6 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
1 s2.0 S2666307421000048 Main
No ratings yet
1 s2.0 S2666307421000048 Main
7 pages
Analytics of Machine Learning-Based Algorithms For Text Classification
No ratings yet
Analytics of Machine Learning-Based Algorithms For Text Classification
11 pages
Course - Machine Learning A-Z - AI, Python & R + ChatGPT Prize (2025) - Udemy Business
No ratings yet
Course - Machine Learning A-Z - AI, Python & R + ChatGPT Prize (2025) - Udemy Business
18 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Introduction to Machine Learning and Neural Classification
From Everand
Introduction to Machine Learning and Neural Classification
Trilokesh Khatri
No ratings yet

Ext 74513

Uploaded by

Ext 74513

Uploaded by

ISSN (Online) 2394-6849

International Journal of Engineering Research in Computer Science and Engineering

Predictive Modelling and Analytics for Diabetes

International Journal of Engineering Research in Computer Science and Engineering

machine (SVM-linear), radial basis kernel support vector

International Journal of Engineering Research in Computer Science and Engineering

Figure 3: Representation of Radical Basis Function (RBF)

International Journal of Engineering Research in Computer Science and Engineering

International Journal of Engineering Research in Computer Science and Engineering

Figure 4: Framework for evaluating Predictive Model.

International Journal of Engineering Research in Computer Science and Engineering

International Journal of Engineering Research in Computer Science and Engineering

Table 2(a): Experiment Predictive Modelling and Analytics

Table 2(b): Predictive Modelling and Analytics for Diabetes

International Journal of Engineering Research in Computer Science and Engineering

Table 2(c): Predictive Modelling and Analytics for Diabetes

Analytics for Diabetes using a Machine Learning

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome

Figure 5: Predictive Modelling and Analytics for Diabetes.

International Journal of Engineering Research in Computer Science and Engineering

International Journal of Engineering Research in Computer Science and Engineering

Neural networks 1, no. 1 (1988): 3-16. [18]Z. C. Lipton,

You might also like