0% found this document useful (0 votes)
89 views

Diabetes Prediction Using Machine Learning

Diabetes has been recorded as a serious glob- al health issue today. It's a long-term metabolic disease that takes place when blood glucose levels elevate in the human body. Early and accurate diabetes diagnosis is essential for managing the condition precisely and will prevent complications quickly. This count proposes a comprehensive and effective machine-learning method for detecting and treating diabetes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Diabetes Prediction Using Machine Learning

Diabetes has been recorded as a serious glob- al health issue today. It's a long-term metabolic disease that takes place when blood glucose levels elevate in the human body. Early and accurate diabetes diagnosis is essential for managing the condition precisely and will prevent complications quickly. This count proposes a comprehensive and effective machine-learning method for detecting and treating diabetes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Diabetes Prediction using Machine Learning


1 2
Sahil Kumar Suman Natasha Sharma
Department of Computer Science and Engineering Department of Computer Science and Engineering
Chandigarh University, Punjab, India Chandigarh University, Punjab, India

3 4
Udeshna Saikia Dhiti
Department of Computer Science and Engineering Department of Computer Science and Engineering
Chandigarh University, Punjab, India Chandigarh University, Punjab, India

5 6
Rahul Chauhan Nandini Singh
Department of Computer Science and Engineering Department of Computer Science and Engineering
Chandigarh University, Punjab, India Chandigarh University, Punjab, India

Abstract:- Diabetes has been recorded as a serious glob- I. INTRODUCTION


al health issue today. It's a long-term metabolic disease
that takes place when blood glucose levels elevate in the Millions of people worldwide are adversely affected by
human body. Early and accurate diabetes diagnosis is diabetes mellitus, a chronic metabolic disease marked by
essential for managing the condition precisely and will persistent hyperglycemia and has become a major public
prevent complications quickly. This count proposes a health concern. Early intervention and specified treatment
comprehensive and effective machine-learning method plans are made possible by punctual and accurate diabetes
for detecting and treating diabetes. The dataset that was prediction, which is needed in proactive healthcare manage-
used contains many clinical and demographic variables ment [1]. The artificial intelligence field of machine learn-
such as age, BMI, family history and various blood test ing, which is evolving quickly, has shown great promising
results. To identify the most relevant variables, the tech- results in the area of healthcare, especially in terms of diag-
nique prioritizes the data to control for missing values nosing and predicting complicated medical conditions. To
and to normalize features. The next stepis to go through a make a more proactive and preventive approach to
strict feature selection process. For the training and vali- healthcare, this study focuses on utilizing machine learning
dation of the model, SVM, RFM, Logistic Regression, techniques to forecast when diabetes will manifest.
and Support Vector Machines (SVM) are just a few of
the machine learning algorithms that are employed. The Age-long diabetes analysis techniques often depend on
performance of each of these algorithms is checked using clinical risk factors and statistical models. These methods
metrics like accuracy, redundancy, uniqueness, and re- might not have the sensitivity and specificity required for
ceiver operating characteristic (ROC) curve area. An en- accurately and precisely identifying people who are at risk,
semble perspective is also explored to combine the benefits though [2]. A favourable substitute is given by machine learn-
of multiple models and increase overall predicting power. ing, which can identify patterns and relationships in large
The recommended model is tested on various test da- and varied datasets. Ma- chine learning models can analyze
tasets for assessment purposes of its generalizability. The a wide variety of patient-related features, like clinical histo-
main purpose of the project is to create a robust and ry, laboratory results, and demo-graphic data, RTML by utiliz-
trustworthy diabetes detection tool that can be used in ing highly accessible and developedalgorithms.
clinical settings to aid medical professionals with ad-
vanced diagnosis and individualized treatment planning. The purpose of this research is to develop a trustwor-
The results demonstrate growing performance and the thy and precise diabetes prediction model. That will assist in
potential for machine learning to increase diabetes de- recognizing the disease early and enlightening on its main
tection accuracy. The importance of the proposed model contributing factors. By permitting healthcare professionals
to subtle patterns in different patient data sets suggests to apply interventions and preventive decisions customized
that it could apply to a large range of demographics. to each patient's unique profile, these models have the po-
This work lays the root level for future analysis into en- tential to completely bring a change in the way that
hancing and expanding the capabilities of diabetes detec- healthcare is provided.
tion models, which will advance ongoing efforts to apply
machine learning to healthcare applications. This research is an attempt to mark the limitations of
current prediction models and contribute to the ongoing ef-
Keywords:- Diabetes, Early Diagnosis, Machine Learning, forts to upgrade and update the accuracy and reliability of
Ac-Curacy, Healthcare Applications. early diabetes detection [3]. The final objective is to make a
better future for every patient in humankind.

IJISRT23NOV2342 www.ijisrt.com 2134


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
II. MACHINE LEARNING TECHNIQUES H. Ensemble Learning:
Ensembled approaches improve performance by joining
Utilizing machine learning methodologies has been se- predictions from several models. By applying strategies like
verely benefiting diabetes detection in a never-seen way be- bagging and boosting to different base models, the diabetes
fore. It can analyze large and great numbers of datasets of detection system's accuracy and generalizability can be im-
various types such as insulin and non-insulin, family history proved.
etc. Below are the ML techniques that are being highly used
for prediction of diseases worldwide: I. Feature Selection Techniques:
Recursive feature elimination (RFE) and feature im-
A. Logistic Regression: portance from tree-based models aretwo feature selection tech-
This binary classification algorithm simulates the prob- niques that assist in determining the most pertinent features
ability that an instance belongs to a particular class. Well, it for diabetes prediction, thereby lowering dimensionality and
is frequently used in diabetes detection to calculate a per- possibly enhancing model performance.
son's risk of developing diabetes based on a variety of input
features. Based on the objectives of the diabetes detection task
and the characteristics of the dataset, these machine-learning
B. Decision Trees: techniques can be used singly or in combination. The selec-
Decision trees are data structures that are ensemble trees, tion of the algorithm is influenced by variables such as the
with each node representing a choice made in response to a type of data, requirements for interpretability, and the in-
specific feature [4]. These are employed in the detection of tended ratio of sensitivity to specificity in the prediction of
diabetes to establish classification rules, which facilitate diabetes.
easy interpretation and comprehension of the design- deci-
sion-making process. III. IMPORTANCE OF FEATURE SPECIFICA-
TION
C. Random Forests:
During training, the Random Forest learning technique Ensuring effective creativity in machine learning for di-
builds multiple decision trees and outputs the class mode. It abetes detection is very crucial and for that, we need Feature
is renowned for being flexible and able to manage sizable Specification. Specific characterization related to the relevant
datasets with a variety of properties, which qualifies it for and targeted disease is necessary for differentiating between
diabetes prediction. healthy and diseased states [6]. The model's capacity for
accurate prediction is directly impacted by the discrimina-
D. Support Vector Machines (SVM): tive strength of its features, and robustness, interpretability,
Strong classification performance is achieved by SVM, and dimensionality reduction are all enhanced by meticulous
an algorithm that handles both linear and non-linear data. feature selection. Additionally, while making healthcare ap-
SVM looks for the hyper-plane that best divides data points plications ethics and prejudice of an individual must always
into distinct classes according to their features to detect dia- be kept in mind being an example of careful feature defini-
betes. tion and to lessen possible inequalities in prediction [7]. For
achieving all critical aspects of properties of a good model,
E. Neural Networks: the machine learning techniques also must justify all quali-
The use of profound learning, especially neural sys- ties. Following are the machine learning techniques, we use
tems, has grown in popularity for the diagnosis of diabetes. for diabetes detection in our study:
Deep neural networks and multi-layer perceptrons (MLPs)
are capable of identifying complex patterns in data that may A. Random Forest:
be linked to an increased risk of diabetes by capturing intri- Using an ensemble learning approach called Random
cate relationships. Forest, one may increase overall forecast curacy and resili-
ence by merging numerous decision trees' predictions. The
F. K-Nearest Neighbors (KNN): technique generates a large number of decision trees during
KNN is a non-parametric learning algorithm that is in- the training phase. In the context of feature definition within
stance-based a fresh occurrence classified by to feature a Random Forest model [8]. To provide variation among the
space's majority class of its k-nearest neighbors. KNN pre- different trees, a randomly chosen subset of the dataset's
dicts an individual's diabetes status by identifying and characters is used in the construction of each tree. This un-
grouping similar individuals. predictability promotes generalization, reduces overfitting,
and captures different facets of the data [9]. In a Random
G. Naive Bayes: Forest model, determining each input feature's relevance to
The algorithm based on probability The Bayes theorem the prediction task is a necessary step in the feature specifi-
is the foundation of naive Bayes. Well, it is computationally cation process. Furthermore, Random Forest is a more opti-
efficient because it assumes that features are conditionally mal option for a variety of datasets due to its versatility in
independent [5]. It is used in the detection of diabetes to handling both numerical and categorical features without
calculate the probability that an individual possesses the needing a lot of preprocessing.
illness based on characteristics that are seen.

IJISRT23NOV2342 www.ijisrt.com 2135


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
B. Linear Regression: B. UCI Diabetes Dataset:
Regardless of being a fundamental statistical tech- A set of data for diabetes identification is available from
nique, linear regression has particular advantages when it the UCI Machine Learning Repository and includes infor-
comes to diabetes prediction. Given that interpretability is mation on sex, BMI, average blood pressure, BMI, age, and
frequently just as important as predictive accuracy in six blood serum parameters. values.
healthcare scenarios, healthcare professionals can easily un-
derstand how each feature affects the predicted outcome C. Diabetes Dataset from Kaggle:
thanks to the simple nature of linear regression [10]. Each Kaggle, a platform that hosts data science contests,
feature's coefficient from linear regression can be readily maintains datasets for the identification of hyperglycemia
understood and shows the direction and strength of each [11]. These datasets may vary in terms ofthe quantity and
feature's influence on diabetes prediction. Linear regression qualities they cover.
is a useful tool in healthcare settings where transparent deci-
sion-making is essential, especially if the goal is to establish D. National Institute of Diabetes and Digestive and Kid-
a clear understanding of the impact of individual features on ney Diseases (NIDDK) Dataset:
diabetes risk. NIDDK provides access to datasetson hypertension re-
search, including clinical information suitablefor machine
IV. USING RTML DATASET learning applications.

As of January 2022, to the best of mine, no well- I suggest looking through the most recent sources, papers
defined or well-recognized dataset referred to as "RTML" on the topic, or the particular platform or organization relat-
has been created specifically to diagnose diabetes. However, ed to "RTML" for more up-to-date and trustworthy data if
since then, other data sets could have been produced, or "RTML" refers to a specific collection of data added after
RTML might be related to a particular dataset within a par- my previous update, or inside a certain context or organiza-
ticular context or organization. A variety of popular data- tion [12]. Furthermore, a range of datasets are frequently
bases are often employed in research and development for available on websites like Kaggle, the UCI Deep Learning
machine learning-based assessments of diabetes. Some of Repository, and others; hence, searching these sources may
these include: provide datasets appropriatefor diabetes diagnosis [13].

A. Pima Indians Diabetes Database: V. RESULT AND ANALYSIS


This dataset, which contains details about age, blood
pressure, BMI, and other medical conditions, is frequently Decision trees, SVM, random forests, and KNN were
used to predict diabetes. It started with research done on amongthe machine learning techniques we employed for our
Pima Indian women. diabetes forecasting model. The secret to creating an accu-
rate model is machine learning with superior features that
will help health pro-fessionals in their work and make the
world a more livable place free of disease or with more
healthful ways to cure disease.

Fig 1 Dataset Graph using RTML

IJISRT23NOV2342 www.ijisrt.com 2136


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
In the above fig, we have an output of a dataset from var- terns and correlations between diabetes outcomes and feature
ious kinds of diabetes-infected people who are tested with sets, machine learning methods, such as logistic regression
RTML. We have no. of people distinguished in each catego- or decision trees are used for training datasets.
ry of predictive diabetes inflation for our future analysis and
work uponit to find out the accuracy of our research. After the model is trained, it is evaluated to check its
accuracy, precision and recall. To improve the model's pre-
VI. WITHOUT USING RTML DATASET dictive power and enhance its parameters, hyperparameter
modifying may be carried out. It may be used to predict
Using previous clinical information to create a predic- diabetes in new instances once the model has been verified,
tion model is the method of detecting diabetes without real- which makes it a beneficial tool in the healthcare field. The
time monitoring using machine learning. At first, a collection model's truthfulness is assured by routine maintenance and
of data is selected, including conventional attributes like age, monitoring, which makes it possible to alter it in response to
BMI, blood pressure, and cholesterol, along with labels changing population health features or evolving datasets
identifying whether or not diabetes is present. Following a [15]. With its roots in historical data, this machine learning
series of steps, the dataset is divided into training and testing technique effectively detects diabetes without requiring real-
sets. data preparation operations, like managing lost values time monitoring, al- allowing for prompt and precise ad-
and generalizing numerical properties [14]. To identify pat- justments in healthcare procedures.

Fig 2 Dataset Graph without RTML.

In the above fig, we have an output of dataset from various kinds of diabetes-infected people who are tested without RTML.
We have no. of people distinguished in each category of predictive diabetes inflation for our future analysis and work upon it to
find out the accuracy of our research.

IJISRT23NOV2342 www.ijisrt.com 2137


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 3 Accuracy Levels using these ML Techniques.

Here in this figure, we record the accuracies that we REFERENCES


have found for predicting diabetes using various ML tech-
niques such as Decision Tree, Random Forest, SVM and [1]. Y. Dubey, P. Wankhede, T. Borkar, A. Borkar and
KNN. K. Mitra, "Diabetes Prediction and Classification
using Machine Learning Algorithms," 2021 IEEE
VII. CONCLUSION International Conference on Biomedical Engineer-
ing, Computer and Information Technology for
To conclude, we have made a research paper with an ef- Health (BECITHCON), Dhaka, Bangladesh, 2021,
fort to make a better future for humankind with early detec- pp. 60-63, doi: 10.1109/BECITHCON54710.2021.
tion of diseases like diabetes which happens on inflation of 9893653.
blood sugar levels in the human body. This disease varies on [2]. S. A. Shampa, M. S. Islam and A. Nesa, "Machine
a wide range of factors including BMI, age, family history, Learning-based Diabe- tes Prediction: A Cross-
eating habits etc. However, the harsh outcomes of this dis- Country Perspective," 2023 International Confer-
ease can be prevented through early detection and its on- ence on Next-Generation Computing, IoT and Ma-
time curation. With the developing and evolving technolo- chine Learning (NCIM), Gazipur, Bangladesh,2023,
gies, this is possible that the fatal rates or disease rates of pp.1-6,doi: 10.1109/NCIM59001.2023.10212596.
humans go down in the coming ages. For such a thing to [3]. E. Daniel, J. Johnson, U. A. Victor, G. V. Aditya
happen we have machine learning techniques that aid in cre- and S. A. Sibby, "An Efficient Diabetes Prediction
ating this kind of model which can be considered a boon in Model using Machine Learning," 2023 4th Interna-
the field of healthcare. To bring a better change for human- tional Conference on Electronics and Sustainable
kind and build a disease-free environment for the world, this Communication Systems (ICESC), Coimbatore, In-
search paper is written. dia, 2023, pp. 1202-1208, doi: 10.1109/
ICESC57686.2023.10193277.
[4]. S. S et al., "A Comparative Analysis of Diabetes
Prediction Models using Machine Learning Algo-
rithms," 2022 8th International Conference on Ad-
vanced Computing and Communication Systems
(ICACCS), Coimbatore, India, 2022, pp. 261-265,
doi: 10.1109/ICACCS54159.2022.9785280.

IJISRT23NOV2342 www.ijisrt.com 2138


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[5]. C. Charitha, A. Devi Chaitrasree, P. C. Varma and [15]. P. Dalve, D. Bobby, A. Marathe, A. Dusane and S.
C. Lakshmi, "Type-II Diabetes Prediction Using Daga, "Comparison of Performance of Machine
Machine Learning Algorithms," 2022 International Learning Algorithms for Diabetes Detection," 2023
Conference on Computer Communication and In- Third International Conference on Advances in
formatics (ICCCI), Coimbatore, India, 2022, pp. 1- Electrical, Computing, Communication and Sus-
5, doi: 10.1109/ICCCI54379.2022.9740844. tainable Technologies (ICAECT), Bhilai, India,
[6]. S. Samet, M. R. Laouar and I. Bendib, "Use of Ma- 2023, pp. 1-7, doi: 10.1109/ICAECT57570.2023.
chine Learning Techniques to Predict Diabetes at an 10118315.
Early Stage," 2021 International Conference on
Networking and Advanced Systems (ICNAS), An-
naba, Algeria, 2021, pp. 1-6, doi:
10.1109/ICNAS53565.2021.9628903.
[7]. S. Mahajan, P. K. Sarangi, A. K. Sahoo and M.
Rohra, "Diabetes Mellitus Prediction using Super-
vised Machine Learning Techniques," 2023 Interna-
tional Conference on Advancement in Computation
& Computer Technologies (InCACCT), Gharuan,
India, 2023, pp. 587-592, doi:
10.1109/InCACCT57535.2023.10141734.
[8]. M. Pal, S. Parija and G. Panda, "Improved Predic-
tion of Diabetes Mellitus using Machine Learning
Based Approach," 2021 2nd International Confer-
ence on Range Technology (ICORT), Chandipur,
Balasore, India, 2021, pp. 1-6, doi:
10.1109/ICORT52730.2021.9581774.
[9]. P. Tumuluru, L. R. Burra, K. K. Sushanth, S. N.
Vali, C. H. M. H. SaiBaba and P. Yellamma,
"DPMLT: Diabetes Prediction Using Machine
Learn- ing Techniques," 2022 International Confer-
ence on Electronics and Re- newable Systems
(ICEARS), Tuticorin, India, 2022, pp. 1127-1133,
doi: 10.1109/ICEARS53579.2022.9751944.
[10]. L. H.N., A. S. Reddy and K. Naidu, "Analysis of
Diabetic Prediction Using Machine Learning Algo-
rithms on BRFSS Dataset," 2023 7th International
Conference on Trends in Electronics and Informat-
ics (ICOEI), Ti-runelveli, India, 2023, pp. 1024-
1028, doi: 10.1109/ICOEI56765.2023. 10125804.
[11]. S. Samet, M. R. Laouar and I. Bendib, "Diabetes
mellitus early-stage risk prediction using machine
learning algorithms," 2021 International Conference
on Networking and Advanced Systems (ICNAS),
Annaba, Alge- ria, 2021, pp. 1-6, doi: 10.1109/ IC-
NAS53565.2021.9628955.
[12]. M. Paliwal and P. Saraswat, "Research on Diabetes
Prediction Method Based on Machine Learning,"
2022 2nd International Conference on Technologi-
cal Advancements in Computational Sciences (IC-
TACS), Tashkent, Uzbekistan, 2022, pp. 415-419,
doi: 10.1109/IC- TACS56270.2022.9988050.
[13]. K. Sidana, "Prediction of Diabetes using Machine
Learning Algorithms," 2023 11th International
Conference on Internet of Everything, Micro-wave
Engineering, Communication and Networks (IE-
MECON), Jaipur, India, 2023, pp. 1-6, doi:
10.1109/IEMECON56962.2023.10092335.
[14]. B. Rathi and F. Madeira, "Early Prediction of Dia-
betes Using Machine Learning Techniques," 2023
Global Conference on Wireless and Optical Tech-
nologies (GCWOT), Malaga, Spain, 2023, pp. 1-7,
doi: 10.1109/GCWOT57803.2023.10064682.

IJISRT23NOV2342 www.ijisrt.com 2139

You might also like