Proposal
Proposal
References …...……………………………………………………..………………...….12
2 |Page
1.0 Introduction:
Earlier studies on the issue of diabetes diagnosis came close to using machine
learning. However, due to either inadequate insulin synthesis by the pancreas or
inappropriate insulin usage by body cells, the specific cause of diabetes is still unknown.
When analyzing vast volumes of data for patterns or doing predictive analysis, machine
learning is especially useful. In order to enhance patient security and healthcare quality, it
also offers an extension of decision support tools for risk management and alerts. To reduce
healthcare expenditures and progress toward personalized treatment, the healthcare industry
must overcome challenges in electronic record administration, information integration,
computer-aided analysis, and early sickness or problem diagnosis. Machine learning offers a
variety of very precise tools, approaches, and systems to deal with these issues.
The global diabetes burden is projected to increase from 380 million people in 2013
to 590 million by 2035. Patients living with diabetes have a higher risk for acute and long-
term complications, such as hyperglycemia, nervous system damage, kidney disease, eye
damage, and cardiovascular events, than the general population. Furthermore, treatments for
diabetes complications are a major contributor to the healthcare costs attributable to diabetes,
particularly due to hospitalizations and emergency department visits (Ravaut et al., 2021).
3 |Page
Diabetes Mellitus (DM), sometimes called diabetes, is a concept for a variety of
disorders that include how the body converts food into energy. Once one consumes food, the
body converts it into sugar named glucose and transfers it to the bloodstream. The pancreas
produces insulin, which is a hormone that tends to transfer glucose from the blood to the cells
that utilize it for energy (Chaki et al., 2020).
In addition, Diabetes mellitus has been an increasing concern owing to its high
morbidity, and the average age of individual affected by of individual affected by this disease
has now decreased to mid-twenties. Given the high prevalence, it is necessary to address with
this problem effectively (Sharma et al., 2021).
Recently, Diabetes mellitus is a global public health issue. In 2019, the International
Diabetes Federation estimated the number of people living with diabetes worldwide at 463
million and the expected growth at 51% by the year 2045. Moreover, it is estimated that
there is one undiagnosed person for each diagnosed person with a diabetes diagnosis
(Fregoso-Aparicio et al., 2021)
Hence, Machine learning has demonstrated its powerful predictive capabilities and
parallel processing capabilities for handling large numbers of variables. Furthermore,
machine learning has derived variable screening mechanisms that can detect and interpret
complex relationships between variables (Qin et al., 2022). Machine learning algorithms
have been embedded into data mining pipelines, which can combine them with classical
statistical strategies, to extract knowledge from data. (Dagliati et al., 2018)
Data mining is one of the most essential components of medical science research,
which unavoidably generates enormous volumes of data due to the significant societal impact
of the specific condition. Applying machine learning and data mining techniques to the
investigation might be a crucial strategy for making use of the vast amounts of diabetes-
related data that are already accessible. When it comes to decision-making, administration,
and other associated healthcare organization aspects, a mix of machine learning and data
mining methodologies may be quite concerning.
4 |Page
Diabetes can cause numerous health complications includes cardiovascular disease,
stroke, chronic kidney infection, foot ulcers, damage to the nerves, harm to the eyes, and
cognitive impedance or death. The detection of diabetes is of great importance, concerning
its severe complications. There have been plenty of research studies about diabetes
identification, many of which are based on the Pima Indian diabetes data set. It’s a data set
studying women in Pima Indian population started from 1965, where the onset rate for
diabetes is comparatively high. Most of the research studies done before mainly focused on
one or two particular complex technique to test the data, while a comprehensive research
over many common techniques is missing (Wei et al., 2018).
Diabetes Mellitus (DM) is one of the most significant research applications is human-
threatening disease prognosis and treatment. DM is among the most widespread diseases
(World Health Organization, 2020) for the elderly in the country. In 2017, 451 million
individuals globally are diabetic as informed by the International Diabetes Federation.
Expectations are that this figure will rise to 693 million citizens over the next 26 years. The
primary cause of DM remains unclear, but researchers believe that both environmental and
genetic factors play an important role in DM (Chaki et al., 2020).
Diabetes increases the risks of initial kidney disease, loss of sight, nerve injure, blood
vessel damage and it contributes to heart disease. The cause of diabetes continues to be an
ambiguity, although both genetics and ecological factors such as obesity and be short of
exercise come out to take part in roles (Chitra et al., 2015).
5 |Page
4.0 Literature Review:
The following are discussions of earlier work on data mining and machine learning
techniques:
According to Sharma et al. (2021), several scientists and medical professionals have
now created artificial intelligence-based detection tools to more effectively address issues
that are ignored as a result of human mistakes. The use of machine vision systems to learn
data on facial images, gain better features for model training, and diagnosis via presentation
of iridocyclitis for detection of the disease through iris patterns are just a few examples of the
data mining techniques with algorithms that have been implemented by different
practitioners.
Research by Ahmad et al. (2015), Data mining has a huge potential to help healthcare
systems use data more effectively and efficiently. The authors found that one of the most
often utilized data mining techniques in the healthcare industry is classification. Data mining
is cited by the authors as being crucial in the identification of fraud and abuse, the provision
of better medical care at affordable costs, intelligent healthcare decision support systems, and
the early detection of diseases like diabetes, heart disease, lung cancer, thyroid, dengue,
Alzheimer's disease, and others. The authors also discussed the difficulties researchers have
encountered when using healthcare data for data mining, difficulties that might pose severe
obstacles to making the right conclusions. The authors recommended combining several data
mining approaches in order to improve survival rates for significant death-related conditions,
increase the accuracy of illness prognosis, etc.
6 |Page
Data mining techniques have been extensively employed in constructing decision
support systems for illnesses prediction using a collection of medical datasets, according to
research by Mehrbakhsh et al. (2017). The authors suggested integrating clustering, noise
reduction, and prediction approaches to create a new knowledge-based system for illness
prediction. The Classification and Regression Trees (CART) method was recommended by
the authors to create the fuzzy rules that would be employed in the knowledge-based system.
Chitra et al. (2015) performed a comprehensive review with the goal of assisting
researchers in creating ensemble learning approaches to aid in the early identification and
diagnosis of diabetes. Support Vector Machines (SVMs) have demonstrated strong
performance in a variety of application domains, according to the authors. The suggested
method by the authors is focused on identifying people who are at risk for developing pre-
diabetes or undetected diabetes and helping them decide whether to visit a doctor for
additional testing. According to the authors, the proposed system has the capability to study
and evaluate the diabetes diagnosis with high levels of accuracy, indicating the model's
capacity for diagnosis.
Zou et al. (2018) conducted research according to rising morbidity rates over the past
several years, there will be 642 million diabetic patients worldwide in 2040, or one in every
ten persons. Diabetes can cause chronic damage and malfunction of many tissues, including
the eyes, kidneys, heart, blood vessels, and nerves. Type 1 diabetes (T1D) and type 2
diabetes are the two kinds of diabetes identified by the authors (T2D). The average age of
type 1 diabetes patients is under 30 years old. To predict diabetes mellitus, the authors
created machine learning methods using decision trees, random forests, and neural networks.
Random forests, as proposed by the authors, are undoubtedly superior to classifiers in several
applications.
7 |Page
diabetes in North Kashmir using machine learning algorithms. According to estimates cited
by the authors, 285 million individuals worldwide had diabetes in 2010. According to
estimates, there will be 552 million people on the planet by 2030. (6.4 percent of adults)
Based on the disease's estimated development rate, by 2040 one in ten persons were expected
to have diabetes. In order to anticipate a patient's diabetes status at the earliest practical stage,
the authors proposed a variety of categorization models based on machine learning
techniques.
According to research by Katherine et al. (2021) and the IDF Chart book published in
2017, there are approximately 424.9 million diabetes patients worldwide between the ages of
20 and 79. Of them, 95% have Type 2 Diabetes Mellitus (T2DM). By 2045, it's expected that
there will be 628.6 million people on the planet. Numerous complicated diseases, including
nephropathy, cardiovascular infection, retinal disease, neuropathy, and many more, can be
brought on by diabetes. The study explains how machine learning may be applied to clinical
diagnostics to create frameworks that make use of patient-specific data to predict the
possibility of problems caused by diabetes.
In the future, the datasets for illness categorization and prediction using incremental
machine learning algorithms need to be given greater thought. Therefore, it is necessary to
evaluate this technique on more datasets, especially huge datasets, in order to determine its
8 |Page
suitability for large data processing. In expansion, the suggested approach may be broadened
to make it suitable for various types of medical datasets.
The aim of this research work is to develop and use a novel system based on
clustering and classification that relies on a hybrid machine learning approach for the
diagnosis of diabetics using genomic databases. The main objectives (illness, strategy,
outcome, accuracy) of diverse research efforts as well as how they used the technique or
methods will be emphasized.
1. To cluster the data for the purpose of assessing their data pattern and classify the
content with according to behavior informatics.
1. What features make up the database used to generate the model, specifically?
3. What are the ideal validation measures to assess the effectiveness of the models?
The major focus of this study is the knowledge-based methodology for diagnosing
diabetes using a genomic database. It does clustering, noise reduction, forecasting, and
classification using a rule-based decision tree approach in combination with a deep belief
network configuration.
9 |Page
Three steps are taken in the research implementation flow: In order to forecast
unknown or future values of intrigued, it is first necessary to use existing variables inside the
database. The second part of the statement focuses on developing designs for expressing the
data before introducing user explanation. Third, use classification and rule-based decision
tree to create fuzzy rules that may be utilized inside the knowledge-based framework.
6.1.1 Pre-processing:
Data Reduction: Combines noise and repetition removal with the float location
module to make models easier to use.
Normalization: In order to remove the artifacts from the data, we used low pass
filtering, missing value removal (zero or negative), data review, outlier detection and
removal, and statistical calculations of maximum, minimum, mean, median, mode,
standard deviation, and range in order to have a normalized data set throughout the
study's conclusion.
6.1.2 Classification:
In order to account for the convolutional layers and totally related layers attributes of
a few class modifications, the proposed study incorporates a discretization procedure of the
Convolutional Neural Network (CNN) based classifier for class information.
We design a Deep Belief Network (DBN) to take into account with slow learning and
over fitting wonder for training datasets in order to address issues with traditional
neural networks in deep layered networks.
To prevent the error from being back-propagated via time and layers. A rule-based
decision tree strategy can ensure that a mistake is more likely to be made as you
memorize across several time steps.
10 | P a g
e
6.1.3 Evaluation:
The performance of the classifier is approved based on three viewpoints; sensitivity,
specificity, and accuracy.
Sensitivity measures the predicted output with respect to the modification of the input.
In other words, sensitivity reveals the magnitude of the accurately identified true
positives.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 (%) = ∗ 100
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
S𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (%) = ∗ 100
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
Accuracy is the relationship among with predicted value and the actual value which
measures how close the anticipated esteem to the real esteem.
Table-1 provides definitions for the abbreviations used for the execution parameters
of the True Positive (TP), True Negative (TN), False Positive (FP), and False
Negative (FN).
Table 1:
Abbreviation Explanation
11 | P a g
e
References:
Ahmad, Parvez & Qamar, Saqib & Rizvi, Syed. (2015). Techniques of Data Mining In
Healthcare: A Review. International Journal of Computer Applications. 120. 38-50.
10.5120/21307-4126.
https://research.ijcaonline.org/volume120/number15/pxc3904126.pdf
Chaki, J., Thillai Ganesh, S., Cidham, S.K, Ananda Theertan, S., Machine Learning and
Artificial Intelligence based Diabetes Mellitus Detection and Self-Management: A
Systematic Review, Journal of King Saud University - Computer and Information
Sciences (2020),
https://www.sciencedirect.com/science/article/pii/S1319157820304134
Chitra Arjun, Mr. Anto S. Diagnosis of Diabetes Using Support Vector Machine and
Ensemble Learning Approach (2015). International Journal of Engineering and
Applied Sciences. 2394-3661, 2 (11).
https://www.ijeas.org/download_data/IJEAS0211027.pdf
Dagliati, A., Marini, S., Sacchi, L., Cogni, G., Teliti, M., Tibollo, V., De Cata, P., Chiovato,
L., & Bellazzi, R. (2018). Machine Learning Methods to Predict Diabetes
Complications. Journal of Diabetes Science and Technology, 12(2), 295-
302. https://doi.org/10.1177/1932296817706375
Fregoso-Aparicio, L., Noguez, J., Montesinos, L. et al. Machine learning and deep learning
predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 13,
148 (2021). https://doi.org/10.1186/s13098-021-00767-9
Katherine Ogurtsova, Leonor Guariguata, Noël C. Barengo, Paz Lopez-Doriga Ruiz, Julian
W. Sacre, Suvi Karuranga, Hong Sun, Edward J. Boyko, Dianna J. Magliano, IDF
diabetes Atlas: Global estimates of undiagnosed diabetes in adults for 2021, Diabetes
Research and Clinical Practice, Volume 183, 2022.
https://www.sciencedirect.com/science/article/pii/S0168822721004770
Mehrbakhsh Nilashi, Othman bin Ibrahim, Hossein Ahmadi, Leila Shahmoradi, An analytical
method for diseases prediction using machine learning techniques, Computers &
Chemical Engineering, Volume 106, 2017, Pages 212-223,
https://doi.org/10.1016/j.compchemeng.2017.06.011.
Pronab Ghosh, Sami Azam, Asif Karim, Mehedi Hassan, Kuber Roy, Mirjam Jonkman, A
Comparative Study of Different Machine Learning Tools in Detecting Diabetes,
12 | P a g
e
Procedia Computer Science, Volume 192, 2021, Pages 467-477,
https://doi.org/10.1016/j.procs.2021.08.048.
Models for Data-Driven Prediction of Diabetes by Lifestyle Type. Int J Environ Res
Public Health. 2022 Nov 15;19(22):15027. doi: 10.3390/ijerph192215027.
https://www2.mdpi.com/1660-4601/19/22/15027
Rahman Tahsinur, Farzana Sheikh Mastura & Khanom Aniqa Zaida (2018). Prediction of
Diabetes Induced Complications Using Different Machine Learning Algorithms.
Thesis: Department of Computer Science and Engineering, BRAC University.
http://dspace.bracu.ac.bd/xmlui/bitstream/handle/10361/10945/15101128_CSE.pdf?se
quence=1&isAllowed=y
Ravaut, M., Sadeghi, H., Leung, K.K. et al. Predicting adverse outcomes due to diabetes
complications with machine learning using administrative health data. npj Digit.
Med. 4, 24 (2021). https://doi.org/10.1038/s41746-021-00394-8
Shafi, Salliah & Selvam, Venkatesan & Ansari, Gufran & Ansari, Mohd Dilshad & Rahman,
Md Habibur. (2022). Prevalence and Early Prediction of Diabetes Using Machine
Learning in North Kashmir: A Case Study of District Bandipora. Computational
Intelligence and Neuroscience. 2022. 1-12. 10.1155/2022/2789760.
Tomar, Divya. (2013). A survey on Data Mining approaches for Healthcare. International
Journal of Bio - Science and Bio - Technology. 5. 241-266.
http://dx.doi.org/10.14257/ijbsbt.2013.5.5.25
Wei, Sidong & Xuejiao, Zhao & Miao, Chunyan. (2018). A comprehensive exploration to the
machine learning techniques for diabetes identification. 291-295.
https://hdl.handle.net/10356/89478
Zou, Quan & Qu, Kaiyang & Luo, Yamei & Yin, Dehui & Ju, Ying & Tang, Hua. (2018).
Predicting Diabetes Mellitus With Machine Learning Techniques. Frontiers in
Genetics. 9. 10.3389/fgene.2018.00515.
https://www.frontiersin.org/articles/10.3389/fgene.2018.00515/full
13 | P a g
e