Human Diseases Detection Based On Machine Learning Algorithms: A Review
Human Diseases Detection Based On Machine Learning Algorithms: A Review
net/publication/349054979
CITATIONS READS
11 2,301
2 authors:
Some of the authors of this publication are also working on these related projects:
Different Model for Hand Gesture Recognition with a Novel Line Feature Extraction View project
All content following this page was uploaded by Adnan Mohsin Abdulazeez on 05 February 2021.
Abstract:
One of the most significant subjects of society is human healthcare. It is
looking for the best one and robust disease diagnosis to get the care they
need as soon as possible. Other fields, such as statistics and computer
science, are needed for the health aspect of searching since this recognition
is often complicated. The task of following new approaches is challenging
these disciplines, moving beyond the conventional ones. The actual number
of new techniques makes it possible to provide a broad overview that
avoids particular aspects. To this end, we suggest a systematic analysis of IJSB
human diseases related to machine learning. This research concentrates on Literature review
Accepted 19 January 2021
existing techniques related to machine learning growth applied to the Published 25 January 2021
diagnosis of human illnesses in the medical field to discover exciting trends, DOI: 10.5281/zenodo.4462858
102
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
Introduction
In human society, healthcare is one of the most urgent issues, as the quality of life of people is
It relies explicitly on it (Bagga & Hans, 2015). The healthcare area, however, is exceedingly
varied, broadly dispersed, and fragmented. The delivery of adequate patient care from a
clinical perspective requires access to appropriate patient information, rarely accessible
when necessary (Grimson et al., 2001; Zeebaree et al., 2019). Besides, the large variance in
the order of tests for diagnostic purposes indicates the need for an adequate and suitable
collection of tests (Daniels & Schroeder, 1977; Wennberg, 1984; Zeebaree et al., 2019).
(Smellie et al., 2002) expanded this claim by suggesting that the significant differences found
in the request for general practice pathology arise primarily from individual variations in
clinical practice and are thus likely to improve through more transparent and better-
informed decision-making for physicians (Stuart et al., 2002). Therefore, medical data also
consist of many heterogeneous variables obtained from various sources, such as
demographics, history of illness, medications, allergies, biomarkers, medical photographs, or
genetic markers, each offers a different partial view of the condition of the patient. Also,
among the sources, as mentioned earlier, statistical properties are fundamentally different.
Researchers and practitioners face two challenges when analyzing such data: The curse of
dimensionality (the number of dimensions and the number of samples increases
exponentially in the space of the features) and the heterogeneity of function sources and
statistical features (Pölsterl et al., 2016). These causes contribute to delays and inaccuracies
in the diagnosis of the disease and, therefore, patients have not been able to obtain adequate
care. Therefore, there is a strong need for an appropriate and systematic approach that
enables early detection of the disease and can be used as a physician's decision-making aid
(Zhuang et al., 2009). Therefore, the medical, computer, and statistical fields face the
challenge of exploring new strategies for modeling disease prognosis and diagnosis, as
conventional paradigms struggle to answer all of this information (Huang et al., 2007). Today,
ML offers many essential resources for intelligent data analysis. Furthermore, its technology
is currently well adapted for the study of medical data. In particular, a wide variety of medical
diagnostic work has been carried out on small-specialized diagnostic problems(Bargarai et
al., 2020; Kononenko, 2001), where initial ML applications have been found. ML classifiers
have been successfully used, for example, to differentiate between stable patients and those
with Parkinson's disease (Sriram et al., 2016; Zebari et al., 2020), which is a valuable tool in
clinical diagnosis. Indeed, on a wide range of significant issues, most ML algorithms perform
very well.
2. Background Theory
This section briefly introduced Machine learning, its types, and the most used literature
techniques, comparing the studies and research about machine learning.
2.1 ML
ML is a branch of artificial intelligence that enables computers to think like human beings
and make their own decisions without human interference. ML has much progress in
detecting various forms of disease due to the rapid growth of Artificial Intelligent. A machine
learning algorithm also provides us with more precise predictions and performance
(Shaheamlung et al., 2020). ML has been widely divided into various forms, as seen in figure
1. below.
103
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
a) SUPERVISED LEARNING
This type of ML gives a training data set. This ML approach responds accurately to all
feasible inputs, as it depends on the training data set. Supervised learning from examples is
often referred to as learning (Hashem et al., 2018; Sadeeq & Abdulazeez, 2018; Shi & Malik,
2000; Zebari et al., 2020). Regression and classification are two forms of supervised machine
learning.
b) UNSUPERVISED LEARNING
Right answers or goals are not given. Because of these similarities, the purpose of un-
regulated learning techniques is to discover the similarities between knowledge data and the
story structured by an un-directed learning approach. This type of learning is otherwise
referred to as calculating thickness. Grouping requires unsupervised adaptation (Jahwar,
2021; Najim Adeen et al., 2020; Pan & Tompkins, 1985).
d) REINFORCEMENT LEARNING
The psychology of behaviorists endorses this form of ML. An algorithm indicates that the
answer is incorrect, but it does not say how to correct that response. This algorithm conducts
several tests before it finds the right answer. Improvement is not feasible in this learning
process.
104
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
2.2.1. Support Vector Machines (SVM)
SVM, which was designed in the 1990s. SVM is used to accomplish (ML) tasks, and it is a
prominent and straightforward tool. A selection of training samples divides each sample into
different categories in this process. Help vector SVM computer, used primarily for problems
with classification and regression (Murphy, 2012; Zeebaree et al., 2019).
105
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
models (i.e., models with fewer parameters). This particular form of regression is ideal for
models with high multicollinearity levels or when certain aspects of model selection are
automated, such as variable selection/parameter elimination.
3. Related work
There are many research areas and related works on this topic. In (Ramana et al., 2011), they
found that the AP datasets were better than the UCLA datasets for all the various chosen
algorithms. The writers used two separate datasets of inputs. The AP data sets were
calculated to be better than the UCLA dataset. Based on the usefulness of their KNN
classification, backward propagation and SVM give better outcomes. For the entire chosen
algorithm, the AP data set is better than UCLA. Besides, 95.07, 96.27, 96.93, 97.47, & 97.07 %
accuracy have C4.5, Backward propagation, Naïve Bayes, SVM, and KNN. (Kousarrizi et al.,
2012) this analysis is focused on two databases on thyroid disease. The first dataset is taken
from the UCI machine learning repository. The second is the actual data gathered from the
Imam Khomeini hospital by the Intelligent Device Laboratory of the K.N.Toosi University of
Technology. They obtained a classification accuracy of 98.62 % using SVM for the first
dataset, which is the highest accuracy achieved so far. (Chitra et al., 2018) in the paper, the
SVM with a Radial base function kernel is used for classification. The output parameters are
high, such as the classification accuracy, sensitivity, and specificity of the SVM and RBF,
making it the right choice for the classification process. (Fan et al., 2013) Twelve
morphological features from the ST segment were extracted. Using the SVM classifier, they
obtained 95.20% sensitivity, 93.29% specificity, and 93.63% accuracy. (Hariharan et al.,
2014) to diagnose Parkinson's disease, in this approach, the neural networks and the SVM
algorithm are fused. The experimental findings show that for Parkinson's dataset, the
combination of feature preprocessing, feature reduction/selection methods, and classification
give a maximum classification precision of 100 %.
The (Senturk & Kara, 2014) intends to contribute to early breast cancer diagnosis in this
study. An analysis of the diagnosis of breast cancer for patients is provided. Seven different
algorithms are used to realize the predictions of the other patients and give them precision.
Patient data from UCI ML during the prediction process, the data mining tool RapidMiner 5.0,
is used to apply data mining with the desired algorithms during the prediction process.
In a difference between two classification algorithms, SVM and ANNs, was addressed by the
Vijayarani & Dhayanand (2015). In this study reached the target of predicting CKD based on
their respective accuracies and timings. The one picked with higher accuracy, and the right
timing was chosen. Survey of a paper (Hashem et al., 2017) to classify liver disease. Different
data mining classification methods were studied in this analysis, and the AP liver dataset data
106
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
set used had better results than the UCLA dataset and concluded that C4.5 had achieved
better results than other algorithms. (Ko et al., 2017) using thermoscopic and clinical images
that displayed the performance of CNNs approach, a CNNs architecture was trained from
scratch. However, because of the limited datasets, a network's training from scratch to detect
skin cancer is usually not viable. Most of the researchers, therefore either fine-tuned the
model or used pre-trained models.
The output of tumour classification techniques for classifying MR brain image characteristics
as n/a, gliomatosis, multifocal, and multicentric was analyzed (Cinarer & Emiroglu, 2019)
study. KNN, RF, LDA and SVM machine learning algorithms tested these results. Compared to
other algorithms, the SVM algorithm with a 90% precision rate was higher. Javeed et al.
(2019) addressed overfitting, a model has been developed to improve heart disease
107
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
prediction; overfitting implies that the proposed model works and provides better data
testing accuracy and gives unfortunate accuracy results for training data when predicting
heart disease. They have built a model to solve this problem to give the best precision for
training and testing results. There are two algorithms in the model: RAS (Random Search
Algorithm) and the other is a random forest algorithm used for model prediction. In both
training data and testing data, this proposed model provided them with better performance.
Intracerebral hemorrhage sources for high mortality rate as a result, (Liu et al., 2019) it is
based on multivariate analysis to anticipate the expansion of hematoma in spontaneous ICH
with normally accessible SVM data and pointed out 83. A randomized 179 search approach
was used in this study for parameter tuning, and recursive function 180 elimination was used
for feature selection. Patient selection for thrombolytic procedures is another significant
factor. Rustam et al. (2020) used three types of the forecast for each model: the number of
cases freshly infected, the number of casualties, and the number of recoveries over the next
ten days. The outcomes provided by the Study Analysis indicate that the use of these methods
in the current COVID-19 pandemic scenario is a promising mechanism. The results show that
of all the models used, and the ES performs best, followed by LASSO & LR, which performs
well in forecasting newly recorded incidents, death rate and recovery rate, Although SVM
does not perform well in the prediction scenarios, the available dataset is given. Tanveer et al.
(2020) analyzed 165 articles from 2005-2019 using different feature extraction techniques
and machine learning techniques. Three key categories are studied in ML techniques: SVM,
ANN and DL, and the ensemble methods.
(Javeed et al., Heart Cleveland RSA, RF 93.33% Develop an intelligent system that
2019) disease heart failure (RSA+RF) would show good performance on
(Meng et al., both training and testing data
2018) diagnosis of heart failure.
The best ML and classification
(Cinarer & Brain (TCIA) KNN, RF, SVM: 90% algorithms' goal is to learn from
108
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
Emiroglu, tumour (Scarpace et SVM and training automatically and ultimately
2019) al. 2015) LDA make a wise decision with high
accuracy.
(Durai et al., Liver UCI J48, With 95.04, To predict the same definitive result,
2019) disease (Shi & Malik, SVM& the J48 compare algorithm techniques with a
2000) NB algorithm higher accuracy rate for detecting
has a better liver disease.
choice of
features.
The study's objective is to increase the
(Ahmed et al., Alzheimer ADNI CNN 90.05% degree of accuracy comparable to
2019) Diseases state-of-the-art techniques, address
the problem of overfitting, and
examine validated brain technologies
that include noticeable AD diagnostic
features.
Based on gene expression data, DL
(Zeebaree et Cancer Different CNN 100% algorithm applications are used to
al., 2018) disease cancer diagnose the disease.
dataset
(Acharya et myocardial Control:40 CNN 98.99% This study proposed diagnosing MI
al., 2017) infarction CHD:7 using 11 deep CNNs layers
(Pan & automatically, using two separate
Tompkins, databases (noise and without noise).
1985; Singh &
Tiwari, 2006)
(Kulkarni Alzheimer 100 (50 CN, The purpose of this research paper is
and Bairagi, disease 50 AD) SVM 96% to examine various characteristics of
2017) (Kulkarni & Alzheimer's disease diagnosis to serve
Bairagi, 2017) as a potential biomarker to
differentiate between the topic of AD
and the ordinary subject.
(Senturk et Determine the best approaches to
al., 2014) breast UCI SVM, NB, K- lead to early breast cancer detection.
cancer KNN and NN:95.15%, An overview of the diagnosis of breast
DT SVM:96.40% cancer in patients is given.
(Hariharan et Parkinson's PD dataset SVM 100% found the best and an integrated
al., 2014) disease was used approach to propose to improve the
from (UCI) accuracy of detection of Parkinson's
disease
Determine the best approaches to
(Kumari and Diabetic UCI SVM 78% lead to early breast cancer detection.
Chitra, 2013) Disease An overview of the diagnosis of breast
cancer in patients is given.
Choose the best methods of feature
(Kousarrizi Thyroid UCI SVM 98.62% selection and classification for thyroid
et al., 2011) Disease disease diagnosis, which is one of the
most critical classification problems
Naqi et al. (2020) focused on 3D properties in the feature's extraction process. In image
processing, recent developments in deep learning are a breakthrough. From traditional
handcraft characteristics to deep automated characteristics, the emphasis of mechanical
diagnostic systems has shifted. It helps in better identification and classification with a CT
picture of nodular objects. For better feature reduction and type, an autoencoder and SoftMax
are considered useful tools. Kumar et al. (2020) employed DL techniques, namely CNNs, the
proposed model eradicates errors in the manual process. The model, trained on cells' images,
preprocesses the images first and extracts the best characteristics. This survey is followed by
the optimized Dense Convolutional neural network structure (called DCNN) training the
109
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
model and eventually predicting the type of cancer present in the cells. The model correctly
replicated all measurements while accurately recollecting the samples 94 times out of 100.
The aggregate accuracy was 97.2%, which is better than the techniques of CNNs such as
SVMs, DT, RF, NB. This research shows that the DCNN model's performance is similar to that
of the architectures of the developed CNNs with much fewer parameters and computation
time tested on the retrieved dataset. Therefore, to evaluate the form of cancer in the bone
marrow, the model can be used effectively.
Discussion
This paper discusses various instruments and methods commonly used in the fields of
medicine and healthcare. These tools are within ML and allow us to reach DL's main aim,
finding useful patterns in databases, explaining and making a non-trivial prediction about
data. We summarized the technical details shown in table 1: (including the References, Year,
Diseases, Dataset, Performance and Research Objective) of the research mentioned in this
previous section. As shown in table 1: some researchers used DL algorithms to achieve a
higher rate of deeper detecting to improve precision, trust, and performance. It has been
noticed that five researchers (Kumar et al., 2020; Naqi et al., 2020; Ahmed et al., 2019;
Zeebaree et al., 2018 and Acharya et al., 2017). Focused on the DL algorithms for a detect
disease like (Blood cancer, Lung cancer, Alzheimer, Cancer disease and myocardial infarction)
show the performance column the accuracy of CNNs in cancer disease has a higher rate than
the others disease. Classification is the model used to search for a model or function that
defines and distinguishes the data, classes, or concepts that the model uses to predict the
class of object whose class mark is unknown. In classification, they create software that can
learn how the data objects can be categorized. The derived model can be presented as
classification or rules; many researchers have used different algorithms to help health care
practitioners diagnose diseases with greater precision in diagnosis. In this study many
classification algorithms used for detect disease (LR, LASSO, SVM, KNN, RF, LDA, NB, J48, RSA
and DT) as shown in table 1, SVM in Liu et al. (2019);Cinarer & Emiroglu, (2019); Kulkarni
and Bairagi (2017); Senturk et al. (2014); Hariharan et al. (2014); Kumari and Chitra (2013)
and Kousarrizi et al. (2011) had the higher accuracy among the other classification algorithms
for the disease detection. However, given the available dataset, Rustam et al. (2020) found
that SVM performs poorly in all prediction scenarios and Durai et al. (2019) mentioned J48
algorithm is considered a better output algorithm when it comes to feature selection with an
accuracy rate of 95.04 %.
Conclusion
Intelligent data processing is a social necessity for identifying, as soon as possible, of useful
and robust disease detections to provide patients with appropriate care within the shortest
possible time. This detection has been carried out in recent decades by detecting exciting
patterns in databases. Smart data processing is emerging as a requirement for effective and
robust diseases to be found by society. Detection of patients providing the necessary
treatment as soon as possible within the shortest possible period. This identification has been
achieved in recent decades through the method of identifying exciting patterns in databases.
A comprehensive overview of intelligent data analysis tools in the medical sector is given in
this paper. Some examples of some algorithms used in these medical field areas are also
presented, examining potential patterns based on the target searched, the methodology used,
and the application field. Given the pace at which new works emerge in this emerging field, a
systematic analysis such as the one we have just presented may become obsolete in a short
period. For this reason, we consider that, after a careful quest for new scientific literature,
Table 1 should mainly be revised, provided that further research is more likely to take place
in the short term on the application of established techniques in this field than on the
110
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
proposal of new techniques which are novel and not merely enhancing or changing existing
ones.
References
Acharya, U. R., Fujita, H., Oh, S. L., Hagiwara, Y., Tan, J. H., & Adam, M. (2017). Application of deep convolutional
neural network for automated detection of myocardial infarction using ECG signals. Information Sciences,
415–416, 190–198. https://doi.org/10.1016/j.ins.2017.06.027
Ahmed, S., Choi, K. Y., Lee, J. J., Kim, B. C., Kwon, G. R., Lee, K. H., & Jung, H. Y. (2019). Ensembles of Patch-Based
Classifiers for Diagnosis of Alzheimer Diseases. IEEE Access, 7, 73373–73383.
https://doi.org/10.1109/ACCESS.2019.2920011
Al-Zebari, A., & Sengur, A. (2019). Performance Comparison of Machine Learning Techniques on Diabetes
Disease Detection. 1st International Informatics and Software Engineering Conference: Innovative
Technologies for Digital Transformation, IISEC 2019 - Proceedings, 2–5.
https://doi.org/10.1109/UBMYK48245.2019.8965542
Bagga, P., & Hans, R. (2015). Applications of mobile agents in healthcare domain: A literature survey.
International Journal of Grid and Distributed Computing, 8(5), 55–72.
https://doi.org/10.14257/ijgdc.2015.8.5.05
Bargarai, F. A. M., Abdulazeez, A. M., Tiryaki, V. M., & Zeebaree, D. Q. (2020). Management of wireless
communication systems using artificial intelligence-based software defined radio. International Journal of
Interactive Mobile Technologies, 14(13), 107–133. https://doi.org/10.3991/ijim.v14i13.14211
Chitra, K. and. (2018). Classification Of Diabetes Disease Using Support Vector Machine. 3(2), 1797–1801.
https://www.researchgate.net/publication/320395340
Cinarer, G., & Emiroglu, B. G. (2019). Classificatin of Brain Tumors by Machine Learning Algorithms. 3rd
International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019 -
Proceedings. https://doi.org/10.1109/ISMSIT.2019.8932878
Daniels, M., & Schroeder, S. A. (1977). Variation among physicians in use of laboratory tests II. Relation to clinical
productivity and outcomes of care. Medical Care, 15(6), 482–487. https://doi.org/10.1097/00005650-
197706000-00004
Durai, V. (n.d.). Liver disease prediction using machine learning. 5(2), 1584–1588.
Fan, C. H., Hsu, Y., Yu, S. N., & Lin, J. W. (2013). Detection of myocardial ischemia episode using morphological
features. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, EMBS, 7334–7337. https://doi.org/10.1109/EMBC.2013.6611252
Grimson, J., Stephens, G., Jung, B., Grimson, W., Berry, D., & Pardon, S. (2001). Sharing healthcare records over the
internet. IEEE Internet Computing, 5(3), 49–58. https://doi.org/10.1109/4236.935177
Hariharan, M., Polat, K., & Sindhu, R. (2014). A new hybrid intelligent system for accurate detection of
Parkinson's disease. Computer Methods and Programs in Biomedicine, 113(3), 904–913.
https://doi.org/10.1016/j.cmpb.2014.01.004
Hashem, S., Esmat, G., Elakel, W., Habashy, S., Raouf, S. A., ElHefnawi, M., Eladawy, M., & ElHefnawi, M. (2018).
Comparison of Machine Learning Approaches for Prediction of Advanced Liver Fibrosis in Chronic
Hepatitis C Patients. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(3), 861–868.
https://doi.org/10.1109/TCBB.2017.2690848
Hazra, A., Kumar, S., & Gupta, A. (2016). Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes,
SVM and Ensemble Algorithms. International Journal of Computer Applications, 145(2), 39–45.
https://doi.org/10.5120/ijca2016910595
Huang, F. J., & LeCun, Y. (2006). Large-scale learning with SVM and convolutional nets for generic object
categorization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 1(July 2006), 284–291. https://doi.org/10.1109/CVPR.2006.164
Huang, M. J., Chen, M. Y., & Lee, S. C. (2007). Integrating data mining with case-based reasoning for chronic
diseases prognosis and diagnosis. Expert Systems with Applications, 32(3), 856–867.
https://doi.org/10.1016/j.eswa.2006.01.038
Iswanto, I., Laxmi Lydia, E., Shankar, K., Nguyen, P. T., Hashim, W., & Maseleno, A. (2019). Identifying diseases
and diagnosis using machine learning. International Journal of Engineering and Advanced Technology, 8(6
Special Issue 2), 978–981. https://doi.org/10.35940/ijeat.F1297.0886S219
Jahwar, A. F. (2021). META-HEURISTIC ALGORITHMS FOR K-MEANS CLUSTERING : A REVIEW. 17(7), 1–20.
Javeed, A., Zhou, S., Yongjian, L., Qasim, I., Noor, A., & Nour, R. (2019). An Intelligent Learning System Based on
Random Search Algorithm and Optimized Random Forest Model for Improved Heart Disease Detection.
IEEE Access, 7, 180235–180243. https://doi.org/10.1109/ACCESS.2019.2952107
Ko, J., Swetter, S. M., Blau, H. M., Esteva, A., Kuprel, B., Novoa, R. A., & Thrun, S. (2017). Dermatologist-level
classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
http://dx.doi.org/10.1038/nature21056
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial
111
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
112
IJSB Volume: 5, Issue: 2 Year: 2021 Page: 102-113
Nareen O. M. Salim & Adnan Mohsin Abdulazeez (2021). Human Diseases Detection Based
On Machine Learning Algorithms: A Review. International Journal of Science and Business,
5(2), 102-113. doi: https://doi.org/10.5281/zenodo.4462858
Published by
113