Machine Learning (ML) in Medicine - Review, Applications, and Challenges PDF
Machine Learning (ML) in Medicine - Review, Applications, and Challenges PDF
Review
Machine Learning (ML) in Medicine: Review, Applications,
and Challenges
Amir Masoud Rahmani 1,† , Efat Yousefpoor 2 , Mohammad Sadegh Yousefpoor 2 , Zahid Mehmood 3 ,
Amir Haider 4,† , Mehdi Hosseinzadeh 5, * and Rizwan Ali Naqvi 4, *
1 Future Technology Research Center, National Yunlin University of Science and Technology,
Douliou 64002, Taiwan; [email protected]
2 Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful 73210, Iran;
[email protected] (E.Y.); [email protected] (M.S.Y.)
3 Department of Computer Engineering, University of Engineering and Technology, Taxila 47050, Pakistan;
[email protected]
4 School of Intelligent Mechatronics Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu,
Seoul 05006, Korea; [email protected]
5 Pattern Recognition and Machine Learning Lab, Gachon University, 1342 Seongnamdaero, Sujeanggu,
Seongnam 13120, Korea
* Correspondence: [email protected] (M.H.); [email protected] (R.A.N.)
† Amir Masoud Rahmani and Amir Haider have contributed equally to this work.
Abstract: Today, artificial intelligence (AI) and machine learning (ML) have dramatically advanced
in various industries, especially medicine. AI describes computational programs that mimic and
simulate human intelligence, for example, a person’s behavior in solving problems or his ability
for learning. Furthermore, ML is a subset of artificial intelligence. It extracts patterns from raw
Citation: Rahmani, A.M.;
data automatically. The purpose of this paper is to help researchers gain a proper understanding
Yousefpoor, E.; Yousefpoor, M.S.;
of machine learning and its applications in healthcare. In this paper, we first present a classifi-
Mehmood, Z.; Haider, A.;
Hosseinzadeh, M.; Ali Naqvi, R.
cation of machine learning-based schemes in healthcare. According to our proposed taxonomy,
Machine Learning (ML) in Medicine: machine learning-based schemes in healthcare are categorized based on data pre-processing meth-
Review, Applications, and Challenges. ods (data cleaning methods, data reduction methods), learning methods (unsupervised learning,
Mathematics 2021, 9, 2970. https:// supervised learning, semi-supervised learning, and reinforcement learning), evaluation methods
doi.org/10.3390/math9222970 (simulation-based evaluation and practical implementation-based evaluation in real environment)
and applications (diagnosis, treatment). According to our proposed classification, we review some
Academic Editors: Bo-Hao Chen and studies presented in machine learning applications for healthcare. We believe that this review paper
Amir Mosavi helps researchers to familiarize themselves with the newest research on ML applications in medicine,
recognize their challenges and limitations in this area, and identify future research directions.
Received: 3 October 2021
Accepted: 17 November 2021
Keywords: artificial intelligence (AI); machine learning (ML); diagnosis; treatment; medicine
Published: 21 November 2021
to be explicitly programmed [9,10]. In other words, the learning model learns based on
samples, whereas explicit programming follows rules or a limited hypothesis [11,12]. ML
improves efficiency and reliability and reduces costs in computational processes. Moreover,
it can accurately and rapidly generate models through data analysis. Machine learning
presents tools that can process a large amount of data, the volume of which is far beyond
human understanding. For example, health data may include demographic data, images,
laboratory results, genomic data, medical records, and data obtained from sensors. Various
platforms are used to generate or collect these data samples; for example network servers,
electronic health record (EHR), genomic data, personal computers, smartphones, mobile
applications, sensors [13,14] and wearable devices [15,16]. Figure 1 represents various data
generation resources in healthcare.
Papers Description
Miotto et al. [27] first introduced the deep learning framework in summary and expressed its superiority compared to
traditional learning methods. Then, they examined some research studies related to the use of deep learning in the
[27]
healthcare field, specifically its applications in medical imaging, electronic health records, genomics, and mobile apps.
They also present challenges and opportunities for deep learning in healthcare.
Alafif et al. [28] reviewed ML applications on COVID-19 diagnosis and treatment. In this paper, they presented new
ML-based methods to diagnose and treat COVID-19. Moreover, they introduced tools and available datasets in this area.
In [28], the authors presented some challenges and future research directions in this area. The authors of [28] stated that
[28] machine learning can be used for diagnosis, treatment recommendations for controlling disease, drug production,
and vaccines. They categorized ML-based methods into two classes: (1) Diagnostic methods that include medical image
analysis, non-invasive measurements, and sound analysis; (2) Treatment-based methods include drug development and
vaccine development. For more details, please refer to [28].
Tayarani [29] provided various applications of artificial intelligence-based methods for diagnosis, treatment, monitoring
patients, detecting disease severity, digital image processing, drug production, and checking out the outbreak of
COVID-19 disease. The proposed classification in this paper includes five Sections (1) Clinical applications of machine
learning-based techniques for diagnosis, treatment, and monitoring patients COVID-19; (2) ML applications for chest
[29] image processing; (3) Machine learning-based methods for studying Coronavirus and its specifications; (4) Machine
learning-based schemes for modeling the COVID-19 outbreak, including epidemic prediction, monitoring pandemic,
controlling and managing pandemic; (5) Investigating the dataset available in this area. In this paper, the authors
attempted to cover all research works provided in this area. This review paper helps researchers to better manage
this disease.
Smiti [30] examined the main concepts of machine learning in healthcare. In this paper, in the first step, the healthcare
process and its various phases are described in summary. According to [30], the healthcare process has four parts:
prevention, detection, diagnosis, and treatment. Then, the machine learning process is briefly explained and various
[30] machine learning algorithms, including supervised learning, unsupervised learning, semi-supervised learning
and reinforcement learning, are presented. Then, the author investigated the ML applications for identifying diseases,
producing drugs, doing robot-assisted surgery, and analyzing medical data. In this article, the author specifically focuses
on medical data analysis and its challenges in this area.
Shouval et al. [31] provided various tools for physicians and researchers to achieve a better understanding of machine
learning and its applications to hematology. In this regard, they presented some guidelines for designing machine
learning-based methods and studied a number of machine learning applications in hematology. Then, the authors
introduced types of learning methods, including supervised learning, unsupervised learning and reinforcement learning.
[31]
In this paper, the authors presented a standard framework for designing machine learning-based models. This
framework includes six steps: problem understanding, data understanding, data preparation, data modeling, evaluation,
and deployment. Finally, they expressed challenges to and restrictions on machine learning in the medical field and
specifically hematology.
Olsen et al. [32] examined the machine learning algorithms and their applications to heart failure. For this purpose,
the authors briefly introduced machine learning and its applications in healthcare. They also presented some important
points when designing machine learning-based models. In this paper, machine learning-based methods are divided into
[32]
three categories based on the learning model: supervised learning, unsupervised learning, and deep learning. Then,
machine learning methods are divided into three main categories based on application: diagnosis, classification,
and heart failure prediction. Finally, the authors presented challenges and obstacles of machine learning in medicine.
Mathematics 2021, 9, 2970 5 of 52
Table 2. Comparison between our review paper and other review papers.
We believe that this review paper helps AI researchers to familiarize themselves with
the latest research on ML-based approaches in healthcare, recognize the challenges and
limitations in this area, and become aware of future research directions. In this review
paper, we focus on a number of papers related to machine learning in healthcare published
in 2017–2021. We also reviewed and studied various review papers, book captures, research
papers, conference papers from different publications such as Springer, Elsevier, IEEE,
Wiley, Taylor & Francis, Nature, ACM, and MDPI. Because the number of papers published
in the healthcare field is very high, we do not study all of them in the limited volume
of this review paper. As a result, we have selected the papers that have recently been
published in the healthcare field, provide a more detailed evaluation, and use a larger
dataset among papers with the same concept. Then, we remove other papers. We use
Google Scholar to find these papers and search various phrases such as “Machine learning”,
“Artificial intelligence in medicine”, “Machine learning applications in medicine”, “Intelligent
medicine”, “Supervised learning in healthcare”, “Unsupervised learning in healthcare”, “Semi-
supervised learning in healthcare”, “Reinforcement learning in healthcare”, “Deep learning”,
and “Future hospitals”.
In the following, the paper is organized as follows: in Section 2, machine learning and
its applications in healthcare are expressed. In Section 3, we present the general framework
for designing a learning model in the medical field. In Section 4, our proposed classification
is introduced. In Section 5, we study some ML-based methods in healthcare in accordance
with the classification provided in this paper. In Section 6, we summarize discussions about
the ML-based methods examined in this paper. In Section 7, we describe some challenges
and restrictions on the use of machine learning in medicine briefly. Finally, the conclusion
of the paper is presented in Section 8.
2. Machine Learning
Empowering machines for learning like humans is similar to a dream because ma-
chines are not inherently intelligent [16,18]. There are some differences between humans
and machines when performing their works, one of these differences is intelligence. This
means that humans can learn from their previous experiences, but machines do not have
this ability. In fact, they must be programmed to follow certain instructions [25,33]. Today,
machine learning allows computers to learn from experiences. In the past, traditional
computational algorithms included a set of programmed instructions explicitly, which is
called “Hard coded”. Computers used these instructions to solve a problem, while today,
machine learning helps computers to learn decision-making rules, so that there is no need
for programmers who manually develop these rules [34,35]. This is called “Soft coded”.
Machine learning is a subset of artificial intelligence (AI). ML-based machines are more
intelligent and do not need human intervention. In fact, the term “smart machine” is a
symbol [36]. It refers to machine learning and its goals. In 1995, Allan Turing expresses the
question for the first time: “Can a machine think?”. He introduced a test called the “Turing
Test”. This test evaluates a machine based on intelligence [37,38]. Today, there are various
Mathematics 2021, 9, 2970 6 of 52
definitions of machine learning. For example, Arthur Samuel defines machine learning
as “a study field that allows computers to learn without explicit programming” [39]. Ethem
Alpaydin also defines machine learning as “an area for programming computers based on data
samples or experience to improve a performance criterion” [40]. In the phrase “machine learning”,
“learning” represents the search process in the possible representation space to create the
best representation based on available data [41,42]. Furthermore, “machine” refers to an
algorithm that performs search operations. This algorithm is a combination of mathematics
and logic [41,42]. In general, the purpose of machine learning is to answer the question:
“How can a computer program be made using historical data to solve a problem and automatically
improve the performance of the program using experience?” [43,44]. In fact, machine learning
is a technology for designing computational algorithms that imitate human intelligence
and learn from the surrounding environment. In machine learning, a system is made and
trained using a large amount of data (millions of data samples) to manage very complicated
tasks. The purpose of this model is to decide, predict or perform tasks without explicit
programming. When this model takes inputs, it must be able to produce the desired output.
Sometimes, humans can easily understand this model. However, in some cases, it is similar
to a black box. This means that humans cannot easily understand this model. In fact, this
model approximates the process, which must be imitated by a machine [20,45].
ML Applications in Healthcare
Machine learning has many applications in healthcare. It can facilitate time-consuming
and complex tasks in this area. Today, the rapid and significant progress in machine
learning (ML), designing faster processors, and accessing digital health data have created
opportunities to improve the healthcare process. These new technologies reduce costs,
accelerate proper drug discovery, and improve the therapeutic results. Today, machine
learning is attracting investors and the main players in the healthcare field [46]. In general,
ML applications in the medical area can be divided into three categories:
• First Category-Improving Available Medical Structures: These applications are the
simplest ML applications in the medical domain. They improve the performance of ex-
isting structures [47,48]. These ML-based technologies define specific and rule-based
tasks for common applications such as simulation and data confirmation. Classifying
digital medical images in healthcare services is one of these machine learning applica-
tions. It improves the accuracy of traditional image processing techniques. Machine
learning can also be used to analyze radiological images to predict whether there is
a particular disease or not. Moreover, ML can be used to evaluate retinal images to
determine whether patients are subject to visual threats or not. For example, Aindra
is a medical company based on artificial intelligence and machine learning. It uses an
ML-based platform to classify medical images. Its purpose is to diagnose cancers in a
more accurate and faster manner.
• Second Category-Upgrading Medical Structures: In this category, machine learning
applications provide structures with new abilities. They move towards personaliza-
tion. Precision medicine is one of these ML applications [8,49]. It is a kind of medical
treatment that targets the specific needs of a person based on her or his character-
istics (for example, the genetic arrangement of the person). For example, iCarbonx
is moving towards personalized healthcare services. For this, it uses large datasets,
biotechnology, and artificial intelligence.
• Third Category-Independent Medical Structures: This category of ML applications is
expanding recently. They create the ML-based models to perform their actions inde-
pendently based on pre-defined goals [11]. For example, one of the future applications
in the healthcare field is to build a hospital without physicians [37,38]. As a result, we
must prepare ourselves for a robotic future based on machine learning and artificial
intelligence. Therefore, we must plan the role of robots in future hospitals. In the
near future, robots will carry out all healthcare processes from diagnosis to surgery.
Today, in developed countries such as China, Korea, and the United States, robots help
Mathematics 2021, 9, 2970 7 of 52
surgeons to do surgery in the operating room [50,51]. However, this new technology
has some weaknesses and imperfections, but it is rapidly advancing and should still
be developed. For example, the Mayo Clinic is moving towards a hospital without
doctors. Currently, they design its components. However, these components should
be sufficiently tested in terms of various standards. Today, surgeons use robots to
improve the surgical process [52,53].
Problem Definition. When designing a learning model in the healthcare field, we must
first answer the question: “What is the purpose of designing this learning model?” To design a
useful model, the first step is to identify problems and challenges in the healthcare field.
Researchers should also analyze exactly how to improve medical services using machine
learning. In addition, they should examine the existing solutions presented in this area
so far [31]. In the first phase, a key point is to review data availability. This means that
researchers should be aware of existing data sources because data should be sufficiently
available for developing the learning model and evaluating this model. In the healthcare
field, the lack of data can be due to a lack of digital data, patient privacy, commercial issues,
or rare diseases.
Database. When designing a learning model in the healthcare field, datasets are used
for training, validating, and testing. Healthcare datasets may include demographic infor-
mation, images, laboratory results, genomic data, and data obtained from sensors [54,55].
Various platforms are used to produce or collect these data, for example network servers,
e-health records, genome data, personal computers, smartphones, mobile applications,
and wearable devices [56,57]. Today, the Internet and cloud-based technology could
improve global connections [58,59]. As a result, data availability has become easier. Be-
fore developing a learning model in the healthcare field, it is necessary to design the
appropriate mechanism for evaluating the learning model because it is not enough for
machine learning for the designer to claim that its learning model has a high performance
and is very desirable. ML-based models are data-centric. Therefore, they may be faced with
a problem called overfitting or underfitting [60,61]. An efficient learning model should
make a tradeoff between overfitting and underfitting. This means that it must have an
appropriate bias and proper variance. Underfitting occurs when we design a very simple
learning model relative to the complexity of the problem and the size of the dataset. This
learning model has a weak performance on both training sets and testing sets. This means
that it has a lot of bias. On the other hand, overfitting also occurs when the learning model
is very complex and includes large parameters relative to the complexity of the problem
and the size of the dataset. In this case, this model has a good performance on the training
Mathematics 2021, 9, 2970 8 of 52
dataset, whereas it has a weak performance for the testing set. In this case, it has a high
variance. In general, a proper learning model should have low bias and low variance.
Figure 4 describes the overfitting and underfitting problems.
In order to prevent overfitting, a common solution is that the dataset is divided into
two parts: training set and testing set. The “training set” indicates a dataset used for training
the learning model and adjusting its parameters. The “testing set” also indicates a dataset
used for evaluating the performance of the learning model. Usually, the training set is
larger than the testing set, for example, the ratio of 70 to 30. One solution for selecting the
training set and the testing set is to randomly divide the dataset into two parts. Another
important point is that, sometimes, the dataset is small. Therefore, it is not possible to
assign a part of the dataset only for testing. In this case, the K-Fold Cross-Validation
technique is used [62,63]. In this technique, the dataset is divided into k sections. Then,
a section is used for testing and k − 1 sections are used for training. This process is repeated
k times so that, in each step, a new section is used for testing. Then, we must evaluate the
performance of this learning model in each step. Finally, the overall performance of the
learning model is equal to the average performance in k steps. K-Fold Cross-Validation is
shown in Figure 5.
Mathematics 2021, 9, 2970 9 of 52
Data Pre-Processing. When designing a learning model in the healthcare field, one
of the most challenging issues is data preprocessing because a machine learning model
requires high-quality data to achieve a higher quality in the training process and a more
suitable performance in terms of accuracy. In general, data pre-processing is a process for
investigating noisy data, missing values, duplicate data, and contradictory data. The pur-
pose of this process is to increase the quality of the database before creating the learning
model. Therefore, in data pre-processing, we may need to filter outliers or estimate missing
values. If data also have high dimensions, some data reduction methods, such as feature
selection [64,65] or feature extraction [66], can be used. Feature selection selects the best
subset of features. On the other hand, feature extraction finds a new dataset with lower
dimensions based on the initial data set.
ML Model Development. When designing a learning model in the healthcare field,
we must consider the database size, type of learning scheme, and model inference time.
We determine the complexity of a learning model based on the database size to avoid
overfitting or underfitting. Considering the training time of a learning model is very
important. However, learning models with more parameters can produce more accurate
results. However, in this case, these models perform more computational operations and
need a longer time for training. As a result, they cannot be used for real-time applica-
tions. Therefore, lightweight architectures are more appropriate for designing a leaning
model. Considering the type of learning scheme is also very important when devel-
oping ML models [67,68]. In general, there are four main learning methods, including
supervised learning, unsupervised learning, semi-supervised learning, and reinforcement
learning [69,70]. We describe these techniques more accurately in Section 4.
Evaluation. Evaluating a machine learning-based system means executing various op-
erations to detect differences between the current behavior of the system and the expected
behavior [71]. After designing a learning model in healthcare, the necessary evaluations
should be performed to determine an answer to the question, “Does this model have the
deployment conditions in real environments?” In the evaluation process, designers use various
scales to examine the performance of the learning model. This evaluation determines its
strengths and weaknesses. In addition, after deploying the learning model in real environ-
ments, we must re-examine the performance of the learning model to evaluate its behavior
when interacting with real users [72,73]. Different evaluation aspects of a machine learning
system include: evaluating the data used to build the final learning model, evaluating the
learning algorithms used to design the final model, and evaluating the performance of the
final model. In the following, we explain these aspects more precisely:
• Evaluating the data used to build the final learning model: The performance of
learning models depends highly on data. Any error in the data can negatively affect
the final model and weaken its performance. In the data evaluation process, it is
necessary to answer different questions. For example, are there enough data to train
and test the model? Can the existing data be considered representative of all real data
for a specific area? Is available data balanced? Is there any hostile or false information
in data?
• Evaluating the learning algorithms used to design the final model: At this step, learn-
ing algorithms used for creating the final learning model must be carefully evaluated
Mathematics 2021, 9, 2970 10 of 52
Accuracy: This scale is very important. Usually, classifiers are evaluated based
on this scale. It is defined as the percentage of samples, which have truly been
classified by the classifier. It is calculated as follows:
TP + TN
Accuracy = . (5)
TP + TN + FP + FN
Matthews correlation coefficient (MCC): It is defined as the correlation coeffi-
cient between the predicted result and the corresponding ground truth. It has
a value between +1 and −1. If MCC = +1, then, this means that the classifier
predicts the result truly. If MCC = 0, then, this means that the classifier can-
not predict the result better than a random manner. If MCC = −1, then, this
means that there is a full contradiction between the predicted result and the
corresponding ground truth. The MCC scale is calculated as follows:
TP · TN − FP · FN
MCC = p . (6)
( TP + FP) · ( TP + FN ) · ( TN + FP) · ( TN + FN )
False discovery rate (FDR): This scale evaluates the ratio of samples that are
falsely predicted as positive, to all samples, which are classified as positive.
The FDR scale is calculated as follows:
FP
FDR = . (7)
FP + TP
AU-ROC: This scale is also another important criterion, which is used for evalu-
ating classifiers. It is calculated based on the area under the receiver operating
characteristic (ROC) curve. Note that ROC has been drawn based on TPR and
FPR. This scale is calculated as follows:
1 TP TN
AU − ROC = + . (8)
2 TP + FN TN + FP
F1-Score: This scale combines two scales, including precision and sensitivity. It
is defined as their weighted average. When F1 − Score = 1, it is the best value.
In contrast, when F1 − Score = 0, it is considered as the worst value. This scale is
calculated as follows:
Precision × Recall
F1 − Score = 2 × . (9)
Precision + Recall
Receiver operating characteristic (ROC) curve: This curve is a method for draw-
ing, organizing and selecting classifiers based on their performance. ROC is a
two-dimensional graph. Its vertical axis represents sensitivity and its horizontal
axis indicates specificity. A new scale is defined based on ROC called the area
under ROC (AUC), which is used for comparing the performance of classifiers.
It has a value between 0.5 and one. If AUC is close to 0.5, the classifier has a
weak performance.
Note that other evaluation criteria can also be used based on applications [74,75].
For example, ML techniques can be used in applications to automatize tasks
such as medical image segmentation. In this case, other scales, such as the Dice
coefficient and Jaccard index, can be used to evaluate machine learning models.
For more details, refer to [76].
– Model Relevance: This parameter is used to evaluate mismatches between model
and data. This refers to overfitting and underfitting. If the available data are
not enough, it causes a non-match between the data and the model. The useful
solution for solving this issue is cross-validation. However, we do not exactly
Mathematics 2021, 9, 2970 12 of 52
know how much overfitting is allowable for the learning model. Suitable methods
have been presented in [77,78], for detecting overfitting.
– Efficiency: It represents the prediction speed and the learning speed in a learning
model. The efficiency problem occurs when the machine learning-based system
conducts the learning or prediction processes very slowly. As a result, ML
designers should consider the runtime of learning algorithms.
– Interpretability: Sometimes, learning models are used to decide on medical
treatment. As a result, humans must understand the logic and reason behind
the decisions taken by these models to trust their decisions so that the final
models are socially acceptable. However, it is difficult to define interpretability in
terms of mathematics. To understand the interpretability of the ML model, refer
to [79]. According to [80], interpretability means the user’s understanding of the
decisions taken by ML. Various solutions have also been presented in [81–84] to
evaluate the interpretability of a machine learning-based system.
• Data cleaning methods: Some ML-based methods presented in healthcare use data
cleaning methods to eliminate contradictions, such as missing data or noisy data,
because such problems are common in the health datasets. These problems have
several reasons: (1) Data collection devices are not accurate in the healthcare field.
As a result, some data may miss due to the hardware constraints of these devices or
some data may be mistakenly recorded; (2) Some data samples are manually produced
by physicians or treatment staff. Therefore, they may incorrectly be recorded due to
human errors; (3) Some patients inadvertently or deliberately do not express proper
information about their illness. This causes errors when recording data. In general,
there are several data cleaning methods, including missing value management, noisy
data management, and data normalization [18,20].
– Missing value management: There are two main approaches for managing the
missing values in the healthcare field: (1) Removing the data with missing values.
Note that if the number of the data with missing values is very high in the dataset,
then this approach is not practical; (2) Estimating missing values. Note that if the
method used for estimating the missing values is not accurate, then it reduces
the accuracy of the learning model.
– Noisy data management: Filtering methods are used to remove noise in health
datasets. This improves the accuracy of the learning model. However, the detec-
tion of noisy data is not easy. A solution is to examine the database by profession-
als and physicians to improve its quality. This causes more accurate modeling
and reduces its error. However, this work is costly and time-consuming.
– Data normalization: Usually, health data are expressed in different scales, for ex-
ample (age, gender, etc.). We cannot compare these data samples with each other.
To solve this problem, a suitable solution is to use the data normalization methods
such as the Min–Max method to put data in the range [0, 1].
• Data reduction methods: Often, health data has high dimensions. This weakens the
performance of machine learning algorithms because it reduces the quality of the
training process and the accuracy of the learning model. Dimensionality reduction
means that health data are presented in a compressed form. As a result, this process
causes the loss of some information. An appropriate dimensionality reduction scheme
in the healthcare field should maintain useful features. Data reduction methods are
divided into two main categories: feature selection and feature extraction.
Mathematics 2021, 9, 2970 14 of 52
– Feature selection: In this process, a subset of features is selected from the health
database to be used in the learning process. The feature selection process is
done automatically or semi-automatically [64,65]. Decision-making to remove or
maintain a feature is based on the desired application. In general, we categorize
feature selection methods into three groups:
* Wrapper methods: In these methods, we consider the ML-based model as
a black box. Then, we feed this model with different subsets of features.
Next, we evaluate its performance for each subset to determine its efficiency.
Finally, the best subset of the features is selected. There are two suitable
wrapper approaches, including forward selection and backward selection.
In the forward selection process, we first consider an empty sub-set. Then,
we select a feature of the health database and insert it into the subset. Next,
we evaluate the performance of the ML-based model. If it reduces the system
error compared to other features, it is added to the final subset. This process
continues until the error rate decreases. The backward selection methods are
similar to the forward selection approaches. However, there is a difference.
In these schemes, we first consider a subset including all features. Then,
we select a feature of this subset in each step and remove this feature from
the subset. This process continues until the error rate of the learning model
decreases [64,65].
* Embedded methods: In these methods, the feature selection process is a
component of the learning model. For more details, please refer to [64,65];
* Filtering methods: These methods are considered as an independent part of
the learning model. In these methods, a prioritization test is performed on
each feature of the database, so these features are ranked based on a specific
criterion. Then, the user chooses the superior features [64,65];
– Feature extraction: These methods are used for compressing health data that
have high dimensions [66]. This maintains the main features of the database
and removes its noise and correlations. This will accelerate the learning process
and produce more accurate results. For example, we introduce some of the most
important feature extraction schemes;
* Principal components analysis (PCA): PCA is a multivariate and unsuper-
vised technique [18,66]. PCA is tasked to analyze the data for extracting
useful information. Then, it displays this information as a set of new orthog-
onal variables. They are called the principal components;
* Linear discriminant analysis (LDA): It is a supervised learning method [18,
66]. Its purpose is to find a linear combination of features, which can be di-
vided into two or more classes. This method tries to maximize the separation
between classes and accurately generate linear discriminant functions;
* Singular value decomposition (SVD): It is an unsupervised learning tech-
nique [18,66]. It is almost close to PCA. In fact, SVD is a generalized version
of PCA. It is considered a matrix factorization method and is an efficient
scheme for reducing data dimensions. SVD gives an optimal approximation
representation of the initial matrix using a low-rank matrix.
Figure 8. Different learning schemes: (a) Unsupervised learning, (b) Supervised learning, (c) Semi-
supervised learning, (d) Reinforcement learning.
• Supervised learning: In this learning scheme, there is a set of inputs and outputs
(labeled databases) [33]. The purpose of this learning technique is to discover the
relationship between inputs and outputs in the training process [85,86]. This algorithm
produces a function that maps data to labels. Then, it is used to predict the label of
unlabeled data. Supervised learning is used when there are outputs (labels) for the
training set. In the following, we introduce the most important supervised learning
schemes. We express their advantages and disadvantages in Table 4.
Mathematics 2021, 9, 2970 16 of 52
Difficult understanding by
humans, difficult implementation,
Ability to manage noisy data, high medium accuracy, low learning
Random forest (RF) classification speed, suitable for speed, low ability to manage
large and heterogeneous databases missing values, low ability to
manage overfitting, low ability to
manage data with high correlation
• Unsupervised learning: In this technique, the dataset includes data samples whose
relevant output is not clear [16,33]. This means that data are unlabeled. This learning
scheme tries to discover the data patterns and relationships in the data. In unsuper-
vised learning, data are compared based on a similarity scale to be categorized in
groups. In the following, we introduce some unsupervised learning methods. We also
express their advantages and disadvantages in Table 5.
– K-means clustering: It is a simple clustering method. The purpose of K-means
is to group n data samples to k clusters, so that each cluster is known based on
its center. This method is an iteration-based technique [97]. Initially, k random
cluster centers is considered and all data points are linked to the closest cluster
center. When clusters are established, so that all the data points in the database
belong to one of the clusters, a new center will be re-calculated in each cluster.
This means that cluster centers are updated in each iteration. This algorithm is
repeated until any cluster center does not change.
– Hierarchical clustering: This clustering scheme aims to group data points to
clusters, so that cluster members (data points in a cluster) have the highest
similarity to each other compared to data points in other clusters [97]. This
process is carried out based on two techniques: top-to-down (Divisive clustering)
and bottom-to-up (agglomerative clustering). In the divisive clustering, all data
points are first placed in one group. Then, this group is divided into smaller
groups. This process continues until each sample is placed in one group. In the
agglomerative clustering, each sample is first placed in a cluster. Then, similar
groups are merged to establish larger groups. This process continues until all data
points are placed in one group. In the hierarchical clustering method, we need
no previous information about the number of clusters. This scheme is simply
implemented.
– Fuzzy-c-means (FCM): It is a clustering method based on fuzzy logic. In this
method, each sample can be in one or more clusters [97]. FCM determines
clusters based on different similarity scales such as distance. Note that one
or more similarity scales may be used in the clustering process and this issue
depends on application or the dataset. The clustering process is repeated to find
best cluster centers. Similar to the K-means clustering method, FCM must be
aware of the number of clusters.
Mathematics 2021, 9, 2970 19 of 52
other. However, there is one main difference between SARSA and Q-Learning.
SARSA is an on-policy method. In contrast, Q-Learning is an off-policy method.
On-policy means that SARSA follows existing policies to select actions. Then, it
updates Q-Value in the Q-Table. Whereas, an Off-policy scheme, like Q-Learning,
does not follow the existing policy. It chooses actions using a greedy manner to
maximize the Q-Value in the Q-Table [12,24].
– Deep reinforcement learning (DRL): It is a combination of deep learning and
reinforcement learning. This scheme can be used to solve many complex issues.
It helps the agents to become more intelligent. This improves their ability to opti-
mize the policy. Reinforcement learning is a machine learning technique, which
can operate without any database. Therefore, in DRL, agents can first produce
the dataset through interaction with the environment. Then, this database is used
to train deep networks in DRL [12,24].
scales analyze the expected profits or losses. For example, if the death risk caused
by surgery is more than the death risk without surgery, the surgeon may not
perform this surgery and abandon it.
• Practical implementation-based evaluation: It is very important to evaluate ML-based
models in healthcare using their practical implementation because it allows us to
evaluate and analyze learning models in real environments. However, it is very
costly because we usually deal with hardware complexities for designing learning
models. Repeating scenarios and performing various experiments is also very difficult.
In practical implementation, we must evaluate the learning model in a real-time
manner and continuously update this model and re-validate it. Some important
scales during the practical implementation of learning models in healthcare include
their generalizability for new data, user feedback, medical community trust to the
designed model, comparing model performance with an expert in the relevant area,
and comparing model performance with other existing models.
4.4. Applications
In our proposed classification, ML-based methods in healthcare are divided into two
main categories based on application: diagnosis and treatment.
• Diagnosis: It is a very important stage in the medical field. Machine learning can
be used in this area to help physicians and detect the disease in the early stages,
and reduce the detection time. For example, machine learning can be used for im-
proving medical images, analyzing laboratory results, segmenting and identifying
elements in images, detecting disease, identifying the degree of disease, analyzing
signals of devices such as electrocardiography (ECG) for detecting heart failure or
electroencephalography (EEG) for evaluating brain activity.
• Treatment: Some ML-based methods can help with the treatment of diseases. For ex-
ample, machine learning can be used to diagnose suitable doses, personalized therapy,
monitoring the treatment procedure, and predicting the progression of the disease.
These methods reduce treatment costs, reduce costs related to drug production, im-
prove the treatment procedure, save time to discover appropriate drugs, and solve
problems caused by the lack of specialist physicians. Machine learning can also cover
the surgical operation to facilitate difficult surgeries with high complexity that are
hardly done by humans.
model is described in detail. Tables 8 and 9 present the most important characteristics of
this ML-based model and its weaknesses and strengths, respectively.
Problem definition. Chronic kidney disease (CKD) is a serious disease, which can
threaten general health. ML-based methods can help us to timely and accurately diagnose
this disease. In the real world, most medical datasets have many missing values. In [101],
the authors believe that existing CKD diagnosis methods have low accuracy, or they used
a constrained and weak technique to estimate the missing values. Therefore, the authors
of [101] provided an ML-based model for CKD diagnosis. The purpose of this learning
method is to increase accuracy and improve its application.
Dataset. In [101], the CKD database available in University of California Irvine (UCI)
machine learning repository is used. In this database, there are 400 data points. These
data points have 24 features, including 11 numerical features and 13 nominal features.
Moreover, there are two final labels, including CKD (In this dataset, there are 250 CKD
patients) and NOTCKD (In this dataset, there are 150 data points, which are known as
NOTCKD). Note that this dataset is relatively small, and this issue limits the performance
of this method in terms of generalizability.
Data pre-processing method. In [101], the KNN Imputation method is applied for
estimating the missing values in the database. This method selects k data points without
missing values. This data points must be closest to the missing values. Similarity scale
is Euclidean distance. Here, there are two cases. One case is that the missing value is
a numerical variable. In this case, the missing value is estimated based on the median
of k data points. Second case is that the missing value is a nominal variable. In this
case, it is obtained based on the majority voting. In addition, this learning model uses a
feature selection method based on the optimal subset regression and RF to select the most
beneficial features.
ML model development. In [101], a supervised learning scheme is used for predicting
CKD disease. In the classification process, various classifiers are examined. The purpose
is that classifiers with the best performance are selected for designing the final model.
These learning models include: (1) Logistic regression (LOG); (2) Random forest (RF);
(3) Support vector machine (SVM); (4) K nearest neighbor (KNN); (5) Naïve Bayes (NB);
(6) Feed forward neural network (FNN). Then, they evaluate performance of different
models based on several parameters such as accuracy, number of misjudgments, runtime,
and among others. Finally, RF and LOG are selected to build the final integration model.
Evaluation. This method uses a simulation-based evaluation. For this, the authors used
R 3.5.2 software for simulating the CKD prediction model. To evaluate the learning model,
4-Fold-Cross-Validation method is used. Finally, this learning model has been evaluated
according to various criteria such as accuracy, sensitivity, specificity, and F1 Score.
5.2. FCMIM-SVM
Li et al. [102] provided an ML-based system for detecting the heart failure disease.
They proposed a feature selection method called FCMIM. In addition, the authors ex-
amined different learning techniques, such as artificial neural networks (ANN), support
vector machine (SVM), decision tree (DT), Naïve Bayes (NB), K nearest neighbor (KNN),
and Logistic regression (LR), for developing the final learning model. Finally, they created
the final learning system called FCMIM-SVM. In the following, we describe this ML-based
method in detail. Tables 8 and 9 summarize the most important characteristics of this
ML-based method and its weaknesses and strengths, respectively.
Problem definition. Heart disease is known to be a serious disease. It can threaten
the lives of many people in the world. Traditional methods for detecting this disease
are time-consuming, expensive, and inefficient. Therefore, ML-based methods can be
very effective because they can detect heart disease using a fast, accurate, and low-cost
scheme. In addition, the performance of an ML-based scheme can be improved when a
balanced database and an efficient feature selection scheme are used. Regarding the issues
Mathematics 2021, 9, 2970 23 of 52
mentioned, the authors of [102] have provided an ML-based method and a feature selection
approach to detect heart disease rapidly and accurately.
Dataset. FCMIM-SVM uses a heart disease dataset related to Cleveland. This dataset
includes 303 data points. Each data point also has 75 features. There are six data points
with missing values. In the pre-processing process, these data points have been removed.
Furthermore, there are two classes for the final label: HD or Not-HD.
Data pre-processing method. FCMIM-SVM applies different data pre-processing tech-
niques. For example, it removes data points with missing values from the dataset. It
also performs some normalization operations such as Standard Scalar (SS) and Min–Max
Scalar on the dataset. Furthermore, FCMIM-SVM designs a feature selection method called
FCMIM for reducing dimensionality. Additionally, various feature selection algorithms,
such as Relief [103], mRMR [104], LASSO [105] and LLBFS [106], are reviewed.
ML model development. In [102], the authors have first assessed different classifiers like
ANN, SVM, DT, NB, KNN, and LR to select the appropriate classifiers for developing the
final learning model. Finally, the SVM classifier has been selected by the authors because
it has the highest accuracy (i.e., Accuracy = 92.37%). Therefore, the final learning model,
called FCMIM-SVM, has been created.
Evaluation. FCMIM-SVM has been evaluated using a simulation-based scheme. This
scheme is simulated in Python software. This method also uses the Leave-one-subject-out
cross-validation (LOSO) as the evaluation technique. In the evaluation process, the per-
formance of FCMIM is compared with several feature selection approaches. According to
the experimental results, the authors believe that FCMIM has a good performance. Then,
FCMIM-SVM is evaluated based on various scales such as accuracy, specificity, sensitivity,
MCC, and processing time.
Scheme Target Data Preprocessing Technique Learning Model Evaluation Criteria Simulator
KNN imputation for estimating
Diagnosing CKD missing values and optimal Integrating LOG Accuracy: 99.83%
[101] Sensitivity: 99.84% R (version 3.5.2)
disease subset regression and RF for and RF
Specificity: 99.80%
selecting useful features
F1 Score: 99.86%
Table 9. The most important strengths and weaknesses of supervised learning-based models.
5.3. CWV-BANN-SVM
Abdar and Makarenkov [107] offered an expert system for detecting breast cancer.
This method uses an ensemble learning technique based on support vector machine and
artificial neural network. In this method, the optimal parameters of SVM are determined
via different experiments. This ensemble system includes two SVMs, multi-layer percep-
tron (MLP), and radial basis function (RBF) neural network. The performance of neural
networks is also improved using boosting technique. In the following, we describe this
learning model exactly. In addition, Tables 8 and 9 express the main characteristics of the
CWV-BANN-SVM method and its advantages and disadvantages, respectively.
Problem definition. Breast cancer is the most common cancer in the world. This disease
requires high costs for treatment. Therefore, ML-based solutions can reduce these costs
and increase the accuracy of diagnosis. In general, learning methods reduce the diagnosis
time and increase its accuracy. As a result, in [107], an ensemble learning method has been
developed to timely and accurately diagnose breast cancer.
Database. In [107], the authors used the Wisconsin breast cancer dataset (WBCD).
WBCD has 699 data points. There are two labels for output result, including benign and
malignant. Each data point has 10 features. There are 452 data points belonging to the
benign class and there are 241 data points belonging to the malignant class.
Data pre-processing method. In the dataset, there are 16 data points with missing values
that are removed in the data pre-processing process.
ML model development. To develop the learning model, first, the authors tested a simple
SVM with different parameters to find its most appropriate parameters. These parameters
include regularization parameter (C), gamma parameter (γ), and e. The authors believe that
this improves the accuracy of the learning model and prevents overfitting. For designing
the final learning model, the authors performed four main steps. First, they tested six classi-
fiers: simple SVM, polynomial SVM, simple MLP, simple RBF, boosting MLP, boosting RBF.
According to the experimental results, the authors selected two polynomial SVMs, boosting
Mathematics 2021, 9, 2970 25 of 52
MLP, and boosting RBF to design the final ensemble model. They also applied SVM-CPG to
determine the importance of each feature in the database for detecting breast cancer. In the
second step, a data pre-processing process is performed for removing data with missing values.
In the third step, the selected classifiers are re-evaluated on the modified database. In the final
step, the authors created an ensemble classifier using two SVMs, boosting MLP, and boosting
RBF. This ensemble system uses the confidence-weighted Voting (CWV) technique.
Evaluation. The CWV-BANN-SVM method uses a simulation-based evaluation. This
scheme is simulated in IBM SPSS Modeler 14.2 software. The dataset is divided in two
parts so that 50% is used for training and 50% is applied for testing. In the evaluation
process, various criteria such as accuracy, sensitivity, specificity, precision, FPR, FNR, F1
Score, AUC, and Gini Index are considered.
5.5. HMANN
Ma et al. [109] suggested an improved neural network called HMANN. This scheme
is used for detecting, segmenting, and identifying chronic renal failure. HMANN is imple-
Mathematics 2021, 9, 2970 26 of 52
mented on the Internet of Medical Things (IoMT) platform. This method combines support
vector machine (SVM), multi-layer perceptron (MLP), and backpropagation algorithm
(BP). In the following, we explain HMANN in detail. Moreover, Table 8 provides the most
important characteristics of HMANN and Table 9 expresses its weaknesses and strengths.
Problem definition. When kidneys do not work well, this issue can threaten human
life. Therefore, it is very important to timely detect kidney stones. Often, digital images
have low contrast. They are also highly noisy. Therefore, it is very difficult to use these
images for detecting kidney abnormalities. Artificial neural networks are one of the most
common tools for solving this problem. Because they are fault-tolerant. They can also
be generalized easily. Moreover, they have a suitable learning ability. Therefore, in [109],
a neural network-based system has been developed.
Database. The authors use images in the UCI chronic kidney disease dataset to train
and test HMANN. In this method, there is no explanation about this database. The authors
do not mention the number of images in the dataset and their type.
Data pre-processing method. As mentioned earlier, digital images often have noise and
low contrast. Their evaluation is difficult. In HMANN, the authors have reduced noise
using threshold wavelet coefficients. In general, a pre-processing process is performed on
these images to overcome the low contrast and noise. The data pre-processing process
includes three steps: (1) Rebuilding images using a level set method; (2) Sharpening or
smoothing using a Gabor filter; (3) Improving contrast using a histogram equalization
process. In addition, a specialist physician performs manually the segmentation process
on normal and abnormal digital images. Then, HMANN uses a feature extraction process
called the gray-level co-occurrence matrix (GLCM) on these segmented regions to extract
features related to this disease. These features include adaptive, Haralick, and histogram
features. Then, a feature selection process is performed for selecting nine features.
ML model development. In [109], the final learning model is built based on three main
components, including SVM, MLP, and BP. The final learning model is called HMANN.
The purpose of HMANN is to classify digital images modified in the previous step, identify
kidney stones, and accurately detect their location.
Evaluation. HMANN uses simulation-based evaluation. This method is simulated and
evaluated through various experiments to determine its efficiency. However, the authors
do not explain the simulation tool, training set, testing set, and other simulation parameters.
HMANN is evaluated based on various criteria such as prediction rate, AUC, accuracy,
computational time, and ROC.
5.6. SRL-RNN
Wang et al. [110] proposed an ML-based model called SRL-RNN. This scheme uses
reinforcement learning and recurrent neural network (RNN). The purpose of SRL-RNN is
to solve the dynamic treatment regime (DTR) problem. The main idea of this method is to
combine two signals, including indicator and evaluation simultaneously. In the following,
we describe SRL-RNN in detail. The most important features of SRL-RNN are represented
in Table 10. Furthermore, Table 11 expresses its strengths and weaknesses.
Problem definition. Many researchers reviewed drug recommendation systems to help
physicians for better decision-making. These systems can be designed using supervised
or reinforcement learning algorithms. Supervised systems utilize similarities between
patients to produce recommendations. However, these methods cannot directly learn
the relationship between illness and drugs. These methods depend on the ground truth.
However, there is no response to this question: how is this ground truth created? In this case,
they work based on the indicator signal. While reinforcement learning-based systems do
not have this problem. However, they may present treatment recommendations that are
strongly different from the prescription recommended by the physician. This is because a
supervisor does not control them. This problem can increase the treatment risk. In fact, they
work based on the evaluation signal. Therefore, the authors of [110] combine supervised
Mathematics 2021, 9, 2970 27 of 52
learning and reinforcement learning to produce a new model called SRL-RNN. This method
can avoid unauthorized risks and deduce optimal and dynamic treatment.
Database. The authors utilize a large and available database called MIMIC-3 v1.4
to evaluate SRL-RNN. This database includes information about 43.000 patients in the
intensive care units (ICU). This information has been collected from 2001 to 2012. It contains
information about 6695 specific diseases and 4127 drugs.
Data preprocessing method. In [110], when a data point has many missing values, more
than 10 features, then this data point must be removed from the database. On the other
hand, when a data point has a small number of missing values, then these missing values
are estimated using the KNN method.
ML model development. In [110], the authors presented a deep architecture called SRL-
RNN for managing a DTR, including several diseases and different prescriptions. The aim
is to learn the prescriptive policy by combining the index signal and the evaluation signal.
SRL-RNN includes three main networks: (1) Actor network for producing drugs in a
time-variant manner based on the dynamic status of patients. In this process, doctor’s
decisions play the role of an indicator signal. This means that there is a supervisor to ensure
safe actions and speed up the learning process; (2) Critic network for assessing the action
related to the actor network to reward or penalize the recommended treatment; (3) LSTM
network for developing SRL-RNN to manage a partially-observed Markov decision process
(POMDP). It summarizes the observations to produce a more complete observation. Note
that LSTM is one of the most famous recurrent neural networks (RNNs). It is known as a
deep neural network.
Evaluation. SRL-RNN uses both evaluation methods i.e., simulation-based and practi-
cal implementation-based. In the practical implementation, the prescriptions produced by
this method are evaluated for two patients in ICU. Note that the authors do not mention the
software used to simulate this method. The dataset is divided into three groups, including
the training set (80% of the dataset), validation set (10% of the dataset), and testing set
(10% of the dataset). In [110], the mortality rate is considered as an evaluation scale to
evaluate the effect of this method for reducing mortality. The Jaccard coefficient has been
used to measure the compatibility between prescriptions recommended by SRL-RNN and
prescriptions produced by the physician.
parts, including regulating network and decoding network. The regulating network is
tasked to show the effect of treatment on the health status. Furthermore, the decoding
network is tasked to transform a space with low dimensions (i.e., the health status) into
a space with high dimensions. In [111], LSTM has been used as a deep learning method
for simulating the human body. In [111], the conceptual alignment deep auto-encoder
(CADAE) has been used as a decoding network. The second component i.e., treatment part
is also responsible for receiving observations and producing therapeutic recommendations.
This component dynamically interacts with the simulated body. It has two main parts:
disease diagnosis and proper therapeutic recommendation. In [111], the author used a deep
reinforcement learning (DRL) scheme to merge these two parts. In this regard, they used
a deep Q-network (DQN) for discrete space and the deep deterministic policy gradient
(DDPG) for continuous space.
Evaluation. This method uses a simulation-based evaluation. Therefore, this scheme
is simulated using TensorFlow installed on Python. The simulated body is trained using
CADAE. This method is evaluated in terms of convergence rate and mis-diagnostic rate.
Note that this method has presented the experimental results in a graph form. As a result,
we do not present numerical results for this scheme.
scale. Note that the evaluation process uses a 10-Fold Cross-Validation method. Then,
the final learning model is implemented in TensorFlow. As mentioned earlier, there are
114 data samples in the database. Then, GAN uses this database to produce artificial
data. After executing this process, 4000 artificial data samples are produced. As a result,
the number of data samples (real data and artificial data) is equal to 4114. Then, the DNN
algorithm is trained according to this new database. In this case, the evaluation criterion is
the average accuracy. Then, the DQN algorithm is executed on 34 patients in the UMCC
protocol. In this case, the root mean square error (RMSE) is considered an evaluation scale,
which is approximately 0.76.
5.9. HQLA
Khalilpourazari and Hashemi [113] offered a reinforcement learning-based algorithm
called HQLA. This algorithm uses the Quebec database to predict the Coronavirus preva-
lence. In this algorithm, the authors utilize two techniques, including reinforcement
learning and evolutionary algorithms. In the following, we describe this method in detail.
Table 10 represents the most important features of this method in summary. Furthermore,
Table 11 expresses its advantages and disadvantages.
Problem definition. Modeling and predicting the COVID-19 epidemic process can help
specialists in the healthcare field to finish its prevalence. However, it is very challenging to
predict the COVID-19 prevalence due to its unclear and complex nature. The metaheuristic
algorithms are very flexible and efficient. They can solve many problems in healthcare
because they reduce computational costs and time complexity. They can also efficiently
explore optimal responses. In addition, reinforcement learning algorithms can solve
many issues in the real world, especially in healthcare. According to this issue, in [113],
the authors combine the metaheuristic algorithms and reinforcement learning to predict
the coronavirus pandemic.
Database. Quebec is one of Canada’s provinces. The dataset includes data samples
related to COVID-19 and the mortality rate recorded from 25 June to 19 July in 2020. This
database includes 63713 data samples related to COVID-19 patients and 5770 data samples
related to the dead individuals due to COVID-19.
Data pre-processing method. In [113], there is no data pre-processing process.
ML model development. This method (HQLA) combines reinforcement learning and
evolutionary algorithms. This scheme can solve complex optimization problems in a
short-term time period. HQLA uses various evolutionary algorithms such as GWO [114],
SCA [115], MFO [116], PSO [117], WCA [118], and SFS [119] to update the particle position
in response space. Q-Learning is used to select the best operator (evolutionary algorithm) in
the optimization process to obtain the best efficiency. Q-learning starts with several random
operations. Then, it evaluates the efficiency for each operator in each step. This helps Q-
Learning to learns the best operations for getting the best response. If an operator improves
the final response quality, Q-learning rewards this operator. Otherwise, it penalizes the
current operator.
Evaluation. HQLA uses simulation-based evaluation. Note that the authors do
not mention the software used to implement this method. In the evaluation process,
the mean square error is considered as the objective function. Its optimal amount is equal
to 6.26 × 10−6 . The authors also presented several graphs, including convergence rate,
a comparison between predicted data and actual data. Evolutionary algorithms have been
evaluated in terms of various parameters. It is outside the field of this paper. For more
details, please refer to [113].
5.10. tVAE
Baucum et al. [120] introduced the transitional variational auto-encoders (tVAE). It
tries to learn the disease progression procedure to map a patient’s status to his next state at
the next time point. In the following, we present this method in detail. In Table 10, some
features of tVAE are expressed. Table 11 presents its advantages and disadvantages.
Mathematics 2021, 9, 2970 30 of 52
Data Preprocessing
Scheme Target Learning Model Evaluation Criteria Simulator
Technique
Removing some data points
Generating Deep reinforcement
with high missing values
treatment learning and Jaccard Coefficient: 0.409 −
[110] and estimating some data
recommendations recurrent neural Mortality Rate: 0.157
points with the small
for DTR network
number of missing values
Deep reinforcement
Designing a virtual
learning and
[111] body and a virtual − − Python
recurrent neural
doctor
network
Designing a virtual
radiotherapy
Deep reinforcement
environment and Bayesian network graph
learning and Accuracy: 100%
[112] determining the theory for selecting useful TensorFlow
recurrent neural RMSE: 0.76
appropriate features
network
radiation dose for
treating lung cancer
Predicting the
Reinforcement
[113] COVID-19 epidemic − MSE: 6.29 × 10−6 −
learning
process
Simulating artificial
Estimating missing values Deep reinforcement
patients and
using the sample-and-hold learning and
[120] simulating the MAE: 12.15 TensorFlow
interpolation method and an artificial neural
virtual treatment
artificial neural network network
policy
Table 11. The most important strengths and weaknesses of reinforcement learning models.
5.11. TE-DLSTM
Zhu et al. [121] presented a semi-supervised learning method called TE-DLSTM to
identify body activities using inertial sensors. This method uses a deep long short-term
network (DLSTM) to extract high-level features. In the following, we explain TE-DLSTM
in detail. Tables 12 and 13 represent the most important characteristics of this method and
its advantages and disadvantages, respectively.
Problem definition. Human activity recognition (HAR) is a very important issue for
informatics applications, especially healthcare. For example, when users use smartphone
applications, HAR helps us to understand their behavior. In fact, HAR discovers their
health status and presents high-quality health recommendations. However, a challenging
issue is that we deal with unlabeled data when designing the HAR system. One effective so-
lution for this issue is semi-supervised learning. Today, many methods use semi-supervised
learning techniques to identify body activity. However, they can only extract low-level
and simple features and do not have an acceptable performance. Accordingly, in [121],
a DLSTM-based method is presented for designing HAR to extract high-level features.
Database. In [121], the authors used the UCI database, which includes time-series
samples collected from 30 people. Their ages are between 19–48 years. Each time-series
sample is sampled based on an overlapping window frame, which is equal to 2.56 s.
The total number of samples is 10,000. Note that in this database, each data sample has
561 features.
Data pre-processing method. In [121], the authors perform a simple feature extraction
process on the database to extract some simple statistical features such as maxim, minimum,
mean, and variance. Then, these low-level features feed the neural network to learn high-
level features. Note that the final learning model is also a feature extraction method for
extracting high-level features from the database.
Mathematics 2021, 9, 2970 32 of 52
ML model development. The database used for designing the learning model includes
both labeled data and unlabeled data. For developing the learning model, in the first step,
an augmentation technique enlarges the database. This technique acts as a regularizer in
terms of randomness. Then, the authors extract simple features from the dataset. DLSTM is
trained based on these low-level features. Then, the Dropout network acts as a regularizer
to enhance the generalization ability of DLSTM. In the next step, the cross-entropy method
is used for measuring supervised learning loss. It analyses the difference between the
ground truth and the predicted label. The Square Loss method is used for measuring
unsupervised learning loss so that the predicted output is compared with the previous
ensemble output. Finally, the final loss is calculated based on a combination of supervised
learning loss and unsupervised learning loss to obtain deep learning parameters based on
the back-propagation method.
Data Preprocessing
Scheme Target Learning Model Evaluation Criteria Simulator
Technique
Extracting high-level
Semi-supervised
features using a Accuracy: 97.21%
[121] Feature extraction method learning method and Python
semi-supervised RunTime: 2.118 s
deep neural network
learning technique
Semi-supervised
Extracting ADR Data normalization F1-Score: 75.1%
[122] learning method and Python
mention from Twitter method Precision: 73.1%
deep neural network Recall: 77.4%
SVEB (%)
Accuracy: 97.4%
Sensitivity: 93.38%
Detecting Normal Specificity: 97.2%
Semi-supervised
beats, SVEB, and VEB Data normalization PPR: 59%
[123] learning method MATLAB
based on the method F1-Score: 72.5%
and CNN
unlabeled dataset VEB (%)
Accuracy: 98.6%
Sensitivity: 87.5%
Specificity: 99.4%
PPR: 90.9%
F1-Score: 89.2%
DRISHTI dataset
DSC: 0.967
Accuracy: 0.9957%
Semi-supervised Jaccard: 0.9314
Data normalization
Segmentation of learning method, Sensitivity: 0.9539 TensorFlow
[124] method, increasing data
retinal fundus images deep neural network, Specificity: 0.9993 tool in Python
samples
transfer learning RIM-ONE dataset
DSC: 0.902
Accuracy: 0.9945%
Jaccard: 0.8824
Sensitivity: 0.873
Specificity: 0.9981
Table 13. The most important strengths and weaknesses of semi-supervised learning-based models.
5.12. SS-BLSTM
Gupta et al. [122] presented a recurrent neural network-based method called SS-
BLSTM. The purpose of this semi-supervised approach is to extract mentions related to
adverse drug reaction (ADR) from Twitter. In the following, we explain this method.
Tables 12 and 13 represent the most important features of the SS-BLSTM method and its
weaknesses and strengths, respectively.
Problem definition. Due to easy and broad access, social networks are known as
a beneficial platform for sharing health information and are an appropriate option for
monitoring health status. In [122], the authors try to discover mentions related to ADR
from Twitter. This is very challenging because these texts are informal and brief. Many
supervised learning methods are presented for this purpose. However, their performance
is not desirable because enough labeled data samples are not available. Recently, new
methods have used deep neural networks, especially LSTM to solve this issue. However,
they need a large database for the training process to avoid overfitting. Accordingly,
in [122], the authors presented a semi-supervised method, which uses both labeled and
unlabeled data.
Database. In [122], the authors used the ADR dataset collected from Twitter for the
supervised learning phase. This database has been obtained from 2007 to 2010. In these
tweets, there are 81 drugs. The database includes 645 tweets. The unlabeled dataset is
produced using Twitter’s Search API. This database includes 0.1 million tweets.
Data pre-processing method. In [122], a data normalization process is performed on the
dataset to remove some words, symbols, and spaces.
ML model development. SS-BLSTM has two main steps: (1) The unsupervised learning
step. The main task is to extract the drug name from tweets using an unsupervised learning
Mathematics 2021, 9, 2970 34 of 52
scheme. For this, a bi-LSTM is trained. In this step, its weights are updated. Finally, these
weights are maintained for the second step; (2) The supervised learning phase. The main
task is to extract ADR from tweets using a supervised method. In this phase, the bi-LSTM
model, which has been trained in the first step, is trained again to learn the labels mentioned
in the tweet text.
Evaluation. SS-BLSTM uses simulation-based evaluation. It is implemented in Python
software. To evaluate the performance of this method, the labeled database is divided into
two sets, including training (470 tweets) and testing (170 tweets). In the evaluation process,
various parameters including F1-Score, precision, and recall are used.
The purpose of this system is to automatically detect OD for providing proper and timely
treatment services. Today, deep learning models, especially artificial neural networks such
as CNN have been used to do this work. These networks have a very good learning ability.
However, they need a large database for training to avoid overfitting. On the other hand,
the databases available for deep retinal images are very small. In [124], the authors attempt
to overcome these problems using semi-supervised learning and transfer learning.
Dataset. In [124], the authors use various databases. These databases are: (1) Kaggle’s
diabetic retinopathy database. The authors employ this labeled dataset for training the auto-
encoder network. It includes 88702 retinal images; (2) DRISHTI GS1 database. The authors
use this dataset for the segmentation network. It includes 101 retinal images. The authors
divide this dataset into two parts, including the training set (50 images) and the testing set
(50 images); (3) RIM-ONE database. This database includes 159 retinal images. Experts
segment these images and determine OD in these images. The segmentation network
utilizes this dataset.
Data pre-processing method. In the first step, the auto-encoder network and the seg-
mentation network perform a two-phase data pre-processing scheme. In the first phase,
image size is changed. The purpose of this phase is to normalize images and adjust their
size. In the second phase, data augmentation is performed. The purpose of this phase is to
increase the number of instances. This work is performed using different transformations
on the input image.
ML model development. In the first step, a deep neural network called convolutional
auto-encoder (CAE) is employed. This network is trained based on the unlabeled database.
The aim is to learn the features of images based on input data to rebuild output images.
Then, a convolutional layer is added to this trained CAE. In this case, it is converted to
the segmentation network. In this step, transfer learning is used. This means that weights
are obtained according to the trained CAE model. Then, the segmentation network is
again trained using the labeled dataset. Finally, this model can be used to detect OD in
retinal images.
Evaluation. This method uses simulation-based evaluation. It is simulated by the
TensorFlow tool in Python. The evaluation scales are DSC, Jaccard index, accuracy, sen-
sitivity, and specificity. Note that the times required for training the CAE network and
the segmentation network are 10 h and 26 min and one hour and 31 min, respectively.
The times required for testing on the DRISHTI and RIM-ONE datasets are 1.19 and 1.4 s,
respectively.
Furthermore, each data sample has between 3 and 30 features in these datasets. Addi-
tionally, the cerebral stroke database has been used to evaluate the performance of the
learning method. This dataset includes 11,039 data samples. So that, each data sample has
33 features. This dataset includes both labeled data (100 data samples) and unlabeled data
(10,939 data samples).
Data pre-processing method. In [125], the authors designed a data pre-processing module
that modifies the dataset with the unbalanced classes. This module increases the size of
a small labeled dataset using GAN. Then, a feature selection process is performed on
the dataset. Note that the authors do not describe this module and the feature selection
process exactly.
ML model development. In the first step, GAN receives the labeled dataset as the input
to produces a number of artificial data samples. The purpose of this work is to enlarge
the size of the labeled dataset and correct the unbalanced class. Then, the authors train
two basic learning algorithms, including support vector machine (SVM) and K-nearest
neighbors (KNN) using both the labeled dataset and artificial data samples. The purpose of
these algorithms is to predict the label of unlabeled data samples. Then, the data samples
with the predicted label are added to the labeled dataset. In the next step, GAN will be
used again for this dataset to produce artificial data samples. The number of these artificial
data samples is equal to the size of the dataset. Finally, the authors train the final classifier
(i.e., SVM) using both real data samples and artificial data to perform the classification task.
Evaluation. This scheme uses simulation-based evaluation. It is implemented using
MATLAB software. Note that each dataset is divided into two sections, including the
training set (70% of data samples) and the testing set (30% of data samples). The evaluation
scale for this method is accuracy.
Data Preprocessing
Scheme Target Learning Model Evaluation Criteria Simulator
Technique
Table 15. The most important strengths and weaknesses of unsupervised learning-based models.
Designing a suitable data pre-processing scheme, reducing Not testing the learning model with other available databases,
[129]
computational time, reducing error rate, high accuracy insufficient experiments to evaluate the final model
Designing a suitable data pre-processing scheme for estimating Not testing the learning model with large datasets, not
[130]
missing values, reducing error rate calculating runtime
Mathematics 2021, 9, 2970 38 of 52
ML model development. In [126], the authors used the fuzzy clustering (FC) technique
to segment MR images. The purpose of fuzzy clustering is to group m data samples of
the brain slide into k clusters. After the clustering process, each data sample achieves a
membership degree for a specific cluster, so that the data sample closest the cluster center
has the highest membership degree. Then, the cluster center is calculated based on the mean
of data samples. These data samples are weighted using their membership degree. In the
next step, the membership degree of each data samples is updated. This process continues
until the total distance between each data sample to the cluster center is minimized or
the better result is not achieved. This process segments the brain structure. Note that in
the clustering process, it is very important to determine the number of clusters. In [126],
this work is done using the silhouette score. In the next step, extracted structures are
improved through morphological operations to determine the boundary between clusters.
Finally, the authors perform some post-processing techniques to extract the desired area
(i.e., tumor) from brain slides.
Evaluation. This scheme uses both simulation-based evaluation and practical
implementation-based evaluation. It is implemented in Python software. Some eval-
uation criteria are Peak Signal to Noise Ratio (PSNR), Normalized Cross-Correlation
(NCC), Normalized Absolute Error (NAE) and Structural Similarity Index (SSIM). The per-
formance of hybrid fuzzy clustering is evaluated based on some similarity criteria such as
Dice and Jaccard. Note that this method practically evaluates the brain MR images of a
particular patient.
using least square and back-propagation methods. ANFIS has five layers. The first layer
refers to input layer and final layer indicates output.
Evaluation. This method uses simulation-based evaluation. Note that the authors do
not mention any description about simulator. The five-Fold Cross-Validation technique
validates this scheme. Evaluation criteria include accuracy, sensitivity, and specificity.
5.18. AFGC
Huang [128] suggested an adaptive fast generalized fuzzy C-means clustering (AFGC)
algorithm. The purpose of this method is to segment the thyroid nodule images in a noisy
environment to accurately detect malignant thyroid tumors. In the following, we describe
this method in detail. Table 14 expresses the specifications of this method in summary.
Furthermore, Table 15 presents its strengths and weaknesses.
Problem definition. The most common malignant thyroid is called the papillary thyroid
carcinomas (PTC), which must be treated timely to stop or control this disease. Usually,
ultrasound images are applied for detecting this disease. However, interpreting these
images is a very difficult, time-consuming, and expert task. Therefore, computer-based
systems are very beneficial for analyzing ultrasound images. The existing clustering
methods for segmenting ultrasound images have poor performance and are not sufficiently
accurate. This is because these images are highly noisy. In [128], a suitable segmentation
model has been proposed based on the AFGC clustering method.
Database. In [128], the authors used the Jinshan Hospital database including thyroid
nodule images. The PACS system is used to take these images from January 2014 to April
2016. In general, there are 610 thyroid nodule images related to 543 patients. These images
are divided into two classes, including benign (403 patients) and malignant (207 patients).
This dataset is used as the training set. In addition, the testing set includes the thyroid
nodule images from May 2016 to September 2016. The testing set includes information
about 45 patients and 50 thyroid nodule images.
Data pre-processing method. In [128], the authors did not perform any data pre-
processing scheme on the database.
ML model development. In [128], the authors presented an AFGC-based segmentation
algorithm to accurately segment the thyroid nodule images. In the first step, the authors
determine a balance scale. This scale is calculated based on the noise probability of none-
local pixels. This work helps the scheme to determine the structure information in the
image exactly. In the second step, the AFGC algorithm and the weighted image are merged
together. In this process, the authors consider the balance scale. This operation produces a
filtered image. This scheme performs the filtering process dynamically. This means that
if this image has high noise, then this scheme increases the filtering degree. Otherwise, it
reduces the degree.
Evaluation. This scheme uses simulation-based evaluation. It is simulated using
MATLAB software. Two evaluation scales, including segmentation accuracy (SA) and
comparison scores (CS), have been used to evaluate this method.
5.19. UDR-RC
Janarthanan et al. [129] offered the unsupervised deep learning assisted reconstructed
coder (UDR-RC). The purpose of this method is to present a data pre-processing scheme
to optimize the dataset. In the following, we explain this method in detail. Moreover, we
represent the main specifications of the UDR-RC method in Table 14. Table 15 expresses its
advantages and disadvantages.
Problem definition. Human activity recognition (HAR) has created opportunities for
designing e-health methods. It uses wearable sensors to recognize different body activities.
These sensors are very important for detecting different diseases and selecting a suitable
treatment policy. Their output is a signal. This signal must be analyzed using deep
learning approaches like DCCN. For analyzing these signals, existing models have high
Mathematics 2021, 9, 2970 40 of 52
computational time and a lot of error rate. This means that they are not sufficiently accurate.
Therefore, in [129], the UDR-RC method is presented to solve the stated problems.
Dataset. UDR-RC employs the WISDM database. The wearable sensors sense these
data samples. These data samples indicate six human activities, such as walking, running,
upstairs, downstairs, sitting, and standing.
Data pre-processing method. UDR-RC is a data pre-processing method, including fea-
ture selection and feature extraction. It reduces computational time and the error rate,
and enhances accuracy.
ML model development. UDR-RC is designed to extract automatically high-level features.
This process includes several steps. In the first step, data samples are analyzed. The purpose
of this step is to represent data samples analytically. It also reduces noise in data samples.
The data samples are signals based on time and frequency. In [129], Fourier transformation
(FT) is used to analyze these data samples. In this scheme, a signal with a long time is
broken into smaller parts. In [129], these time series are divided using a time window
with constant size. In the second step, the feature extraction is performed. This step is
the core of the UDR-RC method. For this purpose, the coder architecture and the Z-Layer
method are merged. They create a deep learning framework. The coder architecture is an
encoder-decoder architecture, which processes the input signal to extract its features using
the Z-Layer method. In the third step, UDR-RC performs a feature selection process to
select the most suitable features for HAR. Finally, an artificial neural network (ANN) is
used for classifying human activity. It includes an input layer, an output layer, and three
hidden layers.
Evaluation. UDR-RC uses simulation-based evaluation. However, the authors do not
mention the software used for implementing this method. In this scheme, evaluation scales
include accuracy, MSE, and runtime.
5.20. CLUSTIMP
Shobha and Savarimuthu [130] presented a clustering-based imputation technique
called CLUSTIMP. In the following, we describe this method in detail. Furthermore,
Table 14 expresses the most important characteristics of the CLUSTIMP method. Table 15
presents its advantages and disadvantages.
Problem definition. Healthcare datasets have useful information. However, they often
include many missing values, unbalanced classes, and other problems. Missing values
are known as a serious problem in these datasets. This problem can be solved using
two schemes: (1) Marginalization, In this scheme, data samples with missing values are
removed from the dataset; (2) Imputation, This scheme estimates the missing values.
The marginalization method causes the imbalance class problem; While the imputation
method does not have this problem. Therefore, in [130], an unsupervised learning algorithm
is provided for estimating these missing values.
Dataset. In [130], the authors used two databases, including the mammographic mass
dataset and the HCC dataset. The mammographic mass dataset has been obtained from
the UCI repository. It includes 961 data samples. These data samples have six features.
There are 162 data samples with missed values. Furthermore, the HCC database includes
information about 165 patients. Each data sample has 50 features. In this dataset, there are
missing values (10.22% of data samples).
Data pre-processing method. CLUSTIMP is a data pre-processing scheme for estimating
missing values.
ML model development. In [130], the authors presented a clustering-based imputation
algorithm called CLUSTIMP. This imputation model employs ART2 for creating clusters.
ART2 is an unsupervised learning algorithm, which is rooted in the ART scheme. This
scheme works with continuous features. After creating the cluster, each cluster has two
types of data samples, including complete data samples and data samples with missing
values. In the next step, cluster members are divided into two groups, including group 1
(complete data samples) and group 2 (data samples with missing values). Then, missing
Mathematics 2021, 9, 2970 41 of 52
values are estimated using two methods, including Expectation Maximization (EM) and J48
(a decision tree). Note that numerical missing values are imputed using EM and categorical
missing values are imputed using J48.
Evaluation. CLUSIMP uses simulation-based evaluation. It is implemented in Python
2.7 software. Evaluation criteria include error rate, accuracy, and root mean squared
error (RMSE).
6. Discussion
In this section, we provide some points about the ML-based methods in healthcare
according to the learning models examined in Section 5. Note that the real-world datasets
in the healthcare field often deal with various problems such as missing values, noisy
data, high data dimensionality (a high number of features), and among others. These
problems reduce the quality of datasets. This problem negatively affects the performance
of ML-based models. According to the research done in this paper, we deduce that most
ML-based methods in medicine consider the data pre-processing methods. Data with
missing values is the most common problem in healthcare datasets. Based on the ML-based
methods studied in this paper, we find that there are two main strategies for solving this
problem: (1) Deleting data with missing values; (2) Estimating missing values. Qin et al.
in [101], Wang et al. in [110], Baucum et al. in [120], and Savarimuthu and Shobha
in [130] offered various designs for estimating missing values. Li et al. [102], Abdar and
Makarenkov [107], and Wang et al. in [110] removed data with missing values from datasets.
It is a simple approach for solving this problem; however, it can lead to a new problem
called imbalanced classes. This problem has a negative effect on the performance of learning
models. Therefore, methods, which impute missing values, provide a more appropriate
solution to solve this issue. However, when designing a method for estimating missing
values, it is very important to estimate missing values exactly. Otherwise, the learning
model does not have an accepted performance. Wang et al. in [110] provided a hybrid
method for solving this issue. This means that some data samples with high missing
values are removed from the dataset and some data samples with low missing values
are also imputed. In addition, most ML-based methods consider the data normalization
process. The purpose of data normalization is that variables with different scales are
standardized in a certain range, for example [0, 1], to have the same effect on the learning
model. For example, Li et al. in [102], Baucum et al. in [120], Gupta et al. in [122],
Zhai et al. in [123], Bengani et al. in [124], Kanniappan et al. in [126], Fathi et al. in [127],
and Janarrhanan et al. in [129] used the data normalization methods. Noise is another
problem in healthcare datasets. It reduces the accuracy of learning models and increases
their error. Therefore, it is very important to design approaches to remove noisy data
to improve the performance of ML-based models. Data has different types, for example
digital images, numerical data, and qualitative data. The noise removal process varies
according to the data type in datasets. In this paper, we examined different methods for
removing different types of noise in various datasets. For example, Ma et al. in [109],
Fathi et al. in [127], Huang in [128], and Janarrhanan et al. in [129] provided various
approaches to remove noise from data. We examined these methods in Section 5. Another
important point is that the healthcare datasets often have high dimensions. This means
that data samples have many features. This can increase the model complexity and boost
learning time, and lead to overfitting. To solve this problem, the appropriate solution is to
use methods for reducing dimensionality such as feature selection and feature extraction.
Some research works have focused on feature selection and feature extraction. For example,
Qin et al. in [101], Li et al. in [102], Abdar et al. in [108], Ma et al. in [109], Tseng et al.
in [112], Zhu et al. in [121], Yang et al. in [125], Fathi et al. in [127], and Jannrthanan et al.
in [129] provided approaches for reducing dimensionality. However, some of the methods
studied in this paper do not explain the method used for reducing dimensionality. This is
an important weakness in these methods because we cannot validate the results presented
in these schemes to review the effect of the feature selection method on their performance.
Mathematics 2021, 9, 2970 42 of 52
For example, Abdar et al. in [108] and Yang et al. in [125] did not provide any explanation
about the feature selection process. Table 16 categorizes the ML-based methods based on
data pre-processing methods.
Missing Value Management Noisy Data Management Data Normalization Feature Selection Feature Extraction
1 [101] X × × X ×
2 [102] X × X X ×
3 [107] X × × × ×
4 [108] × × × X ×
5 [109] × X × X X
6 [110] X × × × ×
7 [111] × × × × ×
8 [112] × × × X ×
9 [113] × × × × ×
10 [120] X × X × ×
11 [121] × × × × X
12 [122] × × X × ×
13 [123] × × X × ×
14 [124] × × X × ×
15 [125] × × × X ×
16 [126] × × X × ×
17 [127] × X X X ×
18 [128] × X × × ×
19 [129] × X X X X
20 [130] X × × × ×
Another important point in ML-based models is the type of learning algorithm used
for their development. According to our reviews in this paper, it can be found that un-
supervised learning-based methods are often used for data pre-processing applications.
For example, Fathi et al. in [127] used the self-organizing map (SOM) for detecting noise.
Janarrhanan et al. in [129] presented an unsupervised deep learning method for feature
extraction, feature selection, and noise removal to reduce computational time. Savarimuthu
and Shoha in [130] provided an unsupervised neural network for estimating missing values
in the dataset. While supervised learning methods are often used to diagnose and classify
a disease. For example, the learning approaches provided by Qin et al. [101], Li et al. [102],
Abdar and Makarenkov [107], Abdar et al. [108], Ma et al. [109]. Today, deep learning
methods are also used to design treatment recommendation systems. However, an impor-
tant problem in these methods is that their performance depends on the labeled database.
A supervised learning algorithm has good performance when enough labeled data are
available for training and testing this model. However, in the healthcare field, we often do
not access large labeled datasets. This can lead to an overfitting problem. This reduces the
generalizability of the learning model and increases its error. Furthermore, some authors
have provided solutions to solve this issue. One solution to such a problem is to use rein-
Mathematics 2021, 9, 2970 43 of 52
forcement learning. For example, Wang et al. in [110], Dia et al. in [111], Tseng et al. in [112],
Khalilpourazari and Hashemi in [113], and Baucum et al. in [120] employed reinforcement
learning for designing the learning models. However, the most important problem when
using this technique in healthcare is that a reinforcement learning method should track the
patient’s health status continuously to learn the optimal treatment strategy. According to
the text presented above, firstly, a very difficult work is to track the patient’s health status.
Secondly, researchers cannot do unauthorized tests on the patient’s body. A solution for
these problems is to create an artificial environment for reinforced learning-based models.
For example, Dia et al. in [111], Tseng et al. in [112], and Baucum et al. in [120] designed
an artificial environment using deep learning techniques to interact with reinforcement
learning-based models. Another solution to solve data unavailability is to produce artificial
data samples. For example, Tseng et al. in [112] and Yang et al. [125] used a deep neural net-
work called GAN to produce artificial data samples and enlarge the initial dataset. Another
solution for data unavailability is to use semi-supervised learning methods. These methods
use a combination of labeled data and unlabeled data for designing the learning model.
Moreover, these methods use both learning techniques, including supervised learning and
unsupervised learning. For example, Zhu et al. in [121], Gupta et al. in [122], Zhai et al.
in [123], Bengani et al. in [124], and Yang et al. in [125] used semi-supervised learning for
designing the learning model. Table 17 categorizes the ML-based methods in the healthcare
field in terms of various learning techniques.
When examining the ML-based methods in healthcare, another point is that re-
searchers often evaluate the performance of their learning model using simulation software.
However, this evaluation method is very important, but we believe that it is not enough.
Because the ML-based methods in healthcare should be analyzed in real environments
and are evaluated by physicians and specialists in this area to identify their weaknesses.
In the research done in this paper, only Wang et al. in [110] and Kanniappan et al. in [126]
examined their methods in a real environment, but it is highly limited. Note that the
practical implementation of the learning models in healthcare is very costly. They deal with
hardware complexities for implementing the ML-based models. Additionally, it is very
difficult to repeat different scenarios. These problems are often considered as important
obstacles for artificial intelligence researchers because they need to evaluate their own
models to update them continuously. In Table 18, the ML-based methods in healthcare are
categorized in terms of evaluation methods.
Evaluation Methods
Number Scheme
Simulation-Based Evaluation Practical Implementation-Based Evaluation
1 [101] X ×
2 [102] X ×
3 [107] X ×
4 [108] X ×
5 [109] X ×
6 [110] X X
7 [111] X ×
8 [112] X ×
9 [113] X ×
10 [120] X ×
11 [121] X ×
12 [122] X ×
13 [123] X ×
14 [124] X ×
15 [125] X ×
16 [126] X X
17 [127] X ×
18 [128] X ×
19 [129] X ×
20 [130] X ×
The final point on the ML-based models in the healthcare field is that most ML-based
methods are used to diagnose a disease. The number of papers presented in the treatment
field, which use machine learning techniques is very limited. Therefore, researchers must
work in this area to resolve its problems. For example, Wang et al. in [110], Dai et al.
in [111], Tseng et al. [112], and Baucum et al. in [120]. Table 19 compares the ML-based
methods in healthcare are in terms of application.
Mathematics 2021, 9, 2970 45 of 52
Application
Number Scheme
Diagnosis Treatment
1 [101] X ×
2 [102] X ×
3 [107] X ×
4 [108] X ×
5 [109] X ×
6 [110] × X
7 [111] X X
8 [112] × X
9 [113] X ×
10 [120] × X
11 [121] X ×
12 [122] X ×
13 [123] X ×
14 [124] X ×
15 [125] X ×
16 [126] X ×
17 [127] X ×
18 [128] X ×
19 [129] X ×
20 [130] X ×
• High dimensions: The real-time healthcare datasets have high dimensional. This prob-
lem increases the model complexity, boosts the learning time, and leads to overfitting.
Therefore, ML-based methods should always consider this issue. There are some
effective techniques for reducing dimensionality. For example, feature selection and
feature extraction are effective solutions for solving this problem. However, this area
requires more research to provide more efficient methods for reducing dimensionality.
• Efficiency: ML-based models are beneficial in healthcare when they solve a serious
problem in this area. In some cases, we do not need to use machine learning techniques
for solving a problem and these techniques are not really necessary, and existing
methods can successfully resolve the problem. ML-based methods are necessary when
datasets have high dimensional or all parameters are not easily predictable, or we
require a long time to infer the correct results, or ordinary methods are inefficient
for solving this issue. Therefore, researchers must use timely and truly machine
learning techniques.
• Privacy: When designing ML-base models, we must consider the privacy issue,
because patients may be identified based on anonymous data. Privacy of patients is
a very important and vital problem that should be considered by researchers to do
more research for addressing this problem.
8. Conclusions
In this paper, we examined ML-based methods in healthcare. For this purpose, we first
explained machine learning in summary and we expressed its application in healthcare.
Then, we introduced a general framework for designing ML-based models in medicine.
We classified ML-based methods in medicine based on data pre-processing methods (data
cleaning methods, data reduction methods), learning methods (unsupervised learning,
supervised learning, semi-supervised learning, and reinforcement learning), evaluation
methods (simulation-based evaluation and practical implementation-based evaluation
in real environment), and applications (diagnosis, treatment). Finally, we studied some
ML-based methods in healthcare and expressed their strengths and weaknesses. In this
paper, we seek to provide researchers with a good view of the use of machine learning in
healthcare and familiarize them with the newest research on ML applications in medicine
so that they can provide new solutions to the existing problems in this area. In the future,
we try to focus on deep learning techniques and reinforcement learning techniques because
these techniques are very powerful tools for solving problems in healthcare.
Author Contributions: Conceptualization, M.S.Y. and E.Y.; methodology, M.S.Y., E.Y. and M.H.;
validation, A.M.R., A.H. and Z.M.; investigation, A.M.R., A.H. and R.A.N.; resources, A.M.R., A.H.
and Z.M.; writing—original draft preparation, M.S.Y., E.Y. and M.H.; supervision, M.H.; project
administration, R.A.N. and M.H.; funding acquisition, R.A.N. All authors have read and agreed to
the published version of the manuscript.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Schuld, M.; Sinayskiy, I.; Petruccione, F. An introduction to quantum machine learning. Contemp. Phys. 2015, 56, 172–185.
[CrossRef]
2. Char, D.S.; Abràmoff, M.D.; Feudtner, C. Identifying ethical considerations for machine learning healthcare applications. Am. J.
Bioeth. 2020, 20, 7–17. [CrossRef] [PubMed]
3. Nordlinger, B.; Villani, C.; Rus, D. Healthcare and Artificial Intelligence; Springer Nature: Cham, Switzerland, 2020. [CrossRef]
4. Johri, S.; Goyal, M.; Jain, S.; Baranwal, M.; Kumar, V.; Upadhyay, R. A novel machine learning-based analytical framework for
automatic detection of COVID-19 using chest X-ray images. Int. J. Imaging Syst. Technol. 2021, 31, 1105–1119. [CrossRef]
Mathematics 2021, 9, 2970 47 of 52
5. Pattnayak, P.; Jena, O.P. Innovation on Machine Learning in Healthcare Services—An Introduction. Mach. Learn. Healthc. Appl.
2021, 1–15. [CrossRef]
6. Reig, B.; Heacock, L.; Geras, K.J.; Moy, L. Machine learning in breast mri. J. Magn. Reson. Imaging 2020, 52, 998–1018. [CrossRef]
[PubMed]
7. Demirhan, A. Neuroimage-based clinical prediction using machine learning tools. Int. J. Imaging Syst. Technol. 2017, 27, 89–97.
[CrossRef]
8. Datta, S.; Barua, R.; Das, J. Application of artificial intelligence in modern healthcare system. In Alginatesrecent Uses of This Natural
Polymer; IntechOpen: Rijeka, Croatia, 2020. [CrossRef]
9. Elsebakhi, E.; Lee, F.; Schendel, E.; Haque, A.; Kathireason, N.; Pathare, T.; Syed, N.; Al-Ali, R. Large-scale machine learning
based on functional networks for biomedical big data with high performance computing platforms. J. Comput. Sci. 2015, 11,
69–81. [CrossRef]
10. Bashir, S.; Qamar, U.; Khan, F.H.; Naseem, L. HMV: A medical decision support framework using multi-layer classifiers for
disease prediction. J. Comput. Sci. 2016, 13, 10–25. [CrossRef]
11. Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.
Artif. Intell. Med. 2020, 104, 101822. [CrossRef]
12. Coronato, A.; Naeem, M.; Pietro, G.D.; Paragliola, G. Reinforcement learning for intelligent healthcare applications: A survey.
Artif. Intell. Med. 2020, 109, 101964. [CrossRef]
13. Yousefpoor, E.; Barati, H.; Barati, A. A hierarchical secure data aggregation method using the dragonfly algorithm in wireless
sensor networks. Peer-to-Peer Netw. Appl. 2021, 1–26. [CrossRef]
14. Yousefpoor, M.S.; Yousefpoor, E.; Barati, H.; Barati, A.; Movaghar, A.; Hosseinzadeh, M. Secure data aggregation methods and
countermeasures against various attacks in wireless sensor networks: A comprehensive review. J. Netw. Comput. Appl. 2021,
103118. [CrossRef]
15. Rong, G.; Mendez, A.; Assi, E.B.; Zhao, B.; Sawan, M. Artificial intelligence in healthcare: Review and prediction case studies.
Engineering 2020, 6, 291–301. [CrossRef]
16. Seaton, H. The Construction Technology Handbook; John Wiley & Sons: Hoboken, NJ, USA, 2021; ISBN 978-1-119-71995-3.
17. Chen, Y.L.; Guo, Q.D. Emerging coronaviruses: Genome structure, replication, parthenogenesis. J. Virol. 2020, 92, 418423.
[CrossRef]
18. Mohammed, M.; Khan, M.B.; Bashier, E.B.M. Machine Learning: Algorithms and Applications; CRC Press: Boca Raton, FL, USA,
2016; ISBN 978-1-4987-0538-7.
19. Seo, H.; Khuzani, M.B.; Vasudevan, V.; Huang, C.; Ren, H.; Xiao, R.; Jia, X.; Xing, L. Machine learning techniques for biomedical
image segmentation: An overview of technical aspects and introduction to state-of-art applications. Med. Phys. 2020, 47,
e148–e167. [CrossRef] [PubMed]
20. Zhang, X.-D. Machine Learning. In A Matrix Algebra Approach to Artificial Intelligence; Springer: Singapore, 2020; pp. 223–440.
[CrossRef]
21. Chen, P.-H.C.; Liu, Y.; Peng, L. How to develop machine learning models for healthcare. Nat. Mater. 2019, 18, 410–414. [CrossRef]
22. He, J.; Baxter, S.L.; Xu, J.; Xu, J.; Zhou, X.; Zhang, K. The practical implementation of artificial intelligence technologies in
medicine. Nat. Med. 2019, 25, 30–36. [CrossRef]
23. Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674.
[CrossRef]
24. Uprety, A.; Rawat, D.B. Reinforcement learning for iot security: A comprehensive survey. IEEE Internet Things J. 2020, 8, 8693–8706.
[CrossRef]
25. Yu, K.-H.; Beam, A.L.; Kohane, I.S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018, 2, 719–731. [CrossRef]
26. Yousefpoor, M.S.; Barati, H. Dskms: A dynamic smart key management system based on fuzzy logic in wireless sensor networks.
Wirel. Netw. 2020, 26, 2515–2535. [CrossRef]
27. Miotto, R.; Wang, F.; Wang, S.; Jiang, X.; Dudley, J.T. Deep learning for healthcare: Review, opportunities and challenges.
Brief. Bioinform. 2018, 19, 1236–1246. [CrossRef] [PubMed]
28. Alafif, T.; Tehame, A.M.; Bajaba, S.; Barnawi, A.; Zia, S. Machine and deep learning towards covid-19 diagnosis and treatment:
Survey, challenges, and future directions. Int. J. Environ. Res. Public Health 2021, 18, 1117. [CrossRef]
29. Tayarani-N, M.-H. Applications of artificial intelligence in battling against covid-19: A literature review. Chaos Solitons Fractals
2020, 110338. [CrossRef] [PubMed]
30. Smiti, A. When machine learning meets medical world: Current status and future challenges. Comput. Sci. Rev. 2020, 37, 100280.
[CrossRef]
31. Shouval, R.; Fein, J.A.; Savani, B.; Mohty, M.; Nagler, A. Machine learning and artificial intelligence in haematology. Br. J.
Haematol. 2021, 192, 239–250. [CrossRef]
32. Olsen, C.R.; Mentz, R.J.; Anstrom, K.J.; Page, D.; Patel, P.A. Clinical applications of machine learning in the diagnosis, classification,
and prediction of heart failure. Am. Heart J. 2020, 229, 1–17. [CrossRef]
33. Berry, M.W.; Mohamed, A.; Yap, B.W. Supervised and Unsupervised Learning for Data Science; Springer: Cham, Switzerland, 2019.
[CrossRef]
Mathematics 2021, 9, 2970 48 of 52
34. Mabrouk, E.; Ayman, A.; Raslan, Y.; Hedar, A.R. Immune system programming for medical image segmentation. J. Comput. Sci.
2019, 31, 111–125. [CrossRef]
35. Forsch, N.; Govil, S.; Perry, J.C.; Hegde, S.; Young, A.A.; Omens, J.H.; McCulloch, A.D. Computational analysis of cardiac
structure and function in congenital heart disease: Translating discoveries to clinical strategies. J. Comput. Sci. 2021, 52, 101211.
[CrossRef]
36. Surendar, P. Diagnosis of lung cancer using hybrid deep neural network with adaptive sine cosine crow search algorithm.
J. Comput. Sci. 2021, 53, 101374. [CrossRef]
37. Saxena, A.; Chandra, S. Artificial Intelligence and Machine Learning in Healthcare; Springer: Singapore, 2021. [CrossRef]
38. Pucchio, A.; Eisenhauer, E.A.; Moraes, F.Y. Medical students need artificial intelligence and machine learning training. Nat.
Biotechnol. 2021, 39, 388–389. [CrossRef] [PubMed]
39. Samuel, A.L. Some studies in machine learning using the game of checkers. IBM J. Res. Dev.t 1959, 3, 210–229. [CrossRef]
40. Alpaydin, E. Introduction to Machine Learning, 3rd ed.; PHI Publisher: New Delhi, India, 2014.
41. Kubat, M. An Introduction to Machine Learning; Springer: Cham, Switzerland, 2017. [CrossRef]
42. Belciug, S.; Gorunescu, F. Era of intelligent systems in healthcare. In Intelligent Decision Support Systems—A Journey to Smarter
Healthcare; Springer: Cham, Switzerland, 2020; pp. 1–55. [CrossRef]
43. El Naqa, I.; Murphy, M.J. What is machine learning? In Machine Learning in Radiation Oncology; Springer: Cham, Switzerland,
2015; pp. 3–11. [CrossRef]
44. Dulhare, U.N.; Ahmad, K.; Ahmad, K.A.B. (Eds.) Machine Learning and Big Data: Concepts, Algorithms, Tools and Applications; John
Wiley & Sons: Hoboken, NJ, USA, 2020.
45. Shobha, G.; Rangaswamy, S. Machine learning. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2018; Volume 38,
pp. 197–228. [CrossRef]
46. Alsuliman, T.; Humaidan, D.; Sliman, L. Machine learning and artificial intelligence in the service of medicine: Necessity or
potentiality? Curr. Res. Transl. Med. 2020, 68, 245–251. [CrossRef]
47. Kandhway, P.; Bhandari, A.K.; Singh, A. A novel reformed histogram equalization based medical image contrast enhancement
using krill herd optimization. Biomed. Signal Process. Control 2020, 56, 101677. [CrossRef]
48. Zerouaoui, H.; Idri, A. Reviewing machine learning and image processing based decision-making systems for breast cancer
imaging. J. Med. Syst. 2021, 45, 1–20. [CrossRef] [PubMed]
49. Handelman, G.; Kok, H.; Chandra, R.; Razavi, A.; Lee, M.; Asadi, H. Machine learning and the future of medicine. J. Intern. Med.
2018, 284, 603–619. [CrossRef] [PubMed]
50. Vatandsoost, M.; Litkouhi, S. The future of healthcare facilities: How technology and medical advances may shape hospitals of
the future. Hosp. Pract. Res. 2019, 4, 1–11. [CrossRef]
51. Himidan, S.; Kim, P. The evolving identity, capacity, and capability of the future surgeon. In Seminars in Pediatric Surgery; WB
Saunders, Elsevier: Amsterdam, The Netherlands, 2015; Volume 24, pp. 145–149. [CrossRef]
52. Assaf, D.; Rayman, S.; Segev, L.; Neuman, Y.; Zippel, D.; Goitein, D. Improving pre-bariatric surgery diagnosis of hiatal hernia
using machine learning models. Minim. Invasive Ther. Allied Technol. 2021, 1–7. [CrossRef] [PubMed]
53. De Bruyne, S.; Speeckaert, M.M.; Van Biesen, W.; Delanghe, J.R. Recent evolutions of machine learning applications in clinical
laboratory medicine. Crit. Rev. Clin. Lab. Sci. 2021, 58, 131–152. [CrossRef] [PubMed]
54. Rahmani, A.M.; Ali, S.; Yousefpoor, M.S.; Yousefpoor, E.; Naqvi, R.A.; Siddique, K.; Hosseinzadeh, M. An area coverage scheme
based on fuzzy logic and shuffled frog-leaping algorithm (sfla) in heterogeneous wireless sensor networks. Mathematics 2021,
9, 2251. [CrossRef]
55. Lee, S.-W.; Ali, S.; Yousefpoor, M.S.; Yousefpoor, E.; Lalbakhsh, P.; Javaheri, D.; Rahmani, A.M.; Hosseinzadeh, M. An energy-
aware and predictive fuzzy logic-based routing scheme in flying ad hoc networks (fanets). IEEE Access 2021, 9, 129977–130005.
[CrossRef]
56. Tao, W.; Concepcion, A.N.; Vianen, M.; Marijnissen, A.C.; Lafeber, F.P.; Radstake, T.R.; Pandit, A. Multiomics and machine
learning accurately predict clinical response to adalimumab and etanercept therapy in patients with rheumatoid arthritis. Arthritis
Rheumatol. 2021, 73, 212–222. [CrossRef]
57. Alizadehsani, R.; Roshanzamir, M.; Abdar, M.; Beykikhoshk, A.; Khosravi, A.; Panahiazar, M.; Koohestani, A.; Khozeimeh, F.;
Nahavandi, S.; Sarrafzadegan, N. A database for using machine learning and data mining techniques for coronary artery disease
diagnosis. Sci. Data 2019, 6, 1–13. [CrossRef]
58. Ben-Israel, D.; Jacobs, W.B.; Casha, S.; Lang, S.; Ryu, W.H.A.; de Lotbiniere-Bassett, M.; Cadotte, D.W. The impact of machine
learning on patient care: A systematic review. Artif. Intell. Med. 2020, 103, 101785. [CrossRef]
59. Yousefpoor, M.S.; Barati, H. Dynamic key management algorithms in wireless sensor networks: A survey. Comput. Commun.
2019, 134, 52–69. [CrossRef]
60. Golsorkhtabar, M.; Nia, F.K.; Hosseinzadeh, M.; Vejdanparast, Y. The novel energy adaptive protocol for heterogeneous wireless
sensor networks. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology,
Chengdu, China, 9–11 July 2010; Volume 2, pp. 178–182. [CrossRef]
61. Nikravan, M.; Movaghar, A.; Hosseinzadeh, M. A lightweight defense approach to mitigate version number and rank attacks in
low-power and lossy networks. Wirel. Pers. Commun. 2018, 99, 1035–1059. [CrossRef]
Mathematics 2021, 9, 2970 49 of 52
62. Zitnik, M.; Nguyen, F.; Wang, B.; Leskovec, J.; Goldenberg, A.; Hoffman, M.M. Machine learning for integrating data in biology
and medicine: Principles, practice, and opportunities. Inf. Fusion 2019, 50, 71–91. [CrossRef]
63. Stetco, A.; Dinmohammadi, F.; Zhao, X.; Robu, V.; Flynn, D.; Barnes, M.; Keane, J.; Nenadic, G. Machine learning methods for
wind turbine condition monitoring: A review. Renew. Energy 2019, 133, 620–635. [CrossRef]
64. Dhal, P.; Azad, C. A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 2021, 1–39.
[CrossRef]
65. Tiwari, S.R.; Rana, K.K. Feature selection in big data: Trends and challenges. In Data Science and Intelligent Applications; Springer:
Singapore, 2021; pp. 83–98. [CrossRef]
66. Guyon, I.; Elisseeff, A. An introduction to feature extraction. In Feature Extraction; Springer: Berlin/Heidelberg, Germany, 2006;
pp. 1–25. [CrossRef]
67. Xiong, Z.; Cui, Y.; Liu, Z.; Zhao, Y.; Hu, M.; Hu, J. Evaluating explorative prediction power of machine learning algorithms for
materials discovery using k-fold forward cross-validation. Comput. Mater. Sci. 2020, 171, 109203. [CrossRef]
68. Xu, Z.; Qin, W.; Tang, Q.; Jiang, D. Energy-efficient cognitive access approach to convergence communications. Sci. China Inf. Sci.
2014, 57, 1–12. [CrossRef]
69. Mandal, I. Machine learning algorithms for the creation of clinical healthcare enterprise systems. Enterp. Inf. Syst. 2017, 11,
1374–1400. [CrossRef]
70. Feldman, K.; Faust, L.; Wu, X.; Huang, C.; Chawla, N.V. Beyond volume: The impact of complex healthcare data on the machine
learning pipeline. In Towards Integrative Machine Learning and Knowledge Extraction; Springer: Berlin/Heidelberg, Germany, 2017;
pp. 150–169. [CrossRef]
71. Zhang, J.M.; Harman, M.; Ma, L.; Liu, Y. Machine learning testing: Survey, landscapes and horizons. IEEE Trans. Softw. Eng. 2020.
[CrossRef]
72. Javaheri, D.; Hosseinzadeh, M.; Rahmani, A.M. Detection and elimination of spyware and ransomware by intercepting kernel-
level system routines. IEEE Access 2018, 6, 78321–78332. [CrossRef]
73. Mesbahi, M.R.; Rahmani, A.M.; Hosseinzadeh, M. Highly reliable architecture using the 80/20 rule in cloud computing
datacenters. Future Gener. Comput. Syst. 2017, 77, 77–86. [CrossRef]
74. Wu, H.; Meng, F.J.April. Review on Evaluation Criteria of Machine Learning Based on Big Data. J. Phys. Conf. Ser. 2020, 1486,
052026. [CrossRef]
75. Vamplew, P.; Dazeley, R.; Berry, A.; Issabekov, R.; Dekker, E. Empirical evaluation methods for multiobjective reinforcement
learning algorithms. Mach. Learn. 2011, 84, 51–80. [CrossRef]
76. Setiawan, A.W. November. Image Segmentation Metrics in Skin Lesion: Accuracy, Sensitivity, Specificity, Dice Coefficient, Jaccard
Index, and Matthews Correlation Coefficient. In Proceedings of the 2020 International Conference on Computer Engineering,
Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia, 17–18 November 2020; pp. 97–102. [CrossRef]
77. Zhang, J.; Barr, E.; Guedj, B.; Harman, M.; Shawe-Taylor, J. Perturbed Model Validation: A New Framework to Validate Model
Relevance. 2019. Available online: https://hal.inria.fr/hal-02139208 (accessed on 24 August 2021).
78. Werpachowski, R.; György, A.; Szepesvári, C. Detecting overfitting via adversarial examples. arXiv 2019, arXiv:1903.02380.
79. Molnar, C. Interpretable Machine Learning. Available online: https://christophm.github.io/interpretable-ml-book (accessed on
11 September 2021).
80. Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [CrossRef]
81. Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608.
82. Slack, D.; Friedler, S.A.; Scheidegger, C.; Roy, C.D. Assessing the local interpretability of machine learning models. arXiv 2019,
arXiv:1902.03501.
83. Zhou, Z.Q.; Sun, L.; Chen, T.Y.; Towey, D. Metamorphic relations for enhancing system understanding and use. IEEE Trans. Softw.
Eng. 2018, 46, 1120–1154. [CrossRef]
84. Chen, W.; Sahiner, B.; Samuelson, F.; Pezeshk, A.; Petrick, N. Calibration of medical diagnostic classifier scores to the probability
of disease. Stat. Methods Med Res. 2018, 27, 1394–1409. [CrossRef] [PubMed]
85. Caruana, R.; Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd
International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 161–168. [CrossRef]
86. Alias Balamurugan, A.; Rajaram, R.; Pramala, S.; Rajalakshmi, S.; Jeyendran, C.; Prakash, J.D.S. Nb+: An improved naive bayesian
algorithm. Knowl.-Based Syst. 2011, 24, 563–569. [CrossRef]
87. Ballard, Z.; Brown, C.; Madni, A.M.; Ozcan, A. Machine learning and computation-enabled intelligent sensor design. Nat. Mach.
Intell. 2021, 3, 556–565. [CrossRef]
88. Miorelli, R.; Kulakovskyi, A.; Chapuis, B.; Dalmeida, O.; Mesnil, O. Supervised learning strategy for classification and regression
tasks applied to aeronautical structural health monitoring problems. Ultrasonics 2021, 113, 106372. [CrossRef] [PubMed]
89. Dhasaradhan, K.; Jaichandran, R.; Shunmuganathan, K.; Kiruthika, S.U.; Rajaprakash, S. Hybrid machine learning model using
decision tree and support vector machine for diabetes identification. In Data Engineering and Intelligent Computing; Springer:
Singapore, 2021; pp. 293–305. [CrossRef]
90. Shrestha, Y.R.; Krishna, V.; von Krogh, G. Augmenting organizational decision-making with deep learning algorithms: Principles,
promises, and challenges. J. Bus. Res. 2021, 123, 588–603. [CrossRef]
Mathematics 2021, 9, 2970 50 of 52
91. Villarrubia, G.; Paz, J.F.D.; Chamoso, P.; la Prieta, F.D. Artificial neural networks used in optimization problems. Neurocomputing
2018, 272, 10–16. [CrossRef]
92. Hasan, K.Z.; Hasan, M.Z. Performance evaluation of ensemble-based machine learning techniques for prediction of chronic
kidney disease. In Emerging Research in Computing, Information, Communication and Applications; Springer: Singapore, 2019; pp.
415–426. [CrossRef]
93. Gottwald, G.A.; Reich, S. Supervised learning from noisy observations: Combining machine-learning techniques with data
assimilation. Phys. D Nonlinear Phenom. 2021, 423, 132911. [CrossRef]
94. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
95. Piccialli, F.; Di Somma, V.; Giampaolo, F.; Cuomo, S.; Fortino, G. A survey on deep learning in medicine: Why, how and when?
Inf. Fusion 2021, 66, 111–137. [CrossRef]
96. Sharma, S.; Singh, G.; Sharma, M. A comprehensive review and analysis of supervised-learning and soft computing techniques
for stress diagnosis in humans. Comput. Biol. Med. 2021, 104450. [CrossRef] [PubMed]
97. Celebi, M.E.; Aydin, K. Unsupervised Learning Algorithms; Springer: Berlin/Heidelberg, Germany, 2016.
98. Zhang, L.; Liu, P.; Zhao, L.; Wang, G.; Zhang, W.; Liu, J. Air quality predictions with a semi-supervised bidirectional lstm neural
network. Atmos. Pollut. Res. 2021, 12, 328–339. [CrossRef]
99. Bull, L.; Worden, K.; Dervilis, N. Towards semi-supervised and probabilistic classification in structural health monitoring. Mech.
Syst. Signal Process. 2020, 140, 106653. [CrossRef]
100. Xu, X.; Zuo, L.; Huang, Z. Reinforcement learning algorithms with function approximation: Recent advances and applications.
Inf. Sci. 2014, 261, 1–31. [CrossRef]
101. Qin, J.; Chen, L.; Liu, Y.; Liu, C.; Feng, C.; Chen, B. A machine learning methodology for diagnosing chronic kidney disease.
IEEE Access 2019, 8, 20991–21002. [CrossRef]
102. Li, J.P.; Haq, A.U.; Din, S.U.; Khan, J.; Khan, A.; Saboor, A. Heart disease identification method using machine learning
classification in e-healthcare. IEEE Access 2020, 8, 107562–107582. [CrossRef]
103. Urbanowicz, R.J.; Meeker, M.; Cava, W.L.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review.
J. Biomed. Inform. 2018, 85, 189–203. [CrossRef] [PubMed]
104. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, maxrelevance, and
min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [CrossRef] [PubMed]
105. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [CrossRef]
106. Sun, Y.; Todorovic, S.; Goodison, S. Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern
Anal. Mach. Intell. 2009, 32, 1610–1626. [CrossRef]
107. Abdar, M.; Makarenkov, V. Cwv-bann-svm ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement
2019, 146, 557–570. [CrossRef]
108. Abdar, M.; Zomorodi-Moghadam, M.; Zhou, X.; Gururajan, R.; Tao, X.; Barua, P.D.; Gururajan, R. A new nested ensemble
technique for automated diagnosis of breast cancer. Pattern Recognit. Lett. 2020, 132, 123–131. [CrossRef]
109. Ma, F.; Sun, T.; Liu, L.; Jing, H. Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous
modified artificial neural network. Future Gener. Comput. Syst. 2020, 111, 17–26. [CrossRef]
110. Wang, L.; Zhang, W.; He, X.; Zha, H. Supervised reinforcement learning with recurrent neural network for dynamic treatment
recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
New York, NY, USA, 19–23 August 2018; pp. 2447–2456. [CrossRef]
111. Dai, Y.; Wang, G.; Muhammad, K.; Liu, S. A closed-loop healthcare processing approach based on deep reinforcement learning.
Multimed. Tools Appl. 2020, 1–23. [CrossRef]
112. Tseng, H.-H.; Luo, Y.; Cui, S.; Chien, J.-T.; Haken, R.K.T.; Naqa, I.E. Deep reinforcement learning for automated radiation
adaptation in lung cancer. Med. Phys. 2017, 44, 6690–6705. [CrossRef]
113. Khalilpourazari, S.; Doulabi, H.H. Designing a hybrid reinforcement learning based algorithm with application in prediction of
the covid-19 pandemic in quebec. Ann. Oper. Res. 2021, 1–45. [CrossRef]
114. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [CrossRef]
115. Mirjalili, S. Sca: A sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 2016, 96, 120–133. [CrossRef]
116. Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249.
[CrossRef]
117. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural
Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [CrossRef]
118. Eskandar, H.; Sadollah, A.; Bahreininejad, A.; Hamdi, M. Water cycle algorithm—A novel metaheuristic optimization method for
solving constrained engineering optimization problems. Comput. Struct. 2012, 110, 151–166. [CrossRef]
119. Salimi, H. Stochastic fractal search: A powerful metaheuristic algorithm. Knowl.-Based Syst. 2015, 75, 1–18. [CrossRef]
120. Baucum, M.; Khojandi, A.; Vasudevan, R. Improving deep reinforcement learning with transitional variational autoencoders: A
healthcare application. IEEE J. Biomed. Health Inform. 2020, 25, 2273–2280. [CrossRef]
121. Zhu, Q.; Chen, Z.; Soh, Y.C. A novel semisupervised deep learning method for human activity recognition. IEEE Trans. Ind.
Inform. 2018, 15, 3821–3830. [CrossRef]
Mathematics 2021, 9, 2970 51 of 52
122. Gupta, S.; Pawar, S.; Ramrakhiyani, N.; Palshikar, G.K.; Varma, V. Semi-supervised recurrent neural network for adverse drug
reaction mention extraction. BMC Bioinform. 2018, 19, 1–7. [CrossRef] [PubMed]
123. Zhai, X.; Zhou, Z.; Tin, C. Semi-supervised learning for ecg classification without patient-specific labeled data. Expert Syst. Appl.
2020, 158, 113411. [CrossRef]
124. Bengani, S.; Jothi, A.A.; Vadivel, S. Automatic segmentation of optic disc in retinal fundus images using semi-supervised deep
learning. Multimed. Tools Appl. 2021, 80, 3443–3468. [CrossRef]
125. Yang, Y.; Nan, F.; Yang, P.; Meng, Q.; Xie, Y.; Zhang, D.; Muhammad, K. Gan-based semi-supervised learning approach for clinical
decision support in health-iot platform. IEEE Access 2019, 7, 8048–8057. [CrossRef]
126. Kanniappan, S.; Samiayya, D.; Vincent, D.R.; Srinivasan, P.M.K.; Jayakody, D.N.K.; Reina, D.G.; Inoue, A. An efficient hybrid
fuzzy-clustering driven 3d-modeling of magnetic resonance imagery for enhanced brain tumor diagnosis. Electronics 2020, 9, 475.
[CrossRef]
127. Fathi, S.; Ahmadi, M.; Birashk, B.; Dehnad, A. Development and use of a clinical decision support system for the diagnosis of
social anxiety disorder. Comput. Methods Programs Biomed. 2020, 190, 105354. [CrossRef]
128. Huang, W. Segmentation and diagnosis of papillary thyroid carcinomas based on generalized clustering algorithm in ultrasound
elastography. J. Med. Syst. 2020, 44, 1–8. [CrossRef]
129. Janarthanan, R.; Doss, S.; Baskar, S. Optimized unsupervised deep learning assisted reconstructed coder in the on-nodule
wearable sensor for human activity recognition. Measurement 2020, 164, 108050. [CrossRef]
130. Shobha, K.; Savarimuthu, N. Clustering based imputation algorithm using unsupervised neural network for enhancing the
quality of healthcare data. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 1771–1781. [CrossRef]
131. Joloudari, J.H.; Hassannataj Joloudari, E.; Saadatfar, H.; Ghasemigol, M.; Razavi, S.M.; Mosavi, A.; Nabipour, N.; Shamshirb, S.;
Nadai, L. Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int. J. Environ. Res.
Public Health 2020, 17, 731. [CrossRef] [PubMed]
132. Ardabili, S.F.; Mosavi, A.; Ghamisi, P.; Ferdin, F.; Varkonyi-Koczy, A.R.; Reuter, U.; Rabczuk, T.; Atkinson, P.M. Covid-19 outbreak
prediction with machine learning. Algorithms 2020, 13, 249. [CrossRef]
133. Pinter, G.; Felde, I.; Mosavi, A.; Ghamisi, P.; Gloaguen, R. COVID-19 pandemic prediction for Hungary; a hybrid machine learning
approach. Mathematics 2020, 8, 890. [CrossRef]
134. Mahmoudi, M.R.; Baleanu, D.; Band, S.S.; Mosavi, A. Factor analysis approach to classify COVID-19 datasets in several regions.
Results Phys. 2021, 25, 104071. [CrossRef]
135. Ayoobi, N.; Sharifrazi, D.; Alizadehsani, R.; Shoeibi, A.; Gorriz, J.M.; Moosaei, H.; Khosravi, A.; Nahavi, S.; Chofreh, A.G.;
Goni, F.A.; et al. Time Series Forecasting of New Cases and New Deaths Rate for COVID-19 using Deep Learning Methods. arXiv
2021, arXiv:2104.15007.
136. Mahmoudi, M.R.; Heydari, M.H.; Qasem, S.N.; Mosavi, A.; Band, S.S. Principal component analysis to study the relations
between the spread rates of COVID-19 in high risks countries. Alex. Eng. J. 2021, 60, 457–464. [CrossRef]
137. Mahmoudi, M.R.; Baleanu, D.; Qasem, S.N.; Mosavi, A.; Band, S.S. Fuzzy clustering to classify several time series models with
fractional Brownian motion errors. Alex. Eng. J. 2021, 60, 1137–1145. [CrossRef]
138. Ardabili, S.; Mosavi, A.; Band, S.S.; Varkonyi-Koczy, A.R. Coronavirus disease (COVID-19) global prediction using hybrid
artificial intelligence method of ANN trained with Grey Wolf optimizer. In Proceedings of the 2020 IEEE 3rd International
Conference and Workshop in Óbuda on Electrical and Power Engineering (CANDO-EPE), Budapest, Hungary, 18–19 November
2020; pp. 251–254. [CrossRef]
139. Kumar, R.L.; Khan, F.; Din, S.; Band, S.S.; Mosavi, A.; Ibeke, E. Recurrent Neural Network and Reinforcement Learning Model for
COVID-19 Prediction. Front. Public Health 2021, 9, 744100. [CrossRef]
140. Yang, F.; Moayedi, H.; Mosavi, A. Predicting the Degree of Dissolved Oxygen Using Three Types of Multi-Layer Perceptron-Based
Artificial Neural Networks. Sustainability 2021, 13, 9898. [CrossRef]
141. Qurat-Ul-Ain, F.A.; Ejaz, M.Y. A comparative analysis on diagnosis of diabetes mellitus using different approaches—A survey.
Inform. Med. Unlocked 2020, 100482. [CrossRef]
142. Nosratabadi, S.; Mosavi, A.; Duan, P.; Ghamisi, P.; Filip, F.; Band, S.S.; Reuter, U.; Gama, J.; Gandomi, A.H. Data science
in economics: Comprehensive review of advanced machine learning and deep learning methods. Mathematics 2020, 8, 1799.
[CrossRef]
143. Mosavi, A.; Faghan, Y.; Ghamisi, P.; Duan, P.; Ardabili, S.F.; Salwana, E.; Band, S.S. Comprehensive review of deep reinforcement
learning methods and applications in economics. Mathematics 2020, 8, 1640. [CrossRef]
144. Chen, H.; Heidari, A.A.; Chen, H.; Wang, M.; Pan, Z.; Gandomi, A.H. Multi-population differential evolution-assisted Harris
hawks optimization: Framework and case studies. Future Gener. Comput. Syst. 2020, 111, 175–198. [CrossRef]
145. Hu, J.; Chen, H.; Heidari, A.A.; Wang, M.; Zhang, X.; Chen, Y.; Pan, Z. Orthogonal learning covariance matrix for defects of grey
wolf optimizer: Insights, balance, diversity, and feature selection. Knowl.-Based Syst. 2021, 213, 106684. [CrossRef]
146. Zhang, Y.; Liu, R.; Heidari, A.A.; Wang, X.; Chen, Y.; Wang, M.; Chen, H. Towards augmented kernel extreme learning models for
bankruptcy prediction: Algorithmic behavior and comprehensive analysis. Neurocomputing 2021, 430, 185–212. [CrossRef]
147. Zhao, D.; Liu, L.; Yu, F.; Heidari, A.A.; Wang, M.; Liang, G.; Muhammad, K.; Chen, H. Chaotic random spare ant colony
optimization for multi-threshold image segmentation of 2D Kapur entropy. Knowl.-Based Syst. 2021, 216, 106510. [CrossRef]
Mathematics 2021, 9, 2970 52 of 52
148. Tu, J.; Chen, H.; Liu, J.; Heidari, A.A.; Zhang, X.; Wang, M.; Ruby, R.; Pham, Q.V. Evolutionary biogeography-based whale
optimization methods with communication structure: Towards measuring the balance. Knowl.-Based Syst. 2021, 212, 106642.
[CrossRef]
149. Dehghani, E.; Ranjbar, S.H.; Atashafrooz, M.; Negarestani, H.; Mosavi, A.; Kovacs, L. Introducing Copula as a Novel Statistical
Method in Psychological Analysis. Int. J. Environ. Res. Public Health 2021, 18, 7972. [CrossRef] [PubMed]
150. Shan, W.; Qiao, Z.; Heidari, A.A.; Chen, H.; Turabieh, H.; Teng, Y. Double adaptive weights for stabilization of moth flame
optimizer: Balance analysis, engineering cases, and medical diagnosis. Knowl.-Based Syst. 2021, 214, 106728. [CrossRef]