0% found this document useful (0 votes)
28 views42 pages

A Review of Deep Learning Approaches in Clinical and Healthcare Systems Based On Medical Image Analysis

This document reviews deep learning approaches in healthcare, focusing on medical image analysis and the integration of these algorithms into clinical systems. It discusses the benefits, applications, and limitations of deep learning in healthcare, highlighting various models and their contributions to diagnostics and data analysis. The paper aims to provide a comprehensive overview of current trends and future directions in deep learning for healthcare systems.

Uploaded by

moosa kamel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views42 pages

A Review of Deep Learning Approaches in Clinical and Healthcare Systems Based On Medical Image Analysis

This document reviews deep learning approaches in healthcare, focusing on medical image analysis and the integration of these algorithms into clinical systems. It discusses the benefits, applications, and limitations of deep learning in healthcare, highlighting various models and their contributions to diagnostics and data analysis. The paper aims to provide a comprehensive overview of current trends and future directions in deep learning for healthcare systems.

Uploaded by

moosa kamel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Multimedia Tools and Applications (2024) 83:36039–36080

https://doi.org/10.1007/s11042-023-16605-1

A review of deep learning approaches in clinical


and healthcare systems based on medical image analysis

Hadeer A. Helaly1,2 · Mahmoud Badawy2,3 · Amira Y. Haikal2

Received: 5 August 2022 / Revised: 19 May 2023 / Accepted: 21 August 2023 /


Published online: 29 September 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract
Healthcare is a high-priority sector where people expect the highest levels of care and service,
regardless of cost. That makes it distinct from other sectors. Due to the promising results of deep
learning in other practical applications, many deep learning algorithms have been proposed for
use in healthcare and to solve traditional artificial intelligence issues. The main objective of
this study is to review and analyze current deep learning algorithms in healthcare systems. In
addition, it highlights the contributions and limitations of recent research papers. It combines
deep learning methods with the interpretability of human healthcare by providing insights into
deep learning applications in healthcare solutions. It first provides an overview of several deep
learning models and their most recent developments. It then briefly examines how these models
are applied in several medical practices. Finally, it summarizes current trends and issues in the
design and training of deep neural networks besides the future direction in this field.

Keywords Deep learning · Healthcare systems · Medical image analysis · Diagnostics


tools · Health data analytics

1 Introduction

Nowadays, various diseases affect the world, and many patients suffer from different dis-
orders. Moreover, the world health organization (WHO) recorded a wave of severe infec-
tious disease epidemics in the twenty-first century, not least the COVID-19 pandemic. The

* Hadeer A. Helaly
[email protected]
Mahmoud Badawy
[email protected]
Amira Y. Haikal
[email protected]
1
Electrical Engineering Department, Faculty of Engineering, Damietta University, Damietta, Egypt
2
Computers and Control Systems Engineering Department, Faculty of Engineering, Mansoura
University, Mansoura, Egypt
3
Department of Computer Science and Informatics, Applied College, Taibah University,
Al Madinah Al Munawwarah 41461, Saudi Arabia

13
Vol.:(0123456789)
36040 Multimedia Tools and Applications (2024) 83:36039–36080

outbreaks of these diseases have impacted lives and livelihoods worldwide [1]. Therefore,
medical data analysis has changed medical experts’ approaches to recognizing, analyzing,
and identifying risks and reactions to medicines through treating disease processes. This
medical data is unstructured data created by hospitals and healthcare systems, such as med-
ical imaging (MI) data, genomic information, free text, and data streams from monitoring
equipment [2].
Manual medical data analysis is often time-consuming, and the chances of errors in the
interpretation are not irrelevant. It required professional doctors for accurate diagnosis.
Moreover, medical data is difficult to collect, expensive, and relatively rare. As a result,
automatic analysis of medical data using computer-aided diagnosis (CAD) systems is
an accurate solution for the early detection of various diseases worldwide [3]. Computer
Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound (US), and X-Rays are
the most often used medical imaging modalities. CT provides a higher resolution on high-
density tissue than the other medical imaging modalities, but it depends on the doctor’s
skill. X-Rays are convenient and inexpensive, making them ideal for first medical exami-
nations. However, CT and X-Rays harm human bodies. Therefore, patients should not do
them frequently. Unlike CT and X-Rays, MRI does not use ionizing radiation and can show
soft tissue more clearly. However, MRI takes a long time, and some patients may not be
willing to wait [2].
With the swift propagation of digital image acquisition and storage techniques, the inter-
pretation of images through computer programs has become a curious and active subject
in machine learning and application-specific research [4, 5]. Although Machine learning
(ML) techniques perform well in medical applications [6], deep learning (DL) is a robust
and reliable tool in medical and computer vision applications such as disease diagnosis,
image classification, and segmentation [7]. As a result, deep learning is involved in the
healthcare sector (HCS) [8]. DL is a branch of ML that uses neural networks with numer-
ous layers of artificial neurons to recognize patterns from the data sets [9, 10]. As a result,
it greatly enhanced state-of-the-art performance in several applications, including speech
recognition, object detection, visual object recognition, and other fields such as drug dis-
covery and genomics [11, 12].
The benefits of using deep learning in the healthcare sector are as follows: it gives fast,
accurate, and efficient operations in healthcare. It reduces the cost of care and prevents
reporting delays about critical and urgent. It minimizes the admin load of healthcare pro-
fessionals. It decreases errors in diagnosis by auditing prescriptions and diagnostic results
and provides faster diagnostics [13]. Transfer learning is distinguished by efficiency, sim-
plicity, and minimal training cost compared to deep learning approaches such as convo-
lutional neural networks. It eliminates the curse of limited datasets. Transfer learning can
use a pre-trained network as a basis rather than training the network from scratch, which
decreases the time and reduces the difficulty of training [14].
Over the past ten years, various Artificial Intelligence (AI) and DL technologies have
been utilized to analyze the enormous amounts of data in the healthcare industry. [15]. The
yearly publications of DL techniques for HCS in the PubMed database in the previous dec-
ade are shown in Fig. 1.
The motivation of this review is to summarize the most significant parts of DL in a sin-
gle review paper so that researchers and students may get a comprehensive vision of deep
learning in the healthcare sector. This review will assist individuals in learning more about
recent advancements in the fields, enhancing DL research. Researchers could decide on the
most appropriate work direction in this field to provide more accurate solutions. The main
novelty is introducing an overview of the recent deep learning algorithms in healthcare

13
Multimedia Tools and Applications (2024) 83:36039–36080 36041

8000

6874
7000

6000
5415

5000

4000 3472

3000
1975
2000
981
1000
406
39 62 71 111 175
0
Count

2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Fig. 1  The yearly distribution of DL techniques in HCS in the PubMed database for the last decade

systems. Moreover, it highlights the contributions and limitations of recent research papers.
It merges deep learning methods with human healthcare interpretability by providing com-
prehensive knowledge of deep learning applications in healthcare solutions. It first provides
an overview of several deep learning models and their most recent developments. Then it
briefly goes through how they applied in several medical activities. Finally, it summarizes
current trends and issues in the design and training of deep neural networks, besides the
future directions of this field. Table 1 summarizes the frequently used abbreviations for the
reader’s convenience.
The main objective of this study is to offer an up-to-date review of research deep learn-
ing in the healthcare sector. This paper’s main contributions may be defined as follows:

• Identification of existing deep learning algorithms in healthcare and their classification.


• Providing in-depth knowledge of the accuracy and application of deep learning models
in healthcare.

Table 1  A list of frequently used Expression Abbreviation


abbreviations
Healthcare sector HCS
Deep Learning DL
Machine Learning ML
Convolutional Neural Networks CNN
Restricted Boltzmann Machines RBMs
Computer Tomography CT
Magnetic Resonance Imaging MRI
Artificial Intelligence AI
Medical Image Segmentation MIS
Medical Image classification MIC
Dice similarity coefficient DSC

13
36042 Multimedia Tools and Applications (2024) 83:36039–36080

• Analyzing the fundamental technologies that have the potential to reshape deep learn-
ing techniques in healthcare.
• Outlining open issues and challenges in existing healthcare deep learning models.

This paper is structured as follows: The definition and architecture of deep learning
methods are detailed in Section 2. The applications of deep learning in healthcare sys-
tems are discussed in Section 3. Section 4 presents open challenges and deep learning
trends in healthcare systems. Finally, Section 5 concludes the paper.

2 Methods and recent developments

Deep learning has recently received much attention in the medical system [16]. As a
result, several related approaches have evolved. Figure 2 shows the classification of typ-
ical deep learning architectures for processing HCS data and their applications in the
HCS sector, especially disease detection.
According to the core methods in which they are formed, deep learning methods can
be divided into four categories: convolutional neural networks (CNNs), restricted Boltz-
mann machines (RBMs), autoencoders (AE), and sparse coding [17, 18]. DL is used
in speech recognition, image analysis, text mining, health monitoring, drug discovery,
computer vision, object recognition, and other applications [8].

Laplacian Local
Sparse Sparse Super-Vector
Sparse Coordinate
CodinG Coding SPM
Coding Coding
Coding

Auto- Sparse Denoising Contracve


Encoder Autoencoder Autoencoder Autoencoder
Methods
Deep
RBM-based Deep Belief Deep Energy
Boltzmann
Methods Networks
Machines
Models

CNN-based
AlexNet GoogLeNet ResNET VGG
Methods

parkinson
Deep
Learning Speech Brain tumor
recognion
Drug epilepsy
discovery
Disease Breast cancer
detecon
Applicaons
Computer Lung cancer
vision
Object
Heart failure
detecon
Natural
language diabetes
processing

Fig. 2  The taxonomy of DL architectures for processing HCS data and its applications, particularly disease
detection [8, 17]

13
Multimedia Tools and Applications (2024) 83:36039–36080 36043

2.1 Convolutional neural networks (CNN or ConvNet)

CNNs is one of the most well-known DL techniques, in which many layers are trained
robustly. It provides excellent performance and is implemented in different computer vision
applications, including image recognition, image classifications, object detection, and face
recognition. It also has tens or hundreds of layers. Each one learns to detect different fea-
tures of an image [19, 20]. The CNN structure’s whole pipeline is shown in Fig. 3 [21].
Traditional machine-learning approaches have three stages: feature extraction, reduc-
tion, and classification. In a conventional CNN, all of these phases are then merged. On
the contrary, iterative learning improves the weights of its early layers, which function as
feature extractors. Three layers constitute CNN: the convolution layer extracts features, the
pooling layer decreases dimensionality, and the fully-connected layer categorizes and low-
ers two-dimensional matrices into a one-dimensional vector [22].

2.1.1 CNN layers

Convolution layer The convolution layer is the first layer to extract features from the input
image. It is also a learnable filter that preserves the relationships between image pixels
when learning features from small squares in the input data [2].
Figure 4 depicts the convolutional operation for a 3D image of size H × W × C, where H
stands for height, W for width, and C for channel count—using a 3D filter with the dimen-
sions ­FH × ­FW × ­FC, where ­FH stands for filter height, ­FW for filter width, and ­FC for filter
channels. As a result, the size of the output activation map should be A ­ H × ­AW, where A ­ H
denotes the activation height, and A ­ W means the activation width. The values of A ­ H and
­AW are calculated by Eqs.1 and 2 [22]. Where P represents padding, S represents the stride,
and filters are represented by n. As a result, the activation map size should be A ­ H x ­Aw x n.
H − FH + 2P
AH = 1 + (1)
S

W − FW + 2P
AW = 1 + (2)
S
The stride is the number of pixels that shift over the input matrix. If the stride is one,
the filters are moved one pixel at a time. Figure 5 shows the stride operation. Because the
filter does not perfectly fit the input image, the padding technique is applied to fit the image

Fig. 3  The full pipeline of the CNN structure [21]

13
36044 Multimedia Tools and Applications (2024) 83:36039–36080

Fig. 4  An illustration of the convolutional operation [22]

Fig. 5  The operation of the stride


through CNN layers

Fig. 6  The convolution operation with a stride of 2

size. There are two forms of padding: zero padding, which pads the picture with zeros, and
proper padding, which removes the image segments where the filter did not fit and keeps
only the valid parts. For example, Fig. 6 displays the convolution operation with a stride of
two and 3 × 3 filters with zero padding.

13
Multimedia Tools and Applications (2024) 83:36039–36080 36045

Fig. 7  Max-pooling illustration

Fig. 8  The structure of a fully connected layer

Pooling layer The pooling layer refers to spatial pooling, subsampling, or down-sam-
pling. When the images are too huge, it reduces the number of parameters. Additionally, it
reduces each map’s dimensionality while preserving essential information. There are three
forms of pooling: maximum pooling, average pooling, and sum pooling. First, the largest
element of the rectified feature map is taken in the max-pooling process. Next, through
average pooling, it takes the average of the feature map elements. Finally, sum pooling
refers to the sum of all feature map elements. Figure 7 depicts the max-pooling operation.

Fully connected layer The fully connected (FC) layer is like a traditional neural network
that flattens the matrix into a vector. Then it feeds it into a fully connected layer. As illus-
trated in Fig. 8, all inputs from one layer are linked to each activation unit of the next layer.

Activation function The activation function transforms the inputs to the neuron nonlin-
early to overcome nonlinearity in the network. A sigmoid function represents the tradi-
tional activation function [23]. It provides probabilities for a data point belonging to
each class, with values ranging from zero to 1, defined in Eq. 3. The rectified linear unit
(ReLU) became a popular solution to solve sigmoid function drawbacks. ReLU is defined
as in Eq. 4. The ReLU cannot learn using gradient-based techniques since all the gradients
become zero. Therefore, a leaky ReLU (LReLU) is applied as a good solution, as defined
by Eq. 5 [24]. Figure 9 shows the difference between the three activation functions.

13
36046 Multimedia Tools and Applications (2024) 83:36039–36080

Fig. 9  The difference among the sigmoid, Relu, and LRelu activation functions [24]

1
fSigmoid = (3)
1 + e(−x)

fRelu = max(0, x) (4)

{ }
x if x > 0
fLRelu =
.01 oterwise (5)

2.1.2 CNN Pretrained models

There are numerous CNN architectures, including LeNet, AlexNet, VGGNet, GoogLeNet,
ResNet, and ZFNet. These models hold the key to creating the algorithms that will soon
drive all of AI.

LeNet‑5 (1998) LeNet-5 is a convolutional network with seven levels. It began its opera-
tions in 1998 [25]. Several banks have used it to classify digits and find handwritten num-
bers on digital checks made from 32 × 32 pixel greyscale input images. However, this strat-
egy is limited by computer resources because high-resolution image processing requires
deep and more convolutional layers [26]. LeNet-5’s structure is depicted in Fig. 10.

AlexNet (2012) The network architecture of AlexNet is deeper than LeNet [27]. Addition-
ally, each layer has additional filters and stacks of convolutional layers. In 2012, AlexNet

Fig. 10  The LeNet-5 architecture [25]

13
Multimedia Tools and Applications (2024) 83:36039–36080 36047

Fig. 11  The architecture of AlexNet [27]

Fig. 12  The ZFNet architecture [28]

won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), surpassing every
competitor and significantly reducing the top-5 error from 26% to 15.3%. Figure 11 depicts
the AlexNet architecture.

ZFNet (2013) The AlexNet architecture was enhanced by the ZFNet architecture. It was
created by adding more deep-learning features while modifying the hyper-parameters of
AlexNet on the same structure. With a top-5 error rate of 14.8%, ZFNet won the ILSVRC
in 2013 [28]. ZFNet’s architecture relied on 7 × 7 kernels rather than 11 × 11 kernels to
reduce the number of weights and network parameters. Additionally, it increased recogni-
tion accuracy. The ZFNet network appears in Fig. 12.

GoogleNet/inception (2014) LeNet’s architecture served as an inspiration for Goog-


leNet (Inception V1). It won Google’s ILSVRC 2014 competition [18] because it produced
6.67% errors. Therefore, its performance was quite similar to that of a person. In addition,
it utilized a 22-layer, so the number of parameters was decreased from 60 million to 4 mil-
lion. Figure 13 shows the architecture of the inception layer.

VGGNet (2014) VGGNet took first place in the ILSVRC 2014 competition. Simonyan
et al. created it, and it has 16 convolutional layers. It has a highly appealing architecture
due to its uniformity. It is similar to AlexNet in that it only contains 3 × 3 convolutions but
many filters [29, 30].
On four GPUs, VGGNet took 2–3 weeks to train. Nevertheless, it has recently become
the community’s most popular method to extract features from images. In addition, the

13
36048 Multimedia Tools and Applications (2024) 83:36039–36080

Fig. 13  The inception layer architecture of the inception network [29]

VGGNet weight configuration is an open-source feature extractor utilized in various appli-


cations and challenges. In contrast, VGGNet has 138 million parameters, which may be
hard to control [31]. Figure 14 shows an example of VGGNet’s structure.

ResNet (2015) Compared to VGGNet, the Residual Neural Network (ResNet) is relatively
simple and has 152 layers. It outperforms human performance with a top-5 error rate of
just 3.57%. It includes “skip connections” and features heavy batch normalization [32–34].
Figure 15 shows the structure of the ResNet model.
Researchers create several CNN models daily; the comparison among recently devel-
oped CNN models is discussed in Table 2.

2.2 Restricted Boltzmann machines (RBMs)

It was proposed in 1986 as a generative stochastic neural network. An RBM is a Boltz-


mann Machine variation with the condition that the visible and hidden units must form
a bipartite graph. Table 3 shows the popular RBM models’ characteristics, benefits, and
drawbacks as learning modules.

Fig. 14  VGGNet example architecture [30]

13
Multimedia Tools and Applications (2024) 83:36039–36080 36049

Fig. 15  The ResNet architecture [34]

2.3 Autoencoder‑based methods

An artificial neural network that learns efficient encodings is known as an autoencoder.


Instead of predicting a goal value Y given inputs X, the autoencoder network recon-
structed its inputs X. Therefore, the dimensions of the output vectors are the same
as those of the input vectors. Table 4 presents the comparison of Autoencoder-based
Methods.

2.4 Sparse coding algorithm

It is utilized to describe the input data through learning an over-complete set of its fun-
damental functions. Table 5 illustrates the comparison among different Sparse coding
Algorithms.
Focusing on healthcare systems, it has considered four deep learning algorithms.
These algorithms are CNN, Deep Belief Network, Auto-Encoder, and Recurrent Neu-
ral Networks (RNN). For disease detection applications, these designs are commonly
employed [8]. Figure 16 depicts the distribution of deep learning methods used in
healthcare systems, including CNN, DBN, AE, and RNN. Figure 17 illustrates disease
publications as an application of DL methods based on the PubMed database.

2.5 Hardware acceleration with GPUs

Hundreds, thousands, or millions of images are used to train a neural network. Thus,
GPUs may significantly reduce the time to train a model when working with large data
and complex network architecture.

2.5.1 Training type

Training from scratch Building a network from scratch permits choosing the network
shape. This method gives high network control and may yield amazing results. However, it
necessitates a strong knowledge of the architecture of a neural network, the numerous layer
types, and configuration options. While its results can sometimes be better than transfer

13
Table 2  Comparison of recently developed CNN models
36050

Model Size (MB) Top-1 Accuracy Top-5 Accuracy Parameters Depth Time (ms) per inference Time (ms) per
step (CPU) inference step
(GPU)

13
Xception 88 0.79 0.945 22,910,480 126 109.42 8.06
VGG16 528 0.713 0.901 138,357,544 23 69.5 4.16
VGG19 549 0.713 0.9 143,667,240 26 84.75 4.38
ResNet50 98 0.749 0.921 25,636,712 – 58.2 4.55
ResNet101 171 0.764 0.928 44,707,176 – 89.59 5.19
ResNet152 232 0.766 0.931 60,419,944 – 127.43 6.54
ResNet50V2 98 0.76 0.93 25,613,800 – 45.63 4.42
ResNet101V2 171 0.772 0.938 44,675,560 – 72.73 5.43
ResNet152V2 232 0.78 0.942 60,380,648 – 107.5 6.64
InceptionV3 92 0.779 0.937 23,851,784 159 42.25 6.86
InceptionResNetV2 215 0.803 0.953 55,873,736 572 130.19 10.02
MobileNet 16 0.704 0.895 4,253,864 88 22.6 3.44
MobileNetV2 14 0.713 0.901 3,538,984 88 25.9 3.83
DenseNet121 33 0.75 0.923 8,062,504 121 77.14 5.38
DenseNet169 57 0.762 0.932 14,307,880 169 96.4 6.28
DenseNet201 80 0.773 0.936 20,242,984 201 127.24 6.67
NASNetMobile 23 0.744 0.919 5,326,716 – 27.04 6.7
NASNetLarge 343 0.825 0.96 88,949,818 – 344.51 19.96
EfficientNetB0 29 – – 5,330,571 – 46 4.91
EfficientNetB1 31 – – 7,856,239 – 60.2 5.55
EfficientNetB2 36 – – 9,177,569 – 80.79 6.5
EfficientNetB3 48 – – 12,320,535 – 139.97 8.77
EfficientNetB4 75 – – 19,466,823 – 308.33 15.12
EfficientNetB5 118 – – 30,562,527 – 579.18 25.29
EfficientNetB6 166 – – 43,265,143 – 958.12 40.45
EfficientNetB7 256 – – 66,658,687 – 1578.9 61.62
Multimedia Tools and Applications (2024) 83:36039–36080
Multimedia Tools and Applications (2024) 83:36039–36080 36051

learning results, this method usually necessitates more images for training because the new
network needs numerous examples of the object to understand the feature variations.
Training from scratch takes longer, and many network layer possibilities exist for config-
uring a network from scratch. When building a network and arranging the layers, looking
to previous network designs is common to see what other researchers have proven useful.

Using pre‑trained models (transfer learning) Transfer learning is generally consider-


ably faster and easier than training from scratch due to fine-tuning a pre-trained network. It
requires a smaller amount of data and computational capabilities. Transfer learning applies
information from one issue type to solve similar tasks. The pre-trained network has learned
many features, a benefit of transfer learning. These features can be used for a variety of
other similar problems. For example, you can retrain a network for new object classifica-
tion using only a few hundred images after being trained on millions of images. The com-
parison between training types is shown in Table 6.

2.5.2 Hardware and software

Significantly, the growth in deep learning publications is affected by the GPU and GPU-
computing libraries (CUDA, OpenCL). GPUs are typically 10–30 times faster than central
processing units (CPUs) [35]. That is because GPUs are highly parallel computing devices
with a vastly increased number of execution threads compared to CPUs. Therefore, in addi-
tion to the Hardware, the availability of open-source software packages is a crucial element
that enables efficient GPU implementations of critical neural network functions. Theano,
Torch, Tensorflow, and Cafee are the most often used packages.

3 Application of DL in HCS

3.1 Medical image analysis

Computer-Aided Detection and Diagnosis (CAD) systems depend on medical image analy-
sis and disease detection techniques in healthcare systems, such as classification, segmen-
tation, localization, and object detection. There are several types of medical images; the
comparison is conducted in Table 7 [3, 36] and shown in Fig. 18.

3.1.1 Medical image segmentation (MIS)

A well-known area of study in computer vision [37] research is image segmentation, which
has recently gained attention in image processing. It divided the image into several dis-
jointed parts based on features, including grayscale, color, spatial texture, and geometric
shapes. Additionally, it identifies organs or lesions in background medical images by their
pixels [7] and offers crucial details about their volumes and shapes [38]. Besides, it does
2D or 3D image analysis and processing to segment, extract, rebuild, and display human
organs, soft tissues, and diseased bodies in three dimensions [39, 40].

Deep learning techniques for medical images segmentation In the image segmentation
field, deep learning algorithms have lately achieved considerable advancements. The effec-
tiveness of its segmentation has surpassed that of traditional segmentation techniques. The

13
36052

13
Table 3  The comparison among Restricted Boltzmann Machines models [17]
Method Characteristics Benefits Drawbacks

1-Deep Belief Networks (DBNs) Directed connections in the lower levels, • It somewhat prevents poor local optima DBN model is computationally expensive
with undirected connections in the top because It initializes the network to create.
two layers. properly.
• No need for labeled data for training;
training is unsupervised.
2-Deep Boltzmann Machines (DBMs) All layers of the network have Undirected It combines top-down feedback to deal Joint optimization is time-consuming.
connections. with ambiguous inputs more effectively.
3-Deep Energy Models (DEMs) For the lower layers, deterministic hidden Allowing the lower layers to adjust to the The initial learned weight may have poor
units are used, and stochastic hidden training of the higher layers creates bet- convergence.
units are applied for the top hidden layer. ter generative models.
Multimedia Tools and Applications (2024) 83:36039–36080
Table 4  The comparison among Autoencoder-based Methods [17]
Method Characteristics Advantages

1-Sparse To make the representation sparse, it applies a sparsity penalty. • It categorizes input data more separably.
Autoencoder • Make the complex data more meaningful.
• Compatible with the biological vision system.
2-Denoising From corrupted data, it restores the proper input data. Remove noise efficiently.
Autoencoder
Multimedia Tools and Applications (2024) 83:36039–36080

3-Contractive It gives the reconstruction error function an analytical contractive penalty. More accurately depicts the local directions of variation determined by the
Autoencoder data.
4-Saturating Reconstruction error increases when inputs are far from the data manifold. Restricts the ability to rebuild inputs that are far from the data manifold.
autoencoder
5-Convolutional It preserves spatial locality while distributing weights across all input loca- Utilizes the 2D image structure.
autoencoder tions.
6-Zero-bias To train autoencoders without further regularisation, it utilizes the use of Greater ability to learn representations on data with high inherent dimension-
autoencoder appropriate shrinkage function. ality.
36053

13
36054

13
Table 5  Sparse Coding Algorithms comparisons
Algorithm Definition Pros and cons

1-Sparse It is an enhanced version of the Spatial Pyramid Match- Pros: compared to vector quantization(VQ), it has a less
coding SPM (ScSPM) ing (SPM) method. assignment restriction
Cons: it ignores the mutual dependence of the local
features.
2-Laplacian Identical features are given to properly selected cluster Pros: Enhance similar features to keep the mutual
Sparse Coding (LSC) centers and ensure that the chosen cluster centers are dependency in the sparse coding.
similar. Cons: Expensive computational cost.
3-Hyper- graph Laplacian Sparse Coding (HLSC) It expands the LSC to include the situation in which Pros: It enhances the robustness of sparse coding.
method a hypergraph identifies the similarity between the Cons: It ignores discriminative information.
instances.
4-Local Coordinate Coding (LCC) It encourages the coding to be local and theoretically Pros: it has a computational advantage over classical
shows that locality is more vital than sparsity. sparse coding
Cons: It needs to solve the L1-norm optimization prob-
lem, which is time-consuming.
5- Locality- Constrained Linear Coding (LLC) It replaces the L1-norm regularization with the L2-norm Pros: Accelerate the process.
regularization. Cons: Expensive computational cost.
6-super-vector coding SVC is a simple extension of VQ by expanding VQ in Pros: Represent a smoother coding scheme.
(SVC) local tangent directions. Cons: Expensive computational cost.
7-Smooth Sparse Coding (SSC) It incorporates neighborhood similarity and temporal Pros: Lower mean square reconstruction error.
information into sparse coding. Cons: Expensive computational cost.
8-Deep Sparse Coding Deep Sparse Coding (DeepSC) It extends sparse coding to a multi-layer architecture. Pros: It has the best performance among the sparse cod-
ing schemes.
Cons: High memory requirements.
Multimedia Tools and Applications (2024) 83:36039–36080
Multimedia Tools and Applications (2024) 83:36039–36080 36055

3000

2500

2000

1500

1000

500

0
CNN AE DBN RNN

2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Fig. 16  The distribution of CNN, DBN, AE, and RNN deep learning methods applied in the healthcare
systems

600 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

500

400

300

200

100

0
brain tumor lung cancer breast cancer heart failure parkinson eye disease diabetes
2012 1 1 2 1 8 1 1
2013 1 1 3 1 7 1 1
2014 1 1 3 1 14 2 2
2015 3 3 5 1 13 7 3
2016 8 3 22 1 25 13 14
2017 13 23 49 9 27 32 34
2018 48 69 100 18 50 95 63
2019 97 161 203 25 83 181 126
2020 181 276 280 44 95 338 218
2021 236 373 410 98 123 450 313
2022 331 457 559 111 152 522 411

Fig. 17  The disease publication as an application of DL methods

fully convolutional network [41] was the first to effectively apply deep learning for image
semantic segmentation because it does not demand manual image feature extraction or
extensive image preprocessing. As a result, it has achieved significant success and pioneer
work in the field and auxiliary diagnosis.
Several deep learning algorithms used in image segmentation, such as DeepLab v1 [42],
DeepLab v2, DeepLab v3 [43], DeepLab v3+, SegNet [44], 2D U-Net [45], 3D U-Net
[46], Mask R-CNN, RefineNet, and DeconvNet, which have a strong advantage in process-
ing fine edges. The comparison among these algorithms is shown in Table 8. In addition,

13
36056 Multimedia Tools and Applications (2024) 83:36039–36080

Table 6  The Comparison between Training Types


Training From Scratch Training Using Pre-trained
Models (Transfer Learning)

Pros 1-. It gives high control over the network. 1. Faster and easier than train-
2-. It produces impressive results. ing from scratch.
3-3-. Its results sometimes be greater than transfer 2. It requires a low amount
learning. of data and computational
power.
Cons 1. It requires more images for training. 1. less control over the network.
2. Its training times are often longer.

several segmentation methods are used for medical image analysis, including Cascaded,
multi-modality, Single-modality, Patch-wise, and Semantic-wise, as shown in Table 9.

3.1.2 Anatomical application areas based on MIS

Several works have demonstrated that medical image segmentation is used widely in the
health care system for the early detection of various diseases such as brain tumors, lung
cancer, heart failure, and other diseases based on several techniques of DL in HCS listed
above [50–54]. The comparison among recent works for disease detection based on medi-
cal image segmentation is depicted in Table 10.

Brain tumor detection Deep learning methods have been extensively employed for brain
image processing across many application domains. Numerous research focuses on seg-
menting brain tissue and anatomical features and diagnosing Alzheimer’s disease (AD)
(e.g., the hippocampus). Detecting and segmenting lesions are additional crucial areas
(e.g., tumors, white matter lesions, lacunes, and micro-bleeds) [55].
J. Dolz et al. [56] proposed reliable 3D FCNNs for detecting brain tumors and AD for
subcortical MRI brain structures segmentation. H.Allioui et al. [57] detect mental disorders
based on U-Net architecture by segmenting 2.5D MRI system analysis. Jingwen et al. [58]
proposed a 3D CNN based on V-Net to segment the bilateral hippocampus from 3D brain
MRI scans and diagnose AD progression states. D.Chitradevi et al. [59] segment the HC
area using a variety of optimization approaches, including the genetic algorithm (GA), the
lion optimization algorithm (LOA), the artificial bee colony (ABC), the BAT algorithm,
particle swarm optimization (PSO). When these optimization techniques were compared,
LOA exceeded the others owing to its ability to escape from local optima.
Furthermore, Diedre Carmo et al. [60] suggested a hippocampus segmentation approach
based on a deep learning U-net. It employs a 2D extended multi-orientation design. The
method was developed and validated using a public Alzheimer’s disease hippocampus seg-
mentation (HarP) dataset. The approach worked well, although the overall Dice and left/
right Dice had a low standard deviation. Finally, Sammaneh et al. [61] created a robust
Automatic atlas CNN-based hippocampal segmentation method named DeepHarp for
hippocampus delineation. The approach was developed and tested using the ADNI har-
monized hippocampal protocol (HarP). Moreover, the left and right hippocampus is seg-
mented by Helaly et al. [62] For early detection of AD and brain tumors based on NITRIC

13
Multimedia Tools and Applications (2024) 83:36039–36080 36057

Table 7  Comparison of Medical image types [3]


Type Description

Magnetic resonance imaging (MRI) MRI creates accurate and detailed images of internal
organs and tissues using strong magnets and radio
frequency (RF) waves.
T1-weighted image(MRI-T1WI) it is one of the simplest MRI pulse sequences for dis-
playing differences in tissue T1 relaxation times.
T2-weighted imaging(MRI-T2WI) It highlights variations in tissue T2 relaxation times.
Fluid-attenuated inversion recovery(MRI-FLAIR) An MRI sequence called FLAIR has an inversion recov-
ery set to null fluids.
Diffusion-weighted imaging(MRI-DWI) It demonstrates the strength of diffusion molecular
motions inside a tissue structure or at the edges of
white and grey matter brain tissues and brain lesions.
Diffusion tensor imaging(MRI-DTI) Instead of simply assigning contrast or color to pixels in
a cross-sectional image, DTI is a magnetic resonance
imaging technique that creates neural tract images by
detecting the restricted diffusion of water in tissue.
Positron emission tomography (PET) With diffusion-weighted, it provides greater soft-tissue
contrast and a way to measure cellular density.
Computed tomography (CT) produces cross-sectional (tomographic) images using
computer-processed adaptations of numerous X-ray
measurements obtained from different angles.

Fig. 18  from (a-g) different types of Medical images [3]

and ADNI datasets. The dataset is processed using the MIPAV program and augmented by
DCGAN. The methods depend on transfer learning and U-NET architecture.

Lung Cancer detection For both males and females, lung cancer is the leading cause of
cancer-related death [63–65]. Yeganeh et al. [66] suggested a modified U-Net (Res BCDU-
Net) to automatically segment lung CT images, replacing the encoder with a pre-trained
ResNet-34 network. Kamel et al. [67] utilized CT cancer images from the Task06 Lung
database to FCN. The FCN architecture was inspired by V.Net architecture for its efficiency
in selecting a region of interest (ROI) using 3D segmentation.

13
36058 Multimedia Tools and Applications (2024) 83:36039–36080

Table 8  Medial image deep learning segmentation techniques


Technique Description

Fully Convolutional • It is the most powerful and effective deep-learning technology for semantic
Neural Networks segmentation.
(FCN) • It is proposed by J. Zhuang et al. [41].
DeepLab v1 • It is proposed by Chen et al. [42]
• The score map result achieved is denser than that of FCN, and the size of the
pooled image is not lowered.
• The padding size was reduced from the original 100 to 1, and the pooling stride
was adjusted from the original 2 to 1.
DeepLab v2 • It is an enhanced version of DeepLab v1.
• It overcomes the segmentation challenge brought by differences in the same
object scale in the image.
DeepLab v3 • It used the ResNet-101 network.
• A cascaded or parallel atrous convolution module is developed to address the
challenge of multiscale target segmentation [43].
SegNet • To achieve end-to-end pixel-level image segmentation, SegNet [44] constructs an
encoder-decoder symmetric structure based on the semantic segmentation task of
FCN.
• It comprises two parts: the encoder that analyzes object information and the
decoder that corresponds the parsed information into the final image form.
U-Net • It is proposed by Ronneberger et al. [45] to design a U-Net network for biomedi-
cal images.
• Due to its excellent performance, it is used in various sub-fields of computer
vision (CV), such as image segmentation.
• It is composed of a U-channel and skip connection.
• The U channel is similar to the encoder-decoder structure of SegNet.

Heart failure detection Early detection of congestive heart failure (CHF), a progressive
and complicated syndrome caused by ventricular dysfunction, is difficult. Meng Lei et al.
[68] suggested heart rate variability (HRV) as a predictive biomarker for CHF. Due to the
success of the segmentation of medical images using 2-D UNet++, It is demonstrated that
deep learning-based HRV evaluation can be an effective tool for the early diagnosis of CHF
and may assist doctors in making prompt and accurate diagnoses. More training data is
required for a more robust diagnosis, particularly for many heart rhythm disorders. Cardio-
XAttentionNet is suggested by Innat [69] to classify and localize cardiomegaly effectively.

Breast Cancer detection Breast cancer is one of the critical health issues for women
worldwide. It is one of the severest threats to women’s health [70–74]. Successful treat-
ment for breast cancer depends on early diagnosis of the disease [2, 75]. Based on deep
learning architecture u-net, Rania Almajalid et al. [76] presented a novel segmentation
framework for breast ultrasound images. The framework is utilized to detect and classify
breast abnormalities. The dataset was very small, so data augmentation techniques were
applied to expand it.

COVID‑19 Coronavirus, also known as COVID-19, is a virus that was initially discovered
in Wuhan, China, in December 2019 and quickly spread around the world [77–79]. Sanika
et al. [80] presented an approach of Computed Tomography (CT) Segmentation of lung

13
Multimedia Tools and Applications (2024) 83:36039–36080 36059

Table 9  Comparison of segmentation methods [47]


Segmentation methods Description

Semantic-wise • It connects each image pixel to its corresponding class label.


• Because every pixel is predicted using data from the whole input image, it is
known as dense prediction [48].
• Because the segmentation labels are mapped to the input image, it reduces the
loss function.
• With less computational complexity than other techniques, it enables the genera-
tion of segmentation maps for images of any size [49].
Patch–wise • Convolution layers, transfer functions, pooling and subsampling layers, and fully
connected layers constitute the patch-wise network design.
• It works with high-resolution images that are divided into local patches as input.
In addition, the input image is employed to extract a NN patch.
• These patches are trained and offer classification labels to distinguish between
normal and abnormal brain images.
Cascaded • To get classification results, it combines two different CNN architectures, in
which the output of the first architecture is fed into the second architecture.
• The first architecture is used to train the model with the initial prediction of class
labels, and later it is utilized for fine-tuning.
Single-modality • It refers to single-source information and is applicable in different scenarios.
• It is commonly used in the public dataset for tissue-type segmentation in brain
MRI (mainly T1-W images).
Multi-modality • It uses multi-source information, which might require more parameters than a
single modality.
• It gains valuable contrast information.
• It utilizes multi-source information and provides exact localization

images using the U-Net architecture to detect covid 19. The limited dataset was an obsta-
cle, so data augmentation techniques were applied to overcome this problem.
Rachna Jain et al. [81] used U-NET, an encoder-decoder network, along with ResNet-34
to detect covid 19. The proposed method depended on the transfer learning concept. To
identify Covid 19, Abhijit Bhattacharyya et al. [82] used a conditional generative adver-
sarial network (C-GAN) to obtain the lung images and detect the disorder. The authors put
several U-net topologies to the test. However, the C-GAN model produced the best results
among the tested supervised learning methods.
Table 10 contrasts recent related works that used deep-learning approaches for health-
care, especially in several diseases detection. The comparison covers the used dataset, tech-
niques, contributions, and limitations of each state of the art. Finally, Table 10 compares
the results of each paper according to the different performance matrices used in each one.

3.2 Medical image classification (MIC)

Medical image classification is one of the most vital concerns in image recognition. Its tar-
get is categorizing medical images into distinct groups to aid clinicians in disease diagnosis
and research [5]. It has been widely studied in healthcare systems and disease detection and
involves several issues and challenges.

13
Table 10  Comparison of recent works for diseases detection based on medical image segmentation
36060

Disease Study Dataset Technique Advantages Disadvantages Results

13
Brain Tumor J. Dolz, C et al. 2018 (ABIDE, ISBR) 3D FCNNs 1. The method is robust. 1. High computational com- DSC(dice similarity coef-
[56] 2. The network is less prone plexity. ficient) = 92%
to overfitting. 2. High memory requirements.
H. Allioui et al. 2019 OASIS U-Net 1. It benefits from 3D The network was trained Accuracy = 92.71%
[57] architecture. from scratch and did not Sensitivity = 94.43%
2. It reduces complexity and benefit from transfer learning Specificity = 91.59%
computational costs. concepts.
Jingwen et al. 2020 ADNI 3D CNN based on V-Net. 1. The model performed 1. Applied to a small dataset. DSC = 0.9162 ± 0.023
[58] well in the three-category 2. Computational complex-
classification task of ity when dealing with 3D
pathological brain states. images.
2. The model segmented 3. Sample numbers of three-
the bilateral hippocampus category with AD progression
Accurately. states are unbalanced.
D. Chitradevi et al. Hospital images Optimization techniques 1. The system does not The system is not applied for Accuracy = 95%
2020 [59] include GA, ABC, BAT, include highly complex Mild cognitive impairment, Sensitivity = 94%
PSO, and LOA. computations and hard- which allows the doctor to Specificity = 93%
ware implementations. examine AD early.
2. The LOA gave high per-
formance than others.
Diedre Carmo et al. HarP U-Net It gave a precise perfor- 1. Low standard deviation DSC = 90%
2021 [60] mance on the public HarP between overall Dice and left/
hippocampus segmenta- right Dice.
tion benchmark. 2. That method was not ready to
treat hippocampus resection
due to epilepsy treatment.
Sammaneh et al. 2021 HarP DeepHarp based on CNN The method was robust and The model was built from DSC = 88%
[61] for hippocampus seg- highly accurate, aiding scratch and did not use the
mentation atrophy measurements in transfer-learning concept.
various pathologies.
helaly et al.2022 [62] ADNI, NITRIC U-NET It offers superior accuracy, It applies binary segmentation Accuracy =97% Dice simi-
sensitivity, specificity, and does not utilize multi-seg- larity coefficient = 94%
and Dice similarity coef- mentation MRI brain features
ficient performance com- of Alzheimer’s disease.
Multimedia Tools and Applications (2024) 83:36039–36080

pared to other models.


Table 10  (continued)
Disease Study Dataset Technique Advantages Disadvantages Results

Lung cancer Yeganeh et al.2021 LIDC-IDRI dataset, which U-Net(Res BCDU-Net) It produces all mask images Not apply the proposed method Dice coefficient
[66] involves lung cancer CT intelligently, without for 3D lung CT images. index = 97.31%
scans needing a radiologist’s
expertise, saving much
time.

Kamel et al.2021 [67] Task06_Lung database (96 FCN inspired by V.Net • The method is less prone The robustness of the 3D-VNet The average DCS is 80% for
CT images) with marked- architecture to overfitting. architecture should be ROI and 98% for surround-
up annotated tumors. • It achieves high per- extended to produce a useful ing lung tissues.
formance in 3D lung clinical system for lung
segmentation diseases.
Heart failure Meng Lei et al. [68] Two open-source databases 2-D UNet++ The proposed method • It is applied to a few training Accuracy = 85.64%, 86.65%,
are utilized. has provided promising data. and 88.79% when 500,
results. • Other heart rhythm abnormal- 1000, and 2000 RR
ities are required to provide a intervals are utilized,
more reliable diagnosis. respectively
Multimedia Tools and Applications (2024) 83:36039–36080

Breast cancer Rania Almajalid et al. The doctors collected the deep learning architecture The proposed method is It needs to evaluate the tech- Dice coefficient = 0.825
[76] dataset from the Second U-Net robust and improves nique on new datasets. similarity rate = 0.698
Affiliated Hospital of state-of-the-art perfor-
Harbin Medical Univer- mance.
sity in China
COVID-19 Sanika et al. [80] Collected from various architecture U-Net The model did not suffer Limited dataset. Dsc = 89%
organizations such as from overfitting. Precision = 85%
Radiopedia, licensed Recall = 88%
under CC BY-NC-SA,
and Corona Cases
Initiative
Abhijit Bhattacharyya Publicly available chest C-GAN It applied transfer learning The proposed method was Accuracy = 96.6%
et al. [82] radiographs (SCR) data- concepts rather than trained and tested on a small
set X-ray images dataset trained networks from dataset.
scratch, which gave good
outcomes.
36061

13
36062 Multimedia Tools and Applications (2024) 83:36039–36080

3.2.1 Anatomical application areas based on MIC

Brain tumor detection (Alzheimer’s disease) Payan et al. [83] employed a sparse autoen-
coder and 3D convolutional neural networks to detect brain tumors using MRI images.
With fine-tuning its convolutional layers, performance is expected to increase [84]. Sarraf
et al. [85] classified healthy (HC) from unhealthy brains using the LeNet-5 CNN architec-
ture. The work provided in [83] was then developed by Hosseini et al. [86].
For AD detection, many researchers competed for the early diagnosis of the disease and
determined its stages, such as mild cognitive impairment (MCI), early MCI (EMCI), and
late MCI (LMCI). Wang et al. [24] utilized an eight-layer CNN structure. Six layers ser-
viced the feature extraction process, and the other two fully linked layers served the clas-
sification process. Khvostikov et al. [87] also employed a 3D Inception-based CNN with
better performance than AlexNet [88]. Sahumbaiev et al. [89] also developed a HadNet
design. For improved training, the collection of MRI images is spatially normalized and
skull-stripped using the Statistical Parametric Mapping (SPM) toolbox. The Apolipopro-
tein E expression level4 (APOe4) model was proposed by Spasov et al. [90]. The APOe4
model was fed data from MRI scans, genetic tests, and clinical evaluations.
Unique CNN architectures were proposed by Wang et al. [91], Ge et al. [92], Song et al.
[93], and Liu et al. [94], based on a different model of MRI to detect Alzheimer’s disease
and classify its stages. Based on the transfer learning concept, Khagi et al. [95] proposed
shallow tuning of a pre-trained model like AlexNet, Google Net, and ResNet50. Moreover,
Jain et al. [96] have suggested that the PFSECTL mathematical model depends on VGG-16
pre-trained models. Finally, a multi-task CNN and the 3D Densely Connected Convolu-
tional Networks (3D DenseNet) models were combined to classify the disease status by Liu
et al. [97].
For neurodegenerative dementia diagnosis, Impedovo et al. [98] proposed a cognitive
model for assessing the link between cognitive functions and handwriting processes in
healthy people and patients with cognitive impairment. Four stages of Alzheimer’s disease
are classified by Harshit et al. [99] using 3D CNN architecture based on 4D FMRI images.
Furthermore, Silvia et al. [100] and Dan et al. [101] detect different Alzheimer’s disease
stages based on novel CNN structures and 3D MRI images. In addition, Juan Ruiz et al.
[102] provided 3D Densely Connected Convolutional Networks (3D DenseNets) for 4-way
classification. The comparison among recent works that used MIC in HCS for brain tumor
and Alzheimer’s disease (AD) detection is listed in Table 11.

Lung Cancer detection Lung cancer is a high-risk disease that affects people all over the
world. Lung nodules are the most common early lung cancer symptom. Automatic lung
nodule detection reduces radiologists’ workload, the rate of misdiagnosis, and missed diag-
noses [103–105].
Zhiqiang Guo et al. [106] suggested a lung cancer diagnosis system based on computed
tomography scan images for lung cancer diagnosis. It consecutively employed two effec-
tive strategies to find efficient results: the CNN-based classifier and the feature-based clas-
sifier. The case study is healthy if the feature-based method does not detect cancer; other-
wise, the case study is cancerous, as shown in Fig. 19, which displays various samples of
the Lung CT-Diagnosis.
Moreover, Ying Su et al. [107] presented a Faster R-CNN algorithm for detecting these
lung nodules. Figure 20 depicts the proposed algorithm’s whole pipeline. The method used
the ZF and VGG 16 models for training and testing steps as the basic feature extraction.

13
Multimedia Tools and Applications (2024) 83:36039–36080 36063

Heart failure detection The electrocardiogram (ECG) is an essential non-invasive diag-


nostic method for interpreting and identifying various heart conditions. For example, a
novel Deep Learning technique with excellent accuracy and minimal computing needs
was developed by Ahmed S. Eltrass et al. [108] for the automated detection of Congestive
Heart Failure (CHF) and Arrhythmia (ARR). It represented an ECG diagnostic method that
combined the Constant-Q Non-Stationary Gabor Transform with a Convolutional Neural
Network (CQ-NSGT). In addition, it used transfer learning techniques with the AlexNet
architecture.

Breast Cancer detection When the Breast cells become malignant, cancerous lesions, it
will be the beginning stages of breast cancer. Self-tests and regular medical examinations
significantly aid diagnosis, effectively enhancing survival chances [73, 109]. Jing Zheng
et al. [110] suggested a Deep Learning assisted Efficient Adaboost Algorithm (DLA-
EABA) for breast cancer diagnosis using advanced computational approaches. The Ada-
Boost technique generated an ensemble classifier’s final prediction function. Figure 21
illustrates the whole suggested architecture in [110]
Nur Syahmi Ismail et al. [111] employed VGG16 and ResNet50 deep learning models
to classify normal and abnormal tumors using the IRMA dataset and compare the results
of the two models. The suggested method included image preprocessing, classification, and
performance evaluation.

COVID 19 Based on a deep neural network model (ResNet-50), Walaa Gouda et al. [112]
proposed two distinct DL methods for COVID-19 prediction using chest X-ray (CXR)
images. The suggested approaches are assessed against two publicly available benchmark
datasets often used by researchers: the COVID-19 Image Data Collection (IDC) and CXR
Images(Pneumonia). Figure 22 illustrates the suggested method in [112].
A dataset of 10,040 X-ray (CXR) images samples, of which 2143 had COVID-19, 3674
had pneumonia (but not COVID-19), and 4223 were normal (not COVID-19 or pneumo-
nia) were used by Somenath Chakraborty et al. [113] To detect COVID-19. The method
enabled radiologists to filter potential candidates in a time-effective manner to detect
COVID-19.
Table 12 compares recent works published in heart failure, lung cancers, and breast
disease using DL based on medical image classification. The comparison focuses on the
used dataset, techniques, advantages, and constraints of each state of the art. Finally, the
table compares the results of each paper according to different performance matrices used
in each one, such as accuracy, Recall, sensitivity, precision, F1-score, specificity, and
sensitivity.
Figure 23 compares the recent works published on heart failure, lung cancer, breast dis-
ease, and covid 19 based on MIC according to accuracy metric.

3.3 Object detection and localization

Object detection is distinct but closely related to an image classification task. For image
classification, the whole image is utilized as the input. In addition, the class label of objects
within the image is predicted. For object detection, besides reporting the presence of a
given class, it estimates the position of the instance (or instances).

13
Table 11  Comparison of recent works that used MIC in HCS for AD detection
36064

Approach Technique Advantages Drawbacks Results

13
Payan et al. [83] Sparse autoencoders and 3D-CNN • It combines sparse autoencod- • Computational complexity at the AD vs. MCI: 86.84%
ers and convolutional neural training stage. HC vs. MCI: 92.11%
networks. • It was pre-trained but was not AD vs. EMC vs.HC: 89.47%
fine-tuned. It is predicted that fine- AD vs. HC: 95.39%
tuning will enhance performance.
Sarraf et al. [85] CNN and LeNet-5. • It successfully classified Alz- • This method is not generalized AD vs. HC: 98.84%
heimer’s subjects from normal to predict different stages of
controls with high accuracy. Alzheimer’s disease for different
• By unique architecture, it allows age groups.
researchers to perform feature
selection and classification.
Hosseini-Asl et al. [86] 3D-CNN built upon a 3D convolu- • The method provided high robust- • It employs only a single imaging AD/MCI: 95%
tional autoencoder ness and confidence for the AD modality (sMRI). MCI/NC: 90.8%
predictions. • It performs no prior skull-strip- AD + MCI/NC: 90.3%
ping preprocessing. AD/NC: 97.6%
AD vs. EMC vs.HC:89.1%
Korolev et al. [20] Residual and plain 3D-CNN • It proved how similar performance • This method is not generalized to LMCI vs. NC: 61%
could be achieved by exceeding predict multi-classification stages LMCI vs. EMCI: 52%
feature extraction steps. of Alzheimer’s disease for differ- EMCI vs. NC: 56%
• The ease of use ent age groups. AD vs. NC: 80%
• no need for handcrafted feature AD vs. EMCI: 63%
generation. AD vs. LMCI: 59%
Wang et al. [24] CNN • Compared to state-of-the-art tech- • It needs to apply transfer learning AD/NC: 97.65%
niques, the classification accuracy because it can handle a small-size
improved by around 5%. dataset more efficiently.
• The hyperparameters were
obtained by experience. So, a ran-
dom search method must be tested
to optimize the hyperparameters.
Multimedia Tools and Applications (2024) 83:36039–36080
Table 11  (continued)
Approach Technique Advantages Drawbacks Results

Khvostikov et.al. [87] 3D Inception-based CNN • The 3D Inception-based CNN per- • It concentrated exclusively on MCI vs. NC: 73.3%
forms much better when compared the ROI of the hippocampus AD vs. MCI vs. NC: 68.9%
to the traditional AlexNet-based biomarker. AD vs. NC: 93.3%
network utilizing data from the • It is important to include other AD vs. MCI: 86.7%
ADNI dataset. ROIs that deteriorate from AD.
Sahumbaiev et al. [89] 3D CNN • The trained classifier gives • The developed classifier used AD/ MCI/ NC: 88.31%
promising classification results the whole MR image based on
to distinguish between AD, MCI, learned features and did not use
and HC. segmented brain regions.
• Applying the Bayesian optimi- • It is predicted to improve the
zation process boosted hyper- sensitivity and specificity of the
parameters and activation function HadNet is improved.
tuning.
Spasov et al. [90] The (APOe4) CNN model • It is effective in reducing overfit- • This method is not generalized to AD/ NC: 99%
ting, computational complex- predict different stages of Alzhei-
Multimedia Tools and Applications (2024) 83:36039–36080

ity, memory requirements, and mer’s disease for other age groups.
prototyping speed through the use
of parameters.
• Since it only takes 20 to 30 sec-
onds for each epoch on an Nvidia
PASCAL TITAN X GPU, it is less
prone to overfitting and quick to
fine-tune and prototype.
Yan Wang et al. [91] A multimodal deep learning frame- • The method combines automatic • Instead of using the 2D approach, AD/aMCI/NC: 92.06%
work based on a CNN. hippocampal segmentation and it needs to try 3D neural con-
AD classification using structural volution since it could produce
MRI data. improved performance.
• It achieved a higher classification
accuracy.
36065

13
Table 11  (continued)
36066

Approach Technique Advantages Drawbacks Results

13
Khagi et al. [95] shallow tuning and fine-tuning of • The result shows that the perfor- • Training time increases with an AD/ NC: 98.51%
the pre-trained models (Alexnet, mance is better when tuning most increase in the number of layers
Googlenet, and Resnet50) layers. to tune.
• Increasing the depth of the learn-
ing model does not always result
in a good performance.
Jain et al. [96] Mathematical model PFSECTL • It achieves high accuracy for the • MCI is the most difficult class to AD vs. MCI: 99.30%
three-way classification. classify since it is an intermediate MCI vs. CN: 99.22%
• It uses the transfer learning stage between AD and CN AD vs. MCI vs. NC: 95.73%
concept. • Overall performance can also be AD vs CN: 99.14%
improved by fine-tuning.
• The data used is not a sufficient
amount.
Manhua et al. [97] Multi-model deep CNNs for jointly 1. It achieved promising perfor- Computational complexity DSC = 87.0% for hippocam-
learning hippocampus segmenta- mance. pal segmentation
tion and disease classification 2. The framework output the disease The area under the curve
evaluated on structural MRI data. status and provided the hippocam- (ROC) = 92.5% for classify-
pus segmentation result. ing AD vs. NC subjects.
Ge, C., & Qu, Q et al. [92] A multiscale deep learning architec- • It proposes a feature fusion and • The dataset is small. AD/NC: 98.80%
ture for learning AD features. enhancement strategy for multi- • It is suggested to use large data-
scale features. sets from augmented or measured
• The method is effective and data to improve performance
achieves high accuracy. further.
• The method only applied to NC
and AD classes and was not used
to MCI.
Multimedia Tools and Applications (2024) 83:36039–36080
Table 11  (continued)
Approach Technique Advantages Drawbacks Results

Song et al. [93] A multi-class Graph convolutional • The method is implemented in • The dataset is small. AD/EMCI/LMCI/NC: 89%
neural networks (GCNNs) clas- four classes of the AD spectrum. • It is suggested to use large data-
sifier. • The GCNN classifier outperforms sets from augmented or measured
SVM by margins reliant on the data to improve performance
disease category. further.
• Not applying transfer learning
techniques.
Silvia et al. [100] A deep learning algorithm for pre- • It distinguishes AD, c-MCI, and • This method does not apply the AD vs. c-MCI: 75.4%, AD
dicting the individual diagnosis of s-MCI with high performance. multi-class classification and only vs. s-MCI: 85.9%, c-MCI
AD based on CNN. • High levels of accuracy were applies binary classification. vs. s-MCI: 75.1%
achieved in all the classifications. AD vs. HC: 99.2%, c-MCI vs
HC: 87.1%, s-MCI vs. HC:
76.1%,
Harshit et al. [99] modified 3D CNN to resting-state • The method is simple and accu- • Computational complexity and AD/EMCI/LMCI/NC:93%
fMRI data for feature extraction rate. memory requirement for training
Multimedia Tools and Applications (2024) 83:36039–36080

and classification of AD • It uses the 4D fMRI data with and processing 4D fMRI.
much less preprocessing, preserv-
ing spatial and temporal informa-
tion.
36067

13
36068 Multimedia Tools and Applications (2024) 83:36039–36080

Fig. 19  Some samples of the Lung CT-Diagnosis dataset in [106]

Fig. 20  Faster R-CNN detection process for lung cancer detection in [107]

Object detection and localization have been widely studied in healthcare systems and
disease detection, and it involves several issues and challenges. The comparison among
Classification, Localization, Detection, and Segmentation is shown in Table 13 and Fig. 24.

4 Trends and challenges

Although deep learning has outperformed machine learning in the medical and health
fields, some obstacles and issues remain. The following problems and obstacles are high-
lighted, along with some solutions.

4.1 Data insufficiency

Deep learning is a data-driven method. In general, neural networks have many parameters
that should be learned, updated, and enhanced from data. Many data applications, such

13
Multimedia Tools and Applications (2024) 83:36039–36080 36069

Fig. 21  Breast cancer detection and classification proposed architecture in [110]

as image classification, natural language processing, and computer vision, have impres-
sive results with deep learning. However, Medical databases are frequently limited and
unbalanced in the healthcare field. Therefore, deep learning applications in this field are a
challenge.
A lack of data will constrain deep learning parameter optimization and will result in
the overfitting issue. The learned model works well on the training set but does badly on
data that has never been seen before. As a result, the model’s power for generalization is
limited. The common solutions for the overfitting problem are dropout and regularization
methods. Moreover, apply data enhancement techniques such as translation, rotation, clip-
ping, scaling, changing contrast, and other methods to generate new images and expand the
dataset. The third effective solution is applying transfer learning.
Multimodal learning is designed to simultaneously learn several sorts of data based on
the properties of various data types, such as electronic health records, medical images, and
genetic data. As a result, the model’s abilities are improved via multimodal learning.

4.2 Model interpretability

Because of the inability of deep learning to explain itself, it is considered a “black box.”
Therefore, lack of interpretability may not be a concern in some applications, such as
image recognition. However, model interpretability is vital in healthcare because doctors
will trust a model’s results and make wise and effective treatments if the model supplies
enough trustworthy information. Furthermore, a fully interpretable model can offer a thor-
ough comprehension of patients.

4.3 Privacy and ethical issues

In the medical and health fields, data privacy is crucial. Patient data misuse, abuse, and
incorrect usage will have disastrous consequences. Deep learning training needs a large
number of representative datasets. These databases are helpful, but they may also be
extremely sensitive.

13
36070 Multimedia Tools and Applications (2024) 83:36039–36080

Fig. 22  The suggested method in [112]

Many researchers in the computational medicine field have generated and publicly pub-
lished deep learning models for others to utilize. These models may have several param-
eters that might include sensitive data. Some persons with malicious motives may methodi-
cally create strategies to attack these models. They can infer these parameters from the
deep learning model, sensitive data from the dataset, and infringing on the model’s and
patients’ privacy.
Some users will upload their data to the cloud to solve the privacy problem, making
it accessible to any researcher. However, this presents difficulties for deep learning when
using cloud computing to process data from several data owners.

4.4 Heterogeneity

The data in the healthcare field is widely heterogeneous, hindering the generation of an
effective deep-learning model. Furthermore, data in the healthcare field is noisy, high-
dimensional, and of poor quality.
There are two types of data: unstructured and structured. Neural network input data
must be processed and translated to a numerical value. Therefore, when training neural
networks, researchers should address how to manipulate and preprocess structured and
unstructured biomedical data effectively. As a result, deep learning in the medical field
faces a barrier in processing these data.

4.5 Explainable artificial intelligence (XAI)

Explainable artificial intelligence is a set of procedures and techniques. It enables users


of explainable artificial intelligence to comprehend and believe the outcomes and results

13
Table 12  The recent works published in heart failure, lung cancers, and breast disease using DL based on MIC
Disease Study Dataset Technique Advantages Drawbacks Results

Lung cancer Zhiqiang Guo et al. [106] Lung CT-Diagnosis CNN The method achieves Not considered other Accuracy: 95.96%
dataset promising performance. types of cancer scans, Recall: 97.10%
such as MRI and X-ray F1-score: 97.10%
images Roc: 97%
Ying Su et al. [107] LIDC-IDRI data- R-CNN The proposed method Parameter optimization is Accuracy: 91.2%
base[1018 patients with optimizes and enhances required to enhance the
CT images] a faster R-CNN model. model.
Heart failure Ahmed S. Eltrass et al. MIT-BIH ARR, MIT- AlexNet • It achieves highly accu- • Small dataset applied. Accuracy: 98.82%, sensi-
[108] BIH NSR, and BIDMC rate ECG multi-class • No data augmentation tivity: 98.87%, specific-
CHF databases. classification using low techniques are applied. ity: 99.21%, Precision:
to medium hardware 99.20%.
requirements.
• It uses low computa-
tional power.
Multimedia Tools and Applications (2024) 83:36039–36080

Breast cancer JING ZHENG et al. It is conducted on the CNN The method accurately Limited collected dataset. accuracy: 97.2%, Sensitiv-
[110] Internet from the most detects breast cancer ity: 98.3%, Specific-
available data. mass and increases the ity:96.5% h
patient’s survival rate.
Nur Syahmi Ismail et al. IRMA dataset Transfer learn- It provides promising The abnormal images VGG16 accuracy = 94%
[111] ing (VGG16, accuracy. didn’t classify as malig- ResNet50 accu-
ResNet50) nant or benign tumors. racy = 91.7%
36071

13
Table 12  (continued)
36072

Disease Study Dataset Technique Advantages Drawbacks Results

13
Covid 19 Walaa Gouda et al. [112] COVID-19 Image Data ResNet-50 High reliability and per- The effectiveness of the accuracy = 99.63%
Collection (IDC) and formance method. suggested approach precision =100%
CXR Images (Pneu- has to be tested using a recall = 98.89%
monia) large and difficult data- F1-score = 99.44%
set that contains many AUC = 100%
COVID-19 cases.
Somenath Chakraborty Collected from the Inter- VGG16 • It enables radiologists • Unbalanced collected accuracy = 96.43% sensi-
et al. [113] net, including Kaggle to filter potential candi- dataset. tivity =93.68%.
and GitHub websites. dates in a time-effective • There is a lack of ROC curve = 99% for
manner to detect validation using the COVID-19, 97% for
COVID-19 program in a different Pneumonia (but not
• High-performance setting or context. COVID-19 positive), and
model 98% for normal cases.
Multimedia Tools and Applications (2024) 83:36039–36080
Multimedia Tools and Applications (2024) 83:36039–36080 36073

96%
100%
94%
97%
99%
91%
96%

86% 88% 90% 92% 94% 96% 98% 100% 102%

Covid 19 Somenath Chakraborty et al.[115] Covid 19 Walaa Gouda et al.[114]


Breast cancer Nur Syahmi Ismail et al. [113] Breast cancer JING ZHENG et al.[112]
Heart failure Ahmed S. Eltrass et al.[110] Lung cancer Ying Su et al. [109]
Lung cancer Zhiqiang Guo et al.[108]

Fig. 23  Comparison between the recent works published in heart failure, lung cancer, breast disease, and
covid 19 based on MIC according to accuracy metric

Table 13  The comparison among Classification, Localization, Detection, and Segmentation


Techniques Definition

Classification It recognizes which objects or structures are present in the image.


Localization It detects which objects or structures are in the image and identifies
their location by outputting their bounding boxes for a single object
in the image.
Object Detection Localization and detection are similar tasks.
Nevertheless, object detection is used for several objects in the image.
Segmentation Clustering together or isolating parts of an image that belong to the
same object [48]. It is also called pixel-wise classification

Fig. 24  Comparison of Classification, Localization, Detection, and segmentation [102]

generated by machine learning algorithms. In distributed systems, it is an issue to

13
36074 Multimedia Tools and Applications (2024) 83:36039–36080

implement deep learning models in terms of XAI. Accountability, transparency, outcome


monitoring, and model improvement in healthcare are all achieved if XAI is applied in ana-
lyzing and diagnosing health data using AI-based systems [114].

4.6 Hyperparameter optimization

Hyperparameter optimization or tuning aims to choose the best set of hyperparameters


for a learning algorithm. A hyperparameter is a value for a parameter used to manage
the learning process. It represents a challenge in deep learning algorithms learning pro-
cesses. It is clear from experiments that better results could be achieved if more time is
spent optimizing the hyperparameters. However, Hyperparameter optimization requires
unlimited time and high CPU power. Therefore, genetic algorithm (GA) methods are a
good way to optimize the parameters for better results [115].

5 Conclusion

The healthcare sector is distinct from other sectors. It is a high-priority area where
people expect the highest levels of care and service, regardless of cost. Deep learning
provides bright and accurate results that solve traditional artificial intelligence issues.
Therefore, Numerous deep learning algorithms have been suggested for use in health-
care. Our review offers a comprehensive overview of deep learning in the healthcare
sector. It highlights the contributions and limitations of recent research papers in this
sector. Moreover, It overviews several deep learning models and their most recent
developments. Also, it goes through how deep learning is applied in several medical
activities.

6 List of limitation

• The review provides a comprehensive view of deep-learning techniques in only the


healthcare sector and avoids discussing other sectors.
• The dataset used in the review depends only on medical image data and neglects
other types of datasets.

Data availability Data sharing does not apply to this article as no datasets were generated or analyzed dur-
ing the current study.

Declarations
Conflict of interest The authors certify that they have NO affiliations with or involvement in any organiza-
tion or entity with any financial or non-financial interest in the subject matter or materials discussed in this
manuscript.

Ethical approval This article contains no studies with human participants or animals performed by authors.

13
Multimedia Tools and Applications (2024) 83:36039–36080 36075

References
1. Baker RE et al. (2021) “Infectious disease in an era of global change,” Nat Rev Microbiol, vol.
0123456789, https://​doi.​org/​10.​1038/​s41579-​021-​00639-z
2. Wang J, Zhu H, Wang SH, Zhang YD (2021) A review of deep learning on medical image analy-
sis. Mob Networks Appl 26(1):351–380. https://​doi.​org/​10.​1007/​s11036-​020-​01672-7
3. Segato A, Marzullo A, Calimeri F, De Momi E (2020) Artificial intelligence for brain diseases: a
systematic review. APL Bioeng 4(4). https://​doi.​org/​10.​1063/5.​00116​97
4. Dev A, Sharma A, Agarwal SS (2021) Artificial intelligence and speech Technology https://​doi.​
org/​10.​1201/​97810​03150​664
5. Lai Z, Deng H (2018) “Medical image classification based on deep features extracted by deep
model and statistic feature fusion with multi-layer perceptron,” Comput Intell Neurosci, vol. 2018,
https://​doi.​org/​10.​1155/​2018/​20615​16
6. Coan LJ et al (2023) Automatic detection of glaucoma via fundus imaging and artificial intelli-
gence: a review. Surv Ophthalmol 68(1):17–41. https://​doi.​org/​10.​1016/j.​survo​phthal.​2022.​08.​005
7. Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image seg-
mentation: achievements and challenges. J Digit Imaging 32(4):582–596. https://​doi.​org/​10.​1007/​
s10278-​019-​00227-x
8. Shamshirband S, Fathi M, Dehzangi A, Chronopoulos AT, Alinejad-Rokny H (2021) A review
on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues. J
Biomed Inform 113(August 2020):103627. https://​doi.​org/​10.​1016/j.​jbi.​2020.​103627
9. Shen D, Wu G, Suk H-I (2017) Deep learning in medical image analysis. Annu Rev Biomed Eng
19:221–248. https://​doi.​org/​10.​1146/​annur​ev-​bioeng-​071516-​044442
10. Ogrean V, Dorobantiu A, Remus B (2021) Deep learning architectures and techniques for multi-
organ segmentation. Int J Adv Comput Sci Appl 12(1). https://​doi.​org/​10.​3791/​1700
11. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://​doi.​org/​
10.​1038/​natur​e14539
12. Goebel R (2022) Series Editors. https://​doi.​org/​10.​5771/​97837​48924​418-​207
13. Hatcher WG, Yu W (2018) A Survey of Deep Learning: Platforms, Applications and Emerging
Research Trends. IEEE Access 6(c):24411–24432. https://​doi.​org/​10.​1109/​ACCESS.​2018.​28306​
61
14. Lin CL, Wu KC (2023) Development of revised ResNet-50 for diabetic retinopathy detection.
BMC Bioinf 24(1):157. https://​doi.​org/​10.​1186/​s12859-​023-​05293-1
15. Hassan E, Shams MY, Hikal NA, Elmougy S (2022) The effect of choosing optimizer algorithms
to improve computer vision tasks: a comparative study. Multimed Tools Appl https://​doi.​org/​10.​
1007/​s11042-​022-​13820-0
16. Squires M et al. (2023) “Deep learning and machine learning in psychiatry: a survey of current
progress in depression detection, diagnosis and treatment,” Brain Inf, vol. 10, no. 1, https://​doi.​
org/​10.​1186/​s40708-​023-​00188-6
17. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understand-
ing: a review. Neurocomputing 187:27–48. https://​doi.​org/​10.​1016/j.​neucom.​2015.​09.​116
18. Mufti M, Kaiser M S, Mcginnity T M, Hussain A, “Deep Learning in Mining Biological Data,”
Cognit. Comput., vol. 1, p. 3, https://​doi.​org/​10.​1007/​s12559-​020-​09773-x.
19. Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2019) Deep neural networks with trans-
fer learning in millet crop images. Comput Ind 108:115–120. https://​doi.​org/​10.​1016/j.​compi​nd.​
2019.​02.​003
20. Korolev S, Safiullin A, Belyaev M, Dodonova Y (2017) “Residual and plain convolutional neural
networks for 3D brain MRI Classification Sergey Korolev Amir Safiullin Mikhail Belyaev Skolk-
ovo Institute of Science and Technology Institute for Information Transmission Problems,” 2017
IEEE 14th Int. Symp. Biomed. Imaging (ISBI 2017), pp. 835–838, https://​doi.​org/​10.​1109/​ISBI.​
2017.​79506​47.
21. Lei L, Yuan Y, Vu TX, Chatzinotas S, Ottersten B (2019) “Learning-Based Resource Allocation:
Efficient Content Delivery Enabled by Convolutional Neural Network,” IEEE Work Signal Process
Adv Wirel Commun SPAWC, vol. 2019, https://​doi.​org/​10.​1109/​SPAWC.​2019.​88154​47
22. Helaly HA, Badawy M, Haikal AY (2021) “Deep learning approach for early detection of Alzhei-
mer’s disease,” Cogn Comput, no. August 2020, https://​doi.​org/​10.​1007/​s12559-​021-​09946-2
23. Zhang YD et al (2018) Voxelwise detection of cerebral microbleed in CADASIL patients by leaky
rectified linear unit and early stopping. Multimed Tools Appl 77(17):21825–21845. https://​doi.​
org/​10.​1007/​s11042-​017-​4383-9

13
36076 Multimedia Tools and Applications (2024) 83:36039–36080

24. Wang SH, Phillips P, Sui Y, Liu B, Yang M, Cheng H (2018) Classification of Alzheimer’s disease
based on eight-layer convolutional neural network with leaky rectified linear unit and max pool-
ing. J Med Syst 42(5):85. https://​doi.​org/​10.​1007/​s10916-​018-​0932-7
25. Kuo CCJ (2016) Understanding convolutional neural networks with a mathematical model. J Vis
Commun Image Represent 41:406–413. https://​doi.​org/​10.​1016/j.​jvcir.​2016.​11.​003
26. Choi KS, Shin JS, Lee JJ, Kim YS, Kim SB, Kim CW (2005) In vitro trans-differentiation of rat
mesenchymal cells into insulin-producing cells by rat pancreatic extract. Biochem Biophys Res
Commun 330(4):1299–1305. https://​doi.​org/​10.​1016/j.​bbrc.​2005.​03.​111
27. Tu F, Yin S, Ouyang P, Tang S, Liu L, Wei S (2017) Deep Convolutional Neural Network Archi-
tecture with Reconfigurable Computation Patterns. IEEE Trans Very Large Scale Integr Syst
25(8):2220–2233. https://​doi.​org/​10.​1109/​TVLSI.​2017.​26883​40
28. Singh AV (2015) “Content-Based Image Retrieval using Deep Learning,” no. July, https://​doi.​org/​
10.​13140/​RG.2.​2.​29510.​16967.
29. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van Esesn BC, Awwal AA,
Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning
approaches. arXiv preprint arXiv:1803.01164. https://​doi.​org/​10.​48550/​arXiv.​1803.​01164
30. Khan HA, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumor classification in MRI image using
convolutional neural network. Math Biosci Eng 17(5):6203–6216. https://​doi.​org/​10.​3934/​MBE.​
20203​28
31. Simonyan K, Zisserman A (2015) “Very deep convolutional networks for large-scale image recog-
nition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp 1–14.https://​doi.​org/​
10.​48550/​arXiv.​1409.​1556
32. Targ S, Almeida D, Lyman K (2016) Resnet in resnet: generalizing residual architectures. pp 1–7.
arXiv preprint. http://​arxiv.​org/​abs/​1603.​08029
33. Alzubaidi L et al. (2021) Review of deep learning: concepts, CNN architectures, challenges, appli-
cations, future directions, vol. 8, no. 1. Springer International Publishing, https://​doi.​org/​10.​1186/​
s40537-​021-​00444-8
34. Chen J, Zhou M, Zhang D, Huang H, Zhang F (2021) “Quantification of water inflow in rock tun-
nel faces via convolutional neural network approach,” Autom Constr, vol. 123, no. January, https://​
doi.​org/​10.​1016/j.​autcon.​2020.​103526
35. G. Litjens et al., “A survey on deep learning in medical image analysis,” Med Image Anal, vol. 42,
no. December 2012, pp. 60–88, 2017, https://​doi.​org/​10.​1016/j.​media.​2017.​07.​005.
36. Zhou T, Canu S, Ruan S (2020) “A review: Deep learning for medical image segmentation using
multi-modality fusion,” arXiv, vol. 4, no. July, https://​doi.​org/​10.​1016/j.​array.​2019.​100004
37. Liu X, Song L, Liu S, Zhang Y (2021) A review of deep-learning-based medical image segmenta-
tion methods. Sustain 13(3):1–29. https://​doi.​org/​10.​3390/​su130​31224
38. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y
(2014) Generative adversarial nets. Adv Neural Inf Proces Syst 27:1–9. https://​doi.​org/​10.​1002/​
14651​858.​CD013​788.​pub2
39. An FP, Liu JE (2021) “Medical image segmentation algorithm based on multi-layer boundary per-
ception-self attention deep learning model,” Multimed Tools Appl, pp. 15017–15039, https://​doi.​
org/​10.​1007/​s11042-​021-​10515-w
40. Shirokikh B et al. (2021) “Accelerating 3d medical image segmentation by adaptive small-scale
target localization,” J Imaging, vol. 7, no. 2, https://​doi.​org/​10.​3390/​jimag​ing70​20035
41. Zhuang J, Yang J, Gu L, Dvornek N (2019) “Shelfnet for fast semantic segmentation,” Proc. -
2019 Int. Conf. Comput. Vis. Work. ICCVW 2019, pp. 847–856, https://​doi.​org/​10.​1109/​ICCVW.​
2019.​00113
42. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image
segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE
Trans Pattern Anal Mach Intell 40(4):834–848. https://​doi.​org/​10.​1109/​TPAMI.​2017.​26991​84
43. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic
image segmentation. arXiv preprint arXiv:1706.05587. https://​doi.​org/​10.​48550/​arXiv.​1706.​05587
44. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder
architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495.
https://​doi.​org/​10.​1109/​TPAMI.​2016.​26446​15
45. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image
segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioin-
formatics) 9351:234–241. https://​doi.​org/​10.​1007/​978-3-​319-​24574-4_​28

13
Multimedia Tools and Applications (2024) 83:36039–36080 36077

46. Ourselin S, Joskowicz L, Eds W. W, Hutchison D (2016) Medical Image Computing and Com-
puter-Assisted Intervention – MICCAI’2016, vol. Proceeding. [Online]. Available: https://​doi.​org/​
10.​1007/​10704​282
47. Yamanakkanavar N, Choi JY, Lee B (2020) MRI segmentation and classification of human brain
using deep learning for diagnosis of alzheimer’s disease: a survey. Sensors (Switzerland) 20(11):1–
31. https://​doi.​org/​10.​3390/​s2011​3243
48. Liu X, Deng Z, Yang Y (2019) Recent progress in semantic image segmentation. Artif Intell Rev
52(2):1089–1106. https://​doi.​org/​10.​1007/​s10462-​018-​9641-3
49. Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis
115(3):211–252. https://​doi.​org/​10.​1007/​s11263-​015-​0816-y
50. Chung YW, Choi IY (2023) Detection of abnormal extraocular muscles in small datasets of computed
tomography images using a three-dimensional variational autoencoder. Sci Rep 13(1):1–10. https://​
doi.​org/​10.​1038/​s41598-​023-​28082-5
51. Kabir S, Farrokhvar L, Dabouei A (2023) A weakly supervised approach for thoracic diseases detec-
tion. Expert Syst Appl 213, no. PB:118942. https://​doi.​org/​10.​1016/j.​eswa.​2022.​118942
52. Al Duhayyim M et al (2023) Sailfish Optimization with Deep Learning Based Oral Cancer Classifica-
tion Model. Comput Syst Sci Eng 45(1):753–767. https://​doi.​org/​10.​32604/​csse.​2023.​030556
53. Umer MJ, Sharif M, Alhaisoni M, Tariq U, Kim YJ, Chang B (2023) A Framework of Deep Learn-
ing and Selection-Based Breast Cancer Detection from Histopathology Images. Comput Syst Sci Eng
45(2):1001–1016. https://​doi.​org/​10.​32604/​csse.​2023.​030463
54. Asiri AA et al (2023) Machine Learning-Based Models for Magnetic Resonance Imaging (MRI)-
Based Brain Tumor Classification. Intell Autom Soft Comput 36(1):299–312. https://​doi.​org/​10.​
32604/​iasc.​2023.​032426
55. Klingenberg M, Eitel F, Habes M, Ritter K (2022) Higher performance for women than men in
MRI-based Alzheimer’s disease detection. Alzheimers Res Ther:1–13. https://​doi.​org/​10.​1186/​
s13195-​023-​01225-6
56. Dolz J, Desrosiers C, Ben Ayed I (2018) 3D fully convolutional networks for subcortical segmen-
tation in MRI: a large-scale study. Neuroimage 170:456–470. https://​doi.​org/​10.​1016/j.​neuro​image.​
2017.​04.​039
57. Allioui H, Sadgal M, Elfazziki A (2019) Deep MRI segmentation: A convolutional method applied
to alzheimer disease detection. Int J Adv Comput Sci Appl 10(11):365–371. https://​doi.​org/​10.​14569/​
IJACSA.​2019.​01011​51
58. Sun J, Yan S, Song C, Han B (2020) Dual-functional neural network for bilateral hippocampi seg-
mentation and diagnosis of Alzheimer’s disease. Int J Comput Assist Radiol Surg 15(3):445–455.
https://​doi.​org/​10.​1007/​s11548-​019-​02106-w
59. Chitradevi D, Prabha S, Prabhu AD (2020) “Diagnosis of Alzheimer disease in MR brain images
using optimization techniques,” Neural Comput & Applic, vol. 7, https://​doi.​org/​10.​1007/​
s00521-​020-​04984-7
60. Carmo D, Silva B, Yasuda C, Rittner L, Lotufo R (2021) Hippocampus segmentation on epilepsy
and Alzheimer’s disease studies with multiple convolutional neural networks. Heliyon 7(2):e06226.
https://​doi.​org/​10.​1016/j.​heliy​on.​2021.​e06226
61. Nobakht S, Schaeffer M, Forkert ND, Nestor S, Black SE, Barber P (2021) “Combined atlas and
convolutional neural network-based segmentation of the hippocampus from mri according to the adni
harmonized protocol,” Sensors, vol. 21, no. 7, https://​doi.​org/​10.​3390/​s2107​2427
62. Helaly HA, Badawy M, Haikal AY (2021) “Toward deep MRI segmentation for Alzheimer’s disease
detection,” Neural Comput & Applic, vol. 8, https://​doi.​org/​10.​1007/​s00521-​021-​06430-8
63. Dodia S, Annappa B, Mahesh PA (2022) Recent advancements in deep learning based lung cancer
detection: a systematic review. Eng Appl Artif Intell 116(September):105490. https://​doi.​org/​10.​
1016/j.​engap​pai.​2022.​105490
64. Zheng S et al (2023) Survival prediction for stage I-IIIA non-small cell lung cancer using deep learn-
ing. Radiother Oncol 180:109483. https://​doi.​org/​10.​1016/j.​radonc.​2023.​109483
65. Shao J et al. (2022) “Deep learning empowers lung Cancer screening based on Mobile low-dose com-
puted tomography in resource-constrained sites,” Front Biosci - Landmark, vol. 27, no. 7, https://​doi.​
org/​10.​31083/j.​fbl27​07212
66. Jalali Y, Fateh M, Rezvani M, Abolghasemi V, Anisi MH (2021) ResBCDU-net: a deep learning
framework for lung CT image segmentation. Sensors (Switzerland) 21(1):1–24. https://​doi.​org/​10.​
3390/​s2101​0268
67. Mohammed KK, Hassanien AE, Afify HM (2021) A 3D image segmentation for lung cancer using
v.net architecture based deep convolutional networks. J Med Eng Technol 45(5):337–343. https://​doi.​
org/​10.​1080/​03091​902.​2021.​19058​95

13
36078 Multimedia Tools and Applications (2024) 83:36039–36080

68. Lei M, Li J, Li M, Zou L, Yu H (2021) An improved unet++ model for congestive heart failure diag-
nosis using short-term rr intervals. Diagnostics 11(3):1–14. https://​doi.​org/​10.​3390/​diagn​ostic​s1103​
0534
69. Innat M, Hossain MF, Mader K, Kouzani AZ (2023) A convolutional attention mapping deep
neural network for classification and localization of cardiomegaly on chest X-rays. Sci Rep
13(1):6247. https://​doi.​org/​10.​1038/​s41598-​023-​32611-7
70. Agarap AFM (2018) On breast cancer detection: An application of machine learning algorithms
on the Wisconsin diagnostic dataset. ACM Int Conf Proceeding Ser 1:5–9. https://​doi.​org/​10.​1145/​
31840​66.​31840​80
71. Dar RA, Rasool M, Assad A (2022) Breast cancer detection using deep learning: datasets, meth-
ods, and challenges ahead. Comput Biol Med 149(August):106073. https://​doi.​org/​10.​1016/j.​
compb​iomed.​2022.​106073
72. Aljuaid H, Alturki N, Alsubaie N, Cavallaro L, Liotta A (2022) Computer-aided diagnosis for
breast cancer classification using deep neural networks and transfer learning. Comput Methods
Prog Biomed 223:106951. https://​doi.​org/​10.​1016/j.​cmpb.​2022.​106951
73. Raaj RS (2023) Breast cancer detection and diagnosis using hybrid deep learning architecture.
Biomed Signal Process Control 82(August 2022):104558. https://​doi.​org/​10.​1016/j.​bspc.​2022.​
104558
74. Koh J, Yoon Y, Kim S, Han K, Kim EK (2022) Deep learning for the detection of breast cancers
on chest computed tomography. Clin Breast Cancer 22(1):26–31. https://​doi.​org/​10.​1016/j.​clbc.​
2021.​04.​015
75. Tariq M, Iqbal S, Ayesha H, Abbas I, Ahmad KT, Niazi MFK (2021) Medical image based breast
cancer diagnosis: state of the art and future directions. Expert Syst Appl 167:114095. https://​doi.​
org/​10.​1016/j.​eswa.​2020.​114095
76. Almajalid R, Shan J, Du Y, Zhang M (2019) “Development of a Deep-Learning-Based Method for
Breast Ultrasound Image Segmentation,” Proc. - 17th IEEE Int Conf Mach Learn Appl ICMLA
2018, pp. 1103–1108, https://​doi.​org/​10.​1109/​ICMLA.​2018.​00179.
77. Ghayvat H et al. (2022) “AI-enabled radiologist in the loop: novel AI-based framework to augment
radiologist performance for COVID-19 chest CT medical image annotation and classification from
pneumonia,” Neural Comput & Applic, vol. 1, https://​doi.​org/​10.​1007/​s00521-​022-​07055-1
78. Subramanian N, Elharrouss O, Al-Maadeed S, Chowdhury M (2022) A review of deep learning-
based detection methods for COVID-19. Comput Biol Med 143:105233. https://​doi.​org/​10.​1016/j.​
compb​iomed.​2022.​105233
79. Aggarwal P, Mishra NK, Fatimah B, Singh P, Gupta A, Joshi SD (2022) COVID-19 image classi-
fication using deep learning: advances, challenges and opportunities. Elsevier Ltd, https://​doi.​org/​
10.​1016/j.​compb​iomed.​2022.​105350.
80. Walvekar S, Shinde S (2021) “Efficient medical image segmentation of COVID-19 Chest CT
images based on deep learning techniques,” 2021 Int Conf Emerg Smart Comput Informatics,
ESCI 2021, pp. 203–206, https://​doi.​org/​10.​1109/​ESCI5​0559.​2021.​93970​43
81. Jain R, Singh S, Swami S, Kumar S (2021) Deep learning-based techniques to identify COVID-
19 patients using medical image segmentation. In: Manocha AK, Jain S, Singh M, Paul S (eds)
Computational intelligence in healthcare. Springer International Publishing, Cham, pp 327–342.
https://​doi.​org/​10.​1007/​978-3-​030-​68723-6_​18
82. Bhattacharyya A, Bhaik D, Kumar S, Thakur P, Sharma R, Pachori RB (2022) A deep learning
based approach for automatic detection of COVID-19 cases using chest X-ray images. Biomed
Signal Process Control 71, no. PB:103182. https://​doi.​org/​10.​1016/j.​bspc.​2021.​103182
83. Payan A, Montana G (2015) “Predicting Alzheimer’s disease: a neuroimaging study with 3D con-
volutional neural networks,” pp. 1–9, https://​doi.​org/​10.​1613/​jair.​301
84. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) “What is the best multi-stage architecture
for object recognition?,” Proc IEEE Int Conf Comput Vis, pp. 2146–2153, https://​doi.​org/​10.​1109/​
ICCV.​2009.​54594​69
85. Sarraf S, Tofighi G (2016) Classification of alzheimer’s disease structural MRI data by deep
learning convolutional neural networks. arXiv preprint, pp 8–12. http://​arxiv.​org/​abs/​1607.​
06583, https://​doi.​org/​10.​1097/​IAE.​00000​00000​001460
86. Hosseini-Asl E, Keynton R, El-Baz A (2016) Alzheimer’s disease diagnostics by adaptation of 3D
convolutional network. In: 2016 IEEE International Conference On Image Processing (ICIP), (vol.
502, pp 126–130). IEEE. https://​doi.​org/​10.​1109/​TNNLS.​2015.​24792​23
87. Khvostikov A, Aderghal K, Krylov A (2018) “3D Inception-based CNN with sMRI and MD-DTI
data fusion for Alzheimer’s Disease diagnostics,” no. July, https://​doi.​org/​10.​13140/​RG.2.​2.​30737.​
28006.

13
Multimedia Tools and Applications (2024) 83:36039–36080 36079

88. Kahramanli H (2012) A modified cuckoo optimization algorithm for engineering optimization. Int
J Futur Comput Commun 1(2):199
89. Sahumbaiev I, Popov A, Ramirez J, Gorriz JM, Ortiz A (2018) 3D-CNN HadNet classification
of MRI for Alzheimer’s Disease diagnosis. In: 2018 IEEE Nuclear Science Symposium and Medi-
cal Imaging Conference Proceedings (NSS/MIC). IEEE, pp 1–4. https://​doi.​org/​10.​1109/​NSSMIC.​
2018.​88243​17
90. Spasov SE et al. (2018) A Multimodal Convolutional Neural Network Framework for the Predic-
tion of Alzheimer ’ s Disease, pp 1271–1274. https://​doi.​org/​10.​1109/​EMBC.​2018.​85124​68
91. Wang Y, Yang Y, Guo X, Ye C, Gao N, Fang Y, Ma HT (2018) A novel multimodal MRI analysis
for Alzheimer’s disease based on convolutional neural network. In: 2018 40th Annual Interna-
tional Conference of the IEEE Engineering In Medicine and Biology Society (EMBC). IEEE, pp
754–757. https://​doi.​org/​10.​1109/​EMBC.​2018.​85123​72
92. Ge C, Qu Q (2019) Multiscale deep convolutional networks for characterization and detection of
alzheimer ’ s disease using mr images Dept. of Electrical Engineering, Chalmers University of
Technology, Sweden Inst. of Neuroscience and Physiology, Sahlgrenska Academy. IEEE Int Conf
Image Process, pp 789–793. https://​doi.​org/​10.​1109/​ICIP.​2019.​88037​31
93. Song T et al (2019) Graph convolutional neural networks for alzheimer ’ s disease. In: 2019 IEEE
16th Int Symp Biomed Imaging (ISBI 2019), no. Isbi, pp 414–417. https://​doi.​org/​10.​1109/​ISBI.​
2019.​87595​31
94. Liu L, Zhao S, Chen H, Wang A (2020) A new machine learning method for identifying Alz-
heimer’s disease. Simul Model Pract Theory 99:102023. https://​doi.​org/​10.​1016/j.​simpat.​2019.​
102023
95. Khagi B, Lee B, Pyun JY, Kwon GR (2019) CNN Models performance analysis on MRI images
of OASIS dataset for distinction between healthy and alzheimer’s patient. In: 2019 International
Conference on Electronics, Information, and Communication (ICEIC). IEEE, pp 1–4. https://​doi.​
org/​10.​23919/​ELINF​OCOM.​2019.​87063​39
96. Jain R, Jain N, Aggarwal A, Hemanth DJ (2019) ScienceDirect convolutional neural network
based Alzheimer ’ s disease classification from magnetic resonance brain images. Cogn Syst Res
57:147–159. https://​doi.​org/​10.​1016/j.​cogsys.​2018.​12.​015
97. Liu M et al (2018) A multi-model deep convolutional neural network for automatic hippocampus
segmentation and classification in Alzheimer’s disease. Neuroimage 208(August):2020. https://​
doi.​org/​10.​1016/j.​neuro​image.​2019.​116459
98. Impedovo D, Pirlo G, Vessio G, Angelillo MT (2019) A handwriting-based protocol for assessing neu-
rodegenerative dementia. Cogn Comput 11(4):576–586. https://​doi.​org/​10.​1007/​s12559-​019-​09642-2
99. Parmar H, Nutter B, Long R, Antani S, Mitra S (2020) Spatiotemporal feature extraction and clas-
sification of Alzheimer’s disease using deep learning 3D-CNN for fMRI data. J Med Imaging
7(05):1–14. https://​doi.​org/​10.​1117/1.​jmi.7.​5.​056001
100. Basaia S et al (2019) Automated classification of Alzheimer’s disease and mild cognitive impair-
ment using a single MRI and deep neural networks. NeuroImage Clin 21(December 2018):101645.
https://​doi.​org/​10.​1016/j.​nicl.​2018.​101645
101. Pan D, Zeng A, Jia L, Huang Y, Frizzell T, Song X (2020) Early detection of Alzheimer’s disease
using magnetic resonance imaging: a novel approach combining convolutional neural networks
and ensemble learning. Front Neurosci 14(May):1–19. https://​doi.​org/​10.​3389/​fnins.​2020.​00259
102. Vassanelli S, Kaiser MS, Eds NZ, Goebel R (2020) Series Editors https://​doi.​org/​10.​1007/​
978-3-​030-​59277-6
103. Gumma LN, Thiruvengatanadhan R, Kurakula L, Sivaprakasam T (2022) A survey on convolu-
tional neural network (deep-learning technique) -based lung Cancer detection. SN Comput Sci
3(1):1–7. https://​doi.​org/​10.​1007/​s42979-​021-​00887-z
104. She Y et al (2022) Deep learning for predicting major pathological response to neoadjuvant chem-
oimmunotherapy in non-small cell lung cancer: a multicentre study. eBioMedicine 86:104364.
https://​doi.​org/​10.​1016/j.​ebiom.​2022.​104364
105. Siegel RL, Miller KD, Fuchs HE, Jemal A (2022) Cancer statistics, 2022. CA Cancer J Clin
72(1):7–33. https://​doi.​org/​10.​3322/​caac.​21708
106. Guo Z, Xu L, Si Y, Razmjooy N (2021) Novel computer-aided lung cancer detection based on con-
volutional neural network-based and feature-based classifiers using metaheuristics. Int J Imaging
Syst Technol 31(4):1954–1969. https://​doi.​org/​10.​1002/​ima.​22608
107. Su Y, Li D, Chen X (2021) “Lung nodule detection based on faster R-CNN framework,” Comput
Methods Prog Biomed, vol. 200, p. 105866, https://​doi.​org/​10.​1016/j.​cmpb.​2020.​105866
108. Eltrass AS, Tayel MB, Ammar AI (2021) A new automated CNN deep learning approach for iden-
tification of ECG congestive heart failure and arrhythmia using constant-Q non-stationary Gabor

13
36080 Multimedia Tools and Applications (2024) 83:36039–36080

transform. Biomed Signal Process Control 65(November 2020):102326. https://​doi.​org/​10.​1016/j.​


bspc.​2020.​102326
109. Balcha AA, Woldie SA (2023) “Impact of genetic algorithm for the diagnosis of breast Cancer:
Literature Review,” no. January, https://​doi.​org/​10.​4236/​aid.​2023.​131005
110. Zheng J, Lin D, Gao Z, Wang S, He M, Fan J (2020) Deep learning assisted efficient AdaBoost
algorithm for breast Cancer detection and early diagnosis. IEEE Access 8:96946–96954. https://​
doi.​org/​10.​1109/​ACCESS.​2020.​29935​36
111. Ismail NS, Sovuthy C (2019) “Breast Cancer Detection Based on Deep Learning Technique,” 2019
Int Unimas Stem 12th Eng Conf EnCon 2019 - Proc, pp. 89–92, https://​doi.​org/​10.​1109/​EnCon.​2019.​
88612​56
112. Gouda W, Almurafeh M, Humayun M, Jhanjhi NZ (2022) Detection of COVID-19 based on chest
X-rays using deep learning. Healthc 10(2):1–19. https://​doi.​org/​10.​3390/​healt​hcare​10020​343
113. Chakraborty S, Murali B, Mitra AK (2022) “An efficient deep learning model to detect COVID-19
using chest X-ray images,” Int J Environ Res Public Health, vol. 19, no. 4, https://​doi.​org/​10.​3390/​
ijerp​h1904​2013
114. Pawar U, O’Shea D, Rea S, O’Reilly R (2020) “Explainable AI in Healthcare,” 2020 Int. Conf. Cyber
Situational Awareness, Data Anal. Assessment, Cyber SA 2020, no. July https://​doi.​org/​10.​1109/​
Cyber​SA493​11.​2020.​91396​55
115. Xiao X, Yan M, Basodi S, Ji C, Pan Y (2020) Efficient hyperparameter optimization in deep learning
using a variable length genetic algorithm. arXiv preprint arXiv:2006.12703. http://​arxiv.​org/​abs/​2006.​
12703

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

13

You might also like