A Review of Deep Learning Approaches in Clinical and Healthcare Systems Based On Medical Image Analysis
A Review of Deep Learning Approaches in Clinical and Healthcare Systems Based On Medical Image Analysis
https://doi.org/10.1007/s11042-023-16605-1
Abstract
Healthcare is a high-priority sector where people expect the highest levels of care and service,
regardless of cost. That makes it distinct from other sectors. Due to the promising results of deep
learning in other practical applications, many deep learning algorithms have been proposed for
use in healthcare and to solve traditional artificial intelligence issues. The main objective of
this study is to review and analyze current deep learning algorithms in healthcare systems. In
addition, it highlights the contributions and limitations of recent research papers. It combines
deep learning methods with the interpretability of human healthcare by providing insights into
deep learning applications in healthcare solutions. It first provides an overview of several deep
learning models and their most recent developments. It then briefly examines how these models
are applied in several medical practices. Finally, it summarizes current trends and issues in the
design and training of deep neural networks besides the future direction in this field.
1 Introduction
Nowadays, various diseases affect the world, and many patients suffer from different dis-
orders. Moreover, the world health organization (WHO) recorded a wave of severe infec-
tious disease epidemics in the twenty-first century, not least the COVID-19 pandemic. The
* Hadeer A. Helaly
[email protected]
Mahmoud Badawy
[email protected]
Amira Y. Haikal
[email protected]
1
Electrical Engineering Department, Faculty of Engineering, Damietta University, Damietta, Egypt
2
Computers and Control Systems Engineering Department, Faculty of Engineering, Mansoura
University, Mansoura, Egypt
3
Department of Computer Science and Informatics, Applied College, Taibah University,
Al Madinah Al Munawwarah 41461, Saudi Arabia
13
Vol.:(0123456789)
36040 Multimedia Tools and Applications (2024) 83:36039–36080
outbreaks of these diseases have impacted lives and livelihoods worldwide [1]. Therefore,
medical data analysis has changed medical experts’ approaches to recognizing, analyzing,
and identifying risks and reactions to medicines through treating disease processes. This
medical data is unstructured data created by hospitals and healthcare systems, such as med-
ical imaging (MI) data, genomic information, free text, and data streams from monitoring
equipment [2].
Manual medical data analysis is often time-consuming, and the chances of errors in the
interpretation are not irrelevant. It required professional doctors for accurate diagnosis.
Moreover, medical data is difficult to collect, expensive, and relatively rare. As a result,
automatic analysis of medical data using computer-aided diagnosis (CAD) systems is
an accurate solution for the early detection of various diseases worldwide [3]. Computer
Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound (US), and X-Rays are
the most often used medical imaging modalities. CT provides a higher resolution on high-
density tissue than the other medical imaging modalities, but it depends on the doctor’s
skill. X-Rays are convenient and inexpensive, making them ideal for first medical exami-
nations. However, CT and X-Rays harm human bodies. Therefore, patients should not do
them frequently. Unlike CT and X-Rays, MRI does not use ionizing radiation and can show
soft tissue more clearly. However, MRI takes a long time, and some patients may not be
willing to wait [2].
With the swift propagation of digital image acquisition and storage techniques, the inter-
pretation of images through computer programs has become a curious and active subject
in machine learning and application-specific research [4, 5]. Although Machine learning
(ML) techniques perform well in medical applications [6], deep learning (DL) is a robust
and reliable tool in medical and computer vision applications such as disease diagnosis,
image classification, and segmentation [7]. As a result, deep learning is involved in the
healthcare sector (HCS) [8]. DL is a branch of ML that uses neural networks with numer-
ous layers of artificial neurons to recognize patterns from the data sets [9, 10]. As a result,
it greatly enhanced state-of-the-art performance in several applications, including speech
recognition, object detection, visual object recognition, and other fields such as drug dis-
covery and genomics [11, 12].
The benefits of using deep learning in the healthcare sector are as follows: it gives fast,
accurate, and efficient operations in healthcare. It reduces the cost of care and prevents
reporting delays about critical and urgent. It minimizes the admin load of healthcare pro-
fessionals. It decreases errors in diagnosis by auditing prescriptions and diagnostic results
and provides faster diagnostics [13]. Transfer learning is distinguished by efficiency, sim-
plicity, and minimal training cost compared to deep learning approaches such as convo-
lutional neural networks. It eliminates the curse of limited datasets. Transfer learning can
use a pre-trained network as a basis rather than training the network from scratch, which
decreases the time and reduces the difficulty of training [14].
Over the past ten years, various Artificial Intelligence (AI) and DL technologies have
been utilized to analyze the enormous amounts of data in the healthcare industry. [15]. The
yearly publications of DL techniques for HCS in the PubMed database in the previous dec-
ade are shown in Fig. 1.
The motivation of this review is to summarize the most significant parts of DL in a sin-
gle review paper so that researchers and students may get a comprehensive vision of deep
learning in the healthcare sector. This review will assist individuals in learning more about
recent advancements in the fields, enhancing DL research. Researchers could decide on the
most appropriate work direction in this field to provide more accurate solutions. The main
novelty is introducing an overview of the recent deep learning algorithms in healthcare
13
Multimedia Tools and Applications (2024) 83:36039–36080 36041
8000
6874
7000
6000
5415
5000
4000 3472
3000
1975
2000
981
1000
406
39 62 71 111 175
0
Count
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Fig. 1 The yearly distribution of DL techniques in HCS in the PubMed database for the last decade
systems. Moreover, it highlights the contributions and limitations of recent research papers.
It merges deep learning methods with human healthcare interpretability by providing com-
prehensive knowledge of deep learning applications in healthcare solutions. It first provides
an overview of several deep learning models and their most recent developments. Then it
briefly goes through how they applied in several medical activities. Finally, it summarizes
current trends and issues in the design and training of deep neural networks, besides the
future directions of this field. Table 1 summarizes the frequently used abbreviations for the
reader’s convenience.
The main objective of this study is to offer an up-to-date review of research deep learn-
ing in the healthcare sector. This paper’s main contributions may be defined as follows:
13
36042 Multimedia Tools and Applications (2024) 83:36039–36080
• Analyzing the fundamental technologies that have the potential to reshape deep learn-
ing techniques in healthcare.
• Outlining open issues and challenges in existing healthcare deep learning models.
This paper is structured as follows: The definition and architecture of deep learning
methods are detailed in Section 2. The applications of deep learning in healthcare sys-
tems are discussed in Section 3. Section 4 presents open challenges and deep learning
trends in healthcare systems. Finally, Section 5 concludes the paper.
Deep learning has recently received much attention in the medical system [16]. As a
result, several related approaches have evolved. Figure 2 shows the classification of typ-
ical deep learning architectures for processing HCS data and their applications in the
HCS sector, especially disease detection.
According to the core methods in which they are formed, deep learning methods can
be divided into four categories: convolutional neural networks (CNNs), restricted Boltz-
mann machines (RBMs), autoencoders (AE), and sparse coding [17, 18]. DL is used
in speech recognition, image analysis, text mining, health monitoring, drug discovery,
computer vision, object recognition, and other applications [8].
Laplacian Local
Sparse Sparse Super-Vector
Sparse Coordinate
CodinG Coding SPM
Coding Coding
Coding
CNN-based
AlexNet GoogLeNet ResNET VGG
Methods
parkinson
Deep
Learning Speech Brain tumor
recognion
Drug epilepsy
discovery
Disease Breast cancer
detecon
Applicaons
Computer Lung cancer
vision
Object
Heart failure
detecon
Natural
language diabetes
processing
Fig. 2 The taxonomy of DL architectures for processing HCS data and its applications, particularly disease
detection [8, 17]
13
Multimedia Tools and Applications (2024) 83:36039–36080 36043
CNNs is one of the most well-known DL techniques, in which many layers are trained
robustly. It provides excellent performance and is implemented in different computer vision
applications, including image recognition, image classifications, object detection, and face
recognition. It also has tens or hundreds of layers. Each one learns to detect different fea-
tures of an image [19, 20]. The CNN structure’s whole pipeline is shown in Fig. 3 [21].
Traditional machine-learning approaches have three stages: feature extraction, reduc-
tion, and classification. In a conventional CNN, all of these phases are then merged. On
the contrary, iterative learning improves the weights of its early layers, which function as
feature extractors. Three layers constitute CNN: the convolution layer extracts features, the
pooling layer decreases dimensionality, and the fully-connected layer categorizes and low-
ers two-dimensional matrices into a one-dimensional vector [22].
2.1.1 CNN layers
Convolution layer The convolution layer is the first layer to extract features from the input
image. It is also a learnable filter that preserves the relationships between image pixels
when learning features from small squares in the input data [2].
Figure 4 depicts the convolutional operation for a 3D image of size H × W × C, where H
stands for height, W for width, and C for channel count—using a 3D filter with the dimen-
sions FH × FW × FC, where FH stands for filter height, FW for filter width, and FC for filter
channels. As a result, the size of the output activation map should be A H × AW, where A H
denotes the activation height, and A W means the activation width. The values of A H and
AW are calculated by Eqs.1 and 2 [22]. Where P represents padding, S represents the stride,
and filters are represented by n. As a result, the activation map size should be A H x Aw x n.
H − FH + 2P
AH = 1 + (1)
S
W − FW + 2P
AW = 1 + (2)
S
The stride is the number of pixels that shift over the input matrix. If the stride is one,
the filters are moved one pixel at a time. Figure 5 shows the stride operation. Because the
filter does not perfectly fit the input image, the padding technique is applied to fit the image
13
36044 Multimedia Tools and Applications (2024) 83:36039–36080
size. There are two forms of padding: zero padding, which pads the picture with zeros, and
proper padding, which removes the image segments where the filter did not fit and keeps
only the valid parts. For example, Fig. 6 displays the convolution operation with a stride of
two and 3 × 3 filters with zero padding.
13
Multimedia Tools and Applications (2024) 83:36039–36080 36045
Pooling layer The pooling layer refers to spatial pooling, subsampling, or down-sam-
pling. When the images are too huge, it reduces the number of parameters. Additionally, it
reduces each map’s dimensionality while preserving essential information. There are three
forms of pooling: maximum pooling, average pooling, and sum pooling. First, the largest
element of the rectified feature map is taken in the max-pooling process. Next, through
average pooling, it takes the average of the feature map elements. Finally, sum pooling
refers to the sum of all feature map elements. Figure 7 depicts the max-pooling operation.
Fully connected layer The fully connected (FC) layer is like a traditional neural network
that flattens the matrix into a vector. Then it feeds it into a fully connected layer. As illus-
trated in Fig. 8, all inputs from one layer are linked to each activation unit of the next layer.
Activation function The activation function transforms the inputs to the neuron nonlin-
early to overcome nonlinearity in the network. A sigmoid function represents the tradi-
tional activation function [23]. It provides probabilities for a data point belonging to
each class, with values ranging from zero to 1, defined in Eq. 3. The rectified linear unit
(ReLU) became a popular solution to solve sigmoid function drawbacks. ReLU is defined
as in Eq. 4. The ReLU cannot learn using gradient-based techniques since all the gradients
become zero. Therefore, a leaky ReLU (LReLU) is applied as a good solution, as defined
by Eq. 5 [24]. Figure 9 shows the difference between the three activation functions.
13
36046 Multimedia Tools and Applications (2024) 83:36039–36080
Fig. 9 The difference among the sigmoid, Relu, and LRelu activation functions [24]
1
fSigmoid = (3)
1 + e(−x)
{ }
x if x > 0
fLRelu =
.01 oterwise (5)
There are numerous CNN architectures, including LeNet, AlexNet, VGGNet, GoogLeNet,
ResNet, and ZFNet. These models hold the key to creating the algorithms that will soon
drive all of AI.
LeNet‑5 (1998) LeNet-5 is a convolutional network with seven levels. It began its opera-
tions in 1998 [25]. Several banks have used it to classify digits and find handwritten num-
bers on digital checks made from 32 × 32 pixel greyscale input images. However, this strat-
egy is limited by computer resources because high-resolution image processing requires
deep and more convolutional layers [26]. LeNet-5’s structure is depicted in Fig. 10.
AlexNet (2012) The network architecture of AlexNet is deeper than LeNet [27]. Addition-
ally, each layer has additional filters and stacks of convolutional layers. In 2012, AlexNet
13
Multimedia Tools and Applications (2024) 83:36039–36080 36047
won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), surpassing every
competitor and significantly reducing the top-5 error from 26% to 15.3%. Figure 11 depicts
the AlexNet architecture.
ZFNet (2013) The AlexNet architecture was enhanced by the ZFNet architecture. It was
created by adding more deep-learning features while modifying the hyper-parameters of
AlexNet on the same structure. With a top-5 error rate of 14.8%, ZFNet won the ILSVRC
in 2013 [28]. ZFNet’s architecture relied on 7 × 7 kernels rather than 11 × 11 kernels to
reduce the number of weights and network parameters. Additionally, it increased recogni-
tion accuracy. The ZFNet network appears in Fig. 12.
VGGNet (2014) VGGNet took first place in the ILSVRC 2014 competition. Simonyan
et al. created it, and it has 16 convolutional layers. It has a highly appealing architecture
due to its uniformity. It is similar to AlexNet in that it only contains 3 × 3 convolutions but
many filters [29, 30].
On four GPUs, VGGNet took 2–3 weeks to train. Nevertheless, it has recently become
the community’s most popular method to extract features from images. In addition, the
13
36048 Multimedia Tools and Applications (2024) 83:36039–36080
ResNet (2015) Compared to VGGNet, the Residual Neural Network (ResNet) is relatively
simple and has 152 layers. It outperforms human performance with a top-5 error rate of
just 3.57%. It includes “skip connections” and features heavy batch normalization [32–34].
Figure 15 shows the structure of the ResNet model.
Researchers create several CNN models daily; the comparison among recently devel-
oped CNN models is discussed in Table 2.
13
Multimedia Tools and Applications (2024) 83:36039–36080 36049
2.3 Autoencoder‑based methods
It is utilized to describe the input data through learning an over-complete set of its fun-
damental functions. Table 5 illustrates the comparison among different Sparse coding
Algorithms.
Focusing on healthcare systems, it has considered four deep learning algorithms.
These algorithms are CNN, Deep Belief Network, Auto-Encoder, and Recurrent Neu-
ral Networks (RNN). For disease detection applications, these designs are commonly
employed [8]. Figure 16 depicts the distribution of deep learning methods used in
healthcare systems, including CNN, DBN, AE, and RNN. Figure 17 illustrates disease
publications as an application of DL methods based on the PubMed database.
Hundreds, thousands, or millions of images are used to train a neural network. Thus,
GPUs may significantly reduce the time to train a model when working with large data
and complex network architecture.
2.5.1 Training type
Training from scratch Building a network from scratch permits choosing the network
shape. This method gives high network control and may yield amazing results. However, it
necessitates a strong knowledge of the architecture of a neural network, the numerous layer
types, and configuration options. While its results can sometimes be better than transfer
13
Table 2 Comparison of recently developed CNN models
36050
Model Size (MB) Top-1 Accuracy Top-5 Accuracy Parameters Depth Time (ms) per inference Time (ms) per
step (CPU) inference step
(GPU)
13
Xception 88 0.79 0.945 22,910,480 126 109.42 8.06
VGG16 528 0.713 0.901 138,357,544 23 69.5 4.16
VGG19 549 0.713 0.9 143,667,240 26 84.75 4.38
ResNet50 98 0.749 0.921 25,636,712 – 58.2 4.55
ResNet101 171 0.764 0.928 44,707,176 – 89.59 5.19
ResNet152 232 0.766 0.931 60,419,944 – 127.43 6.54
ResNet50V2 98 0.76 0.93 25,613,800 – 45.63 4.42
ResNet101V2 171 0.772 0.938 44,675,560 – 72.73 5.43
ResNet152V2 232 0.78 0.942 60,380,648 – 107.5 6.64
InceptionV3 92 0.779 0.937 23,851,784 159 42.25 6.86
InceptionResNetV2 215 0.803 0.953 55,873,736 572 130.19 10.02
MobileNet 16 0.704 0.895 4,253,864 88 22.6 3.44
MobileNetV2 14 0.713 0.901 3,538,984 88 25.9 3.83
DenseNet121 33 0.75 0.923 8,062,504 121 77.14 5.38
DenseNet169 57 0.762 0.932 14,307,880 169 96.4 6.28
DenseNet201 80 0.773 0.936 20,242,984 201 127.24 6.67
NASNetMobile 23 0.744 0.919 5,326,716 – 27.04 6.7
NASNetLarge 343 0.825 0.96 88,949,818 – 344.51 19.96
EfficientNetB0 29 – – 5,330,571 – 46 4.91
EfficientNetB1 31 – – 7,856,239 – 60.2 5.55
EfficientNetB2 36 – – 9,177,569 – 80.79 6.5
EfficientNetB3 48 – – 12,320,535 – 139.97 8.77
EfficientNetB4 75 – – 19,466,823 – 308.33 15.12
EfficientNetB5 118 – – 30,562,527 – 579.18 25.29
EfficientNetB6 166 – – 43,265,143 – 958.12 40.45
EfficientNetB7 256 – – 66,658,687 – 1578.9 61.62
Multimedia Tools and Applications (2024) 83:36039–36080
Multimedia Tools and Applications (2024) 83:36039–36080 36051
learning results, this method usually necessitates more images for training because the new
network needs numerous examples of the object to understand the feature variations.
Training from scratch takes longer, and many network layer possibilities exist for config-
uring a network from scratch. When building a network and arranging the layers, looking
to previous network designs is common to see what other researchers have proven useful.
Significantly, the growth in deep learning publications is affected by the GPU and GPU-
computing libraries (CUDA, OpenCL). GPUs are typically 10–30 times faster than central
processing units (CPUs) [35]. That is because GPUs are highly parallel computing devices
with a vastly increased number of execution threads compared to CPUs. Therefore, in addi-
tion to the Hardware, the availability of open-source software packages is a crucial element
that enables efficient GPU implementations of critical neural network functions. Theano,
Torch, Tensorflow, and Cafee are the most often used packages.
3 Application of DL in HCS
Computer-Aided Detection and Diagnosis (CAD) systems depend on medical image analy-
sis and disease detection techniques in healthcare systems, such as classification, segmen-
tation, localization, and object detection. There are several types of medical images; the
comparison is conducted in Table 7 [3, 36] and shown in Fig. 18.
A well-known area of study in computer vision [37] research is image segmentation, which
has recently gained attention in image processing. It divided the image into several dis-
jointed parts based on features, including grayscale, color, spatial texture, and geometric
shapes. Additionally, it identifies organs or lesions in background medical images by their
pixels [7] and offers crucial details about their volumes and shapes [38]. Besides, it does
2D or 3D image analysis and processing to segment, extract, rebuild, and display human
organs, soft tissues, and diseased bodies in three dimensions [39, 40].
Deep learning techniques for medical images segmentation In the image segmentation
field, deep learning algorithms have lately achieved considerable advancements. The effec-
tiveness of its segmentation has surpassed that of traditional segmentation techniques. The
13
36052
13
Table 3 The comparison among Restricted Boltzmann Machines models [17]
Method Characteristics Benefits Drawbacks
1-Deep Belief Networks (DBNs) Directed connections in the lower levels, • It somewhat prevents poor local optima DBN model is computationally expensive
with undirected connections in the top because It initializes the network to create.
two layers. properly.
• No need for labeled data for training;
training is unsupervised.
2-Deep Boltzmann Machines (DBMs) All layers of the network have Undirected It combines top-down feedback to deal Joint optimization is time-consuming.
connections. with ambiguous inputs more effectively.
3-Deep Energy Models (DEMs) For the lower layers, deterministic hidden Allowing the lower layers to adjust to the The initial learned weight may have poor
units are used, and stochastic hidden training of the higher layers creates bet- convergence.
units are applied for the top hidden layer. ter generative models.
Multimedia Tools and Applications (2024) 83:36039–36080
Table 4 The comparison among Autoencoder-based Methods [17]
Method Characteristics Advantages
1-Sparse To make the representation sparse, it applies a sparsity penalty. • It categorizes input data more separably.
Autoencoder • Make the complex data more meaningful.
• Compatible with the biological vision system.
2-Denoising From corrupted data, it restores the proper input data. Remove noise efficiently.
Autoencoder
Multimedia Tools and Applications (2024) 83:36039–36080
3-Contractive It gives the reconstruction error function an analytical contractive penalty. More accurately depicts the local directions of variation determined by the
Autoencoder data.
4-Saturating Reconstruction error increases when inputs are far from the data manifold. Restricts the ability to rebuild inputs that are far from the data manifold.
autoencoder
5-Convolutional It preserves spatial locality while distributing weights across all input loca- Utilizes the 2D image structure.
autoencoder tions.
6-Zero-bias To train autoencoders without further regularisation, it utilizes the use of Greater ability to learn representations on data with high inherent dimension-
autoencoder appropriate shrinkage function. ality.
36053
13
36054
13
Table 5 Sparse Coding Algorithms comparisons
Algorithm Definition Pros and cons
1-Sparse It is an enhanced version of the Spatial Pyramid Match- Pros: compared to vector quantization(VQ), it has a less
coding SPM (ScSPM) ing (SPM) method. assignment restriction
Cons: it ignores the mutual dependence of the local
features.
2-Laplacian Identical features are given to properly selected cluster Pros: Enhance similar features to keep the mutual
Sparse Coding (LSC) centers and ensure that the chosen cluster centers are dependency in the sparse coding.
similar. Cons: Expensive computational cost.
3-Hyper- graph Laplacian Sparse Coding (HLSC) It expands the LSC to include the situation in which Pros: It enhances the robustness of sparse coding.
method a hypergraph identifies the similarity between the Cons: It ignores discriminative information.
instances.
4-Local Coordinate Coding (LCC) It encourages the coding to be local and theoretically Pros: it has a computational advantage over classical
shows that locality is more vital than sparsity. sparse coding
Cons: It needs to solve the L1-norm optimization prob-
lem, which is time-consuming.
5- Locality- Constrained Linear Coding (LLC) It replaces the L1-norm regularization with the L2-norm Pros: Accelerate the process.
regularization. Cons: Expensive computational cost.
6-super-vector coding SVC is a simple extension of VQ by expanding VQ in Pros: Represent a smoother coding scheme.
(SVC) local tangent directions. Cons: Expensive computational cost.
7-Smooth Sparse Coding (SSC) It incorporates neighborhood similarity and temporal Pros: Lower mean square reconstruction error.
information into sparse coding. Cons: Expensive computational cost.
8-Deep Sparse Coding Deep Sparse Coding (DeepSC) It extends sparse coding to a multi-layer architecture. Pros: It has the best performance among the sparse cod-
ing schemes.
Cons: High memory requirements.
Multimedia Tools and Applications (2024) 83:36039–36080
Multimedia Tools and Applications (2024) 83:36039–36080 36055
3000
2500
2000
1500
1000
500
0
CNN AE DBN RNN
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Fig. 16 The distribution of CNN, DBN, AE, and RNN deep learning methods applied in the healthcare
systems
600 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
500
400
300
200
100
0
brain tumor lung cancer breast cancer heart failure parkinson eye disease diabetes
2012 1 1 2 1 8 1 1
2013 1 1 3 1 7 1 1
2014 1 1 3 1 14 2 2
2015 3 3 5 1 13 7 3
2016 8 3 22 1 25 13 14
2017 13 23 49 9 27 32 34
2018 48 69 100 18 50 95 63
2019 97 161 203 25 83 181 126
2020 181 276 280 44 95 338 218
2021 236 373 410 98 123 450 313
2022 331 457 559 111 152 522 411
fully convolutional network [41] was the first to effectively apply deep learning for image
semantic segmentation because it does not demand manual image feature extraction or
extensive image preprocessing. As a result, it has achieved significant success and pioneer
work in the field and auxiliary diagnosis.
Several deep learning algorithms used in image segmentation, such as DeepLab v1 [42],
DeepLab v2, DeepLab v3 [43], DeepLab v3+, SegNet [44], 2D U-Net [45], 3D U-Net
[46], Mask R-CNN, RefineNet, and DeconvNet, which have a strong advantage in process-
ing fine edges. The comparison among these algorithms is shown in Table 8. In addition,
13
36056 Multimedia Tools and Applications (2024) 83:36039–36080
Pros 1-. It gives high control over the network. 1. Faster and easier than train-
2-. It produces impressive results. ing from scratch.
3-3-. Its results sometimes be greater than transfer 2. It requires a low amount
learning. of data and computational
power.
Cons 1. It requires more images for training. 1. less control over the network.
2. Its training times are often longer.
several segmentation methods are used for medical image analysis, including Cascaded,
multi-modality, Single-modality, Patch-wise, and Semantic-wise, as shown in Table 9.
Several works have demonstrated that medical image segmentation is used widely in the
health care system for the early detection of various diseases such as brain tumors, lung
cancer, heart failure, and other diseases based on several techniques of DL in HCS listed
above [50–54]. The comparison among recent works for disease detection based on medi-
cal image segmentation is depicted in Table 10.
Brain tumor detection Deep learning methods have been extensively employed for brain
image processing across many application domains. Numerous research focuses on seg-
menting brain tissue and anatomical features and diagnosing Alzheimer’s disease (AD)
(e.g., the hippocampus). Detecting and segmenting lesions are additional crucial areas
(e.g., tumors, white matter lesions, lacunes, and micro-bleeds) [55].
J. Dolz et al. [56] proposed reliable 3D FCNNs for detecting brain tumors and AD for
subcortical MRI brain structures segmentation. H.Allioui et al. [57] detect mental disorders
based on U-Net architecture by segmenting 2.5D MRI system analysis. Jingwen et al. [58]
proposed a 3D CNN based on V-Net to segment the bilateral hippocampus from 3D brain
MRI scans and diagnose AD progression states. D.Chitradevi et al. [59] segment the HC
area using a variety of optimization approaches, including the genetic algorithm (GA), the
lion optimization algorithm (LOA), the artificial bee colony (ABC), the BAT algorithm,
particle swarm optimization (PSO). When these optimization techniques were compared,
LOA exceeded the others owing to its ability to escape from local optima.
Furthermore, Diedre Carmo et al. [60] suggested a hippocampus segmentation approach
based on a deep learning U-net. It employs a 2D extended multi-orientation design. The
method was developed and validated using a public Alzheimer’s disease hippocampus seg-
mentation (HarP) dataset. The approach worked well, although the overall Dice and left/
right Dice had a low standard deviation. Finally, Sammaneh et al. [61] created a robust
Automatic atlas CNN-based hippocampal segmentation method named DeepHarp for
hippocampus delineation. The approach was developed and tested using the ADNI har-
monized hippocampal protocol (HarP). Moreover, the left and right hippocampus is seg-
mented by Helaly et al. [62] For early detection of AD and brain tumors based on NITRIC
13
Multimedia Tools and Applications (2024) 83:36039–36080 36057
Magnetic resonance imaging (MRI) MRI creates accurate and detailed images of internal
organs and tissues using strong magnets and radio
frequency (RF) waves.
T1-weighted image(MRI-T1WI) it is one of the simplest MRI pulse sequences for dis-
playing differences in tissue T1 relaxation times.
T2-weighted imaging(MRI-T2WI) It highlights variations in tissue T2 relaxation times.
Fluid-attenuated inversion recovery(MRI-FLAIR) An MRI sequence called FLAIR has an inversion recov-
ery set to null fluids.
Diffusion-weighted imaging(MRI-DWI) It demonstrates the strength of diffusion molecular
motions inside a tissue structure or at the edges of
white and grey matter brain tissues and brain lesions.
Diffusion tensor imaging(MRI-DTI) Instead of simply assigning contrast or color to pixels in
a cross-sectional image, DTI is a magnetic resonance
imaging technique that creates neural tract images by
detecting the restricted diffusion of water in tissue.
Positron emission tomography (PET) With diffusion-weighted, it provides greater soft-tissue
contrast and a way to measure cellular density.
Computed tomography (CT) produces cross-sectional (tomographic) images using
computer-processed adaptations of numerous X-ray
measurements obtained from different angles.
and ADNI datasets. The dataset is processed using the MIPAV program and augmented by
DCGAN. The methods depend on transfer learning and U-NET architecture.
Lung Cancer detection For both males and females, lung cancer is the leading cause of
cancer-related death [63–65]. Yeganeh et al. [66] suggested a modified U-Net (Res BCDU-
Net) to automatically segment lung CT images, replacing the encoder with a pre-trained
ResNet-34 network. Kamel et al. [67] utilized CT cancer images from the Task06 Lung
database to FCN. The FCN architecture was inspired by V.Net architecture for its efficiency
in selecting a region of interest (ROI) using 3D segmentation.
13
36058 Multimedia Tools and Applications (2024) 83:36039–36080
Fully Convolutional • It is the most powerful and effective deep-learning technology for semantic
Neural Networks segmentation.
(FCN) • It is proposed by J. Zhuang et al. [41].
DeepLab v1 • It is proposed by Chen et al. [42]
• The score map result achieved is denser than that of FCN, and the size of the
pooled image is not lowered.
• The padding size was reduced from the original 100 to 1, and the pooling stride
was adjusted from the original 2 to 1.
DeepLab v2 • It is an enhanced version of DeepLab v1.
• It overcomes the segmentation challenge brought by differences in the same
object scale in the image.
DeepLab v3 • It used the ResNet-101 network.
• A cascaded or parallel atrous convolution module is developed to address the
challenge of multiscale target segmentation [43].
SegNet • To achieve end-to-end pixel-level image segmentation, SegNet [44] constructs an
encoder-decoder symmetric structure based on the semantic segmentation task of
FCN.
• It comprises two parts: the encoder that analyzes object information and the
decoder that corresponds the parsed information into the final image form.
U-Net • It is proposed by Ronneberger et al. [45] to design a U-Net network for biomedi-
cal images.
• Due to its excellent performance, it is used in various sub-fields of computer
vision (CV), such as image segmentation.
• It is composed of a U-channel and skip connection.
• The U channel is similar to the encoder-decoder structure of SegNet.
Heart failure detection Early detection of congestive heart failure (CHF), a progressive
and complicated syndrome caused by ventricular dysfunction, is difficult. Meng Lei et al.
[68] suggested heart rate variability (HRV) as a predictive biomarker for CHF. Due to the
success of the segmentation of medical images using 2-D UNet++, It is demonstrated that
deep learning-based HRV evaluation can be an effective tool for the early diagnosis of CHF
and may assist doctors in making prompt and accurate diagnoses. More training data is
required for a more robust diagnosis, particularly for many heart rhythm disorders. Cardio-
XAttentionNet is suggested by Innat [69] to classify and localize cardiomegaly effectively.
Breast Cancer detection Breast cancer is one of the critical health issues for women
worldwide. It is one of the severest threats to women’s health [70–74]. Successful treat-
ment for breast cancer depends on early diagnosis of the disease [2, 75]. Based on deep
learning architecture u-net, Rania Almajalid et al. [76] presented a novel segmentation
framework for breast ultrasound images. The framework is utilized to detect and classify
breast abnormalities. The dataset was very small, so data augmentation techniques were
applied to expand it.
COVID‑19 Coronavirus, also known as COVID-19, is a virus that was initially discovered
in Wuhan, China, in December 2019 and quickly spread around the world [77–79]. Sanika
et al. [80] presented an approach of Computed Tomography (CT) Segmentation of lung
13
Multimedia Tools and Applications (2024) 83:36039–36080 36059
images using the U-Net architecture to detect covid 19. The limited dataset was an obsta-
cle, so data augmentation techniques were applied to overcome this problem.
Rachna Jain et al. [81] used U-NET, an encoder-decoder network, along with ResNet-34
to detect covid 19. The proposed method depended on the transfer learning concept. To
identify Covid 19, Abhijit Bhattacharyya et al. [82] used a conditional generative adver-
sarial network (C-GAN) to obtain the lung images and detect the disorder. The authors put
several U-net topologies to the test. However, the C-GAN model produced the best results
among the tested supervised learning methods.
Table 10 contrasts recent related works that used deep-learning approaches for health-
care, especially in several diseases detection. The comparison covers the used dataset, tech-
niques, contributions, and limitations of each state of the art. Finally, Table 10 compares
the results of each paper according to the different performance matrices used in each one.
Medical image classification is one of the most vital concerns in image recognition. Its tar-
get is categorizing medical images into distinct groups to aid clinicians in disease diagnosis
and research [5]. It has been widely studied in healthcare systems and disease detection and
involves several issues and challenges.
13
Table 10 Comparison of recent works for diseases detection based on medical image segmentation
36060
13
Brain Tumor J. Dolz, C et al. 2018 (ABIDE, ISBR) 3D FCNNs 1. The method is robust. 1. High computational com- DSC(dice similarity coef-
[56] 2. The network is less prone plexity. ficient) = 92%
to overfitting. 2. High memory requirements.
H. Allioui et al. 2019 OASIS U-Net 1. It benefits from 3D The network was trained Accuracy = 92.71%
[57] architecture. from scratch and did not Sensitivity = 94.43%
2. It reduces complexity and benefit from transfer learning Specificity = 91.59%
computational costs. concepts.
Jingwen et al. 2020 ADNI 3D CNN based on V-Net. 1. The model performed 1. Applied to a small dataset. DSC = 0.9162 ± 0.023
[58] well in the three-category 2. Computational complex-
classification task of ity when dealing with 3D
pathological brain states. images.
2. The model segmented 3. Sample numbers of three-
the bilateral hippocampus category with AD progression
Accurately. states are unbalanced.
D. Chitradevi et al. Hospital images Optimization techniques 1. The system does not The system is not applied for Accuracy = 95%
2020 [59] include GA, ABC, BAT, include highly complex Mild cognitive impairment, Sensitivity = 94%
PSO, and LOA. computations and hard- which allows the doctor to Specificity = 93%
ware implementations. examine AD early.
2. The LOA gave high per-
formance than others.
Diedre Carmo et al. HarP U-Net It gave a precise perfor- 1. Low standard deviation DSC = 90%
2021 [60] mance on the public HarP between overall Dice and left/
hippocampus segmenta- right Dice.
tion benchmark. 2. That method was not ready to
treat hippocampus resection
due to epilepsy treatment.
Sammaneh et al. 2021 HarP DeepHarp based on CNN The method was robust and The model was built from DSC = 88%
[61] for hippocampus seg- highly accurate, aiding scratch and did not use the
mentation atrophy measurements in transfer-learning concept.
various pathologies.
helaly et al.2022 [62] ADNI, NITRIC U-NET It offers superior accuracy, It applies binary segmentation Accuracy =97% Dice simi-
sensitivity, specificity, and does not utilize multi-seg- larity coefficient = 94%
and Dice similarity coef- mentation MRI brain features
ficient performance com- of Alzheimer’s disease.
Multimedia Tools and Applications (2024) 83:36039–36080
Lung cancer Yeganeh et al.2021 LIDC-IDRI dataset, which U-Net(Res BCDU-Net) It produces all mask images Not apply the proposed method Dice coefficient
[66] involves lung cancer CT intelligently, without for 3D lung CT images. index = 97.31%
scans needing a radiologist’s
expertise, saving much
time.
Kamel et al.2021 [67] Task06_Lung database (96 FCN inspired by V.Net • The method is less prone The robustness of the 3D-VNet The average DCS is 80% for
CT images) with marked- architecture to overfitting. architecture should be ROI and 98% for surround-
up annotated tumors. • It achieves high per- extended to produce a useful ing lung tissues.
formance in 3D lung clinical system for lung
segmentation diseases.
Heart failure Meng Lei et al. [68] Two open-source databases 2-D UNet++ The proposed method • It is applied to a few training Accuracy = 85.64%, 86.65%,
are utilized. has provided promising data. and 88.79% when 500,
results. • Other heart rhythm abnormal- 1000, and 2000 RR
ities are required to provide a intervals are utilized,
more reliable diagnosis. respectively
Multimedia Tools and Applications (2024) 83:36039–36080
Breast cancer Rania Almajalid et al. The doctors collected the deep learning architecture The proposed method is It needs to evaluate the tech- Dice coefficient = 0.825
[76] dataset from the Second U-Net robust and improves nique on new datasets. similarity rate = 0.698
Affiliated Hospital of state-of-the-art perfor-
Harbin Medical Univer- mance.
sity in China
COVID-19 Sanika et al. [80] Collected from various architecture U-Net The model did not suffer Limited dataset. Dsc = 89%
organizations such as from overfitting. Precision = 85%
Radiopedia, licensed Recall = 88%
under CC BY-NC-SA,
and Corona Cases
Initiative
Abhijit Bhattacharyya Publicly available chest C-GAN It applied transfer learning The proposed method was Accuracy = 96.6%
et al. [82] radiographs (SCR) data- concepts rather than trained and tested on a small
set X-ray images dataset trained networks from dataset.
scratch, which gave good
outcomes.
36061
13
36062 Multimedia Tools and Applications (2024) 83:36039–36080
Brain tumor detection (Alzheimer’s disease) Payan et al. [83] employed a sparse autoen-
coder and 3D convolutional neural networks to detect brain tumors using MRI images.
With fine-tuning its convolutional layers, performance is expected to increase [84]. Sarraf
et al. [85] classified healthy (HC) from unhealthy brains using the LeNet-5 CNN architec-
ture. The work provided in [83] was then developed by Hosseini et al. [86].
For AD detection, many researchers competed for the early diagnosis of the disease and
determined its stages, such as mild cognitive impairment (MCI), early MCI (EMCI), and
late MCI (LMCI). Wang et al. [24] utilized an eight-layer CNN structure. Six layers ser-
viced the feature extraction process, and the other two fully linked layers served the clas-
sification process. Khvostikov et al. [87] also employed a 3D Inception-based CNN with
better performance than AlexNet [88]. Sahumbaiev et al. [89] also developed a HadNet
design. For improved training, the collection of MRI images is spatially normalized and
skull-stripped using the Statistical Parametric Mapping (SPM) toolbox. The Apolipopro-
tein E expression level4 (APOe4) model was proposed by Spasov et al. [90]. The APOe4
model was fed data from MRI scans, genetic tests, and clinical evaluations.
Unique CNN architectures were proposed by Wang et al. [91], Ge et al. [92], Song et al.
[93], and Liu et al. [94], based on a different model of MRI to detect Alzheimer’s disease
and classify its stages. Based on the transfer learning concept, Khagi et al. [95] proposed
shallow tuning of a pre-trained model like AlexNet, Google Net, and ResNet50. Moreover,
Jain et al. [96] have suggested that the PFSECTL mathematical model depends on VGG-16
pre-trained models. Finally, a multi-task CNN and the 3D Densely Connected Convolu-
tional Networks (3D DenseNet) models were combined to classify the disease status by Liu
et al. [97].
For neurodegenerative dementia diagnosis, Impedovo et al. [98] proposed a cognitive
model for assessing the link between cognitive functions and handwriting processes in
healthy people and patients with cognitive impairment. Four stages of Alzheimer’s disease
are classified by Harshit et al. [99] using 3D CNN architecture based on 4D FMRI images.
Furthermore, Silvia et al. [100] and Dan et al. [101] detect different Alzheimer’s disease
stages based on novel CNN structures and 3D MRI images. In addition, Juan Ruiz et al.
[102] provided 3D Densely Connected Convolutional Networks (3D DenseNets) for 4-way
classification. The comparison among recent works that used MIC in HCS for brain tumor
and Alzheimer’s disease (AD) detection is listed in Table 11.
Lung Cancer detection Lung cancer is a high-risk disease that affects people all over the
world. Lung nodules are the most common early lung cancer symptom. Automatic lung
nodule detection reduces radiologists’ workload, the rate of misdiagnosis, and missed diag-
noses [103–105].
Zhiqiang Guo et al. [106] suggested a lung cancer diagnosis system based on computed
tomography scan images for lung cancer diagnosis. It consecutively employed two effec-
tive strategies to find efficient results: the CNN-based classifier and the feature-based clas-
sifier. The case study is healthy if the feature-based method does not detect cancer; other-
wise, the case study is cancerous, as shown in Fig. 19, which displays various samples of
the Lung CT-Diagnosis.
Moreover, Ying Su et al. [107] presented a Faster R-CNN algorithm for detecting these
lung nodules. Figure 20 depicts the proposed algorithm’s whole pipeline. The method used
the ZF and VGG 16 models for training and testing steps as the basic feature extraction.
13
Multimedia Tools and Applications (2024) 83:36039–36080 36063
Breast Cancer detection When the Breast cells become malignant, cancerous lesions, it
will be the beginning stages of breast cancer. Self-tests and regular medical examinations
significantly aid diagnosis, effectively enhancing survival chances [73, 109]. Jing Zheng
et al. [110] suggested a Deep Learning assisted Efficient Adaboost Algorithm (DLA-
EABA) for breast cancer diagnosis using advanced computational approaches. The Ada-
Boost technique generated an ensemble classifier’s final prediction function. Figure 21
illustrates the whole suggested architecture in [110]
Nur Syahmi Ismail et al. [111] employed VGG16 and ResNet50 deep learning models
to classify normal and abnormal tumors using the IRMA dataset and compare the results
of the two models. The suggested method included image preprocessing, classification, and
performance evaluation.
COVID 19 Based on a deep neural network model (ResNet-50), Walaa Gouda et al. [112]
proposed two distinct DL methods for COVID-19 prediction using chest X-ray (CXR)
images. The suggested approaches are assessed against two publicly available benchmark
datasets often used by researchers: the COVID-19 Image Data Collection (IDC) and CXR
Images(Pneumonia). Figure 22 illustrates the suggested method in [112].
A dataset of 10,040 X-ray (CXR) images samples, of which 2143 had COVID-19, 3674
had pneumonia (but not COVID-19), and 4223 were normal (not COVID-19 or pneumo-
nia) were used by Somenath Chakraborty et al. [113] To detect COVID-19. The method
enabled radiologists to filter potential candidates in a time-effective manner to detect
COVID-19.
Table 12 compares recent works published in heart failure, lung cancers, and breast
disease using DL based on medical image classification. The comparison focuses on the
used dataset, techniques, advantages, and constraints of each state of the art. Finally, the
table compares the results of each paper according to different performance matrices used
in each one, such as accuracy, Recall, sensitivity, precision, F1-score, specificity, and
sensitivity.
Figure 23 compares the recent works published on heart failure, lung cancer, breast dis-
ease, and covid 19 based on MIC according to accuracy metric.
Object detection is distinct but closely related to an image classification task. For image
classification, the whole image is utilized as the input. In addition, the class label of objects
within the image is predicted. For object detection, besides reporting the presence of a
given class, it estimates the position of the instance (or instances).
13
Table 11 Comparison of recent works that used MIC in HCS for AD detection
36064
13
Payan et al. [83] Sparse autoencoders and 3D-CNN • It combines sparse autoencod- • Computational complexity at the AD vs. MCI: 86.84%
ers and convolutional neural training stage. HC vs. MCI: 92.11%
networks. • It was pre-trained but was not AD vs. EMC vs.HC: 89.47%
fine-tuned. It is predicted that fine- AD vs. HC: 95.39%
tuning will enhance performance.
Sarraf et al. [85] CNN and LeNet-5. • It successfully classified Alz- • This method is not generalized AD vs. HC: 98.84%
heimer’s subjects from normal to predict different stages of
controls with high accuracy. Alzheimer’s disease for different
• By unique architecture, it allows age groups.
researchers to perform feature
selection and classification.
Hosseini-Asl et al. [86] 3D-CNN built upon a 3D convolu- • The method provided high robust- • It employs only a single imaging AD/MCI: 95%
tional autoencoder ness and confidence for the AD modality (sMRI). MCI/NC: 90.8%
predictions. • It performs no prior skull-strip- AD + MCI/NC: 90.3%
ping preprocessing. AD/NC: 97.6%
AD vs. EMC vs.HC:89.1%
Korolev et al. [20] Residual and plain 3D-CNN • It proved how similar performance • This method is not generalized to LMCI vs. NC: 61%
could be achieved by exceeding predict multi-classification stages LMCI vs. EMCI: 52%
feature extraction steps. of Alzheimer’s disease for differ- EMCI vs. NC: 56%
• The ease of use ent age groups. AD vs. NC: 80%
• no need for handcrafted feature AD vs. EMCI: 63%
generation. AD vs. LMCI: 59%
Wang et al. [24] CNN • Compared to state-of-the-art tech- • It needs to apply transfer learning AD/NC: 97.65%
niques, the classification accuracy because it can handle a small-size
improved by around 5%. dataset more efficiently.
• The hyperparameters were
obtained by experience. So, a ran-
dom search method must be tested
to optimize the hyperparameters.
Multimedia Tools and Applications (2024) 83:36039–36080
Table 11 (continued)
Approach Technique Advantages Drawbacks Results
Khvostikov et.al. [87] 3D Inception-based CNN • The 3D Inception-based CNN per- • It concentrated exclusively on MCI vs. NC: 73.3%
forms much better when compared the ROI of the hippocampus AD vs. MCI vs. NC: 68.9%
to the traditional AlexNet-based biomarker. AD vs. NC: 93.3%
network utilizing data from the • It is important to include other AD vs. MCI: 86.7%
ADNI dataset. ROIs that deteriorate from AD.
Sahumbaiev et al. [89] 3D CNN • The trained classifier gives • The developed classifier used AD/ MCI/ NC: 88.31%
promising classification results the whole MR image based on
to distinguish between AD, MCI, learned features and did not use
and HC. segmented brain regions.
• Applying the Bayesian optimi- • It is predicted to improve the
zation process boosted hyper- sensitivity and specificity of the
parameters and activation function HadNet is improved.
tuning.
Spasov et al. [90] The (APOe4) CNN model • It is effective in reducing overfit- • This method is not generalized to AD/ NC: 99%
ting, computational complex- predict different stages of Alzhei-
Multimedia Tools and Applications (2024) 83:36039–36080
ity, memory requirements, and mer’s disease for other age groups.
prototyping speed through the use
of parameters.
• Since it only takes 20 to 30 sec-
onds for each epoch on an Nvidia
PASCAL TITAN X GPU, it is less
prone to overfitting and quick to
fine-tune and prototype.
Yan Wang et al. [91] A multimodal deep learning frame- • The method combines automatic • Instead of using the 2D approach, AD/aMCI/NC: 92.06%
work based on a CNN. hippocampal segmentation and it needs to try 3D neural con-
AD classification using structural volution since it could produce
MRI data. improved performance.
• It achieved a higher classification
accuracy.
36065
13
Table 11 (continued)
36066
13
Khagi et al. [95] shallow tuning and fine-tuning of • The result shows that the perfor- • Training time increases with an AD/ NC: 98.51%
the pre-trained models (Alexnet, mance is better when tuning most increase in the number of layers
Googlenet, and Resnet50) layers. to tune.
• Increasing the depth of the learn-
ing model does not always result
in a good performance.
Jain et al. [96] Mathematical model PFSECTL • It achieves high accuracy for the • MCI is the most difficult class to AD vs. MCI: 99.30%
three-way classification. classify since it is an intermediate MCI vs. CN: 99.22%
• It uses the transfer learning stage between AD and CN AD vs. MCI vs. NC: 95.73%
concept. • Overall performance can also be AD vs CN: 99.14%
improved by fine-tuning.
• The data used is not a sufficient
amount.
Manhua et al. [97] Multi-model deep CNNs for jointly 1. It achieved promising perfor- Computational complexity DSC = 87.0% for hippocam-
learning hippocampus segmenta- mance. pal segmentation
tion and disease classification 2. The framework output the disease The area under the curve
evaluated on structural MRI data. status and provided the hippocam- (ROC) = 92.5% for classify-
pus segmentation result. ing AD vs. NC subjects.
Ge, C., & Qu, Q et al. [92] A multiscale deep learning architec- • It proposes a feature fusion and • The dataset is small. AD/NC: 98.80%
ture for learning AD features. enhancement strategy for multi- • It is suggested to use large data-
scale features. sets from augmented or measured
• The method is effective and data to improve performance
achieves high accuracy. further.
• The method only applied to NC
and AD classes and was not used
to MCI.
Multimedia Tools and Applications (2024) 83:36039–36080
Table 11 (continued)
Approach Technique Advantages Drawbacks Results
Song et al. [93] A multi-class Graph convolutional • The method is implemented in • The dataset is small. AD/EMCI/LMCI/NC: 89%
neural networks (GCNNs) clas- four classes of the AD spectrum. • It is suggested to use large data-
sifier. • The GCNN classifier outperforms sets from augmented or measured
SVM by margins reliant on the data to improve performance
disease category. further.
• Not applying transfer learning
techniques.
Silvia et al. [100] A deep learning algorithm for pre- • It distinguishes AD, c-MCI, and • This method does not apply the AD vs. c-MCI: 75.4%, AD
dicting the individual diagnosis of s-MCI with high performance. multi-class classification and only vs. s-MCI: 85.9%, c-MCI
AD based on CNN. • High levels of accuracy were applies binary classification. vs. s-MCI: 75.1%
achieved in all the classifications. AD vs. HC: 99.2%, c-MCI vs
HC: 87.1%, s-MCI vs. HC:
76.1%,
Harshit et al. [99] modified 3D CNN to resting-state • The method is simple and accu- • Computational complexity and AD/EMCI/LMCI/NC:93%
fMRI data for feature extraction rate. memory requirement for training
Multimedia Tools and Applications (2024) 83:36039–36080
and classification of AD • It uses the 4D fMRI data with and processing 4D fMRI.
much less preprocessing, preserv-
ing spatial and temporal informa-
tion.
36067
13
36068 Multimedia Tools and Applications (2024) 83:36039–36080
Fig. 20 Faster R-CNN detection process for lung cancer detection in [107]
Object detection and localization have been widely studied in healthcare systems and
disease detection, and it involves several issues and challenges. The comparison among
Classification, Localization, Detection, and Segmentation is shown in Table 13 and Fig. 24.
Although deep learning has outperformed machine learning in the medical and health
fields, some obstacles and issues remain. The following problems and obstacles are high-
lighted, along with some solutions.
4.1 Data insufficiency
Deep learning is a data-driven method. In general, neural networks have many parameters
that should be learned, updated, and enhanced from data. Many data applications, such
13
Multimedia Tools and Applications (2024) 83:36039–36080 36069
as image classification, natural language processing, and computer vision, have impres-
sive results with deep learning. However, Medical databases are frequently limited and
unbalanced in the healthcare field. Therefore, deep learning applications in this field are a
challenge.
A lack of data will constrain deep learning parameter optimization and will result in
the overfitting issue. The learned model works well on the training set but does badly on
data that has never been seen before. As a result, the model’s power for generalization is
limited. The common solutions for the overfitting problem are dropout and regularization
methods. Moreover, apply data enhancement techniques such as translation, rotation, clip-
ping, scaling, changing contrast, and other methods to generate new images and expand the
dataset. The third effective solution is applying transfer learning.
Multimodal learning is designed to simultaneously learn several sorts of data based on
the properties of various data types, such as electronic health records, medical images, and
genetic data. As a result, the model’s abilities are improved via multimodal learning.
4.2 Model interpretability
Because of the inability of deep learning to explain itself, it is considered a “black box.”
Therefore, lack of interpretability may not be a concern in some applications, such as
image recognition. However, model interpretability is vital in healthcare because doctors
will trust a model’s results and make wise and effective treatments if the model supplies
enough trustworthy information. Furthermore, a fully interpretable model can offer a thor-
ough comprehension of patients.
In the medical and health fields, data privacy is crucial. Patient data misuse, abuse, and
incorrect usage will have disastrous consequences. Deep learning training needs a large
number of representative datasets. These databases are helpful, but they may also be
extremely sensitive.
13
36070 Multimedia Tools and Applications (2024) 83:36039–36080
Many researchers in the computational medicine field have generated and publicly pub-
lished deep learning models for others to utilize. These models may have several param-
eters that might include sensitive data. Some persons with malicious motives may methodi-
cally create strategies to attack these models. They can infer these parameters from the
deep learning model, sensitive data from the dataset, and infringing on the model’s and
patients’ privacy.
Some users will upload their data to the cloud to solve the privacy problem, making
it accessible to any researcher. However, this presents difficulties for deep learning when
using cloud computing to process data from several data owners.
4.4 Heterogeneity
The data in the healthcare field is widely heterogeneous, hindering the generation of an
effective deep-learning model. Furthermore, data in the healthcare field is noisy, high-
dimensional, and of poor quality.
There are two types of data: unstructured and structured. Neural network input data
must be processed and translated to a numerical value. Therefore, when training neural
networks, researchers should address how to manipulate and preprocess structured and
unstructured biomedical data effectively. As a result, deep learning in the medical field
faces a barrier in processing these data.
13
Table 12 The recent works published in heart failure, lung cancers, and breast disease using DL based on MIC
Disease Study Dataset Technique Advantages Drawbacks Results
Lung cancer Zhiqiang Guo et al. [106] Lung CT-Diagnosis CNN The method achieves Not considered other Accuracy: 95.96%
dataset promising performance. types of cancer scans, Recall: 97.10%
such as MRI and X-ray F1-score: 97.10%
images Roc: 97%
Ying Su et al. [107] LIDC-IDRI data- R-CNN The proposed method Parameter optimization is Accuracy: 91.2%
base[1018 patients with optimizes and enhances required to enhance the
CT images] a faster R-CNN model. model.
Heart failure Ahmed S. Eltrass et al. MIT-BIH ARR, MIT- AlexNet • It achieves highly accu- • Small dataset applied. Accuracy: 98.82%, sensi-
[108] BIH NSR, and BIDMC rate ECG multi-class • No data augmentation tivity: 98.87%, specific-
CHF databases. classification using low techniques are applied. ity: 99.21%, Precision:
to medium hardware 99.20%.
requirements.
• It uses low computa-
tional power.
Multimedia Tools and Applications (2024) 83:36039–36080
Breast cancer JING ZHENG et al. It is conducted on the CNN The method accurately Limited collected dataset. accuracy: 97.2%, Sensitiv-
[110] Internet from the most detects breast cancer ity: 98.3%, Specific-
available data. mass and increases the ity:96.5% h
patient’s survival rate.
Nur Syahmi Ismail et al. IRMA dataset Transfer learn- It provides promising The abnormal images VGG16 accuracy = 94%
[111] ing (VGG16, accuracy. didn’t classify as malig- ResNet50 accu-
ResNet50) nant or benign tumors. racy = 91.7%
36071
13
Table 12 (continued)
36072
13
Covid 19 Walaa Gouda et al. [112] COVID-19 Image Data ResNet-50 High reliability and per- The effectiveness of the accuracy = 99.63%
Collection (IDC) and formance method. suggested approach precision =100%
CXR Images (Pneu- has to be tested using a recall = 98.89%
monia) large and difficult data- F1-score = 99.44%
set that contains many AUC = 100%
COVID-19 cases.
Somenath Chakraborty Collected from the Inter- VGG16 • It enables radiologists • Unbalanced collected accuracy = 96.43% sensi-
et al. [113] net, including Kaggle to filter potential candi- dataset. tivity =93.68%.
and GitHub websites. dates in a time-effective • There is a lack of ROC curve = 99% for
manner to detect validation using the COVID-19, 97% for
COVID-19 program in a different Pneumonia (but not
• High-performance setting or context. COVID-19 positive), and
model 98% for normal cases.
Multimedia Tools and Applications (2024) 83:36039–36080
Multimedia Tools and Applications (2024) 83:36039–36080 36073
96%
100%
94%
97%
99%
91%
96%
Fig. 23 Comparison between the recent works published in heart failure, lung cancer, breast disease, and
covid 19 based on MIC according to accuracy metric
13
36074 Multimedia Tools and Applications (2024) 83:36039–36080
4.6 Hyperparameter optimization
5 Conclusion
The healthcare sector is distinct from other sectors. It is a high-priority area where
people expect the highest levels of care and service, regardless of cost. Deep learning
provides bright and accurate results that solve traditional artificial intelligence issues.
Therefore, Numerous deep learning algorithms have been suggested for use in health-
care. Our review offers a comprehensive overview of deep learning in the healthcare
sector. It highlights the contributions and limitations of recent research papers in this
sector. Moreover, It overviews several deep learning models and their most recent
developments. Also, it goes through how deep learning is applied in several medical
activities.
6 List of limitation
Data availability Data sharing does not apply to this article as no datasets were generated or analyzed dur-
ing the current study.
Declarations
Conflict of interest The authors certify that they have NO affiliations with or involvement in any organiza-
tion or entity with any financial or non-financial interest in the subject matter or materials discussed in this
manuscript.
Ethical approval This article contains no studies with human participants or animals performed by authors.
13
Multimedia Tools and Applications (2024) 83:36039–36080 36075
References
1. Baker RE et al. (2021) “Infectious disease in an era of global change,” Nat Rev Microbiol, vol.
0123456789, https://doi.org/10.1038/s41579-021-00639-z
2. Wang J, Zhu H, Wang SH, Zhang YD (2021) A review of deep learning on medical image analy-
sis. Mob Networks Appl 26(1):351–380. https://doi.org/10.1007/s11036-020-01672-7
3. Segato A, Marzullo A, Calimeri F, De Momi E (2020) Artificial intelligence for brain diseases: a
systematic review. APL Bioeng 4(4). https://doi.org/10.1063/5.0011697
4. Dev A, Sharma A, Agarwal SS (2021) Artificial intelligence and speech Technology https://doi.
org/10.1201/9781003150664
5. Lai Z, Deng H (2018) “Medical image classification based on deep features extracted by deep
model and statistic feature fusion with multi-layer perceptron,” Comput Intell Neurosci, vol. 2018,
https://doi.org/10.1155/2018/2061516
6. Coan LJ et al (2023) Automatic detection of glaucoma via fundus imaging and artificial intelli-
gence: a review. Surv Ophthalmol 68(1):17–41. https://doi.org/10.1016/j.survophthal.2022.08.005
7. Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image seg-
mentation: achievements and challenges. J Digit Imaging 32(4):582–596. https://doi.org/10.1007/
s10278-019-00227-x
8. Shamshirband S, Fathi M, Dehzangi A, Chronopoulos AT, Alinejad-Rokny H (2021) A review
on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues. J
Biomed Inform 113(August 2020):103627. https://doi.org/10.1016/j.jbi.2020.103627
9. Shen D, Wu G, Suk H-I (2017) Deep learning in medical image analysis. Annu Rev Biomed Eng
19:221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442
10. Ogrean V, Dorobantiu A, Remus B (2021) Deep learning architectures and techniques for multi-
organ segmentation. Int J Adv Comput Sci Appl 12(1). https://doi.org/10.3791/1700
11. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/
10.1038/nature14539
12. Goebel R (2022) Series Editors. https://doi.org/10.5771/9783748924418-207
13. Hatcher WG, Yu W (2018) A Survey of Deep Learning: Platforms, Applications and Emerging
Research Trends. IEEE Access 6(c):24411–24432. https://doi.org/10.1109/ACCESS.2018.28306
61
14. Lin CL, Wu KC (2023) Development of revised ResNet-50 for diabetic retinopathy detection.
BMC Bioinf 24(1):157. https://doi.org/10.1186/s12859-023-05293-1
15. Hassan E, Shams MY, Hikal NA, Elmougy S (2022) The effect of choosing optimizer algorithms
to improve computer vision tasks: a comparative study. Multimed Tools Appl https://doi.org/10.
1007/s11042-022-13820-0
16. Squires M et al. (2023) “Deep learning and machine learning in psychiatry: a survey of current
progress in depression detection, diagnosis and treatment,” Brain Inf, vol. 10, no. 1, https://doi.
org/10.1186/s40708-023-00188-6
17. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understand-
ing: a review. Neurocomputing 187:27–48. https://doi.org/10.1016/j.neucom.2015.09.116
18. Mufti M, Kaiser M S, Mcginnity T M, Hussain A, “Deep Learning in Mining Biological Data,”
Cognit. Comput., vol. 1, p. 3, https://doi.org/10.1007/s12559-020-09773-x.
19. Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2019) Deep neural networks with trans-
fer learning in millet crop images. Comput Ind 108:115–120. https://doi.org/10.1016/j.compind.
2019.02.003
20. Korolev S, Safiullin A, Belyaev M, Dodonova Y (2017) “Residual and plain convolutional neural
networks for 3D brain MRI Classification Sergey Korolev Amir Safiullin Mikhail Belyaev Skolk-
ovo Institute of Science and Technology Institute for Information Transmission Problems,” 2017
IEEE 14th Int. Symp. Biomed. Imaging (ISBI 2017), pp. 835–838, https://doi.org/10.1109/ISBI.
2017.7950647.
21. Lei L, Yuan Y, Vu TX, Chatzinotas S, Ottersten B (2019) “Learning-Based Resource Allocation:
Efficient Content Delivery Enabled by Convolutional Neural Network,” IEEE Work Signal Process
Adv Wirel Commun SPAWC, vol. 2019, https://doi.org/10.1109/SPAWC.2019.8815447
22. Helaly HA, Badawy M, Haikal AY (2021) “Deep learning approach for early detection of Alzhei-
mer’s disease,” Cogn Comput, no. August 2020, https://doi.org/10.1007/s12559-021-09946-2
23. Zhang YD et al (2018) Voxelwise detection of cerebral microbleed in CADASIL patients by leaky
rectified linear unit and early stopping. Multimed Tools Appl 77(17):21825–21845. https://doi.
org/10.1007/s11042-017-4383-9
13
36076 Multimedia Tools and Applications (2024) 83:36039–36080
24. Wang SH, Phillips P, Sui Y, Liu B, Yang M, Cheng H (2018) Classification of Alzheimer’s disease
based on eight-layer convolutional neural network with leaky rectified linear unit and max pool-
ing. J Med Syst 42(5):85. https://doi.org/10.1007/s10916-018-0932-7
25. Kuo CCJ (2016) Understanding convolutional neural networks with a mathematical model. J Vis
Commun Image Represent 41:406–413. https://doi.org/10.1016/j.jvcir.2016.11.003
26. Choi KS, Shin JS, Lee JJ, Kim YS, Kim SB, Kim CW (2005) In vitro trans-differentiation of rat
mesenchymal cells into insulin-producing cells by rat pancreatic extract. Biochem Biophys Res
Commun 330(4):1299–1305. https://doi.org/10.1016/j.bbrc.2005.03.111
27. Tu F, Yin S, Ouyang P, Tang S, Liu L, Wei S (2017) Deep Convolutional Neural Network Archi-
tecture with Reconfigurable Computation Patterns. IEEE Trans Very Large Scale Integr Syst
25(8):2220–2233. https://doi.org/10.1109/TVLSI.2017.2688340
28. Singh AV (2015) “Content-Based Image Retrieval using Deep Learning,” no. July, https://doi.org/
10.13140/RG.2.2.29510.16967.
29. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van Esesn BC, Awwal AA,
Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning
approaches. arXiv preprint arXiv:1803.01164. https://doi.org/10.48550/arXiv.1803.01164
30. Khan HA, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumor classification in MRI image using
convolutional neural network. Math Biosci Eng 17(5):6203–6216. https://doi.org/10.3934/MBE.
2020328
31. Simonyan K, Zisserman A (2015) “Very deep convolutional networks for large-scale image recog-
nition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp 1–14.https://doi.org/
10.48550/arXiv.1409.1556
32. Targ S, Almeida D, Lyman K (2016) Resnet in resnet: generalizing residual architectures. pp 1–7.
arXiv preprint. http://arxiv.org/abs/1603.08029
33. Alzubaidi L et al. (2021) Review of deep learning: concepts, CNN architectures, challenges, appli-
cations, future directions, vol. 8, no. 1. Springer International Publishing, https://doi.org/10.1186/
s40537-021-00444-8
34. Chen J, Zhou M, Zhang D, Huang H, Zhang F (2021) “Quantification of water inflow in rock tun-
nel faces via convolutional neural network approach,” Autom Constr, vol. 123, no. January, https://
doi.org/10.1016/j.autcon.2020.103526
35. G. Litjens et al., “A survey on deep learning in medical image analysis,” Med Image Anal, vol. 42,
no. December 2012, pp. 60–88, 2017, https://doi.org/10.1016/j.media.2017.07.005.
36. Zhou T, Canu S, Ruan S (2020) “A review: Deep learning for medical image segmentation using
multi-modality fusion,” arXiv, vol. 4, no. July, https://doi.org/10.1016/j.array.2019.100004
37. Liu X, Song L, Liu S, Zhang Y (2021) A review of deep-learning-based medical image segmenta-
tion methods. Sustain 13(3):1–29. https://doi.org/10.3390/su13031224
38. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y
(2014) Generative adversarial nets. Adv Neural Inf Proces Syst 27:1–9. https://doi.org/10.1002/
14651858.CD013788.pub2
39. An FP, Liu JE (2021) “Medical image segmentation algorithm based on multi-layer boundary per-
ception-self attention deep learning model,” Multimed Tools Appl, pp. 15017–15039, https://doi.
org/10.1007/s11042-021-10515-w
40. Shirokikh B et al. (2021) “Accelerating 3d medical image segmentation by adaptive small-scale
target localization,” J Imaging, vol. 7, no. 2, https://doi.org/10.3390/jimaging7020035
41. Zhuang J, Yang J, Gu L, Dvornek N (2019) “Shelfnet for fast semantic segmentation,” Proc. -
2019 Int. Conf. Comput. Vis. Work. ICCVW 2019, pp. 847–856, https://doi.org/10.1109/ICCVW.
2019.00113
42. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image
segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE
Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
43. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic
image segmentation. arXiv preprint arXiv:1706.05587. https://doi.org/10.48550/arXiv.1706.05587
44. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder
architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495.
https://doi.org/10.1109/TPAMI.2016.2644615
45. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image
segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioin-
formatics) 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28
13
Multimedia Tools and Applications (2024) 83:36039–36080 36077
46. Ourselin S, Joskowicz L, Eds W. W, Hutchison D (2016) Medical Image Computing and Com-
puter-Assisted Intervention – MICCAI’2016, vol. Proceeding. [Online]. Available: https://doi.org/
10.1007/10704282
47. Yamanakkanavar N, Choi JY, Lee B (2020) MRI segmentation and classification of human brain
using deep learning for diagnosis of alzheimer’s disease: a survey. Sensors (Switzerland) 20(11):1–
31. https://doi.org/10.3390/s20113243
48. Liu X, Deng Z, Yang Y (2019) Recent progress in semantic image segmentation. Artif Intell Rev
52(2):1089–1106. https://doi.org/10.1007/s10462-018-9641-3
49. Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis
115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
50. Chung YW, Choi IY (2023) Detection of abnormal extraocular muscles in small datasets of computed
tomography images using a three-dimensional variational autoencoder. Sci Rep 13(1):1–10. https://
doi.org/10.1038/s41598-023-28082-5
51. Kabir S, Farrokhvar L, Dabouei A (2023) A weakly supervised approach for thoracic diseases detec-
tion. Expert Syst Appl 213, no. PB:118942. https://doi.org/10.1016/j.eswa.2022.118942
52. Al Duhayyim M et al (2023) Sailfish Optimization with Deep Learning Based Oral Cancer Classifica-
tion Model. Comput Syst Sci Eng 45(1):753–767. https://doi.org/10.32604/csse.2023.030556
53. Umer MJ, Sharif M, Alhaisoni M, Tariq U, Kim YJ, Chang B (2023) A Framework of Deep Learn-
ing and Selection-Based Breast Cancer Detection from Histopathology Images. Comput Syst Sci Eng
45(2):1001–1016. https://doi.org/10.32604/csse.2023.030463
54. Asiri AA et al (2023) Machine Learning-Based Models for Magnetic Resonance Imaging (MRI)-
Based Brain Tumor Classification. Intell Autom Soft Comput 36(1):299–312. https://doi.org/10.
32604/iasc.2023.032426
55. Klingenberg M, Eitel F, Habes M, Ritter K (2022) Higher performance for women than men in
MRI-based Alzheimer’s disease detection. Alzheimers Res Ther:1–13. https://doi.org/10.1186/
s13195-023-01225-6
56. Dolz J, Desrosiers C, Ben Ayed I (2018) 3D fully convolutional networks for subcortical segmen-
tation in MRI: a large-scale study. Neuroimage 170:456–470. https://doi.org/10.1016/j.neuroimage.
2017.04.039
57. Allioui H, Sadgal M, Elfazziki A (2019) Deep MRI segmentation: A convolutional method applied
to alzheimer disease detection. Int J Adv Comput Sci Appl 10(11):365–371. https://doi.org/10.14569/
IJACSA.2019.0101151
58. Sun J, Yan S, Song C, Han B (2020) Dual-functional neural network for bilateral hippocampi seg-
mentation and diagnosis of Alzheimer’s disease. Int J Comput Assist Radiol Surg 15(3):445–455.
https://doi.org/10.1007/s11548-019-02106-w
59. Chitradevi D, Prabha S, Prabhu AD (2020) “Diagnosis of Alzheimer disease in MR brain images
using optimization techniques,” Neural Comput & Applic, vol. 7, https://doi.org/10.1007/
s00521-020-04984-7
60. Carmo D, Silva B, Yasuda C, Rittner L, Lotufo R (2021) Hippocampus segmentation on epilepsy
and Alzheimer’s disease studies with multiple convolutional neural networks. Heliyon 7(2):e06226.
https://doi.org/10.1016/j.heliyon.2021.e06226
61. Nobakht S, Schaeffer M, Forkert ND, Nestor S, Black SE, Barber P (2021) “Combined atlas and
convolutional neural network-based segmentation of the hippocampus from mri according to the adni
harmonized protocol,” Sensors, vol. 21, no. 7, https://doi.org/10.3390/s21072427
62. Helaly HA, Badawy M, Haikal AY (2021) “Toward deep MRI segmentation for Alzheimer’s disease
detection,” Neural Comput & Applic, vol. 8, https://doi.org/10.1007/s00521-021-06430-8
63. Dodia S, Annappa B, Mahesh PA (2022) Recent advancements in deep learning based lung cancer
detection: a systematic review. Eng Appl Artif Intell 116(September):105490. https://doi.org/10.
1016/j.engappai.2022.105490
64. Zheng S et al (2023) Survival prediction for stage I-IIIA non-small cell lung cancer using deep learn-
ing. Radiother Oncol 180:109483. https://doi.org/10.1016/j.radonc.2023.109483
65. Shao J et al. (2022) “Deep learning empowers lung Cancer screening based on Mobile low-dose com-
puted tomography in resource-constrained sites,” Front Biosci - Landmark, vol. 27, no. 7, https://doi.
org/10.31083/j.fbl2707212
66. Jalali Y, Fateh M, Rezvani M, Abolghasemi V, Anisi MH (2021) ResBCDU-net: a deep learning
framework for lung CT image segmentation. Sensors (Switzerland) 21(1):1–24. https://doi.org/10.
3390/s21010268
67. Mohammed KK, Hassanien AE, Afify HM (2021) A 3D image segmentation for lung cancer using
v.net architecture based deep convolutional networks. J Med Eng Technol 45(5):337–343. https://doi.
org/10.1080/03091902.2021.1905895
13
36078 Multimedia Tools and Applications (2024) 83:36039–36080
68. Lei M, Li J, Li M, Zou L, Yu H (2021) An improved unet++ model for congestive heart failure diag-
nosis using short-term rr intervals. Diagnostics 11(3):1–14. https://doi.org/10.3390/diagnostics1103
0534
69. Innat M, Hossain MF, Mader K, Kouzani AZ (2023) A convolutional attention mapping deep
neural network for classification and localization of cardiomegaly on chest X-rays. Sci Rep
13(1):6247. https://doi.org/10.1038/s41598-023-32611-7
70. Agarap AFM (2018) On breast cancer detection: An application of machine learning algorithms
on the Wisconsin diagnostic dataset. ACM Int Conf Proceeding Ser 1:5–9. https://doi.org/10.1145/
3184066.3184080
71. Dar RA, Rasool M, Assad A (2022) Breast cancer detection using deep learning: datasets, meth-
ods, and challenges ahead. Comput Biol Med 149(August):106073. https://doi.org/10.1016/j.
compbiomed.2022.106073
72. Aljuaid H, Alturki N, Alsubaie N, Cavallaro L, Liotta A (2022) Computer-aided diagnosis for
breast cancer classification using deep neural networks and transfer learning. Comput Methods
Prog Biomed 223:106951. https://doi.org/10.1016/j.cmpb.2022.106951
73. Raaj RS (2023) Breast cancer detection and diagnosis using hybrid deep learning architecture.
Biomed Signal Process Control 82(August 2022):104558. https://doi.org/10.1016/j.bspc.2022.
104558
74. Koh J, Yoon Y, Kim S, Han K, Kim EK (2022) Deep learning for the detection of breast cancers
on chest computed tomography. Clin Breast Cancer 22(1):26–31. https://doi.org/10.1016/j.clbc.
2021.04.015
75. Tariq M, Iqbal S, Ayesha H, Abbas I, Ahmad KT, Niazi MFK (2021) Medical image based breast
cancer diagnosis: state of the art and future directions. Expert Syst Appl 167:114095. https://doi.
org/10.1016/j.eswa.2020.114095
76. Almajalid R, Shan J, Du Y, Zhang M (2019) “Development of a Deep-Learning-Based Method for
Breast Ultrasound Image Segmentation,” Proc. - 17th IEEE Int Conf Mach Learn Appl ICMLA
2018, pp. 1103–1108, https://doi.org/10.1109/ICMLA.2018.00179.
77. Ghayvat H et al. (2022) “AI-enabled radiologist in the loop: novel AI-based framework to augment
radiologist performance for COVID-19 chest CT medical image annotation and classification from
pneumonia,” Neural Comput & Applic, vol. 1, https://doi.org/10.1007/s00521-022-07055-1
78. Subramanian N, Elharrouss O, Al-Maadeed S, Chowdhury M (2022) A review of deep learning-
based detection methods for COVID-19. Comput Biol Med 143:105233. https://doi.org/10.1016/j.
compbiomed.2022.105233
79. Aggarwal P, Mishra NK, Fatimah B, Singh P, Gupta A, Joshi SD (2022) COVID-19 image classi-
fication using deep learning: advances, challenges and opportunities. Elsevier Ltd, https://doi.org/
10.1016/j.compbiomed.2022.105350.
80. Walvekar S, Shinde S (2021) “Efficient medical image segmentation of COVID-19 Chest CT
images based on deep learning techniques,” 2021 Int Conf Emerg Smart Comput Informatics,
ESCI 2021, pp. 203–206, https://doi.org/10.1109/ESCI50559.2021.9397043
81. Jain R, Singh S, Swami S, Kumar S (2021) Deep learning-based techniques to identify COVID-
19 patients using medical image segmentation. In: Manocha AK, Jain S, Singh M, Paul S (eds)
Computational intelligence in healthcare. Springer International Publishing, Cham, pp 327–342.
https://doi.org/10.1007/978-3-030-68723-6_18
82. Bhattacharyya A, Bhaik D, Kumar S, Thakur P, Sharma R, Pachori RB (2022) A deep learning
based approach for automatic detection of COVID-19 cases using chest X-ray images. Biomed
Signal Process Control 71, no. PB:103182. https://doi.org/10.1016/j.bspc.2021.103182
83. Payan A, Montana G (2015) “Predicting Alzheimer’s disease: a neuroimaging study with 3D con-
volutional neural networks,” pp. 1–9, https://doi.org/10.1613/jair.301
84. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) “What is the best multi-stage architecture
for object recognition?,” Proc IEEE Int Conf Comput Vis, pp. 2146–2153, https://doi.org/10.1109/
ICCV.2009.5459469
85. Sarraf S, Tofighi G (2016) Classification of alzheimer’s disease structural MRI data by deep
learning convolutional neural networks. arXiv preprint, pp 8–12. http://arxiv.org/abs/1607.
06583, https://doi.org/10.1097/IAE.0000000000001460
86. Hosseini-Asl E, Keynton R, El-Baz A (2016) Alzheimer’s disease diagnostics by adaptation of 3D
convolutional network. In: 2016 IEEE International Conference On Image Processing (ICIP), (vol.
502, pp 126–130). IEEE. https://doi.org/10.1109/TNNLS.2015.2479223
87. Khvostikov A, Aderghal K, Krylov A (2018) “3D Inception-based CNN with sMRI and MD-DTI
data fusion for Alzheimer’s Disease diagnostics,” no. July, https://doi.org/10.13140/RG.2.2.30737.
28006.
13
Multimedia Tools and Applications (2024) 83:36039–36080 36079
88. Kahramanli H (2012) A modified cuckoo optimization algorithm for engineering optimization. Int
J Futur Comput Commun 1(2):199
89. Sahumbaiev I, Popov A, Ramirez J, Gorriz JM, Ortiz A (2018) 3D-CNN HadNet classification
of MRI for Alzheimer’s Disease diagnosis. In: 2018 IEEE Nuclear Science Symposium and Medi-
cal Imaging Conference Proceedings (NSS/MIC). IEEE, pp 1–4. https://doi.org/10.1109/NSSMIC.
2018.8824317
90. Spasov SE et al. (2018) A Multimodal Convolutional Neural Network Framework for the Predic-
tion of Alzheimer ’ s Disease, pp 1271–1274. https://doi.org/10.1109/EMBC.2018.8512468
91. Wang Y, Yang Y, Guo X, Ye C, Gao N, Fang Y, Ma HT (2018) A novel multimodal MRI analysis
for Alzheimer’s disease based on convolutional neural network. In: 2018 40th Annual Interna-
tional Conference of the IEEE Engineering In Medicine and Biology Society (EMBC). IEEE, pp
754–757. https://doi.org/10.1109/EMBC.2018.8512372
92. Ge C, Qu Q (2019) Multiscale deep convolutional networks for characterization and detection of
alzheimer ’ s disease using mr images Dept. of Electrical Engineering, Chalmers University of
Technology, Sweden Inst. of Neuroscience and Physiology, Sahlgrenska Academy. IEEE Int Conf
Image Process, pp 789–793. https://doi.org/10.1109/ICIP.2019.8803731
93. Song T et al (2019) Graph convolutional neural networks for alzheimer ’ s disease. In: 2019 IEEE
16th Int Symp Biomed Imaging (ISBI 2019), no. Isbi, pp 414–417. https://doi.org/10.1109/ISBI.
2019.8759531
94. Liu L, Zhao S, Chen H, Wang A (2020) A new machine learning method for identifying Alz-
heimer’s disease. Simul Model Pract Theory 99:102023. https://doi.org/10.1016/j.simpat.2019.
102023
95. Khagi B, Lee B, Pyun JY, Kwon GR (2019) CNN Models performance analysis on MRI images
of OASIS dataset for distinction between healthy and alzheimer’s patient. In: 2019 International
Conference on Electronics, Information, and Communication (ICEIC). IEEE, pp 1–4. https://doi.
org/10.23919/ELINFOCOM.2019.8706339
96. Jain R, Jain N, Aggarwal A, Hemanth DJ (2019) ScienceDirect convolutional neural network
based Alzheimer ’ s disease classification from magnetic resonance brain images. Cogn Syst Res
57:147–159. https://doi.org/10.1016/j.cogsys.2018.12.015
97. Liu M et al (2018) A multi-model deep convolutional neural network for automatic hippocampus
segmentation and classification in Alzheimer’s disease. Neuroimage 208(August):2020. https://
doi.org/10.1016/j.neuroimage.2019.116459
98. Impedovo D, Pirlo G, Vessio G, Angelillo MT (2019) A handwriting-based protocol for assessing neu-
rodegenerative dementia. Cogn Comput 11(4):576–586. https://doi.org/10.1007/s12559-019-09642-2
99. Parmar H, Nutter B, Long R, Antani S, Mitra S (2020) Spatiotemporal feature extraction and clas-
sification of Alzheimer’s disease using deep learning 3D-CNN for fMRI data. J Med Imaging
7(05):1–14. https://doi.org/10.1117/1.jmi.7.5.056001
100. Basaia S et al (2019) Automated classification of Alzheimer’s disease and mild cognitive impair-
ment using a single MRI and deep neural networks. NeuroImage Clin 21(December 2018):101645.
https://doi.org/10.1016/j.nicl.2018.101645
101. Pan D, Zeng A, Jia L, Huang Y, Frizzell T, Song X (2020) Early detection of Alzheimer’s disease
using magnetic resonance imaging: a novel approach combining convolutional neural networks
and ensemble learning. Front Neurosci 14(May):1–19. https://doi.org/10.3389/fnins.2020.00259
102. Vassanelli S, Kaiser MS, Eds NZ, Goebel R (2020) Series Editors https://doi.org/10.1007/
978-3-030-59277-6
103. Gumma LN, Thiruvengatanadhan R, Kurakula L, Sivaprakasam T (2022) A survey on convolu-
tional neural network (deep-learning technique) -based lung Cancer detection. SN Comput Sci
3(1):1–7. https://doi.org/10.1007/s42979-021-00887-z
104. She Y et al (2022) Deep learning for predicting major pathological response to neoadjuvant chem-
oimmunotherapy in non-small cell lung cancer: a multicentre study. eBioMedicine 86:104364.
https://doi.org/10.1016/j.ebiom.2022.104364
105. Siegel RL, Miller KD, Fuchs HE, Jemal A (2022) Cancer statistics, 2022. CA Cancer J Clin
72(1):7–33. https://doi.org/10.3322/caac.21708
106. Guo Z, Xu L, Si Y, Razmjooy N (2021) Novel computer-aided lung cancer detection based on con-
volutional neural network-based and feature-based classifiers using metaheuristics. Int J Imaging
Syst Technol 31(4):1954–1969. https://doi.org/10.1002/ima.22608
107. Su Y, Li D, Chen X (2021) “Lung nodule detection based on faster R-CNN framework,” Comput
Methods Prog Biomed, vol. 200, p. 105866, https://doi.org/10.1016/j.cmpb.2020.105866
108. Eltrass AS, Tayel MB, Ammar AI (2021) A new automated CNN deep learning approach for iden-
tification of ECG congestive heart failure and arrhythmia using constant-Q non-stationary Gabor
13
36080 Multimedia Tools and Applications (2024) 83:36039–36080
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
13