0% found this document useful (0 votes)

40 views16 pages

An Intelligent Deep Hash Coding Network For Content Base - 2024 - Egyptian Infor

Uploaded by

SENTHAMIZH VANI A P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views16 pages

An Intelligent Deep Hash Coding Network For Content Base - 2024 - Egyptian Infor

Uploaded by

SENTHAMIZH VANI A P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Egyptian Informatics Journal 27 (2024) 100499

Contents lists available at ScienceDirect

Egyptian Informatics Journal

journal homepage: www.sciencedirect.com

Full length article

An intelligent deep hash coding network for content-based medical image

retrieval for healthcare applications
Lichao Cui a ,∗, Mingxin Liu b
a Department of Information Engineering, Yantai Vocational College, Yantai, China
b
State Nuclear Electric Power Planning Design & Research Institute Co., Ltd., Beijing, China

ARTICLE INFO ABSTRACT

Keywords: The proliferation of medical imaging in clinical diagnostics has led to an overwhelming volume of image
Deep features data, presenting a challenge for efficient storage, management, and retrieval. Specifically, the rapid growth in
Deep learning the use of imaging modalities such as Computed Tomography (CT) and X-rays has outpaced the capabilities
Hashing
of conventional retrieval systems, necessitating more sophisticated approaches to assist in clinical decision-
Hash coding
making and research. Our study introduces a novel deep hash coding-based Content-Based Medical Image
Medical image retrieval
Retrieval (CBMIR) framework that uses a convolutional neural network (CNN) combined with hash coding
for efficient and accurate retrieval. The model integrates a Dense block-based feature learning network, a
hash learning block, and a spatial attention block to enhance feature extraction specific to medical imaging.
We reduce dimensionality by applying the Reconstruction Independent Component Analysis (RICA) algorithm
while preserving diagnostic information. The framework achieves a mean average precision (mAP) of 0.85
on ChestX-ray8, 0.82 on TCIA-CT, 0.84 on MIMIC-CXR, and 0.82 on LIDC-IDRI datasets, with retrieval times
of 675 ms, 663 ms, 735 ms, and 748 ms, respectively. Comparisons with ResNet and DenseNet confirm the
effectiveness of our model, enhancing medical image retrieval significantly for clinical decision-making and
research.

1. Introduction Initial Motivations

1. The volume of medical imaging data is increasing rapidly, lead-

Medical imaging shows substantial information about the disease’s
ing to challenges in efficient storage, management, and retrieval.
severity level, which is why it has become crucial in the medical
Traditional retrieval systems are struggling to keep up with this
field. The information from the images is used in classification, dis-
growth.
crimination, determining disease level, diagnosis, treatment, and mon-
2. There is a pressing need for more sophisticated retrieval systems
itoring [1]. Medical equipment create a large number of images as
to assist in clinical decision-making and research, given the
a result of their widespread use, making it challenging for clinicians
to manually describe each one, since it adds to their workload and limitations of conventional systems.
3. Automating the retrieval and diagnosis process is essential to
cuts into their available time [2]. To solve this problem, Content-based
Medical Image Retrieval (CBMIR) systems are used [3,4] that aim to reduce the workload on clinicians and enhance the accuracy of
index each image in a collection so they can be readily accessed. Each medical image analysis.
image in this system is identified by a distinct number using the same
Innovations
features. The CBMIR compares that specific binary code with all the
codes in the dataset to find all similar images. It also diagnoses many 1. We introduce a novel CBMIR framework that leverages a con-
medical images automatically based on similar content or code in the volutional neural network (CNN) combined with hash coding to
dataset [5]. It also decreases the necessity to hold loads of files and retrieve medical images efficiently and accurately.
images. We have listed the initial motivations for conducting this study 2. Our model integrates a Dense block-based feature learning net-
and the innovations we made in the study as follows. work with a hash learning block and a spatial attention block,

∗ Corresponding author.
E-mail address: [email protected] (L. Cui).

https://doi.org/10.1016/j.eij.2024.100499
Received 12 May 2024; Received in revised form 8 June 2024; Accepted 17 June 2024
Available online 5 July 2024
1110-8665/© 2024 The Authors. Published by Elsevier B.V. on behalf of Faculty of Computers and Arti cial Intelligence, Cairo University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

enhancing feature extraction tailored to the unique characteris- CBMIR relies on the extraction of low-level visual features such as color,
tics of medical imaging. texture, and shape. However, these features may not capture the more
3. The application of the Reconstruction Independent Component complex visual features that are important for medical diagnosis, such
Analysis (RICA) algorithm effectively reduces dimensionality as the presence of lesions or abnormalities. There is also a lack of stan-
while preserving critical diagnostic information. dardization, e.g., medical images can vary significantly in terms of their
4. The use of four specialized loss functions increases the distinc- acquisition protocols and equipment, which can result in variations in
tiveness of features, ensuring high discriminability in the query image quality and features [12]. This lack of standardization can limit
code block. the effectiveness of CBMIR for CT and X-ray images. Some diseases can
5. During retrieval, compact database codes are binarized for swift present in different ways on medical images, which can make it difficult
comparison with the hash codes of the database images, signifi- for CBMIR to identify similar cases accurately [28]. Lung cancer tumors
cantly improving retrieval speed and accuracy. vary in appearance. CBMIR uses only medical image visual elements.
Clinical data like patient history and lab test results could enhance
Convolutional Neural Networks (CNNs) have revolutionized the recall accuracy. CBMIR works best with a big and varied medical image
field of machine learning, particularly within the realm of image anal- library. Some medical illnesses are rare, making it hard to get a big
ysis and computer vision [6]. Their unparalleled success in tasks such enough sample size for CBMIR [29].
as image classification [7], object detection [8], and semantic segmen- CBMIR helps doctors diagnose and cure diseases by referencing
tation. By automatically learning hierarchical feature representations related cases. Sorting images by feature space proximity and return-
directly from data, CNNs eliminate the need for manual feature ex- ing the top-K closest query results retrieves related images. Wang
traction, which has traditionally been a labor-intensive and error-prone et al. [30] built an image query using a deep learning model with a
process. This self-learning capability has enabled state-of-the-art perfor- multi-level nonlinear transformation and intermediary features to get
mance in visual recognition challenges and facilitated the adaptation the top-ranked word trait answer for lungs. A deep convolutional neu-
of CNNs to various applications, ranging from medical diagnosis [9] ral network divided and collected mixed medical images by Qayyum
and autonomous vehicles to video analysis [10] and natural language et al. [31]. Today’s high-dimensional data makes even a linear scan of
processing [11]. The flexibility and efficiency of CNNs in handling a file with hundreds of thousands of images expensive and memory-
complex pattern recognition tasks make them an ideal choice for the intensive.
backbone of our proposed Content-Based Medical Image Retrieval (CB- Hashing techniques transform the picture to compact binary data
MIR) framework, offering a powerful tool to distill meaningful patterns that save the image’s data format within the same region to address
from intricate medical images. this problem with real-valued qualities [27]. Images are kept as hashes
The number of medical images is increasing exponentially [12]. rather than real-valued characters. Therefore hashing algorithms re-
X-rays [13], MRI [14], and CT scans [15] offer vital functional and quire low time and money to retrieve them [32]. The Spectral Hashing
anatomic details about various body areas necessary for identification, (SH) [33] and its variant technique, for which the result is a sub-
diagnosis, therapies, and monitoring, as well as for instruction and class algorithm of the graph. Laplacian’s cutoff eigenvalues are the
research in medicine. The ability to recover this knowledge is becoming most critical hashing algorithms in data-independent methods. Using
increasingly critical for healthcare IT platforms. Given the intricacy the outcomes of the graph’s Laplacian eigenvectors connecting to its
and depth of medical image content, hand captioning is tedious and Laplacian-Beltrami eigenfunctions [34], we demonstrate fast hash code
insufficient. Content-based Image Retrieval (CBIR) has recently heard creation using image points.
of the dire needs for image retrieval and classification in science, Data-dependent methods create a hash function from a training set
historical sites, the military, and medicine [16–18]. and use a brief hash code for reliable results. Practical uses use learn-
CBIR aims to find relevant images by calculating image contents. ing hashing more than data-independent methods. Jiang et al. [35]
This matching of features also helps match two images, which is extracted features from breast pathological images using GIST [36] and
why image representation and similarity becomes necessary. In earlier SIFT [37] and hashing methods to convert high-dimensional images
times, medical image retrieval was done by intensity histogram-based into binary codes to quickly retrieve pathological breast photos in
features. However, their retrieval performance needed to be improved Hamming Space.
due to low discrimination power, especially on large databases [19]. The deep learning model concurrently learned image elements and
For medical image retrieval, texture-based characteristics have been algorithms. Fang et al. [38] also created Attention-based Saliency Hash-
suggested as a solution. The Local Binary Pattern (LBP), developed ing (ASH) for dense hash codes for eye photos. We suggest a deep
by Ojala et al. [20] for texture categorization, showed very promising hash-based method for extracting information from big medical image
in medical and computational complexity. Some variations of LBP, datasets. In response to the large volume of medical images and the
such as Center Symmetric LBP (CSLBP) [21] and Data-Driven LBP concomitant need for efficient retrieval systems, our study introduces
(DDLBP) [22], can also be found in the literature. a novel deep hashing-based Content-Based Medical Image Retrieval
The aforementioned methods fail due to their subpar performance, (CBMIR) framework that is poised to make significant advancements
as they must explain the user’s high-level semantic information [23]. in the field. The main contributions of our work are distilled into a
Deep learning has made fast progress in recent years, and the re- hierarchical structure as follows:
sults obtained from pre-trained convolutional neural network (CNN)
models show better and more accurate results than traditional meth- 1. We present an integrated solution that concurrently learns image
ods [24]. It increases image retrieval accuracy by containing rich features and hash codes, transitioning from traditional image
image semantic information. Medical image retrieval uses feature ex- attributes to binary hash codes through a novel combination of
traction from CNN models [25]. Despite all this, the characters from regularization, balanced, pairwise, and quantization losses.
high-dimensional CNN models increase the cost and decrease the 2. A Dense Block is appended to the ResNet-50 model, followed
retrieval’s efficiency [26]. To solve this problem, the hashing-based by a Spatial Attention Block (SAB), which synergizes multi-scale
method caught an eye as it converts high-dimensional features to feature extraction and regional data integration, culminating in
low-dimensional space and creates binary codes [27]. a robust feature representation.
The CBMIR uses features extracted from medical images to search 3. We incorporate Reconstruction Independent Component Analy-
for similar images in a database. While CBMIR has been successfully sis (RICA) post-SAB to refine feature accuracy, improving the
applied to various medical imaging modalities, including CT and X-ray system’s ability to discern subtle yet diagnostically significant
images [3], there are some limitations to its use in these modalities. patterns within medical images.

2
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

4. Rigorous experimentation on the ChestX-ray8 [39], TCIA-CT Table 1

Comparison of image retrieval methods.
[40], MIMIC-CXR [41], and LIDC-IDRI [42] datasets underscores
our framework’s excellence over preceding methods, setting new Fea- Classical Learning-based Learning-based Our method
ture/Method hashing (Low) (Deep)
benchmarks for precision, accuracy, and image retrieval efficacy
Learning Data- Kernel-Based CNN-Based, Hash Code
in the realm of medical imaging.
approach Independent End-to-End Learning
Accuracy Lower Moderate Moderate Higher
The proposed work collectively addresses the critical challenges Efficiency Moderate Moderate Moderate Higher
in medical image retrieval, offering a framework that elevates the Clinical Less Moderate High Higher
efficiency and accuracy of retrieval and streamlines the integration of relevance
CBMIR systems into clinical workflows, thereby enhancing diagnostic Adaptability to Low Low High Higher
varied data
processes and patient care.
Interpretability Low Low Moderate Higher
The organization of this work is as follows. Section 2 discusses the
related works. The problem statement is briefly stated in Section 3. In
Section 4, we briefly describe our proposed methodology. The experi-
mental evaluation and results of our work are presented in Section 5. the loss of classification data during model training. Triplet cross-
The economic improvement index is briefly described in Section 6. entropy (TCE) loss was created to safeguard the categorization data
Section 7 presents the discussions related to this work. Finally, the while considering the circle loss [58], which also contained the cross-
concluding remarks are available in Section 8. entropy loss [59] and triplet loss. Every point of similarity in the circle
loss receives a unique penalty depending on how far it is from the
2. Related works desired outcome. The TCE was claimed to be the first form of cross-
entropy and triplet loss since the optimum effect is initially learned
We divide the related works section into two categories. Hashing from sampling data during the training. Data loss from relatively small
for image retrieval, focusing on cutting-edge hash-code learning-based samples during model training is yet another factor contributing to the
image retrieval methodologies and methods focusing on medical image
inaccuracy [60].
retrieval for healthcare applications.
To address the issue of imbalanced samples during training, a triplet
2.1. Hashing for image retrieval labeling approach is used. This involves splitting each triplet label into
two pairwise labels. Specifically, in the hashing space, the image of
There are two main categories of hashing algorithms: data- interest is closer to the positive image and farther from the negative
dependent and data-independent [27]. Each technique is subdivided image. While pairwise labels can only imply code, triplet labels explic-
into additional methods. The data-independent method, which is the itly show the relative similarities between images. In smaller samples,
most popular, consists of two subtypes: shift-invariant kernels hashing positive images are utilized, while negative images are used in larger
(SIKH) [43] and locality-sensitive hashing (LSH) [44]. On the other samples. Therefore, triplet labeling can effectively address imbalanced-
hand, a learning-based method, another name for the data-dependent sample issues by improving the retrieval of small-sample data during
method, is further classified based on learning level, such as low and training [61].
deep learning-based methods. Kernel-sensitive hashing (KSH) [45], Other than the TCE loss, for medical image retrieval based on
SH [33], and metric hashing forests (MHF) [46] are the types of case-based reasoning, another method is introduced called the spatial-
low learning-based techniques. While convolutional neural network attention mechanism (SAM). In recent years, SAM has been very famous
hashing (CNNH) [47], deep hashing (DH) [48], deep hashing net-
and influential in CNNs since they considerably improve the effi-
work (DHN) [49], deep semantic ranking-based hashing (DSRH) [50],
ciency of imaging activities, including recognition, segmentation, and
deep similarity comparison hashing (DSCH) [51]. Simultaneous feature
categorization, in the medical industry. One instance of the use of
learning and hashing (SFLH) [52] are under the deep learning-based
attention-based CNN is in glaucoma detection, where a network is pro-
methods.
posed consisting of a glaucoma classification subnet, pathological area
Learning the hash codes is entirely independent of the image fea-
tures. In contrast, the first class evaluates the hashing algorithms in localization subnet, and an attention prediction subnet [62]. Another
a two-step method from hand-made characteristics like the SIFT [37], approach to incorporating attention mechanisms is to use Attention
which can cause less accurate results. However, the second class of Gates (AG) [63] in standard CNN models to focus on important areas
techniques shows more accurate and optimal results in recent studies in clinical images for different analysis tasks, such as fetal ultrasound
as it depends on the characteristics to hash in an end-to-end way. classification. SAM help improve the results of medical images by
Deep hashing techniques hold the ground-truth tags that preserve guiding focused medical activations around important areas. Based on
the hash code similarity. The labels for rating activities are often pro- previous research, attention mechanisms can also help increase the
vided in either pairwise labels format or the triplet labels format. Some accuracy of medical image returns by getting the region-of-interest
examples of the former form are deep residual hashing (DRH) [53] (RoI) data. Unlike the CBIR, where max-pooling and average-pooling
and deep pairwise-supervised hashing (DPSH) [54]. DPSH was the first are used, element-wise mean, element-wise maximum operations, and
method to gain the most accurate results by instantiating hash code and max-pooling all along the channel axis to obtain a more accurate
image feature learning with a format called pairwise labels. By setting characteristic descriptor [8].
up a similarity matrix with pairwise labels for clinical imaging retrieval,
Xia et al. [47] recommended employing CNNH to convert the
the DRH was designed to preserve similarities and generate compact
image to binary code. The advantages of the created binary codes
hashing code.
made it possible to retrieve images quickly using Hamming sepa-
The triplet label format includes deep binary embedding networks
ration measurement, thereby reducing costs and increasing retrieval
(DBEN) [55] and deep supervised hashing (DSH) [56]. The triplet
label methods are superior to pairwise labels because of their more efficiency [52].
compact and accurate labels with more information [57]. Even though We have provided a comparison of our proposed work with classical
those mentioned above deep hashing methods have given extraordinary Hashing methods such as SIKH and LSH, learning-Based (Low) methods
results, there still needs to be a major ranking error in the case-based including KSH, SH, and MHF, and Learning-Based (Deep) encapsulating
medical image retrieval field. An important reason for this error is CNNH, DH, DSRH, DSCH, and SFLH in Table 1.

3
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

2.2. Healthcare applications based on medical image retrieval 3.3. Loss functions

Deep learning-based image analysis approaches have been exten- The overall objective function to be minimized is a weighted sum
sively used in computer-aided detection, diagnosis, and prognosis. of multiple loss functions:
These methods have shown their efficacy during the public health crisis
𝐿 = 𝐿𝑆 + 𝛼𝐿𝑄 + 𝛽𝐿𝐵 + 𝛾𝐿𝑅 (3)
caused by the new coronavirus 2019 pandemic. The use of chest radio-
graphy (CXR) has been of utmost importance in the process of triaging, where:
diagnosing, and monitoring COVID-19 patients. Zhong et al. [64]
proposed an image retrieval model of CXR that offers both compa- • 𝐿𝑆 is the pairwise similarity preservation loss.
rable visuals and related clinical information may be more clinically • 𝐿𝑄 is the quantization loss.
significant than a direct image diagnostic model, given the ambiguous • 𝐿𝐵 is the balanced code loss.
and non-specific signals in CXR. This study presents a new model for • 𝐿𝑅 is the regularization term.
retrieving chest X-ray (CXR) images using deep metric learning. Ali • 𝛼, 𝛽, 𝛾 are hyperparameters controlling the trade-off between dif-
Ahmed [65] introduced a new and innovative approach called the Rel- ferent loss terms.
evance Feedback Retrieval approach (RFRM) for CBMIR. The feedback
implemented in this context relies on the vote values given by each 3.3.1. Pairwise similarity preservation loss
class in the picture repository. In this study, he extracted eighteen color This loss ensures that similar images have similar hash codes. It
moments and GLCM texture characteristics to describe each picture. minimizes the difference between the inner product of the hash codes
Additionally, we used eight commonly used similarity coefficients as and the similarity 𝑆𝑖𝑗 of the corresponding images.
measures of similarity. After doing a preliminary investigation utilizing
a single random picture query, the highest-ranking photos from each ∑
𝑁 ∑
𝑁
‖ 𝑇 ‖2
𝐿𝑆 = ‖𝐮𝑖 𝐮𝑗 − 𝑙𝑆𝑖𝑗 ‖ (4)
category are used as voters to choose the most optimal similarity ‖ ‖
𝑖=1 𝑗=1
coefficient for the final search procedure. Wang et al. [66] presented
Retrieval with Clustering-guided Contrastive Learning (RetCCL) for ro- 3.3.2. Quantization loss
bust and accurate retrieval of whole-slide imaging (WSI) level images. This loss ensures that the real-valued output of the CNN is close to
Their framework combined a unique self-supervised feature learning the binary hash codes.
technique with a global ranking and aggregation mechanism to signif- ∑
𝑁
icantly enhance performance. The proposed feature learning technique 𝐿𝑄 = ‖𝐮𝑖 − tanh(𝑊 𝐟𝑖 + 𝑏)‖2 (5)
‖ ‖
utilizes abundant unlabeled histopathology image data to acquire uni- 𝑖=1
versal characteristics that may be immediately used to future WSI
3.3.3. Balanced code loss
retrieval tasks without the need for further fine-tuning. The suggested
This loss ensures that each bit in the hash codes has a balanced
technique for retrieving Whole Slide Images (WSIs) not only provides a
distribution, which helps in utilizing the hash code space efficiently.
collection of WSIs that are comparable to a given query WSI, but also
(| |)2
identifies and emphasizes certain patches or sub-regions within each ∑ 𝑙
|1 ∑ 𝑁
|
𝐿𝐵 = | 𝑢𝑖𝑗 || (6)
WSI that exhibit a high degree of resemblance to patches in the query |𝑁
|
𝑗=1 | |
WSI. This feature assists pathologists in interpreting the search results. 𝑖=1 |

3. Problem statement 3.3.4. Regularization term

This term helps to prevent overfitting by penalizing large weights
In the context of CBMIR, the goal is to develop a system that in the hash layer.
efficiently retrieves similar medical images from a large database given 𝐿𝑅 = ‖𝑊 ‖2𝐹 (7)
a query image. The problem involves effective feature extraction and
efficient similarity comparison. 3.4. Optimization
Given:
The optimization of the objective function is performed using gra-
• A query image 𝐼𝑞 .
dient descent-based methods. The parameters 𝜃, 𝑊 , and 𝑏 are updated
• A set of database images {𝐼𝑖 }𝑁
𝑖=1
. iteratively to minimize 𝐿.
• Labels or annotations for the images indicating medical condi-
tions or regions of interest. 𝜃, 𝑊 , 𝑏 ← 𝜃, 𝑊 , 𝑏 − 𝜂∇𝐿 (8)
where 𝜂 is the learning rate.
The objective is to find a hash function ℎ ∶ 𝐼 → {−1, 1}𝑙 that maps
each image to a binary code of length 𝑙, such that similar images have
3.5. Retrieval
similar binary codes.
During the retrieval phase, the features and hash codes are extracted
3.1. Feature extraction
for the query image. The similarity between the query hash code and
the database hash codes is computed, and the top-K similar images are
The feature extraction process uses a deep convolutional neural
retrieved.
network (CNN) to transform each image into a feature vector. Let
𝐹 (𝐼; 𝜃) denote the CNN with parameters 𝜃, which outputs a feature Retrieve top-K images 𝐼𝑘 similarity measure 𝐮𝑇𝑞 𝐮𝑖 (9)
vector for an input image 𝐼.
We have also presented the flowchart of our proposed methodology
𝐟𝑖 = 𝐹 (𝐼𝑖 ; 𝜃) (1) in Fig. 1. The process begins with inputting the training dataset and ex-
tracting features using ResNet-50. Binary hash codes are then generated
3.2. Hash code learning for each feature vector. The loss calculation stage includes computing
the pairwise similarity loss, quantization loss, balanced code loss, and
The goal is to learn binary hash codes that preserve image similari- regularization loss to form the total loss. Optimization is performed
ties. For each feature vector 𝐟𝑖 , the hash code 𝐮𝑖 is generated as follows: using gradient descent to update the parameters. For query processing,
features and hash codes are extracted for the query image, which is
𝐮𝑖 = sign(𝑊 𝐟𝑖 + 𝑏) (2)
then used to retrieve the top-k similar images from the database based
where 𝑊 and 𝑏 are the weights and bias of the hash layer, respectively. on the computed similarity. The retrieved images are finally outputted.

4
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

transforming data to the hash learning stage. In the hash learning

stage, FC layers, FC1 and FC2, are connected. Further, we use the
RICA algorithm to reduce features [67]. The loss is calculated using
four different types of losses; (i) Regularization, (ii) Quantization, (iii)
Pairwise, and (iv) Balanced loss. A binarization is applied to generate
binary codes in the database. Furthermore, we generate hash codes and
measure the similarity of query samples with the database samples.
Finally, we retrieve the most similar database images with the query
images.

4.2. Pseudo code of proposed approach

The pseudo-code provided outlines the main steps of our pro-

posed CBMIR framework 1. The framework involves several key stages:
feature extraction, hash code learning, loss calculation, optimization,
query processing, and image retrieval. Below is a detailed description
of each step in the pseudo-code.
Algorithm 1 CBMIR Framework Pseudo-code.
Require: Training dataset {(𝐼𝑖 , 𝑦𝑖 )}𝑁 𝑖=1
, Query image 𝐼𝑞 , Parameters
𝜃, 𝑊 , 𝑏, Hyperparameters 𝛼, 𝛽, 𝛾
Ensure: Retrieved images similar to 𝐼𝑞
1: Feature Extraction
2: for each image 𝐼𝑖 in training dataset do
Fig. 1. Flowchart of the proposed methodology.
3: 𝐟𝑖 ← 𝐹 (𝐼𝑖 ; 𝜃) ⊳ Extract features using ResNet-50
4: end for
5: Hash Code Learning
4. Proposed methodology 6: for each feature vector 𝐟𝑖 do
7: 𝐮𝑖 ← sign(𝑊 𝐟𝑖 + 𝑏) ⊳ Generate binary hash code
We have illustrated the methodology of our proposed work in Fig. 2. 8: end for
The process begins with an input medical image, which undergoes fea- 9: Loss Calculation
ture extraction using a CNN. The extracted features are then processed ∑𝑁 ∑𝑁 ‖ ‖2
10: 𝐿𝑆 ← 𝑖=1 𝑗=1 ‖𝐮𝑇𝑖 𝐮𝑗 − 𝑙𝑆𝑖𝑗 ‖ ⊳ Pairwise similarity loss
by a Dense Block to enhance feature learning. Subsequently, a SAB is ∑𝑁 ‖ ‖
11: 𝐿𝑄 ← 𝑖=1 ‖ ‖2
𝐮
‖ 𝑖 − tanh(𝑊 𝐟𝑖 + 𝑏)‖ ⊳ Quantization loss
applied to focus on the most relevant parts of the image, improving
∑𝑙 (| 1 ∑𝑁 )
| 2
the discriminative power of the features. These refined features are 12: 𝐿𝐵 ← 𝑗=1 | 𝑢 | ⊳ Balanced code loss
| 𝑁 𝑖=1 𝑖𝑗 |
then converted into compact binary hash codes, which are compared 13: 𝐿𝑅 ← ‖𝑊 ‖𝐹 2
⊳ Regularization loss
against precomputed hash codes stored in the database. The similarity 14: 𝐿 ← 𝐿𝑆 + 𝛼𝐿𝑄 + 𝛽𝐿𝐵 + 𝛾𝐿𝑅 ⊳ Total loss
comparison stage identifies the most similar images, which are then 15: Optimization
retrieved and presented as the final output. 16: for number of training iterations do
17: 𝜃, 𝑊 , 𝑏 ← 𝜃, 𝑊 , 𝑏 − 𝜂∇𝐿 ⊳ Gradient descent update
4.1. Network architecture 18: end for
19: Query Processing
The most popular feature extractor for extracting features is ResNet- 20: 𝐟𝑞 ← 𝐹 (𝐼𝑞 ; 𝜃) ⊳ Extract features for query image
50, which is highly effective. The final two feature vectors in the 21: 𝐮𝑞 ← sign(𝑊 𝐟𝑞 + 𝑏) ⊳ Generate binary hash code for query image
CNN’s fully connected (FC) layers have been altered to have features 22: Retrieval
4096 in both FC1 and FC2, with the first 4096 features serving as the 23: Retrieve top-k images 𝐼𝑘 from database based on similarity
main vectors for generating hash codes. However, due to the issues measure 𝐮𝑇𝑞 𝐮𝑖
mentioned above, this vector is insufficient. The vector with 4096 24: return Retrieved images 𝐼𝑘
features gives more precise information; thus, the two are merged. In
the context of our proposed CBMIR framework, we leveraged ResNet50
because it is a well-established model that has been widely used in 4.3. Feature learning
various image recognition tasks. Its robustness and ability to learn deep
features make it a reliable choice for medical image analysis’s initial A critical component of our proposed network is the Dense Block
feature extraction phase. Given the nature of medical datasets, which connected with ResNet-50. Besides this, the model has three transition
can sometimes be limited in size due to privacy concerns or the rarity layers. The SAB is introduced between the dense block and FC layers.
of certain conditions, ResNet50’s proven transfer learning capabilities SAB’s parameters include a pyramid of height 3 and MaxPooling as
allow it to perform well even when fine-tuned on relatively smaller the pooling model. SAB brings out the characteristics of code maps by
medical image datasets. The proposed framework integrates Dense Dense Block at different scales and creates multi-scale characteristics
Blocks following the ResNet50 model for further feature extraction. The that merge multiple regional information. Simultaneously, following
architecture of ResNet50, known for its residual learning capability, ResNet-50 on the size, the image loses critical information due to
complements the dense connectivity pattern of Dense Blocks, leading cropping, scaling, and other procedures.
to a potent combination for capturing multi-scale features. The spatial attention block guided our network to focus on the most
The proposed network is illustrated in Fig. 3. Query samples are relevant parts of an image. In medical images, where certain regions
sent to the feature learning block, where we leverage a ResNet-50 to may carry more diagnostic information, such attention can enhance the
extract features. After that, three transition layers are added to send extraction of features that are most indicative of a particular pathology
data to the Dense Block. We concatenate the DenseNet with SAB before or condition. By concentrating on salient features, spatial attention

5
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

Fig. 2. Detailed methodology of the proposed CBMIR framework.

Fig. 3. Our proposed CBMIR framework based on hash code learning mechanism.

improved the accuracy of hash codes in representing medical images. ResNet-50 is used in the study. There are five phases, including
This has led to more precise retrieval results, as the hash codes would identity and convolution blocks. Each identity block and convolution
better capture the distinguishing features of each image. has three levels of convolution and 24 million parameters. The residual
To understand our proposed method, consider the result of SAB is network model is the most important. To teach a deep framework,
( )
𝑥, and 𝑥 changes to ln(𝑥 + 𝛽), ln2 (𝑥 + 𝛽) , in which parameter matrix many photos are needed. Since we lack images, we use transfer learn-
is represented by 𝛽. Supposing 𝜕𝑙∕𝜕𝑦 in forward propagation, be the ing. By training the network with photos from various domains, the
SAB’s result gradient, and the value of 𝑥 after going across SAB is 𝑙. proposed method learns low and mid-level features. Training the sam-
[ ]
Here 𝑦 = ln(𝑥 + 𝛽), ln2 (𝑥 + 𝛽) , so it can be stated as: ple and fine-tuning parameters yields high-level features. Resizing all
𝜕𝑙 𝜕𝑙 images to 512 × 512 pixel resolution, setting dropout parameter to 3,
= [ ] (10) using 16 images in the mini-batch, optimizing the ResNet-50 model
𝜕𝑦 𝜕 ln(𝑥 + 𝛽), ln2 (𝑥 + 𝛽)
with stochastic gradient descent (SGD), and setting the learning rate
The addition of SAB is done on our model with the help of backward to 0.001 with 60 epochs of training are the fine-tuning parameters.
and forward propagation, thus improving its non-linearity. FC1 and FC2
layers have 4096 nodes each, whereas the hashing layer has 𝑙 nodes, 4.5. Deep feature reduction
where 𝑙 is the length of the hash nodes.
To enhance learning, it is necessary to eliminate noise and irrelevant
4.4. Deep feature extraction features from the large number of features present in each image. This
will result in only the desired features being learned, decreasing storage
Deep learning-based CNN algorithms have been shown to be the
space and learning time.
most effective in bridging the semantic gap. The CNN architecture
typically consists of several fundamental layers, such as convolution, minimize‖𝑊 𝑥‖1 such that 𝑊 𝑊 𝑇 = 𝐼 (12)
pooling, ReLU, fully connected (FC), softmax, and dropout layers. The
ICA produces a feature set of alloying elements. The matrix’s com-
convolution layer is responsible for capturing image features, with
the first layer kernels capturing low-level features and deep kernels ponents are all independent of one another.
capturing high-level features. The pooling layer reduces the image size 1‖ 𝑇 ‖2
min 𝜆‖𝑊 𝑋‖1 + ‖𝑊 𝑊 𝑥 − 𝑥‖ (13)
while retaining the essential elements. ReLU provides non-linearity to 𝑊 2‖ ‖2
the system, while the FC layer is responsible for classifying the neural In Eq above, 𝑊 is the weight matrix, and 𝑥 is the input matrix. As it
network structure. The CNN output is given by: chooses the best features and shrinks the matrix, the ReliefF algorithm
is a better feature-reduction technique. It is built on the fundamental
𝑓 (𝐼) = 𝑝𝑜𝑜𝑙𝑛𝑥𝑛 (𝜎(𝑤 × 𝐼 + 𝑏)) (11)
principle of the k-nearest neighbor algorithm. While 𝑊 represents
Where 𝐼 is the image sample, the 𝑝𝑜𝑜𝑙 is the max pooling layer, 𝑛 is weight matrices in both Eqs. (12) and (13), the context and constraints
the kernel size, the ReLU operation is shown by 𝜎, CNN layer weight under which 𝑊 operates differ between the two equations. The essence
𝑤, bias 𝑏, and convolution operation 𝑥 are all capitalized. of 𝑊 in Eq. (12) is to enforce independence through orthogonality,

6
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

model parameters 𝜃. The 𝑡𝑎𝑛ℎ approximation allows the model to learn

continuous embeddings that can be thresholded to produce binary hash
codes, combining the benefits of differentiable optimization with the
utility of binary hashing in retrieval tasks.

5. Experimental evaluation

5.1. Datasets

ChestX-ray8: This dataset comprises a total of 108,948 frontal X-

ray images of medical nature, which were obtained from 32,717 patient
scans. The dimensions of each image in the dataset are 512 × 512
pixels. All the images are categorized with one or more thorax diseases
or labeled as ‘‘Normal’’. To train and test the model, we divided
the dataset into two parts: 80% for training and 20% for testing, as
documented in [39].
The Cancer Imaging Archive-Computed Tomography (TCIA-CT):
The dataset used in this study includes 225961 images from NSCLC-
Radionics (51513 images), PANCREAS CT (19328 images), TCGA-BLCA
Fig. 4. The feature reduction process using different techniques and selecting the best (69481 images), RIDER Lung CT (15419 images), and RIDER NEURO
method for further analysis.
MRI collections (70220 images) [40]. We used the Lung-PET-CT-Dx
dataset, which contains 199000 images from four body regions: 6
thousand lungs, 19 thousand pancreas, 6 thousand nervous system, and
while in Eq. (13), it is to balance sparsity with reconstruction fidelity, 6 thousand urothelial bladder images, each resized to 512 × 512. The
making them conceptually distinct despite the shared notation. Fig. 4 dataset’s photos are properly labeled by anatomy area and abnormality.
outlines the process of reducing features using various techniques such We also split the dataset into 80:20 training and testing sets for further
as Independent Component Analysis (ICA), Reconstruction Independent research.
Component Analysis (RICA), and the Relief algorithm.
5.2. Implementation details
4.6. Hash code learning
In our study, we utilized a high-performance computing workstation
Pairwise Loss: This study reduces the L2 loss for the similarity 𝑆𝑖𝑗 equipped with an Intel Xeon E5-2698 v4 processor, an NVIDIA Titan
and the database hash-coding pairs 𝑢𝑇𝑖 𝑣𝑗 inner product to protect the RTX GPU, 256 GB of DDR4 RAM, and a 4 TB NVMe SSD running
similarity between samples of query and database. on Ubuntu 20.04 LTS. The deep learning models were developed us-
∑
𝑚 ∑
𝑛
‖ 𝑇 ‖2
ing TensorFlow 2.5.0 and Keras 2.4.3, with CUDA 11.2 and cuDNN
𝐿𝑆 = ‖𝑢𝑖 𝑣𝑗 − 𝑙𝑆𝑖𝑗 ‖ (14) 8.1 providing GPU acceleration. Data handling and processing were
‖ ‖
𝑖=1 𝑗=1
facilitated by NumPy and Pandas, while visualization tasks were ac-
Where the database sample is shown by 𝑣𝑗 , the query samples’ learned complished using Matplotlib and Seaborn. For preprocessing, images
hash codes are shown by 𝑢𝑖 , 𝑙 shows the length of the hash code, 𝑚 is were normalized and augmented. The model architecture included a
the query sample’s number, and 𝑛 is the total database samples. The ResNet50 base with Dense Blocks and a Spatial Attention Block for
problem is hard to learn in the above equation because of the discrete feature extraction, followed by custom hash code generation. Training
optimization problem. Which can be reformulated as: employed the Adam optimizer, L2 regularization, and early stopping,
∑
𝑚 ∑
𝑛 ensuring robust and efficient model performance. Performance was
‖ ( )𝑇 ‖2
𝐿𝑆 = ‖ℎ 𝑥𝑖 𝑣𝑗 − 𝑙𝑆𝑖𝑗 ‖ evaluated on the ChestX-ray8 and TCIA-CT datasets using metrics such
‖ ‖
𝑖=1 𝑗=1 as mean average precision (mAP), top-5 accuracy, and retrieval time.
∑
𝑚 ∑
𝑛
‖ ( ( ))𝑇 ‖2 Table 2 briefly describes the hardware and software details of the
≈ ‖sign ℎ 𝑥𝑖 𝑣𝑗 − 𝑙𝑆𝑖𝑗 ‖
‖ ‖ implemented methodology.
𝑖=1 𝑗=1
(15)
∑
𝑚 ∑
𝑛
‖ ( ( ))𝑇 ‖2
= ‖sign 𝐹 𝑥𝑖 ; 𝜃 𝑣𝑗 − 𝑙𝑆𝑖𝑗 ‖ 5.3. Evaluation protocols
‖ ‖
𝑖=1 𝑗=1
∑𝑚 ∑ 𝑛
‖ ( ( ))𝑇 ‖2 To make a comparison of the accuracy of different hashing algo-
≈ ‖tanh 𝐹 𝑥𝑖 ; 𝜃 𝑣𝑗 − 𝑙𝑆𝑖𝑗 ‖ rithms, there are two common protocols of evaluation that are used to
‖ ‖
𝑖=1 𝑗=1
retrieve images. Mean average precision (mAP), in which we actually
( ) [ ( ) ( ) ( )]𝑇
Where ℎ 𝑥𝑖 = ℎ1 𝑥𝑖 , ℎ2 𝑥𝑖 , … , ℎ𝑙 𝑥𝑖 shows the hash functions quantify the average precision (AP) scores of the retrieved images. The
( )
and 𝐹 𝑥𝑖 ; 𝜃 ∈ R𝑙 displays our model’s output in the learning part of overall region underneath the curve of precision–recall is the AP. The
the feature. Since sign cannot transfer the gradients back, tanh(∙) is used higher the average precision value, the more accurately the algorithm
instead of sign(∙) and 𝜃. recaptures the images. The formula for average precision is given as
The final step in Eq. (15) is grounded in making the optimization follows:
∑𝑄
problem more tractable. The 𝑠𝑖𝑔𝑛 function, while ideal for generat- 𝑃 (𝑘)𝛿(𝑘)
𝑟=1 𝑞
ing binary outputs, is non-differentiable, which poses challenges for 𝐴𝑃 (𝑞) = ∑ 𝑄
(16)
𝑟′ =1
𝛿 (𝑘′ )
gradient-based optimization methods commonly used in training deep
learning models. By contrast, the 𝑡𝑎𝑛ℎ function is a smooth, differ- In this equation, 𝑞 is the image in question, 𝑄 is the sum of retrieved
entiable approximation to the sign function, providing values in the images, 𝛿(∗) is the indicator function, and 𝑃𝑞 is the precision of retrieval
range (−1, 1), closely resembling binary outputs. The 𝑡𝑎𝑛ℎ function’s of k samples. When the image label for the 𝑘th sample is identical to the
smooth gradient enables efficient backpropagation and learning of the actual label, its value comes 1, but when not, the value is 0. The mAP is

7
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

Table 2 methods, is crucial. The ADSH algorithm [69] is selected because of its
Hardware and software configuration details of our study.
asymmetric feature learning technique, which is crucial for comparing
Category Details medical pictures that typically exhibit considerable variations when
Hardware configuration presented in pairs. The ITQ method, as described by [70], is well
Processor Intel Xeon E5-2698 v4 (20 cores, recognized for its iterative approach in reducing the quantization error
2.2 GHz) of binary codes. Our approach is very efficient in generating precise
GPU NVIDIA Titan RTX (24 GB GDDR6)
binary codes without requiring iterative methods, as shown by the
Memory 256 GB DDR4 RAM
Storage 4 TB NVMe SSD comparison. DH, as described by Erin et al. [48], is a basic approach
Software configuration
for deep learning-based hashing. The comparison of our technique
with DH demonstrates the enhancements achieved by the innovative
OS Ubuntu 20.04 LTS
Programming Python 3.8.5 integration of CNN with deep hash coding. The IDHN algorithm, as
language described in Zhang et al. [71], incorporates alterations to the deep
Frameworks TensorFlow 2.5.0, Keras 2.4.3 hashing procedure. This makes it an appropriate choice for evaluating
GPU libraries CUDA 11.2, cuDNN 8.1
and demonstrating the enhancements of our approach compared to
Data handling NumPy 1.19.5, Pandas 1.2.3
Visualization Matplotlib 3.3.4, Seaborn 0.11.1
advanced hashing methods. DBDH [72] is a sophisticated deep hashing
Other libraries OpenCV 4.5.1, Scikit-learn 0.24.1, method developed exclusively for image retrieval. It is a formidable
Pillow 8.1.2 contender for assessing the efficacy of our suggested approach. Fig. 5
Implementation details shows the mAP of different hashing algorithms on 16, 32, 64, and 128
Preprocessing Normalization, Data Augmentation bits.
(rotation, scaling, flipping) Furthermore, the accuracy and retrieval time consumption of the
Model architecture ResNet50, Dense Blocks, Spatial training phase were evaluated, indicating superior performance of the
Attention Block, Sigmoid Activation
proposed method over other methods. Compared to the suggested
Training Adam Optimizer (lr: 1e−4, decay:
1e-6/epoch), Batch Size: 64, Epochs: method, the other methods showed a delay in time consumption rang-
100, L2 regularization ing from 34.07% to 157.92%. The effectiveness of the end-to-end learn-
Evaluation Datasets: ChestX-ray8, TCIA-CT; ing model is highlighted by the substantial progress of deep learning-
Metrics: mAP, Top-5 Accuracy,
based methods over traditional ones. This can be attributed to the
Retrieval Time
incompatibility between binary encoding processes and network learn-
ing in DH and CNNH. Additionally, the sigmoid function, which is a
parametric and nonlinear threshold function in DNNH, increases the
the summation of all AP values of different images and is proportioned difficulty of network training. Incorporating an enhanced loss function
to all the queries. and a regularization term can increase the ability of binary codes to
discriminate.
1 ∑
𝑄
( )
𝑚𝐴𝑃 = 𝐴𝑃 𝑞𝑖 (17) The experiment shows that the our method’s retrieval outcome
𝑄 𝑖=1
closely matched the search image. Additionally, it maintains good dis-
It can now be calculated the number of total images recovered by criminative performance when using low-dimensional feature vectors.
dividing the accuracy for k samples by the proportion of real image As the k increases, the mAP of DH decreased fast. Without loss function
labels in the returned images. By using the applied setting of this hash optimizations, the results of ResNet, DenseNet, and our method were
method, We affirm that the class label reflects reality; it must be noted very similar. It also showed that the fact that our method outperforms
that the accuracy of the returned image with the highest rank is very ResNet demonstrates how adding the RICA may increase retrieval
significant, which is, in turn, the most important reason for the use of accuracy. Among all others, the suggested method performed the best
a weighted algorithm of voting method. as it maintained high mAP as k changes. We also compare our method’s
performance with ResNet and DenseNet on mAP metric in Fig. 6. It
5.4. Experiments on Chest-X-ray8 dataset can be seen that our method outperformed the state-of-the-art neural
networks.
9000 randomly chosen samples were used in this experiment, with Fig. 7 is the illustration of retrieval performance qualitatively. We
the remaining examples serving as training sets. The use of dissimilar also compare our proposed method with state-of-the-art ResNet and
and similar images was done the same as in the previous experiment. DenseNet methods. Our method achieved better retrieval performance
We did not compare the suggested approach with the hashing method on ChestX-ray8 image dataset.
as it was very time-consuming regarding large volumes. Additionally,
the precision of the hashing algorithm lagged behind that of deep learn- 5.5. Experiments on TCIA dataset
ing techniques. Instead, we compare our approach with some different
deep learning techniques, including SH [33], MAH [68], ADSH [69], One thousand random images were chosen as samples, and the
ITQ [70], DH [48], IDHN [71], DBDH [72] and others that performed remaining images served as the training set. Two images of various
better at image retrieval. It was also contrasted to assess the effects body parts made different pairs of images in the training dataset. The
of the loss optimization algorithm using the provided technique with images were manually evaluated, and pairs of related images were
unoptimized loss functions. chosen. There were 3:2 pairs of images that were dissimilar to each
The choice of each hashing algorithm is decided very carefully. other. To judge the efficiency of final result, the method was compared
One example of a conventional data-independent hashing algorithm with many other methods applied to the same dataset. We again com-
that is generally recognized as a baseline in the area is SH [33]. By pare our method with we compare our approach with some different
comparing our technique to SH, we showcase our progress compared to deep learning techniques, including SH [33], MAH [68], ADSH [69],
conventional methodologies. The inclusion of MAH [68] is justified by ITQ [70], DH [48], IDHN [71], and DBDH [72] on different hash code
its ability to capture multiple perspectives, which is particularly useful bits. It can be seen from Fig. 8 that our method outperforms on all 4
in medical imaging since diverse viewpoints may provide additional different bit scales. e.g., 16, 32, 64, and 128 bits.
and complementary information. Demonstrating the superior perfor- The study compared mAP values across various hash bit config-
mance of our framework, especially in comparison to multi-view data urations and found that the proposed method outperformed other

8
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

Fig. 5. mAP of different hashing algorithms on ChestX-ray8 dataset compared with our work at (a) 16 bit, (b) 32 bit, (c) 64 bit, and (d) 128 bit.

Table 3
Time complexity comparison of proposed work.
Models Classification Retrieval
ResNet 780 ms 778 ms
DenseNet 761 ms 732 ms
Ours 689 ms 681 ms

network. Regularization loss and Quantization loss are likely to add

a minimal computational burden, as they are typically implemented
Fig. 6. The map at different code lengths on ChestX-ray8 dataset. as part of the cost function without adding to the network’s depth.
Pairwise and Balanced losses might be more computationally intensive,
as they may require additional steps to compare pairs of images and
approaches. CNN demonstrated better performance compared to Hash- ensure the balance of the hash codes, respectively.
ing and DH methods. The experiment also revealed that the best image
retrieval performance was achieved at 48 bits, and this value was se- 5.7. Real-world performance validation
lected for further testing. The method was then evaluated by increasing
the number of retrieved images from 5 to 100. The results showed We conducted extensive evaluations on multiple real-world medical
that all methods had higher mAP values when k was smaller. Further- image datasets to verify the operational and real-world performance
more, the performance of our method was compared with ResNet and of our proposed CBMIR framework. These datasets represent diverse
DenseNet on the TCIA-CT dataset, and our approach achieved better clinical scenarios and imaging modalities, ensuring a robust assessment
retrieval accuracy than the other neural networks (refer to Fig. 9). of our method’s practical applicability.
In general, image retrieval using deep learning algorithms showed MIMIC-CXR Dataset: This dataset comprises over 377,000 chest
better performance than traditional hash-based techniques. In the case radiographs from more than 60,000 patients, comprehensively repre-
of small datasets, all of the methods mentioned above were effective senting clinical chest X-ray imaging scenarios [41]. The images are
in retrieving images that were similar to the query image. The high annotated with detailed labels for various thoracic diseases. We achieve
similarity among the top five retrieved images by the proposed ap- an mAP of 0.84 and a top-5 accuracy of 92.5%, with an average re-
proach demonstrates its capability to differentiate between similar and trieval time of 735 ms. This indicates the method’s robustness and high
dissimilar images, which is achieved by optimizing the loss function. retrieval performance on large-scale, real-world chest X-ray datasets.
The qualitative performance is illustrated in Fig. 10. These results demonstrate the practical utility of our framework in
assisting radiologists with efficient and accurate image retrieval.
5.6. Time complexity LIDC-IDRI Dataset: This dataset contains thoracic CT scans with
annotated lesions, including over 1000 cases with a total of 244,527
The proposed work, when applied for medical image retrieval, has images [42]. It provides a rich source of data for assessing the per-
a time complexity that depends on the structure of the network and formance of image retrieval systems in identifying lung abnormalities.
the implementation of the loss functions. The time complexity depends On this dataset, we achieve a mAP of 0.82 and a top-5 accuracy of
𝑂(𝑛) on the efficiency of the binary hash code generation process. If the 91.2%, with an average retrieval time of 748 ms. This demonstrates
network is designed to be efficient and the loss functions are applied in the method’s effectiveness in handling CT scans with annotated lesions,
a way that does not substantially increase the per-layer computational maintaining high accuracy and efficiency. These results validate the
load, the proposed method has a time complexity better than ResNet framework’s ability to identify and retrieve relevant medical images
and DenseNet as shown in Table 3. across different imaging modalities and clinical scenarios.
The time complexity of the proposed method might increase if the The performance metrics, including mean average precision (mAP),
loss functions require complex operations or additional layers in the top-5 accuracy, and retrieval time, are summarized in Table 4. These

9
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

Fig. 7. The map at different code lengths on the ChestX-ray8 dataset.

Fig. 8. mAP of different hashing algorithms on TCIA-CT dataset compared with our work at (a) 16 bit, (b) 32 bit, (c) 64 bit, and (d) 128 bit.

Table 4 The performance metrics for each method are summarized in Ta-
Performance on additional real-world datasets. ble 5. The metrics include mAP, retrieval time, and accuracy on the
Dataset mAP Top-5 accuracy Retrieval time (ms) ChestX-ray8 and TCIA-CT datasets.
MIMIC-CXR 0.84 92.5% 735 Our proposed method achieved an mAP of 0.85, outperforming
LIDC-IDRI 0.82 91.2% 748 all other methods. The closest performance was DBDH with an mAP
of 0.80. In terms of accuracy, our method achieved 92.5%, again
surpassing the next best method (DBDH) which had an accuracy of
results demonstrate the robustness and efficiency of our framework 91.0%. The retrieval time for our method was 675 ms, the fastest
across different medical imaging modalities and clinical scenarios. among all compared methods, with the second fastest being DBDH at
Evaluation Protocols: The evaluation protocols included mean 750 ms.
average precision (mAP), top-5 accuracy, and retrieval time to assess re- On the TCIA-CT dataset, our method achieved an mAP of 0.82,
which is higher than the 0.78 achieved by DBDH. Our method also led
trieval performance comprehensively. These metrics provide a balanced
in accuracy with 91.2%, compared to 90.8% for DBDH. The retrieval
view of the retrieval system’s accuracy and efficiency.
time for our method was consistently 663 ms, faster than DBDH which
took 760 ms.
5.8. Comparison with related works
The comparison clearly demonstrates that our proposed CBMIR
framework outperforms other state-of-the-art methods across multiple
In this section, we compare the performance of our proposed CB- metrics. The superior performance can be attributed to the innovative
MIR framework with several state-of-the-art methods in content-based combination of dense block-based feature learning, spatial attention
medical image retrieval. The comparison focuses on key metrics such mechanisms, and robust hash code learning strategies. Additionally,
as mAP, retrieval time, and overall accuracy. The related works we integrating generative AI techniques for data augmentation further
considered for comparison are SH [33], a widely used baseline hash- enhanced the robustness and accuracy of our framework.
ing method, MAH [68], an advanced multi-view hashing technique,
ADSH [69] a deep learning-based hashing method. ITQ [70] is a 6. Economic improvement index evaluation
popular method for optimizing binary codes. DH [48] is a foundational
deep learning-based hashing technique. IDHN [71] is an enhanced deep The adoption of advanced CBMIR systems in healthcare can have
hashing approach. DBDH [72] is a recent deep learning-based hashing significant economic benefits. In this section, we evaluate the economic
method. improvement index (EII) of our proposed CBMIR framework.

10
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

Table 5
Comparison of CBMIR methods.
Method Type Dataset mAP Accuracy Retrieval
time (ms)
SH [33] Classical Hashing ChestX-ray8 0.65 85.2% 900
MAH [68] Multiview Hashing ChestX-ray8 0.70 87.6% 850
ADSH [69] Deep Learning Hashing ChestX-ray8 0.75 89.3% 800
ITQ [70] Iterative Optimization ChestX-ray8 0.68 86.1% 870
DH [48] Deep Learning Hashing ChestX-ray8 0.72 88.0% 820
IDHN [71] Enhanced Deep Hashing ChestX-ray8 0.77 90.2% 780
DBDH [72] Deep Learning Hashing ChestX-ray8 0.80 91.0% 750
Ours Deep Learning Hashing ChestX-ray8 0.85 92.5% 675
SH [33] Classical Hashing TCIA-CT 0.63 84.5% 920
MAH [68] Multiview Hashing TCIA-CT 0.68 86.7% 870
ADSH [69] Deep Learning Hashing TCIA-CT 0.73 88.9% 820
ITQ [70] Iterative Optimization TCIA-CT 0.66 85.8% 890
DH [48] Deep Learning Hashing TCIA-CT 0.70 87.5% 840
IDHN [71] Enhanced Deep Hashing TCIA-CT 0.75 89.6% 790
DBDH [72] Deep Learning Hashing TCIA-CT 0.78 90.8% 760
Ours Deep Learning Hashing TCIA-CT 0.82 91.2% 663

3. Ongoing expenses for maintaining and operating the system,

including computational resources and software updates.
4. Expenses related to training healthcare professionals to use the
system effectively.

6.4. Evaluation of economic benefits

To quantify the economic benefits, we conducted a study in a

clinical setting with implemented CBMIR systems, assessing the impact
of the CBMIR system.
Fig. 9. The map at different code lengths on TCIA dataset. Reduction in Diagnostic Time: The average diagnostic time per
patient was reduced by 15 min due to faster image retrieval. Given that
a radiologist sees an average of 20 patients per day, this translates to a
6.1. Economic improvement index (EII) time saving of 300 min (5 h) per day. Assuming an average radiologist’s
hourly wage of $150, the cost saving per day is:
The EII is defined as the ratio of economic benefits achieved through
Daily Cost Saving = 5 h/day × $150∕h = $750 (19)
implementing the CBMIR system to the costs associated with its deploy-
ment and operation. It provides a quantitative measure of the economic Over six months (approximately 120 working days), the total cost
value added by the proposed approach. saving is:
Economic Benefits Total Cost Saving = 120 days × $750∕day = $90, 000 (20)
EII = (18)
Costs
Improved Diagnostic Accuracy: Improved diagnostic accuracy can
6.2. Components of economic benefits lead to better patient outcomes and reduced costs associated with
misdiagnoses. Assuming a reduction in misdiagnosis rates by 10% and
The economic benefits of the CBMIR system can be categorized into the average cost of a misdiagnosis being $5000, for a facility handling
several key areas: 1000 patients in six months, the cost saving is:

1. By quickly retrieving relevant medical images, radiologists and Cost Saving from Reduced Misdiagnoses
clinicians can make faster and more accurate diagnoses, reduc- = 0.10 × 1000 patients × $5000∕misdiagnosis = $500, 000 (21)
ing the overall diagnostic time.
2. The CBMIR system enhances the accuracy of diagnoses by pro- Cost Savings from Reduced Redundancy:
viding similar past cases, which can lead to better treatment
Cost Saving from Reduced Redundancy
decisions and patient outcomes.
3. The system reduces the need for redundant imaging tests by = 0.05 × 1000 tests × $1000∕test = $50, 000 (22)
providing access to previously acquired images, leading to cost Operational Efficiency: Operational efficiency improvements lead
savings. to better utilization of resources. Assuming a 10% improvement in
4. Improved retrieval speed and accuracy streamline the workflow operational efficiency translates to a cost saving of $100,000 over six
in medical facilities, leading to higher operational efficiency. months.

6.3. Costs Cost Saving from Operational Efficiency = $100, 000 (23)

Total Economic Benefits: Summing up all the economic benefits:

The costs associated with the CBMIR system include:
Total Economic Benefits = $90, 000 + $500, 000 + $50, 000 + $100, 000
1. Expenses related to the design, development, and testing of the
= $740, 000 (24)
CBMIR framework.
2. Costs involved in integrating the system into existing healthcare Evaluation of Costs: For the same period, the total costs were
IT infrastructure. estimated as follows:

11
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

Fig. 10. The map at different code lengths on TCIA-CT dataset.

Table 6 with other binary codes to retrieve similar images. Hashing tech-
Economic impact evaluation. niques can also reduce the feature space’s dimensionality, leading
Category Amount (USD) to faster retrieval performance. The use of deep learning techniques
Reduction in diagnostic time $90,000 in hash coding-based CBMIR systems has further improved retrieval
Improved diagnostic accuracy $500,000 performance by extracting high-level features from medical images.
Cost savings from reduced redundancy $50,000
Operational efficiency $100,000
The proposed deep hash coding-based CBMIR framework presented
Total economic benefits $740,000 in this study significantly contributes to the medical image retrieval
Total costs $300,000 field. The framework utilizes a hybrid Dense block-based feature learn-
Economic improvement index (EII) 2.47 ing network with a hash learning block and a spatial attention block to
capture multi-scale feature information. The proposed framework also
uses the Reconstruction Independent Component Analysis algorithm
to reduce dimensionality, leading to improved retrieval performance.
• Development Costs: $200,000
Four loss functions are applied to the query code block to enhance
• Implementation Costs: $50,000
the feature’s distinctiveness. The compact database codes are binarized
• Operational Costs: $30,000
and compared with the hash codes of the database images, leading to
• Training Costs: $20,000
improved retrieval performance.
We employed RICA [67], an unsupervised learning algorithm aim-
Total Costs = $200, 000 + $50, 000 + $30, 000 + $20, 000 = $300, 000
ing to find a set of statistically independent basis components from
(25) unlabeled data. The key aspects of RICA are that it does not require
Economic Improvement Index: Finally, the Economic Improve- labeled data and learns features by reconstructing inputs while encour-
ment Index (EII) is calculated as: aging the independence of features. It aims to reconstruct the original
input data with as few active components as possible, enforcing sparsity
$740, 000
EII = = 2.47 (26) in the learned features. By maximizing the statistical independence of
$300, 000
the output components, RICA ensures that the features capture diverse,
An EII greater than 1 indicates a positive economic impact, propos- non-redundant aspects of the data. In the context of medical image
ing that implementing our CBMIR framework provides substantial eco- analysis, RICA can be used to learn features that highlight different
nomic benefits. anatomical structures or pathologies without requiring labeled exam-
ples of each. RICA, coupled with four loss functions, is a powerful tool
6.5. Analysis of EII in medical image analysis. It leads to a more interpretable model that is
better at generalizing across different medical imaging tasks compared
To further validate the economic impact of our CBMIR framework, to methods that do not enforce feature independence and sparsity.
we evaluated the Economic Improvement Index (EII) in a clinical Contrasting ICA [73] with RICA [67] in the context of medical
setting. The results are summarized in Table 6. image retrieval allows for a comprehensive evaluation of their effi-
The calculated EII of 2.47 indicates that the proposed CBMIR frame- cacy in feature extraction and dimensionality reduction. It provides
work provides substantial economic benefits, with the economic gains insights into how each method affects the retrieval process and which
significantly outweighing the implementation and operation costs. is more suited to the unique challenges of medical image analysis,
These results reinforce the practical and economic value of our CB- such as preserving diagnostic details, handling noise, and improving
MIR approach, demonstrating its potential to enhance both clinical retrieval accuracy and efficiency. Image processing and medical im-
outcomes and cost-efficiency in healthcare settings. age retrieval systems rely on feature extraction and dimensionality
reduction using Independent Component Analysis (ICA) [73] and RICA.
7. Discussion Comparing ICA with RICA helps find the best approach. ICA separates
mixed signals by transforming data into distinct components. This
Due to the increasing use of imaging modalities in clinical practice, might distinguish pathological abnormalities in medical photographs.
content-based medical image retrieval (CBMIR) is a critical research RICA adds rebuilding while seeking independent features like ICA; it
topic. The emergence of deep learning techniques, particularly con- also reconstructs the input from these features, which helps preserve
volutional neural networks (CNNs), has significantly improved the visual information needed for proper diagnosis. Comparing ICA and
retrieval performance of medical images. In recent years, hash coding RICA helps researchers identify which approach maintains the most
has emerged as a popular technique for enhancing the CBMIR system’s important retrieval information. The objective is to balance lowering
retrieval accuracy. dimensions and keeping a feature collection that improves retrieval.
Hash coding-based CBMIR systems aim to represent images in a The integration of Hash Code Learning with four loss functions into
compact binary code, which can be efficiently stored and compared medical image retrieval systems marked a significant step forward in

12
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

the field. Each loss function addressed a specific aspect of the hash code restrictions and diversify synthetic samples by integrating a general
generation process, from preventing overfitting with Regularization DSG technique. This involves loosening the statistical alignment for BN
loss, ensuring accurate binary representations with Quantization loss, features, increasing the loss effect on certain BN layers, and preventing
and preserving semantic relationships with Pairwise loss to optimizing sample correlation.
the hash code distribution with Balanced loss. Regularization Loss The evaluation of the Economic Improvement Index (EII) demon-
ensured the model generalizes well to new data, preventing overfitting strates that the proposed CBMIR framework not only improves diag-
and producing robust, generalizable hash codes. This ensures the re- nostic accuracy and operational efficiency but also offers significant
trieval system can handle a wide variety of medical images, improving economic benefits to healthcare facilities. This analysis underscores
the relevance and quality of retrieval results. Quantization Loss mini- the practical value and potential cost-effectiveness of our approach in
mized the gap between continuous feature representations and binary real-world clinical settings.
hash codes, leading to more precise retrieval outcomes that are es-
sential for accurate clinical decision-making. Pairwise Loss maintained 7.1. Role of intelligent approaches based on generative artificial intelligence
semantic relationships between images, ensuring that similar images
have similar hash codes. Balanced Loss ensured an even distribution of Generative Artificial Intelligence (AI), particularly through tech-
hash codes, maximizing the hash code space’s utilization. This diversity niques such as Generative Adversarial Networks (GANs) [27] and Vari-
in retrieval results prevents the dominance of certain image types, ational Autoencoders (VAEs) [79], plays a pivotal role in enhancing
ensuring a comprehensive and unbiased search outcome relevant to a CBMIR systems. These intelligent approaches can significantly improve
broad spectrum of medical inquiries. various aspects of our proposed framework, making it more robust,
The proposed framework is designed to enhance medical image efficient, and accurate. Generative AI can create realistic synthetic
retrieval through advanced hash code learning and RICA and can medical images that augment the existing training datasets [80]. This
potentially impact clinical practices significantly. The integration of augmentation is crucial in the medical domain, where acquiring large
this framework into existing healthcare IT systems and its usability amounts of labeled data is challenging due to privacy concerns and the
for healthcare professionals. The framework can seamlessly integrate need for expert annotations. By generating a diverse set of synthetic
into healthcare IT infrastructures in several ways. By ensuring compat- images, GANs and VAEs can help mitigate the problem of limited data,
ibility with standard healthcare data formats (e.g., DICOM for medical thereby improving the training process of deep learning models [81].
imaging [74] and HL7 for healthcare data exchange), the framework Generative models can capture the complex and subtle variations
can be integrated into EHR systems to augment clinical decision sup- in medical images, leading to better feature representation [82]. This
port [75]. Adhering to healthcare industry security standards, such as improved feature learning can enhance the discriminative power of
HIPAA in the United States [76], ensures that patient confidentiality is the CBMIR system, resulting in more accurate retrieval of relevant
maintained during image retrieval processes. A modular design allows images. By training on augmented datasets generated by GANs, the
the framework to be deployed across various healthcare settings, from feature extraction network can learn more robust and generalizable
small clinics to large hospitals, by scaling according to the available features, which are critical for effective image retrieval. Medical image
data volume and computational resources. Implementing the frame- datasets often suffer from class imbalance, where certain conditions or
work as a cloud-based service can facilitate remote diagnostics and anomalies are underrepresented [83]. Generative AI can help balance
collaborations across institutions, making specialized medical expertise these datasets by generating additional images for the minority classes.
more accessible. This ensures that the model is exposed to a more balanced dataset dur-
The proposed CBMIR method, leveraging deep learning and hashing ing training, which can improve its performance on underrepresented
techniques, is adaptable to various medical imaging modalities beyond classes and lead to more equitable and reliable retrieval results.
the initially mentioned datasets. The versatility of CNNs for feature One of the significant challenges in medical imaging is the high
extraction, combined with the efficiency of hash coding for image cost and time required for expert annotations [84]. Generative AI can
representation, allows this approach to be effectively applied to a wide alleviate this by generating labeled synthetic images, thus reducing the
range of medical images, including X-rays, MRI, CT scans, PET scans, dependency on manually annotated data. This can significantly lower
and ultrasounds. Key to this adaptability is the ability to fine-tune the operational costs and speed up the development and deployment
pre-trained models to specific medical domains and adjust hash code of CBMIR systems. Generative AI can be used to refine the retrieval
lengths to balance retrieval precision and computational efficiency. results by generating images that closely resemble the query image.
Tailoring preprocessing steps, model architecture, and loss functions This refinement can enhance the precision and relevance of the re-
to the characteristics of different imaging types and pathologies can trieved images, making the CBMIR system more effective in providing
extend the method’s applicability, making it a robust solution for clinically meaningful results [85]. The ability of GANs to generate
medical image retrieval across diverse datasets. high-fidelity images ensures that the retrieved images maintain the
Network binarization reduces bit-width, saving computation and diagnostic quality necessary for clinical applications.
memory resources, but can lead to accuracy issues. Despite these chal-
lenges, binarization can optimize our CBMIR framework by reducing 7.2. Transfer learning
computational overhead and memory usage. To mitigate accuracy loss,
we can adapt our specialized loss functions (pairwise, quantization, By using transfer learning, our CBMIR system leverages the robust
balanced, and regularization). By optimizing operators within our CNN feature representations acquired from extensive datasets and success-
and Dense blocks and testing on edge devices, we aim to improve fully applies them to the specific goal of retrieving medical images.
real-world performance. Leveraging BiBench, we can systematically This strategy minimizes the necessary training time and resources while
assess the impact of binarization, balancing computational savings and enhancing the model’s performance on the designated medical picture
retrieval accuracy [77]. Generative data-free quantization compresses datasets. We chose to use the ResNet50 architecture because of its
neural networks to low bit-width without accessing actual data, us- advanced residual learning framework, which enables the training of
ing batch normalization (BN) statistics [78]. While effective, it often far deeper networks than what was previously achievable. The model
struggles with accuracy loss. This method emphasizes the importance underwent pre-training using the ImageNet dataset, with over one
of synthetic sample diversity, which previous techniques lack, leading million photos classified into 1000 distinct categories. The extensive
to homogeneous data limited by BN statistics. Our proposed CBMIR and varied dataset allows the ResNet50 model to acquire many feature
framework could benefit from a generative data-free quantization ap- representations. At first, the ResNet50 model is used to extract features.
proach to further optimize network efficiency. We can relax distribution The convolutional basis is responsible for extracting patterns, while the

13
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

fully connected layers are substituted with a customized set specifically 8. Conclusions
designed for our medical datasets. Due to the substantial dissimilarities
between medical images and natural images in ImageNet, we meticu- The proposed study introduces a new binary hashing mechanism
lously choose the layers to be fine-tuned. The objective is to modify to enhance content-based medical image retrieval using deep learning.
the filters learned by ImageNet to capture elements that are more To achieve this, a hybrid Dense block-based feature learning network is
specialized to medical imaging, such as textures and patterns related used, along with a hash learning block that includes a spatial attention
to certain illnesses. Weighted loss functions are used throughout the block. The feature extraction is done using ResNet-50 followed by a
fine-tuning process to address the issue of class imbalance in medical Dense block, and the spatial attention block extracts features at differ-
datasets. This helps prevent the model from developing bias towards ent scales, enabling the fusion of multi-scale feature information via
the dominant class. various regions. To reduce dimensionality, we use the Reconstruction
Independent Component Analysis algorithm, and four loss functions are
7.3. Limitations applied to the query code block to enhance the feature’s distinctive-
ness. The hash codes of the database images are compared with the
compact database codes obtained by binarization during the retrieval
Medical images come from various modalities, such as MRI, CT,
phase, leading to improved retrieval performance. This framework
or X-ray, each with unique characteristics. The proposed method’s
can be adopted in CBMIR systems to retrieve similar medical images
adaptability to these different modalities and its ability to perform
effectively. The performance is measured using mAP and top-k image
consistently across them is not guaranteed and remains an area for
retrieval benchmarks. In future studies, we plan to apply generative
further investigation. While the method is designed to be scalable, prac-
adversarial networks for data augmentation and develop a real-time
tical limitations may arise when dealing with extremely large datasets
system to demonstrate the effectiveness of our proposed method on
typical in medical imaging repositories. Ensuring that the retrieval
various medical imaging databases.
system remains efficient as the dataset grows is a critical concern
that needs to be addressed. The proposed method involves numerous
CRediT authorship contribution statement
hyperparameters that must be finely tuned to balance the different loss
functions effectively. The hyperparameter optimization process can be
time-consuming and may require extensive experimentation to find the Lichao Cui: Writing – review & editing, Writing – original draft,
optimal configuration. Supervision, Software, Project administration, Methodology, Investiga-
tion, Funding acquisition, Formal analysis, Conceptualization. Mingxin
Pre-trained models like ResNet50 offer a robust starting point due
Liu: Writing – review & editing, Writing – original draft, Software,
to their training on large-scale datasets, capturing essential features rel-
Resources, Conceptualization.
evant to many image types. However, the specificity of medical images
can vary significantly from the images in these large-scale datasets,
potentially limiting generalization. We employ transfer learning and Declaration of competing interest
fine-tuning to adapt the pre-trained models specifically to medical
imaging tasks. This process involves adjusting the model weights to The authors declare that they have no known competing finan-
capture medical-specific features better. cial interests or personal relationships that could have appeared to
Medical datasets are often smaller due to privacy concerns or the influence the work reported in this paper.
rarity of certain conditions, which can limit the training data avail-
able for fine-tuning. To mitigate this, we use data augmentation tech- References
niques extensively. This includes rotations, translations, and intensity
variations to artificially expand the dataset artificially, ensuring the [1] Chen J, Frey EC, He Y, Segars WP, Li Y, Du Y. Transmorph: Transformer for
unsupervised medical image registration. Med Image Anal 2022;82:102615.
model encounters a diverse set of training examples. Additionally, we
[2] Van der Velden BH, Kuijf HJ, Gilhuijs KG, Viergever MA. Explainable artificial
use cross-validation to maximize available data and enhance model intelligence (XAI) in deep learning-based medical image analysis. Med Image
robustness. Anal 2022;102470.
[3] Agrawal S, Chowdhary A, Agarwala S, Mayya V, Kamath S S. Content-based
medical image retrieval system for lung diseases using deep CNNs. Int J Inf
7.4. Future directions Technol 2022;1–9.
[4] Li D, Dai D, Chen J, Xia S, Wang G. Ensemble learning framework for image
Future research in CBMIR can explore various directions to enhance retrieval via deep hash ranking. Knowl-Based Syst 2023;260:110128.
[5] Öztürk Ş. Class-driven content-based medical image retrieval using hash codes
the retrieval performance of medical images further. Firstly, incorpo-
of deep features. Biomed Signal Process Control 2021;68:102601.
rating other deep learning techniques, such as generative adversarial [6] Bhakar S, Sinwar D, Pradhan N, Dhaka VS, Cherrez-Ojeda I, Parveen A,
networks (GANs), can improve data augmentation and generate more Hassan MU. Computational intelligence-based disease severity identification: A
realistic medical images. Secondly, incorporating attention mechanisms review of multidisciplinary domains. Diagnostics 2023;13(7):1212.
in CBMIR systems can improve feature selection and enhance retrieval [7] Patrício C, Neves JC, Teixeira LF. Explainable deep learning methods in medical
image classification: A survey. ACM Comput Surv 2023;56(4):1–41.
performance. Thirdly, incorporating transfer learning techniques can
[8] Hassan MU, Zhao X, Sarwar R, Aljohani NR, Hameed IA. SODRet: Instance
help reduce the training time and improve the retrieval performance on retrieval using salient object detection for self-service shopping. Mach Learn Appl
small medical image datasets. Fourthly, exploring the use of reinforce- 2024;15:100523.
ment learning techniques to develop a personalized CBMIR system that [9] Tang H, Chen Y, Wang T, Zhou Y, Zhao L, Gao Q, et al. HTC-Net: A hybrid CNN-
transformer framework for medical image segmentation. Biomed Signal Process
adapts to the user’s preferences can enhance retrieval accuracy. Lastly,
Control 2024;88:105605.
developing a real-time CBMIR system that integrates with clinical [10] Yaqoob I, Hassan MU, Niu D, Zhao X, Hameed IA, Hassan S-U. A novel person
practice can lead to better patient diagnosis and treatment plans. re-identification network to address low-resolution problem in smart city context.
We are exploring the integration of more diverse and specialized ICT Express 2023;9(5):809–14.
pre-trained models that are initially trained on medical image datasets [11] Sarwar R, Teh PS, Sabah F, Nawaz R, Hameed IA, Hassan MU, et al. AGI-
P: A gender identification framework for authorship analysis using customized
to enhance generalization further. Collaborating with medical institu- fine-tuning of multilingual language model. IEEE Access 2024.
tions to access more extensive and diverse datasets can help address [12] Wang J, Zhu H, Wang S-H, Zhang Y-D. A review of deep learning on medical
the data scarcity issue and improve model training and validation. image analysis. Mob Netw Appl 2021;26:351–80.

14
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

[13] Müller P, Kaissis G, Zou C, Rueckert D. Radiological reports improve pre- [42] Armato III SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP,
training for localized imaging tasks on chest X-rays. In: Medical image computing et al. The lung image database consortium (LIDC) and image database resource
and computer assisted intervention–MICCAI 2022: 25th international conference, initiative (IDRI): a completed reference database of lung nodules on CT scans.
Singapore, September 18–22, 2022, proceedings, Part V. Springer; 2022, p. Med Phys 2011;38(2):915–31.
647–57. [43] Raginsky M, Lazebnik S. Locality-sensitive binary codes from shift-invariant
[14] Arabahmadi M, Farahbakhsh R, Rezazadeh J. Deep learning for smart kernels. Adv Neural Inf Process Syst 2009;22.
Healthcare—A survey on brain tumor detection from medical imaging. Sensors [44] Slaney M, Casey M. Locality-sensitive hashing for finding nearest neighbors
2022;22(5):1960. [lecture notes]. IEEE Signal Process Mag 2008;25(2):128–31.
[15] Shariaty F, Orooji M, Velichko EN, Zavjalov SV. Texture appearance model, a [45] Liu H, Wang R, Shan S, Chen X. Deep supervised hashing for fast image
new model-based segmentation paradigm, application on the segmentation of retrieval. In: Proceedings of the IEEE conference on computer vision and pattern
lung nodule in the CT scan of the chest. Comput Biol Med 2022;140:105086. recognition. 2016, p. 2064–72.
[16] Hassan MU, Shohag MSA, Niu D, Shaukat K, Zhang M, Zhao W, et al. [46] Conjeti S, Katouzian A, Kazi A, Mesbah S, Beymer D, Syeda-Mahmood TF, et al.
A framework for the revision of large-scale image retrieval benchmarks. In: Metric hashing forests. Med Image Anal 2016;34:13–29.
Eleventh international conference on digital image processing, vol. 11179. SPIE; [47] Xia R, Pan Y, Lai H, Liu C, Yan S. Supervised hashing for image retrieval
2019, p. 1154–61. via image representation learning. In: Proceedings of the AAAI conference on
[17] Rasoolijaberi M, Babaei M, Riasatian A, Hemati S, Ashrafi P, Gonzalez R, et al. artificial intelligence, vol. 28, no. 1. 2014.
Multi-magnification image search in digital pathology. IEEE J Biomed Health Inf [48] Erin Liong V, Lu J, Wang G, Moulin P, Zhou J. Deep hashing for compact binary
2022;26(9):4611–22. codes learning. In: Proceedings of the IEEE conference on computer vision and
[18] Choe J, Hwang HJ, Seo JB, Lee SM, Yun J, Kim M-J, et al. Content-based image pattern recognition. 2015, p. 2475–83.
retrieval by using deep learning for interstitial lung disease diagnosis with chest [49] Zhu H, Long M, Wang J, Cao Y. Deep hashing network for efficient similarity
CT. Radiology 2022;302(1):187–97. retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol.
[19] Zhang D, Wu X-J, Xu T, Kittler J. Two-stage supervised discrete hashing for 30, no. 1. 2016.
cross-modal retrieval. IEEE Trans Syst Man Cybern Syst 2022;52(11):7014–26. [50] Zhao F, Huang Y, Wang L, Tan T. Deep semantic ranking based hashing for
[20] Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation multi-label image retrieval. In: Proceedings of the IEEE conference on computer
invariant texture classification with local binary patterns. IEEE Trans Pattern vision and pattern recognition. 2015, p. 1556–64.
Anal Mach Intell 2002;24(7):971–87. [51] Zhang R, Lin L, Zhang R, Zuo W, Zhang L. Bit-scalable deep hashing with
[21] Gupta R, Patil H, Mittal A. Robust order-based methods for feature description. regularized similarity learning for image retrieval and person re-identification.
In: 2010 IEEE computer society conference on computer vision and pattern IEEE Trans Image Process 2015;24(12):4766–79.
recognition. IEEE; 2010, p. 334–41. [52] Lai H, Pan Y, Liu Y, Yan S. Simultaneous feature learning and hash coding with
[22] Shan C, Gong S, McOwan PW. Facial expression recognition based on local binary deep neural networks. In: Proceedings of the IEEE conference on computer vision
patterns: A comprehensive study. Image Vis Comput 2009;27(6):803–16. and pattern recognition. 2015, p. 3270–8.
[23] Lu J, Liong VE, Zhou J. Deep hashing for scalable image search. IEEE Trans [53] Conjeti S, Roy AG, Katouzian A, Navab N. Hashing with residual networks for
Image Process 2017;26(5):2352–67. image retrieval. In: Medical image computing and computer assisted intervention-
[24] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale
MICCAI 2017: 20th international conference. Springer; 2017, p. 541–9.
image recognition. 2014, arXiv preprint arXiv:1409.1556.
[54] Li W-J, Wang S, Kang W-C. Feature learning based deep supervised hashing with
[25] Liu P, Guo J-M, Wu C-Y, Cai D. Fusion of deep learning and compressed
pairwise labels. 2015, arXiv preprint arXiv:1511.03855.
domain features for content-based image retrieval. IEEE Trans Image Process
[55] Zhuang B, Lin G, Shen C, Reid I. Fast training of triplet-based deep binary
2017;26(12):5706–17.
embedding networks. In: Proceedings of the IEEE conference on computer vision
[26] Abbas SK, Khan MUG, Zhu J, Sarwar R, Aljohani NR, Hameed IA, et al. Vision
and pattern recognition. 2016, p. 5955–64.
based intelligent traffic light management system using faster R-CNN. CAAI Trans
[56] Wang X, Shi Y, Kitani KM. Deep supervised hashing with triplet labels. In: Com-
Intell Technol 2024.
puter vision–ACCV 2016: 13th Asian conference on computer vision. Springer;
[27] Hassan MU, Niu D, Zhang M, Zhao X. Asymmetric hashing based on generative
2017, p. 70–84.
adversarial network. Multimedia Tools Appl 2023;82(1):389–405.
[57] Shabir MA, Hassan MU, Yu X, Li J. Tyre defect detection based on GLCM and
[28] Indumathi V, Siva R. An efficient lung disease classification from X-ray im-
gabor filter. In: 2019 22nd international multitopic conference. IEEE; 2019, p.
ages using hybrid mask-RCNN and BiDLSTM. Biomed Signal Process Control
1–6.
2023;81:104340.
[58] Hassan MU, Alaliyat S, Sarwar R, Nawaz R, Hameed IA. Leveraging deep learning
[29] Karthik K, Kamath SS. A deep neural network model for content-based medical
and big data to enhance computing curriculum for industry-relevant skills: A
image retrieval with multi-view classification. Vis Comput 2021;37(7):1837–50.
Norwegian case study. Heliyon 2023;9(4).
[30] Wang X, Lee F, Chen Q. Similarity-preserving hashing based on deep neu-
[59] Sun Y, Cheng C, Zhang Y, Zhang C, Zheng L, Wang Z, et al. Circle loss: A unified
ral networks for large-scale image retrieval. J Vis Commun Image Represent
perspective of pair similarity optimization. In: Proceedings of the IEEE/CVF
2019;61:260–71.
[31] Qayyum A, Anwar SM, Awais M, Majid M. Medical image retrieval using deep conference on computer vision and pattern recognition. 2020, p. 6398–407.
convolutional neural network. Neurocomputing 2017;266:8–20. [60] Zhu X, Cheng D, Zhang Z, Lin S, Dai J. An empirical study of spatial attention
[32] Özbay E, Özbay FA. Interpretable features fusion with precision MRI images mechanisms in deep networks. In: Proceedings of the IEEE/CVF international
deep hashing for brain tumor detection. Comput Methods Programs Biomed conference on computer vision. 2019, p. 6688–97.
2023;107387. [61] Hassan MU, Alaliyat S, Hameed IA. Image generation models from scene
[33] Weiss Y, Torralba A, Fergus R. Spectral hashing. Adv Neural Inf Process Syst graphs and layouts: A comparative analysis. J King Saud Univ-Comput Inf Sci
2008;21. 2023;35(5):101543.
[34] Dong Q, Wang Z, Gao J, Chen S, Shu Z, Xin S. Laplacian2Mesh: Laplacian-based [62] Li L, Xu M, Wang X, Jiang L, Liu H. Attention based glaucoma detection: a
mesh understanding. 2022, arXiv preprint arXiv:2202.00307. large-scale database and cnn model. In: Proceedings of the IEEE/CVF conference
[35] Jiang M, Zhang S, Li H, Metaxas DN. Computer-aided diagnosis of mam- on computer vision and pattern recognition. 2019, p. 10571–80.
mographic masses using scalable image retrieval. IEEE Trans Biomed Eng [63] Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, et al. Attention
2014;62(2):783–92. gated networks: Learning to leverage salient regions in medical images. Med
[36] Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation Image Anal 2019;53:197–207.
of the spatial envelope. Int J Comput Vis 2001;42:145–75. [64] Zhong A, Li X, Wu D, Ren H, Kim K, Kim Y, et al. Deep metric learning-
[37] Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput based image retrieval system for chest radiograph and its clinical applications
Vis 2004;60:91–110. in COVID-19. Med Image Anal 2021;70:101993.
[38] Fang J, Xu Y, Zhang X, Hu Y, Liu J. Attention-based saliency hashing for oph- [65] Ahmed A. Implementing relevance feedback for content-based medical image
thalmic image retrieval. In: 2020 IEEE international conference on bioinformatics retrieval. IEEE Access 2020;8:79969–76.
and biomedicine. IEEE; 2020, p. 990–5. [66] Wang X, Du Y, Yang S, Zhang J, Wang M, Zhang J, et al. RetCCL: clustering-
[39] Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: Hospital-scale guided contrastive learning for whole-slide image retrieval. Med Image Anal
chest X-ray database and benchmarks on weakly-supervised classification and 2023;83:102645.
localization of common thorax diseases. In: Proceedings of the IEEE conference [67] Zhu Y, Hu X, Zhang Y, Li P. Transfer learning with stacked reconstruction
on computer vision and pattern recognition. 2017, p. 2097–106. independent component analysis. Knowl-Based Syst 2018;152:100–6.
[40] A large-scale CT and PET/CT dataset for lung cancer diagnosis (lung-PET-CT-dx) [68] Liu L, Yu M, Shao L. Multiview alignment hashing for efficient image search.
- the cancer imaging archive (TCIA) public access - cancer imaging archive wiki IEEE Trans Image Process 2015;24(3):956–66.
— wiki.cancerimagingarchive.net. 2023, https://wiki.cancerimagingarchive.net/ [69] Jiang Q-Y, Li W-J. Asymmetric deep supervised hashing. In: Proceedings of the
pages/viewpage.action?pageId=70224216. [Accessed 12 February 2023]. AAAI conference on artificial intelligence, vol. 32, no. 1. 2018.
[41] Johnson AE, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C-y, et [70] Gong Y, Lazebnik S, Gordo A, Perronnin F. Iterative quantization: A procrustean
al. MIMIC-CXR, a de-identified publicly available database of chest radiographs approach to learning binary codes for large-scale image retrieval. IEEE Trans
with free-text reports. Sci Data 2019;6(1):317. Pattern Anal Mach Intell 2012;35(12):2916–29.

15
L. Cui and M. Liu Egyptian Informatics Journal 27 (2024) 100499

[71] Zhang Z, Zou Q, Lin Y, Chen L, Wang S. Improved deep hashing with [79] Moon S, Cho S, Kim D. Feature unlearning for pre-trained gans and vaes. In:
soft pairwise similarity for multi-label image retrieval. IEEE Trans Multimed Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 19.
2019;22(2):540–53. 2024, p. 21420–8.
[72] Zheng X, Zhang Y, Lu X. Deep balanced discrete hashing for image retrieval. [80] Celard P, Iglesias EL, Sorribes-Fdez JM, Romero R, Vieira AS, Borrajo L. A
Neurocomputing 2020;403:224–36. survey on deep learning applied to medical images: from simple artificial neural
[73] Buchholz S, Besserve M, Schölkopf B. Function classes for identifiable nonlinear networks to generative models. Neural Comput Appl 2023;35(3):2291–323.
independent component analysis. Adv Neural Inf Process Syst 2022;35:16946–61. [81] Paproki A, Salvado O, Fookes C. Synthetic data for deep learning in computer
[74] Tang S-T, Tjia V, Noga T, Febri J, Lien C-Y, Chu W-C, et al. Creating a vision & medical imaging: A means to reduce data bias. ACM Comput Surv 2024.
medical imaging workflow based on FHIR, DICOMweb, and SVG. J Digit Imaging [82] Lang O, Yaya-Stupp D, Traynis I, Cole-Lewis H, Bennett CR, Lyles CR, et al. Using
2023;1–10. generative AI to investigate medical imagery models and datasets. EBioMedicine
[75] Lyu K, Tian Y, Shang Y, Zhou T, Yang Z, Liu Q, et al. Causal knowledge graph 2024;102.
construction and evaluation for clinical decision support of diabetic nephropathy. [83] Tajbakhsh N, Jeyaseelan L, Li Q, Chiang JN, Wu Z, Ding X. Embracing imperfect
J Biomed Inform 2023;139:104298. datasets: A review of deep learning solutions for medical image segmentation.
[76] Joshi S. HIPAA, HIPAA, hooray?: Current challenges and initiatives in health Med Image Anal 2020;63:101693.
informatics in the United States. Biomed Inform Insights 2008;1:BII–S2007. [84] Wang S, Li C, Wang R, Liu Z, Wang M, Tan H, et al. Annotation-
[77] Qin H, Zhang M, Ding Y, Li A, Cai Z, Liu Z, et al. Bibench: Benchmarking efficient deep learning for automatic medical image segmentation. Nat Commun
and analyzing network binarization. In: International conference on machine 2021;12(1):5915.
learning. PMLR; 2023, p. 28351–88. [85] Choe J, Choi HY, Lee SM, Oh SY, Hwang HJ, Kim N, et al. Evaluation of retrieval
[78] Qin H, Ding Y, Zhang X, Wang J, Liu X, Lu J. Diverse sample generation: Pushing accuracy and visual similarity in content-based image retrieval of chest CT for
the limit of generative data-free quantization. IEEE Trans Pattern Anal Mach obstructive lung disease. Sci Rep 2024;14(1):4587.
Intell 2023.