Deep Ensemble Architectures With Heterogeneous Approach For An Efficient Content-Based Image Retrieval
Deep Ensemble Architectures With Heterogeneous Approach For An Efficient Content-Based Image Retrieval
Corresponding Author:
Manimegalai A.
Department of Computer Science and Engineering, East Point College of Engineering and Technology
Bengaluru, India
Email: [email protected]
1. INTRODUCTION
Every day, a significant amount of images, amounting to terabytes of data, are transmitted and
stored on the internet. The inherent continuity of this process enables the formation of a substantial collection
of images. The task of identifying relevant images from a vast collection poses a significant challenge,
thereby generating prospects for exploring novel opportunities in multimedia research. There exist two
primary approaches for retrieving images based on language and content: Text-based image retrieval (TBIR)
and content-based image retrieval (CBIR) [1]. The effectiveness of TBIR relies on the textual information,
known as metadata that is associated with the image. Textual data can be generated using a variety of
techniques or manually inputted into a system. TBIR encounters two significant challenges within the
domain of image annotation: The manual annotation process necessitates a significant investment of time and
effort. It is crucial to recognize that the interpretation of annotated data may demonstrate variability among
various individuals [2].
The development of CBIR was driven by the need to address the inherent limitations of TBIR.
CBIR is a methodology that utilizes visual information to enhance the process of retrieving images. CBIR is
a methodology that employs various attributes, such as shape, color, and texture, to effectively classify
images based on their inherent visual properties[3]. CBIR is a well-established field of study that continues to
be actively researched. The phenomenon can be attributed to several factors, including the substantial
increase in image datasets, the diverse range of usage scenarios, and the numerous applications associated
with CBIR. In contemporary times, a multitude of search engines are employed to facilitate the storage and
retrieval of extensive collections of images from the internet. These collections can reach sizes of terabytes
and are accessed daily. CBIR is a specialized domain that encompasses a range of applications, which can be
classified into three main categories: association search, image search, and category search [4].
The fundamental principle underlying CBIR revolves around the notion of an image's contents,
specifically denoting its distinct characteristics. The process consists of three primary phases: representation,
extraction, and feature selection. The primary objective of a content-based retrieval system is to efficiently
distinguish and segregate the unique visual attributes that serve as defining characteristics for various forms of
media, such as images, videos, and audio files [5]. The procedure of CBIR encompasses various
characteristics, such as type, form, texture, and key point descriptors. The attributes of the picture dataset play
a critical role in determining the feature selection process. Various color models are employed to extract color
attributes. The previously mentioned models offer distinct methodologies for perceiving and representing
colors, each designed to suit particular circumstances and applications. The assessment of texture within an
image is of utmost importance for evaluating its material properties and overall visual representation. The
process involves arranging components in various spatial positions relative to each other [6].
The concept of spatial texture organization refers to the arrangement of texture attributes within an
image. The information provided presents valuable insights regarding various characteristics, including
directionality, smoothness, coarseness, regularity, and uniformity. The utilization of shape attributes offers
benefits in scenarios where objects possess distinct and identifiable structures, such as traffic signs, company
names, and logos. Accurate extraction and representation of shape information play a crucial role in the
successful implementation of CBIR applications. The effective management of images with clearly defined
forms is of utmost importance. In the field of CBIR research, there has been a shift in focus among
researchers towards the integration of multiple low-level features as a means to enhance system performance.
In the domain of CBIR systems, it has been observed that the integration of multiple characteristics has
demonstrated better efficacy compared to the sole reliance on individual features [7].
The identification of identical or similar photographs in response to a specific query image has
become more challenging due to the significant growth in the number of images accessible on the internet.
The utilization of manual feature extraction techniques significantly increases the complexity of this task.
Deep learning algorithms are widely acknowledged as a practical and effective method for addressing this
specific problem. In recent years, there has been an observed shift towards the adoption of learning-based
techniques, specifically deep learning methods, instead of manual feature extraction and representation
methods. These methods facilitate the automated extraction of abstract features from the data [8].
Several design options were presented to effectively accommodate the specific data type being
processed. Convolutional neural networks (CNNs) are frequently employed for image data processing, while
artificial neural networks (ANNs) have demonstrated their efficacy in handling one-dimensional data [9]. The
application of recurrent neural networks (RNNs) [10] offers numerous benefits in the examination of
time-series data. The incorporation of various advanced methodologies has facilitated the development of
deep learning algorithms utilized in image retrieval. The learning paradigms discussed in this context are
specifically related to network-based learning. This approach employs a diverse range of architectures, such
as neural networks, convolutional networks, artificial networks, attention networks, Siamese networks, and
triplet networks. Furthermore, the topic at hand encompasses various learning approaches, including
supervised learning, unsupervised learning, semi-supervised learning, and self-supervised learning [11].
The performance of CBIR systems is undeniably influenced by the quality of images stored in the
database. Performance degradation in CBIR systems can occur as a result of various factors, such as the
presence of noise, low visibility, and insufficient texture within images. Several factors can inhibit the
retrieval of relevant images that correspond to the user's query. The challenges arise from the distortion or
loss of crucial visual data, which obstructs the accurate evaluation and comparison of images using CBIR
techniques [12]. CBIR systems frequently encounter difficulties in achieving precise query-image matching,
leading to suboptimal performance. In addition to assessing the image quality, the storage of CBIR data
involves additional complexities. The implementation of effective techniques for managing image data is
essential to optimize system resources and achieve rapid retrieval times. When dealing with a large number
of images, it is essential to give careful thought to storage architectures, indexing techniques, and retrieval
algorithms. The choice of a data storage strategy has a direct impact on scalability, resource utilization, and
retrieval speed. The effectiveness and precision of image retrieval rely significantly on the preservation,
categorization, and organization of these images. Achieving an optimal balance between retrieval
performance and storage efficiency is crucial for effectively managing various picture sizes, types, and
feature representations [13].
To effectively address these challenges, it is vital to implement a comprehensive approach. The
recommended strategy should integrate advancements in feature extraction, image enhancement, and storage
technologies. At present, there is active research and practical implementation underway to enhance the
performance and reliability of these systems. The main goal of their research is to develop innovative
methodologies for improving image quality, mitigating visibility issues, and optimizing data storage in CBIR
systems [14].
The exponential growth in the volume of images uploaded to the internet daily underscores the
importance of efficient and accurate image retrieval systems. The field of CBIR is particularly crucial in this
context, as it leverages the visual content of images-such as color, shape, and texture-to facilitate the retrieval
process. CBIR's significance is amplified by its ability to directly analyze the visual information, avoiding the
need for manual annotation. This approach is not only more aligned with how humans perceive images but
also crucial for handling the sheer scale of data. With the advent of deep learning techniques, particularly
CNN, CBIR systems have seen substantial improvements in identifying and classifying complex patterns
within images. The motivation for continued research in CBIR is clear: to keep pace with the relentless
growth of image databases and to meet the demands of diverse applications that rely on quick, accurate
image retrieval—be it for digital libraries, medical diagnostics, or multimedia systems. The pursuit of more
refined CBIR systems is not just an academic interest but a necessity for the infrastructure of an increasingly
digital world. The contribution is mentioned here.
− Advanced asymmetric retrieval model: HybridEnsembleNet technique is designed that implement a
lightweight query structure that optimizes performance under resource constraints, enhancing retrieval
accuracy in CBIR systems.
− Deep learning integration: Utilizes deep neural networks for refined feature extraction, significantly
improving pattern recognition and accuracy in image retrieval.
− Efficient feature embedding strategy: Employs aligned embedding spaces for query and image sets,
streamlining the image comparison process for faster and more precise CBIR results.
The research organization is carried out in this paper in four sections: the first section depicts a brief
overview of CBIR. The second section discusses the related work, and in the third section, the proposed
methodology is designed. In the fourth section, the performance evaluation is carried out where the results
are displayed in graphs and tables.
2. RELATED WORK
The abundance of image formats available on the internet makes it difficult to identify a particular
visual item from a vast database. The retrieval of similar images based on different contents of query images
is a technique that is utilized in various domains. These domains include digitally acquainted libraries, crime
prevention, fingerprint identification, information systems of biodiversity, medicine, and historical place
research. CBIR is a distinct approach to image retrieval that diverges from keyword-based methods by
prioritizing the analysis of visual attributes within images instead of relying solely on predetermined
keywords. CBIR is a technique that leverages visual elements, including color, form, and texture, to address
the challenge of identifying visual entities [15].
Computer vision encompasses a wide range of applications, among them is CBIR. CBIR is a
process that focuses on the retrieval of images from a database that contains a vast number of images. The
objective of this study is to examine the practical application of a two-stage process for the retrieval of
images based on their content. CNNs are utilized for image detection during the initial phase. The CNN
demonstrates the ability to process an image efficiently within a single pass. The system can identify and
classify multiple objects present in the image, where each object is assigned to a specific class. The problem
of detecting multiple classes is resolved by employing a CNN. During the second phase of the process, the
acquisition of relevant images takes place after the execution of object detection. The achievement of
assigning priority to images within the same class is made possible through the utilization of a relevance
ranking system [16].
The cloud classification system categorizes clouds into three distinct levels: high, middle, and low.
The cloud categorization process employs the use of CBIR and k-means clustering algorithms. The
developed approach classifies clouds into three distinct categories: low, medium, and high. The precipitation
Deep ensemble architectures with heterogeneous approach for an efficient … (Manimegalai A.)
4846 ISSN:2252-8938
amount is significantly influenced by the type of cloud [17]. The effect of high resolutions on search
precision and result organization is not well-established and can exhibit variability. The primary aim of this
study is to examine the influence of picture resolution on both search accuracy and result sorting. It is
strongly recommended to resize images before adding them to the image database, especially if resizing has
any effect on the images.
The process involves the identification of visual characteristics, the correlation of these features
based on their effects, and the assessment of the impact of these factors on retrieval. Low-level visual
features are identified to target specific perceptual components of visual data, in addition to encompassing
high-level characteristics that facilitate image retrieval approaches. The primary goal of this study is to
analyze the various components involved in improving the efficiency of CBIR search results [18].
A novel CBIR model that aims to efficiently retrieve images by utilizing query pictures. The
proposed model employs an Adadelta-optimized residual network to improve the retrieval process. The
proposed model employs a feature extractor obtained from ResNet 50 to extract a suitable set of features.
Additionally, the Adadelta optimizer is utilized to effectively optimize the hyperparameters of the ResNet-50
model, resulting in enhanced retrieval performance.
The theoretical foundations and practical implementations of a CBIR system demonstrate high
effectiveness. The authors provide an in-depth analysis of the concepts and discuss the real-world
applications of this system. The essential components of the system include its characteristics related to
colors, textures, and forms. The multilayer searching capability is achieved through the implementation of
three subsequent searching processes. The proposed systems (PS) differ significantly from previous methods
as they integrate all features concurrently for the single-level search of a typical CBIR system. The PS utilize
a sequential approach, wherein each feature is evaluated independently. The output of one step is then used
as the input for the next step, following a hierarchical pattern [19].
CBIR is a technique that distinguishes itself from keyword-based image retrieval by prioritizing the
analysis of visual contents and attributes of images, including color, form, and texture. In contrast to
keyword-based retrieval, which depends on explicit image descriptions, CBIR utilizes visual features to tackle
the mentioned problem. The objective of this paper is to present a novel approach to picture retrieval by
utilizing a hybrid feature combination technique. The technique employs the color histogram method for
extracting color features and subsequently producing the color gradient. The Gabor wavelet method is utilized
to extract both outer and inner edges. By implementing the aforementioned techniques, a feature vector will be
generated as a result. This feature vector can then be utilized to retrieve visually similar images [20].
3. PROPOSED METHODOLOGY
Considering the disadvantage of traditional deep learning of heavy architecture, this research work
develops HybridEnsembleNet. That combines the ensemble architecture of various deep learning models and
heterogeneous retrieval approaches that make it efficient for higher effectiveness. Moreover, the ensemble
network comprises the various deep learning model (deep local and global features [21], [22]).
involves computing the similarities between the two embeddings and comparing them to the centroids to
determine the structural similarities. Finally, this approach aims to enhance the query set by restricting the
level of consistency between two structural similarities. After undergoing training, the embedding space of
both the query set and the image set data points exhibit a high degree of alignment due to their shared nature.
Figure 1 shows the proposed architecture.
I k = ϑi (. ) ∈ T f ,k = 1,2, … … . P (1)
k k k
I1k , … … … , If∗ ,…………,If−f ∗ +1 , … . , If , (2)
w1 (I k )wO (I k )
Here ILk represents the l − th feature dimension ofik , f ∗ = f/O and f is a multiple of O. When
clustering is performed on each set [wl (I1 ); wl (I 2 ); … … . ; wl (I P )] ∈ T P∗f∗ , l = 1,2, … … , O, specifically to
obtain E l ∈ T M∗f∗ , where M is the number of centroids. The data points in the gallery space are defined as the
multiple defined as the Cartesian product. Within any centroid vector that is formed by integrating O
different sub-centroid vectors. In comparison with the clustering, the quantization has distinct advantages. It
is easy to generate a large number of data points E. The total number of data points directly, to store O ∗
Msub centroids, while training the adoption of splitting mechanism to evaluate the similarity by segments,
instead of directly computing the similarities between feature vectors and the data points to reduce the
overhead training.
O ∗f
E = E1 *E 2 *………*E O ∈ T M (3)
Deep ensemble architectures with heterogeneous approach for an efficient … (Manimegalai A.)
4848 ISSN:2252-8938
vectors in the feature space compression function before training as given in (5). Table 1 shows the data
points generation.
Here Elk represents the k − th centroid vector within the l − th sub-space and u(. , . ) is considered as
the similarity metric. This is formulated as given in (6). By evaluating the constraints γe for the structure
similarities as s and i in the embedding space of the image set. The data points are shared between the query
and image set; their embedding space is aligned properly.
μi is the temperature value used to control the sharpness assignment. The probability distribution corresponding
to the k − th vector of the query feature s is assigned as given in (8). The similarity constraint between the two
parameters for the probabilities on the same sub-centroid vectors is denoted as mentioned in (9). This consists of
the cross-entropy of rki and rks and the loss is encountered by rki . this is independent from the feature query set
which does not affect the training. The final objective is defined as the sum of all the consistency in the loss
adjacent to the O sub-vectors as mentioned in (10). Table 2 shows the query model training.
U s U s
exp( k,1 ) exp( k,M )
μs μs
rks = [ Uk,n s ,………., Uk,n s ] (8)
∑M
n=1 exp( ) ∑M
n=1 exp( )
μs μs
rik,n
γ kkll = KLL(rki ||rks ) = ∑M i
n=1 rk,n log (9)
rsk,n
γloss = ∑O k
k=1 γ KLL (10)
The centroids of the quantizing function serve as the data points in the embedded space of the image
set. Upon quantizing the feature, the conversion of feature regression into an assignment task. The
temperature set is μi = 0, the probability Uki shown in (7) is one vector with only one single index at 1 index
i
shown as i = arg max l (Uk,l ). This is further simplified as given in (11).
i Uik,n 1
γ kkll = ∑M
n=1 Uk,n log = log (11)
Usk,n Usk,l
The query set is encouraged to optimize the loss to degenerate the data point, which is the quantized
feature i. The degeneration of the details feature of the image set by the query set is prevented by this. However,
neglecting the discriminatory information conveyed by the associations between the feature vector and data
points leads to poorer performance. To achieve the desired outcome, utilize soft assignments as the prediction
target, specifically by setting μi to a value greater than zero. The overall learning process is summarized in
Table 2.
4. PERFORMANCE EVALUATION
The evaluation of HybridEnsembleNet emphasizes its efficacy in enhancing image retrieval
scenarios through its unique combination of ensemble architecture and heterogeneous modules. The
assessment involves measuring the model's performance against relevant benchmarks, highlighting
improvements in retrieval accuracy and efficiency. Additionally, a thorough analysis of
HybridEnsembleNet's capabilities, such as its ability to capture both local and global features, provides
insights into its effectiveness in addressing the limitations of traditional deep learning architectures for image
retrieval.
4.2. Results
Mean average precision (mAP) is the mean of the average precision (AP) scores for each query. In
scenarios where there are multiple queries or test samples (like different objects to be detected in object
detection or multiple search queries in a retrieval system), AP is calculated for each query separately, and
mAP is the mean of these AP scores. mAP is a highly important metric because it considers both the
precision (how many retrieved items are relevant) and the recall (how many relevant items are retrieved)
across all queries. It gives a single-figure measure of quality across recall levels, making it particularly useful
for evaluating systems where the retrieval of all relevant documents is important.
Deep ensemble architectures with heterogeneous approach for an efficient … (Manimegalai A.)
4850 ISSN:2252-8938
global featuresglobal (DELG), deep local and global features based α-weighted query expansion (DELG
global+αQE), deep local and global features based geometric verification (DELG global+GV), deep local
and global features based reranking transformer (DELG global+RRT), deep local and global features based
graph convolution based re-ranking (DELG global+GCR). Performance scores range from 65.3 for DSM to
88.96 for PS, suggesting a progression in effectiveness or an improvement in the techniques used. DELG
global and its extensions show a consistent performance in the 70s, indicating a solid baseline. Notably, the
entry existing system-based graph convolution-based re-ranking (ES+GCR) scores 84.3, showing an
improvisation on ES at 79.3, The increment suggests that GCR provides a substantial improvement. The
highest score of 88.96, PS, stands out, possibly indicating a particularly effective method that significantly
outperforms others in this benchmark. Figure 2 shows the comparison on ROxford (Medium).
Table 4 displays a set of methodological approaches evaluated on the ROxford (Hard) benchmark.
The performance scores indicate how well each method copes with more challenging conditions. The scores
span from a low of 31.2 for GRAP-CDto a high of 76.98 for PS, suggesting a wide range of effectiveness
among the methods. Notably, traditional methods like DSM and IRT are on the lower end of the performance
spectrum, while TBR scores a relatively high 66.6, indicating its robustness. The DELG global method and
its variations show moderate performance, with scores generally in the 50s and low 60s, but with the GCR
enhancement, it reaches 63.1. ES stands at 62.8, and its enhanced version with GCR significantly
outperforms the basic version at 69.7, demonstrating the value of the GCR enhancement. The PS method
outshines the others with a notable score of 76.98, which could signify a breakthrough or a particularly
advanced approach to the ROxford (Hard) benchmark conditions. Figure 3 shows the comparison on
ROxford (Hard).
ROxford (Medium)
100
90
80
70
60
mAP
50
40
30
20
10
0
Methodology
ROxford (Hard)
90
80
70
60
mAP
50
40
30
20
10
0
Method
4.2.2. RParis
Table 5 showcases various methods and their corresponding performance scores on the RParis
(Medium) benchmark. The spectrum of scores is broad, with DSM at the lower end at 77.4, reflecting a base
level of effectiveness, and the highest score achieved by PS at 94.67, indicating superior performance.
Middle-tier scores are occupied by methods such as IRT, HOW+ASMK, and GRAP-CD, which fall between
80.1 and 81.6, suggesting moderate effectiveness. Notable high performers include DOLG and DALG, which
are close competitors with scores of 91 and 90, respectively. The DELG global method, along with its
variations, demonstrates strong performance, particularly with the GCR enhancement, which achieves a score
of 89.2. The ES method scores 84.4, but with the addition of GCR, it significantly outperforms its
unenhanced counterpart with a score of 91.9. This data underscores the impact of enhancements like GCR on
method performance and highlights the PS method as potentially embodying a more advanced or efficient
approach. Figure 4 shows the comparison analysis on RParis (Medium).
Table 6 and Figure 5 offer a performance evaluation of various computer vision methods on the
ROxford (Hard) dataset. Scores range significantly, highlighting the varied effectiveness of these methods
under challenging conditions. The lowest score, at 31.2 by GRAP-CD, suggests some methods may struggle
with the dataset's complexity. In contrast, the highest score, at 76.98 by PS, indicates a notably robust
approach. DELG global and its enhancements display a progressive improvement, especially with the GCR
addition, which scores 63.1. Another key observation is the performance jump from ES at 62.8 toES+ GCR
at 69.7, confirming the substantial benefit of the GCR enhancement. The scores of TBR and MDA, at 66.6
and 62.2, respectively, denote methods that are more effective than the baseline but not as high as the leading
scores. Overall, this table reflects the performance of the diverse methods in image retrieval tasks, with
particular enhancements offering significant improvements.
Deep ensemble architectures with heterogeneous approach for an efficient … (Manimegalai A.)
4852 ISSN:2252-8938
RParis (Medium)
100
95
90
mAP
85
80
75
Method
RParis (Hard)
90
80
70
60
mAP 50
40
30
20
10
0
Method
5. CONCLUSION
This research work presents HybridEnsembleNet methodology as a significant leap forward in the
field of CBIR. By seamlessly integrating deep learning with an asymmetric retrieval model,
HybridEnsembleNet addresses the critical challenges of accuracy and computational efficiency in handling
large-scale image datasets. Its innovative approach to feature extraction and embedding not only enhances
the precision of image retrieval but also ensures scalability and speed, crucial for modern digital applications.
The successful implementation of HybridEnsembleNet underscores its potential as a transformative solution
in CBIR, promising to elevate the standards for image search and retrieval technologies. This methodology
not only meets the current demands of diverse CBIR applications but also lays a robust foundation for future
advancements in the field, marking a pivotal moment in the evolution of image retrieval systems. As part of
future work, an important focus will be on further optimizing HybridEnsembleNet to reduce image retrieval
time.
REFERENCES
[1] D. Srivastava, B. Rajitha, and S. Agarwal, “Content-based image retrieval for categorized dataset by aggregating gradient and
texture features,” Neural Computing and Applications, vol. 33, no. 19, pp. 12247–12261, 2021, doi: 10.1007/s00521-020-05614-y.
[2] D. Srivastava, B. Rajitha, S. Agarwal, and S. Singh, “Pattern-based image retrieval using GLCM,” Neural Computing and
Applications, vol. 32, no. 15, pp. 10819–10832, 2018, doi: 10.1007/s00521-018-3611-1.
[3] S. Chauhan, R. Prasad, P. Saurabh, and P. Mewada, “Dominant and LBP-based content image retrieval using combination of
color, shape and texture features,” in Advances in Intelligent Systems and Computing, Springer Singapore, 2018, pp. 235–243,
doi: 10.1007/978-981-10-7871-2_23.
[4] H. Kavitha and M. V Sudhamani, “Content-based image retrieval using edge and gradient orientation features of an object in an
image from database,” Journal of Intelligent Systems, vol. 25, no. 3, pp. 441–454, 2016, doi: 10.1515/jisys-2014-0088.
Deep ensemble architectures with heterogeneous approach for an efficient … (Manimegalai A.)
4854 ISSN:2252-8938
[5] M. Dey, B. Raman, and M. Verma, “A novel colour- and texture-based image retrieval technique using multi-resolution local
extrema peak valley pattern and RGB colour histogram,” Pattern Analysis and Applications, vol. 19, no. 4, pp. 1159–1179, Nov.
2016, doi: 10.1007/s10044-015-0522-y.
[6] M. Verma and B. Raman, “Local neighborhood difference pattern: A new feature descriptor for natural and texture image
retrieval,” Multimedia Tools and Applications, vol. 77, no. 10, pp. 11843–11866, 2017, doi: 10.1007/s11042-017-4834-3.
[7] M. Marinov, “Comparative analysis on different degrees of JPEG compression used in CBIR systems,” 2020 XI National
Conference with International Participation (ELECTRONICA), Sofia, Bulgaria, 2020, pp. 1-4, doi:
10.1109/ELECTRONICA50406.2020.9305154.
[8] T. L. D. Likhitha, M. Noushika, V. S. Deepika, and V. M. Manikandan, “A detailed review on CBIR and its importance in current
era,” in 2021 International Conference on Data Science and Its Applications (ICoDSA), IEEE, Oct. 2021, pp. 124–128, doi:
10.1109/ICoDSA53588.2021.9617481.
[9] Y. Xu, Q. Lin, J. Huang, and Y. Fang, “An improved ensemble-learning-based CBIR algorithm,” 2020 Cross Strait Radio Science
& Wireless Technology Conference (CSRSWTC), Fuzhou, China, 2020, pp. 1-3, doi: 10.1109/CSRSWTC50769.2020.9372466.
[10] T. Sutojo, P. S. Tirajani, D. R. I. M. Setiadi, C. A. Sari, and E. H. Rachmawanto, “CBIR for classification of cow types using
GLCM and color features extraction,” 2017 2nd International conferences on Information Technology, Information Systems and
Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 2017, pp. 182-187, doi: 10.1109/ICITISEE.2017.8285491.
[11] M. A. Aboali, I. Elmaddah, and H. E. Abdelmunim, “Neural textual features composition for CBIR,” IEEE Access, vol. 11, pp.
28506–28521, 2023, doi: 10.1109/ACCESS.2023.3259737.
[12] G. V. R. M. Kumar and D. Madhavi, “Stacked siamese neural network (SSiNN) on neural codes for content-based image
retrieval,” IEEE Access, vol. 11, pp. 77452–77463, 2023, doi: 10.1109/ACCESS.2023.3298216.
[13] Z. Zhang, W. Lu, X. Feng, J. Cao, and G. Xie, “A discriminative feature learning approach with distinguishable distance metrics
for remote sensing image classification and retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, vol. 16, pp. 889–901, 2023, doi: 10.1109/jstars.2022.3233032.
[14] J. Pradhan, C. Bhaya, A. K. Pal, and A. Dhuriya, “Content-based image retrieval using DNA transcription and translation,” IEEE
Transactions on NanoBioscience, vol. 22, no. 1, pp. 128–142, 2023, doi: 10.1109/tnb.2022.3169701.
[15] D. Srivastava, S. S. Singh, B. Rajitha, M. Verma, M. Kaur, and H.-N. Lee, “Content-based image retrieval: a survey on local and
global features selection, extraction, representation, and evaluation parameters,” IEEE Access, vol. 11, pp. 95410–95431, 2023,
doi: 10.1109/access.2023.3308911.
[16] G. Sumbul, J. Xiang, and B. Demir, “Towards Simultaneous image compression and indexing for scalable content-based retrieval in
remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022, doi: 10.1109/tgrs.2022.3204914.
[17] Z. Xia, L. Jiang, D. Liu, L. Lu, and B. Jeon, “BOEW: A content-based image retrieval scheme using bag-of-encrypted-words in
cloud computing,” IEEE Transactions on Services Computing, vol. 15, no. 1, pp. 202–214, 2022, doi: 10.1109/tsc.2019.2927215.
[18] J. Madake, R. Agrawal, V. Pawar, and S. Bhatlawande, “A content based image retrieval system for biodiversity system,” 2023 4th
IEEE Global Conference for Advancement in Technology (GCAT), 2023, pp. 1-6, doi: 10.1109/GCAT59970.2023.10353428.
[19] Y. Mahajan, P. Batta, M. Sharma, and D. Saxena, “A model for content-based image retrieval using machine learning,” 2023 1st
International Conference on Circuits, Power and Intelligent Systems (CCPIS), 2023, pp. 1-6, doi:
10.1109/CCPIS59145.2023.10291361.
[20] J. S. S. Kumar and S. M. C. Vigila, “A review on content based image retrieval techniques,” in Proceedings of the International
Conference on Circuit Power and Computing Technologies, ICCPCT 2023, Aug. 2023, pp. 1251–1256, doi:
10.1109/ICCPCT58313.2023.10245360.
[21] B. Cao, A. Araujo, and J. Sim, “Unifying deep local and global features for image search,” in Computer Vision–ECCV 2020: 16th
European Conference, Cham: Springer International Publishing, 2020, pp. 726–743, doi: 10.1007/978-3-030-58565-5_43.
[22] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,”
Computer Vision – ECCV 2018, Springer International Publishing, pp. 122–138, 2018, doi: 10.1007/978-3-030-01264-9_8.
[23] F. Radenovic, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Revisiting Oxford and Paris: large-scale image retrieval
benchmarking,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5706-5715, 2018, doi:
10.1109/cvpr.2018.00598.
[24] O. Simeoni, Y. Avrithis, and O. Chum, “Local features and visual words emerge in activations,” 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 11651-11660,2019, doi: 10.1109/cvpr.2019.01192.
[25] A. El-Nouby, N. Neverova, I. Laptev, and H. Jégou, “Training vision transformers for image retrieval,” arXiv-Computer
Science,pp. 1-10, 2021, doi: 10.48550/arXiv.2102.05644
[26] G. Tolias, T. Jenicek, and O. Chum, “Learning and aggregating deep local descriptors for instance-level recognition,” Computer
Vision – ECCV 2020, Springer International Publishing, pp. 460–477, 2020, doi: 10.1007/978-3-030-58452-8_27.
[27] M. Yang et al., “DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features,” 2021 IEEE/CVF
International Conference on Computer Vision (ICCV). IEEE, pp. 11772-11781,2021, doi: 10.1109/iccv48922.2021.01156.
[28] Y. Song, R. Zhu, M. Yang, and D. He, “DALG: deep attentive local and global modeling for image retrieval,” arXiv-Computer
Science,pp. 1-14,2022, doi: 10.48550/arXiv.2207.00287.
[29] X. Zhu, H. Wang, P. Liu, Z. Yang, and J. Qian, “Graph-based reasoning attention pooling with curriculum design for content-
based image retrieval,” Image and Vision Computing, vol. 115, 2021, doi: 10.1016/j.imavis.2021.104289.
[30] H. Wu, M. Wang, W. Zhou, and H. Li, “Learning deep local features with multiple dynamic attentions for large-scale image
retrieval,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11416-11425, 2021, doi:
10.1109/iccv48922.2021.01122.
[31] H. Wu, M. Wang, W. Zhou, H. Li, and Q. Tian, “Contextual similarity distillation for asymmetric image retrieval,” 2022
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9489-9498, 2022, doi:
10.1109/cvpr52688.2022.00927.
[32] H. Wu, M. Wang, W. Zhou, Y. Hu, and H. Li, “Learning token-based representation for image retrieval,” Proceedings of the
AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2703–2711, 2022, doi: 10.1609/aaai.v36i3.20173.
[33] F. Radenovic, G. Tolias, and O. Chum, “Fine-tuning CNN image retrieval with no human annotation,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1655–1668, 2019, doi: 10.1109/tpami.2018.2846566.
[34] F. Tan, J. Yuan, and V. Ordonez, “Instance-level image retrieval using reranking transformers,” 2021 IEEE/CVF International
Conference on Computer Vision (ICCV). IEEE, pp. 12105-12115, 2021, doi: 10.1109/iccv48922.2021.01189.
[35] Y. Zhang, Q. Qian, H. Wang, C. Liu, W. Chen, and F. Wang, “Graph convolution based efficient re-ranking for visual retrieval,”
IEEE Transactions on Multimedia, vol. 26, pp. 1089–1101, 2024, doi: 10.1109/tmm.2023.3276167.
BIOGRAPHIES OF AUTHORS
Dr. Josephine Prem Kumar received her B.Tech. degree in Electronics and
Communication Engineering and M.Tech. degree in Computer Science from Regional
Engineering College (now National Institute of Technology), Warangal and Ph.D. in Computer
Science and Engineering from Dr. MGR Educational and Research Institute, Dr. MGR
University, Chennai. After serving ITI Limited, Bangalore for over fifteen years and Infycons
Creative Software Private Ltd., Bangalore for a brief period, she has taken up the teaching
profession. She has worked as Professor in MVJ College of Engineering, Bangalore and East
Point Group of Institutions and is currently working as Professor-CSE in Cambridge Institute
of Technology, Bangalore. She has been guiding Ph.D. students under Visvesvaraya
Technological University. She can be contacted at email: [email protected].
Dr. Nanda Ashwin working as HOD, Department CSE- (IoT & CSBT) East Point
College of Engineering has about 25 years of teaching experience. She received his B.E.
degree in Computer Science and Engineering M. S. in System Software and M.Tech. degree in
Software Engineering with distinction from VTU, Belagavi. She has published 16 research
papers in refereed international journals and 10 research papers in the proceedings of various
international conferences. She has received several best paper awards for his research papers at
various international conferences. Her areas of research include wireless communication, cloud
computing, big data analytics, and data science. She can be contacted at email:
[email protected].
Deep ensemble architectures with heterogeneous approach for an efficient … (Manimegalai A.)