0% found this document useful (0 votes)
86 views22 pages

Conten Based Image Retrieval and Feature Extraxtion (Comperhensive Review)

Multimedia content analysis is applied in different real-world computer vision applications, and digital images constitute a major part of multimedia data. In last few years, the complexity of multimedia contents, especially the images, has grown exponentially, and on daily basis, more than millions of images are uploaded at different archives such as Twitter, Facebook, and Instagram. To search for a relevant image from an archive is a challenging research problem for computer vision research comm

Uploaded by

Dimas Henry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views22 pages

Conten Based Image Retrieval and Feature Extraxtion (Comperhensive Review)

Multimedia content analysis is applied in different real-world computer vision applications, and digital images constitute a major part of multimedia data. In last few years, the complexity of multimedia contents, especially the images, has grown exponentially, and on daily basis, more than millions of images are uploaded at different archives such as Twitter, Facebook, and Instagram. To search for a relevant image from an archive is a challenging research problem for computer vision research comm

Uploaded by

Dimas Henry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Hindawi

Mathematical Problems in Engineering


Volume 2019, Article ID 9658350, 21 pages
https://doi.org/10.1155/2019/9658350

Review Article
Content-Based Image Retrieval and Feature Extraction: A
Comprehensive Review

Afshan Latif,1 Aqsa Rasheed,1 Umer Sajid ,1 Jameel Ahmed,2 Nouman Ali ,1
Naeem Iqbal Ratyal ,3,4 Bushra Zafar ,5,6 Saadat Hanif Dar,1 Muhammad Sajid,3
and Tehmina Khalil 1
1
Department of Software Engineering, Mirpur University of Science and Technology (MUST), Mirpur-10250 (AJK), Pakistan
2
Department of Electrical Engineering, RIPHAH International University, Islamabad 75300, Pakistan
3
Department of Electrical Engineering, Mirpur University of Science and Technology (MUST), Mirpur-10250 (AJK), Pakistan
4
Department of Computer Systems Engineering, Mirpur University of Science and Technology (MUST),
Mirpur-10250 (AJK), Pakistan
5
Department of Computer Science, Government College University, Faisalabad 38000, Pakistan
6
Department of Computer Science, National Textile University, Faisalabad 38000, Pakistan

Correspondence should be addressed to Nouman Ali; [email protected]

Received 10 April 2019; Revised 20 July 2019; Accepted 24 July 2019; Published 26 August 2019

Academic Editor: Marek Lefik

Copyright © 2019 Afshan Latif et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Multimedia content analysis is applied in different real-world computer vision applications, and digital images constitute a major
part of multimedia data. In last few years, the complexity of multimedia contents, especially the images, has grown exponentially,
and on daily basis, more than millions of images are uploaded at different archives such as Twitter, Facebook, and Instagram. To
search for a relevant image from an archive is a challenging research problem for computer vision research community. Most of
the search engines retrieve images on the basis of traditional text-based approaches that rely on captions and metadata. In the last
two decades, extensive research is reported for content-based image retrieval (CBIR), image classification, and analysis. In CBIR
and image classification-based models, high-level image visuals are represented in the form of feature vectors that consists of
numerical values. The research shows that there is a significant gap between image feature representation and human visual
understanding. Due to this reason, the research presented in this area is focused to reduce the semantic gap between the image
feature representation and human visual understanding. In this paper, we aim to present a comprehensive review of the recent
development in the area of CBIR and image representation. We analyzed the main aspects of various image retrieval and image
representation models from low-level feature extraction to recent semantic deep-learning approaches. The important concepts
and major research studies based on CBIR and image representation are discussed in detail, and future research directions are
concluded to inspire further research in this area.

1. Introduction search engines on the Internet retrieve the images on the


basis of text-based approaches that require captions as input
Due to recent development in technology, there is an in- [4–6]. The user submits a query by entering some text or
crease in the usage of digital cameras, smartphone, and keywords that are matched with the keywords that are placed
Internet. The shared and stored multimedia data are in the archive. The output is generated on the basis of
growing, and to search or to retrieve a relevant image from matching in keywords, and this process can retrieve the
an archive is a challenging research problem [1–3]. The images that are not relevant. The difference in human visual
fundamental need of any image retrieval model is to search perception and manual labeling/annotation is the main
and arrange the images that are in a visual semantic re- reason for generating the output that is irrelevant [7–10]. It is
lationship with the query given by the user. Most of the near to impossible to apply the concept of manual labeling to
2 Mathematical Problems in Engineering

existing large size image archives that contain millions of and representation? (4) How machine learning-based ap-
images. The second approach for image retrieval and proaches can improve the performance of CBIR? (5) How
analysis is to apply an automatic image annotation system learning can be enhanced by the use of deep neural networks
that can label image on the basis of image contents. The (DNN)?
approaches based on automatic image annotation are de- In this review, we have conducted a detailed analysis to
pendent on how accurate a system is in detecting color, address the abovementioned objectives. The recent trends
edges, texture, spatial layout, and shape-related information are discussed in detail by highlighting the main contribu-
[11–13]. Significant research is being performed in this area tions, and upcoming future challenges are discussed by
to enhance the performance of automatic image annotation, keeping the focus of CBIR and feature extraction. The
but the difference in visual perception can mislead the re- structure of the paper is as follow: Section 2 is about color
trieval process. Content-based image retrieval (CBIR) is a feature, Section 3 is about texture features, Section 4 is about
framework that can overcome the abovementioned prob- shape features, Section 5 is about spatial features, Section 6 is
lems as it is based on the visual analysis of contents that are about low-level feature fusion, Section 7 is about local
part of the query image. To provide a query image as an input feature, commonly used dataset for CBIR and overview to
is the main requirement of CBIR and it matches the visual basic machine learning techniques, Section 8 is about deep-
contents of query image with the images that are placed in learning-based CBIR, Section 9 is about feature extraction
the archive, and closeness in the visual similarity in terms of for face recognition, Section 10 is about distance measures,
image feature vector provides a base to find images with Section 11 is about performance evaluation criteria for CBIR
similar contents. In CBIR, low-level visual features (e.g., and feature extraction techniques, while the last Section 12
color, shape, texture, and spatial layout) are computed from points towards the possible future research directions.
the query and matching of these features is performed to sort
the output [1]. According to the literature, Query-By-Image 2. Color Features
Content (QBIC) and SIMPLicity are the examples of image
retrieval models that are based on the extraction of low-level Color is considered as one of the important low-level visual
visual semantic [1]. After the successful implementation of features as the human eye can differentiate between visuals
the abovementioned models, CBIR and feature extraction on the basis of color. The images of the real-world object that
approaches are applied in various applications such as are taken within the range of human visual spectrum can be
medical image analysis, remote sensing, crime detection, distinguished on the basis of differences in color [24–27].
video analysis, military surveillance, and textile industry. The color feature is steady and hardly gets affected by the
Figure 1 provides an overview of the basic concepts and image translation, scale, and rotation [28–31]. Through the
mechanism of image retrieval [14–16]. use of dominant color descriptor (DCD) [24], the overall
The basic need for any image retrieval system is to search color information of the image can be replaced by a small
and sort similar images from the archive with minimum amount of representing colors. DCD is taken as one of the
human interaction with the machine. According to the MPEG-7 color descriptors and uses an effective, compact,
literature, the selection of visual features for any system is and intuitive format to narrate the indicative color distri-
dependent on the requirements of the end user. The dis- bution and feature. Shao et al. [24] presented a novel ap-
criminative feature representation is another main re- proach for CBIR that is based on MPEG-7 descriptor. Eight
quirement for any image retrieval system [17, 18]. To make dominant colors from each image are selected, features are
the feature more robust and unique in terms of represen- measured by the histogram intersection algorithm, and
tation fusion of low-level visual features, high computational similarity computation complexity is simplified by this.
cost is required to obtain more reliable results [19, 20]. According to Duanmu [25], classical techniques can
However, the improper selection of features can decrease the retrieve images by using their labels and annotation which
performance of image retrieval model [12]. The image cannot meet the requirements of the customers; therefore,
feature vector can be used as an input for machine learning the researchers focused on another way of retrieving the
algorithms through training and test models and it can images that is retrieving images based on their content. The
improve the performance of CBIR [1, 2]. A machine learning proposed method uses a small image descriptor that is
algorithm can be applied by using training-testing (either changeable according to the context of the image by a two-
through supervised or through unsupervised) framework in stage clustering technique. COIL-100 image library is used
both cases. The recent trends for image retrievals are focused for the experiments. Results obtained from the experiments
on deep neural networks (DNN) that are able to generate proved that the proposed method to be efficient [25].
better results at a high computational cost [21–23]. In this Wang et al. [26] proposed a method based on color for
paper, we aim to provide a compressive overview of the retrieving image on the basis of image content, which is
recent research trends that are challenging in the field of established from the consolidation of color and texture
CBIR and feature representation. The basic objectives of this features. This provides an effective and flexible estimation of
research study are as follows: (1) How the performance of how early human can process visual content [26]. The fusion
CBIR can be enhanced by using low-level visual features? (2) of color and texture features offers a vigorous feature set for
How semantic gap between the low-level image represen- color image retrieval approaches. Results obtained from the
tation and high-level image semantics can be reduced? (3) experiments reveal that the proposed method retrieved
How important is image spatial layout for image retrieval images more accurately than the other traditional methods.
Mathematical Problems in Engineering 3

Tiger

Abstract
thought

Query by keyword

Query by image

Query by sketch
Interface

Horse

forest

Query by concept layout

Figure 1: Pictorial representation of different concepts of image retrieval [6].

However, the feature dimensions are not higher than other between the query image and the image in the database, two
approaches and require a high computational cost. A features Color Histogram Feature (CHF) and Bit Pattern
pairwise comparison for both low-level features is used to Histogram Feature (BHF) are introduced. The CHF and
calculate similarity measure which could be a bottleneck BHF are calculated from the VQ-indexed color quantizer
[26]. and VQ-indexed bitmap image, respectively. The distance
Various research groups carried out a study on the evaluated from CHF and BHF can be used to assess the
completeness property of invariant descriptors [27]. Zernike likeliness between the two images. Results obtained from the
and pseudo-Zernike polynomials which are orthogonal basis experiments show that the proposed scheme performs better
moment functions can represent the image by a set of than former BTC-based image indexing and other existing
mutually independent descriptors, and these moment image retrieval schemes. The EDBTC has good ability for
functions hold orthogonality and rotation invariance [27]. image compression as well as indexing images for CBIR [28].
PZMs proved to be more vigorous to image noise over the Liu et al. [29] proposed a novel method for region-based
Zernike moments. Zhang et al. [27] presented a new ap- image learning which utilizes a decision tree named DT-ST.
proach to derive a complete set of pseudo-Zernike moment Image segmentation and machine learning techniques are
invariants. The link between pseudo-Zernike moments of the base of this proposed technique. DT-ST controls the
the original image and the same shape but distinct orien- feature discretization problem which frequently occurs in
tation and scale images is formed first. An absolute set of contemporary decision tree learning algorithms by con-
scale and rotation invariants is obtained from this re- structing semantic templates from low-level features for
lationship. And this proposed technique proved to be better annotating the regions of an image. It presents a hybrid tree
in performance in recognizing pattern over other techniques which is good for handling the noise and tree fragmentation
[27]. problems and reduced the chances of misclassification. In
Guo et al. [28] proposed a new approach for indexing semantic-based image retrieval, the user can query image
images based on the features extracted from the error dif- through both labels and regions of images. Results obtained
fusion block truncation coding (EDBTC). To originate from the experiments conducted to check the effectiveness of
image feature descriptor, two color quantizers and a bitmap the proposed technique reveal that this technique provides
image using vector quantization (VQ) are processed which higher retrieval accuracy than the traditional CBIR tech-
are produced by EDBTC. For assessing the resemblance niques and the semantic gap between low- and high-level
4 Mathematical Problems in Engineering

features is reduced to a significant level. The proposed level and texture orientation quantization levels are evalu-
technique performs well than the two effectively set decision ated and our proposed model performs better on HSV and
tree induction algorithms ID3 and C4.5 in image semantic Lab color space and poor on RGB color space. For getting
learning [29]. Islam et al. [30] presented a supreme color- good results between storage space, retrieval accuracy, and
based vector quantization algorithm that can automatically speed, 72 color and orientation quantization levels are used
categorize the image components. The new algorithm effi- in MSD and 6 for image retrieval. The average retrieval and
ciently holds the variable feature vector like the dominant recall ratios of MSD are compared with other methods like
color descriptors than the traditional vector quantization Gabor MTH on Corel datasets because these algorithms are
algorithm. This algorithm is accompanied by the novel developed for image retrieval for the evaluation of MSD and
splitting and stopping criterion. The number of clusters can the results show that our proposed model (MSD) out-
be learned, and unnecessary overfragmentation of region performs other models.
clusters can be avoided by the algorithm through these 10,000 color images [34] were collected from public
criteria. resources of natural scenes such as landscapes, peoples, and
Jiexian et al. [31] presented a multiscale distance co- textures in order to perform their experiments for image
herence vector (MDCV) for CBIR. The purpose behind this retrieval based on texture. Generally, for retrieval results,
is that different shapes may have the same descriptor and properties such as smoothness, regularity, distribution, and
distance coherence vector algorithm may not completely coarseness are considered while used additionally the color
eliminate the noise. The proposed technique first uses the information with these properties. The precision compari-
Gaussian function to develop the image contour curve. The son between the proposed model (color co-occurrence
proposed technique is invariant to different operations like matrix) and the gray-level co-occurrence matrix method
translation, rotation, and scaling transformation. provides results to evaluate the proposed model. The
comparison shows that the color co-occurrence matrix is
better than the gray-level co-occurrence matrix because of
2.1. Summary of Color Features. There are various low-level the additionally added property (color information). For
color features, and the performance of color moments is not CBIR [35], Corel, COIL, and Caltech-101 datasets (those
good as it can represent all the regions of the image. His- datasets are chosen that have images grouped in the form of
togram-based color features require high computational cost semantic concepts) containing 10908, 7200 images, and 101
while DCD performs better for region-based image retrieval image categories for respective datasets are used. The mean
and is computationally less expensive due to low di- precision and recall rates obtained by the proposed method
mensions. A detailed summary of the abovementioned color (embedded neural network with bandlet transform) on top
features [24–31] is represented in Table 1. 20 retrievals are compared with the other standard and with
the state-of-the-art retrieval systems. The mean precision
3. Texture Features and recall rates obtained by the proposed method are 0.820
and 0.164 on top 20 retrievals. These results show that the
Papakostas et al. [32] performed their experiments on four research presented in [35] clearly outperformed other
datasets, namely, COIL, ORL, JAFFE, and TRIESCH I in models in terms of mean precision value and recall rate.
order to show the discrimination power of the wavelet With Corel image gallery containing 10900 images for
moments. These datasets are divided into 10, 40, 7, and 10 categorical image retrieval, Irtaza and Jaffar [36] conducted
classes. For the evaluation of the proposed model (WMs), experiments to show the effectiveness of the proposed model
two different configurations of wavelets WMS-1 and WMs-2 (SVM-based architecture; Figure 2 represents an example of
are used where the former uses cubic B-spline and the other binary classification while using SVM). The Corel image
uses the Mexican hat mother wavelets. By keeping only gallery is divided into two sets Corel A having 1000 images
effective characteristics in feature selection approach greatly that are divided into ten categories and Corel B that has 9900
improves the classification capabilities of the wavelet mo- images. The mean precision and recall rates obtained by the
ments. The performance of the proposed model is compared proposed method on the top 20 retrievals are compared with
with Zernike, pseudo-Zernike, Fourier–Mellin, and Legen- other standard retrieval systems. Different numbers of
dre and with two others by using 25, 50, 75, and 100 percent returned images are used to show the retrieval capacity of
of the entire datasets, and each moment family behaves SVM and it shows consistent results. Thus, the results and
differently in each dataset. Classification performance of the comparison show that the proposed model has better results
moment descriptors shows the better results of the proposed and is more consistent in image retrieval. Fadaei et al. [38]
model (wavelet Moments and moment invariants). For the performed experiments on Brodatz and Vistex datasets for
evaluation of the proposed model (MSD) for image retrieval, content-based image retrieval containing 112 grayscale and
Liu et al. [33] perform experiments on Corel datasets as there 54 color images for respective datasets. The distance between
are no specific datasets for content-based image retrieval the query image and dataset image is calculated, images that
(CBIR). Corel-5000 and Corel-10000 are used with 15000 have minimum distance are retrieved, and then the precision
images, and HSV, RGB, and Lab color space are used to and recall rates are calculated. The results of the proposed
evaluate the retrieval performance. On both datasets Corel- models are compared with other prior methods. The re-
5000 and Corel-10000, the average retrieval and recall rates trieval time of Brodatz is longer than that of Vistex database
of the proposed model using different color quantization because Brodatz has more images than Vistex; thus, it needs
Mathematical Problems in Engineering 5

Table 1: A summary of the performance of color features.


Author Application Method Dataset Accuracy
Duanmu [25] Image retrieval Color moment invariant COIL-100 0.985
Wang et al.
Content-based image retrieval Integrated color and texture features Corel 0.613
[26]
Zhang et al.
Object recognition Complete set of pseudo-Zernike invariants COIL-100 —
[27]
Error diffusion block truncation coding
Guo et al. [28] Content-based image retrieval Corel 0.797
features
Shao et al. [24] Image retrieval MPEG-7 dominant color descriptor Corel 0.8964
High-level semantics using decision tree
Liu et al. [29] Region-based image retrieval Corel 0.768
learning
Automatic categorization of image
Islam et al. [30] Dominant color-based vector quantization Corel 0.9767
regions
Jiexian et al. Multiscale distance coherence vector MPEG-7 image
Content-based image retrieval 0.97
[31] algorithm database

application of shape features in the domain of image re-


trieval and image representation. Region-based and con-
tour-based are the main classifications of shape features [14].
Figure 3 presents a basic overview of the classification of
–1
shape features. Trademark-based image retrieval [41] is one
of the specific domains where shape features are used for
image representation.

5. Spatial Features
Image spatial features are mainly concerned with the lo-
+1 cations of objects within the 2D image space. The Bag of
Support vector Visual Words (BoVW) [42] is one of the popular frame-
works that ignore image spatial layout while representing the
Figure 2: Example of SVM-based classification [37]. image as a histogram. Spatial Pyramid Matching (SPM)
[43–45] is reported as one of the popular techniques that can
capture image spatial attributes but is insensitive to scaling
time for more feature matching and processing. The di-
and rotations. Zafar et al. [46] presented a method to encode
mension of the feature vector for the proposed model is 3124
the respective spatial information for representing the his-
which is higher than that of other methods. Retrieving time
togram of the BoVW model. This is initiated by the cal-
of the proposed model is slower in feature matching and
culation of the universal geometric correlation between the
faster in feature extraction although the dimension of the
sets of similar visual words corresponding to the center of
feature vector is high. Comparison and results show that the
the image. Five databases are used for assessing the per-
proposed model (LDRP) has better performance and average
formance of the proposed scheme based on respective spatial
precision rates and is faster in feature extraction and slower
information. Ali et al. [47] proposed Hybrid Geometric
in feature matching.
Spatial Image Representation (HGSIR) by using image
classification-based framework. The base of this is the
3.1. Summary of Texture Features. There are various low- compound of different histograms calculated for the rect-
level texture features and they can be applied in different angular, triangular, and circular areas of the images. To
domains of image retrieval. As they represent a group of assess how well the presented approach performs, five
pixel, therefore they are semantically more meaningful than datasets are used for this. And the results show that this
color features. The main drawback of texture features is the research performs better than the state-of-the-art methods
sensitivity to image noise and their semantic representation concerning how accurately images are classified. In another
also depends on the shapes of objects in the images. A research, Zafar et al. [48] presented a novel technique for
detailed summary of the abovementioned texture features representing images that includes the spatial information to
[32–36, 38, 39] is represented in Table 2. the reversed index of the BoVW model. The spatial in-
formation is attached by computing the universal corre-
4. Shape Features sponding spatial inclination of visual words in a gyration-
invariant fashion. The geometric correlation of similar visual
Shape is also considered as an important low-level feature as words is calculated. This is done by computing an orthog-
it is helpful in identification of real-world shapes and objects. onal vector corresponding to every single point in the triplets
Zhang and Lu [15] presented a comprehensive review of the of similar visual words. The histogram of visual words is
6 Mathematical Problems in Engineering

Table 2: A summary of the performance of texture features.


Authors Datasets Purpose Model Performance/accuracy
Classification performances
Wavelet moments and their on (100%) percent of entire
Papakostas et al. COIL, ORL, JAFFE, Wavelet moments and
corresponding invariants in data are 0.3083, 0.2425, 0.1784,
[32] TRIESCH I moment invariants
machine vision system and 0.1500, respectively, for
datasets
Similarity between query
image and image database is
3.9198, 9.92209, and 8.86239
Corel-1000 and Corel- for dragons, busses, and
Wang et al. [34] Image retrieval SED
10000 landscapes, and there will be
high precision rate when the
query image has noteworthy
regions or texture
Average retrieval precision
Corel datasets (Corel-5000 and recall ratios on Corel-5000
Liu et al. [33] Image retrieval MSD
and Corel-10000) and Corel-10000 are 55.92%,
6.71% and 41.44%, 5.48%
Improvement in average
retrieval rate on Brodatz (EB2)
by our model is 6.86% and
Lasmar and GC-MGG and GC-
Vistex, Brodatz, ALOT Texture image retrieval 5.23%, respectively, with
Berthoumieu [40] MWbl
Daubechies filter db4 and
dual-tree complex wavelet
transform
80.81% and 91.91% are average
precision rates of the first-
Fadaei et al. [38] Brodatz and Vistex Content-based image retrieval LDRP
order LDRP(P � 6, K � 4) for
the respective datasets

computed based on the size of orthogonal vectors that method (proposed by Rahat) to make them rotation in-
provides information about the respective position of the variant as they are not rotation invariant before. Then these
linear visual words. For the evaluation of the presented modified angles are merged with circular tilings which get an
method, four datasets are used. Ali [49] proposed two increased rate of classification and it reduces the compu-
techniques for representing the images. The base of these tation complexity. Anwar et al. [52] performed experiments
techniques is the histogram of triangles that incorporates the on various datasets belonging to different categories (as they
spatial information to the reversed index of BoF represen- have different backgrounds) to verify the proposed model
tation. An image is divided into two or four triangles which and to verify rotation invariant of images in coins; authors
are assessed individually for calculating the histograms of rotated coin images to an extreme extent. Khan et al. [53]
triangles for two levels: level 1 and level 2. Two datasets are proposed a global and local relative spatial distribution of
used for evaluating the results of the presented technique. visual words over an image named soft pairwise spatial
Experimental results show that the proposed technique angle-distance histogram to include distance and angle
performs well while retrieving images. information of visual words. The aim is to provide efficient
Khan et al. [50] proposed PIW (Pairs of Identical visual representation capable of adding relative spatial information
word, the set of all pairs of VWs of the same type) to and by performing experiments on classification tasks on
represent global spatial distribution (histogram orientation MSRC-2, 15Scene, Caltech-101, Caltech-256, and Pascal
of segments formed by PIW). Khan et al. [50] just considered VOC 2007 datasets, so authors concluded that the proposed
relationships among similar visual words so histograms that method performs well and improves overall performance. In
are produced by each word type compose powerful details of order to acquire rotation invariance efficiently, Ali et al. [54]
intratype visual words relationships. The advantages of this proposed to represent global spatial distribution by con-
approach over others are as follows: it enables infusion of structing histograms based on the computation of the or-
global information, powerful geometric transformation, thogonal vector between PIWs. For the evaluation of the
efficient extraction of spatial information, reduces com- presented method, three satellite scene datasets are used.
plexity, and improves classification rate by adding dis-
tinguishing information. Anwar et al. [51] presented a model 6. Low-Level Feature Fusion
by using symbol recognition (symbol recognition is per-
formed by using scale-invariant feature transform-based Ashraf et al. [55] presented a CBIR model that is based on
BoVW). To add spatial information to BoVW, circular til- color and discrete wavelet transform (DWT). For the re-
ings are used and modify angles histograms of an existing trieval of similar images, the low-level feature color, texture,
Mathematical Problems in Engineering 7

Complex coordinate
Centroid distance function
Tangent angle
One-dimensional function for
Contour curvature
shape representation
Area function
Triangle-area representation
Chord length function
Merging methods
Polygonal approximation
Splitting method

Adaptive grid resolution


Boundary box
Shape representation

Convex hull
Chain code
Spatial interrelation feature Smooth curve decomposition
Representation based on Ali’s method
Beam angle statistic
Shape matrix
Shape context
Chord distribution
Shock graph

Boundary moment
Moments
Region moments

Curvature scale space


Scale space methods
Intersection point map

Fourier descriptors
Wavelet transform
Angular radial transformation
Shape transform domain
Shape signature harmonic embedding
R_transform
Shapelet descriptors
Figure 3: An overview of shape-based feature extraction approaches [14, 15].

and shape are used. These features play a significant role in distance between the query image and repository image, the
the retrieval process. Different types of features and feature correlated image from the database is selected to match with
extraction technique are discussed and scenarios in which the image that is passed in query. To reduce the compu-
feature extraction technique is good are explained [55]. To tational steps and enhance the search, the color features are
prepare the eigenvector information from the image [55], also incorporated with histogram and the Haar Wavelet
color edge detection and discrete wavelet approaches are transform was applied. And then for image retrieval, the
used. The color space RGB and YCbCr are used to extract the artificial neural network (ANN) is applied; then, its per-
color features. The researchers in [55] transformed RGB formance is measured against the existing CBIR system. The
images to YCbCr color space to extract the meaningful result shows that this method has a better performance than
information. The YCbCr transformation is selected in this the others [55].
case because the human visual system can view different Ashraf et al. [56] presented a new CBIR technique that
colors and brightness sensitivity. In YCbCr, the Y represents uses the combination of color and texture features to extract
the luminance while the color is represented by Cb and Cr. the local vector which is used as a featured vector. Color
The output of YCbCr is dependent on two factors, while in moments are used to extract the color feature, and for the
case of RGB, the output image is dependent on the intensity texture feature, the discrete wavelet transform and Gabor
of R, G, and B, respectively. The YCbCr color space is also wavelet methods are used. To enhance the feature vector
used to solve the color vibration problem. To extract the edge color and edge, directory descriptor is also used in the
features, the Canny edge detector is used. The viewfinder feature vector. Then, this method is compared with all other
ensures that this special feature responds to the opponent existing CBIR methods and good performance is achieved
and then provides the best shape in any size. In order to [56] in terms of precision and recall values.
retrieve the query image, the color and edge-based features Mistry et al. [57] conducted a study on CBIR by using
are extracted to compute the feature vector. If there is a small hybrid features and various distance metrics. In this paper,
8 Mathematical Problems in Engineering

the hybrid features combine three different features de- indexing of SIFT feature and the deep convolutional neuron
scriptors which consist of spatial features, frequency, network (d-CNN) for the retrieval of image. To check the
binarized statistical image features (BSIF), and color and shared image-level neighborhood structure and to implicitly
edge directivity descriptors (CEDD). Features are extracted integrate the CNN and SIFT features, index the collaborative
by using BSIF, CEDD, HSV color histogram, and color index embedding algorithms proposed which continuously
moment. Features that are extracted by using HSV histo- update the index file of CNN and SIFT features. After
gram contain color quantization and color space conversion continuous iteration of the embedding index, the CNN
and histogram computation. Feature extraction by using the embedded index is used for the online query, which shows
BSIF includes conversion of RGB to grayscale image and the efficient retrieval accuracy with 10 percent more than the
patch selection from grayscale image. It also includes sub- original CNN and SIFT index. The results of the extensive
traction of mean value from the components. Feature ex- experiment performed based on this method show that it
traction by using the CEDD process includes HSV color achieves higher performance in the retrieval [60].
two-stage fuzzy linking system. Feature extraction using the Li et al. [61] studied on the color texture feature image
color moment process first converts the RGB into its which is based on the Gaussian copula model of Gabor
component and then finds out the mean and standard de- wavelets. He proposed an efficient method for the retrieval of
viation for each component. The stored features are then the image in the color and texture context by using the
compared with the query image feature vector. Minimum Gaussian copula model which is based on Gabor wavelets.
distance by using the distance classifiers results in the Gabor filter is considered as a linear filter which is used for
comparison and then the image is retrieved. Different ex- signal analysis. Orientation and the frequency representa-
periments are performed on that approach, and the results tion of Gabor filter are resembled with the human visual
show that this approach significantly performs better than system and it is particularly used for texture image retrieval
the existing methods [57]. and the copula model is used to capture the dependence
Ahmed et al. [58] conducted a study on CBIR by using structure in the variable where dependencies exist. Gabor
image feature information fusion. In this technique, the wavelets are used to decompose the color image; after de-
fusion between the extracted spatial color features with composition, three types of dependencies exist in decom-
shape features extracted and object recognition takes place. posed subbands of Gabor wavelet. These three dependencies
Colors with shape together can differentiate the object more are directional dependence, color dependence, and scale
accurately. Spatial color feature in the feature vector in- dependence. After the decomposition, existence de-
creases the retrieval of the image. In the proposed method, pendencies are analyzed and captured by using the Gaussian
RGB color is used to extract the color feature while the gray- copula method. There are three types of schemes developed
level images are used to extract the object edges and corner for Gaussian copula, and accordingly, four Kullback–Leibler
in the formation of shape. The detection of corner and edges distances (KLD) are introduced for color retrieval image.
from the shape creates more powerful descriptor. Shape Several experiments are performed using the datasets ALOT
detection conforms the better understanding of object or and STex, and the results show that it performs better than
image. Shape image detection on the basis of edges and the several state-of-the-art retrieval methods [61].
corner formation combining with the color produces more Bu et al. [62] studied on CBIR by using color and texture
accurate result for retrieval or detection of image. For features by combining the color and texture features
selecting the high variance component, the dimension re- extracted from the image using Multi-Resolution Multi-
duction takes place on the feature vector. Then, the compact Directional (MRMD) filters. MRMD filters are used as
data features are the input of Bag of Word (BoW) for quick simple and it can be independent to low- and high-frequency
indexing or retrieval of image. The results of the experiment features, and it produces efficient multiresolution multidi-
performed based on this technique show that it outperforms rectional analyses. HSV color space is used as its charac-
the existing CBIR technique [58]. teristics are very close to the human visual system. Local and
Liu et al. [59] proposed a method for classifying and global features are extracted from the domain of low- and
searching an image by fusing the local base pattern (LBP) high-frequency in each color space. Several experiments are
and color information feature (CIF). For deriving the image performed by comparing the precision VS recall of the
descriptor, the LBP extracts the textural feature. But the LBP retrieval and the feature dimension vector. The results show
has not good performance for the color feature descriptor. that this method has significant improvement over the
Both the color feature and textural feature are used for the existing techniques [62]. A detailed summary of the
efficient retrieval of the color image from a large set of abovementioned low-level feature fusion for CBIR is rep-
database. In this proposed method, a new color feature CIF resented in Table 3.
with the LBP-based feature is used for image retrieval as well Nazir et al. [63] conducted a study on CBIR by fusing the
as for classification. CIF and LBP both together represent the color and texture features. Since retrieving the image from a
color and textural information of an image. Several exper- large set of databases is a challenging task, researchers
iments are performed using a large set of database, and the proposed many techniques to overcome this challenge.
results show that this method has good performance for Nazir et al. [63] used both the color and texture features to
retrieval and classification of the images [59]. retrieve the image. The previous research shows that by
Zhou et al. [60] conducted a study on collaborative index retrieving the image using a single feature does not provide
embedding. This work explores the potential of unifying good results and using multiple features for image retrieval
Mathematical Problems in Engineering 9

Table 3: A summary of the performance of fusion feature-based approaches for CBIR.


Author Dataset Images/classes Techniques Applications Precision
HSV color histogram, discrete
Nazir Corel 1- 1000 images which are divided Content-based
wavelet transform, and edge 0.735
et al. [63] K into 10 classes image retrieval
histogram descriptor
It contains 10 categories. Each Multimedia data for content-
Ashraf Corel Content-based
category contains 100 images based image retrieval by using 0.875
et al. [56] 1000 image retrieval
with different size multiple features
Mistry Dataset contains 1000 images Hybrid features and various Content-based
Wang 0.875
et al. [57] from 10 different classes distance metric image retrieval
Dataset contains 1000 image
Ahmed Corel- splitted into 10 categories. Image features information Content-based For Africa and building
et al. [58] 1000 Each category consists of 100 fusion image retrieval categories, the precision is 0.90
images
Brodatz consisting of 1856 and
600 texture image
Vistex consisting of 640 and
Liu et al. Brodatz, Fusion of color histogram and Texture-based
864 texture images. 0.841 and 0.952
[59] Vistex LBP-based features images retrieval
Each class in Brodatz and
Vistex consists of 16 similar
images

seems to be a better option. The color feature is extracted and recognition that are properly formulated to sparse rep-
using the color histogram while the texture feature is resentation problem. Several public datasets such as Corel-
extracted using discrete wavelet transform (DWT) and by 1000, COIL-20, COIL-100, and Caltech-101 are used for
edge histogram descriptor. In the extraction of color fea- simulation and obtaining the desired results [64].
tures, the color space of the image describes the color array. Zhao et al. [65] proposed cooperative sparse represen-
HSV color space is used for color feature, as reported the hue tation in two opposite directions for semisupervised image
and saturation is very close to the human visual system. The annotation. According to the recent research studies [8],
DWT is used for texture feature extraction because it is very sparse representation is effective for many computer vision
efficient for nonstationary signal. It varies for both the problems and its kernel version has powerful classification
frequency and spatial range. Here, the author applied capability. They focused on cooperative SR application in the
“Daubechies dbl” wave as it gives very efficient result than semisupervised image annotation which may increase the
the others. Edge histogram descriptor is used to depict only number of labeled images in the training image classifiers for
the distribution of local edges in the image. EDH is used to future use. A set of labeled and unlabeled images is provided,
find the most relevant image from the database and it and the usual SR methodology which is also known as
performed some computational steps, and at last, EDH is forward SR is used to represent each unlabeled image with
calculated for the image. Different experiments are used to many other labeled images, and after that, the unlabeled
determine this technique; as a result, it performs better than image is annotated according to the label image annotations.
the existing CBIR system [63]. In backward SR approach, the annotation process is com-
pleted and labels are assigned to the images that are without
7. Local Feature-Based Approaches semantic description. The main focus is on the contribution
of backward SR to image annotation. To evaluate the
Kang et al. [64] conducted a study on image similarity as- complementary nature between two SRs in the opposite
sessment technique based on sparse feature representation. To direction, a semisupervised method called cotraining is
automatically interpret the similar things in different images adopted which builds a unique learning model for improved
is the main reason behind similarity assessment. Information image annotation in kernel space. Results of the experiment
fidelity problem is taken as the image similarity assessment show that two SRs are different and independent. Co-KSR
problem. For gathering information available in the reference results better with an image annotation with high perfor-
image and estimating the amount of information that can be mance improvement over other state-of-the-art semi-
collected from the test image, a feature-based approach is supervised classifiers such as TSVM, GFHF, and LGC.
proposed [64]. This feature-based approach will basically Therefore, the proposed Co-KSR method can be an effective
assess the similar things between two images. A descriptor method for semisupervised image annotation. Figure 4
dictionary is learned to extract different features points and represents an overview of automatic image annotation.
the corresponding descriptor from an image to understand Different high-level semantics are assigned to image through
the information available in the image. Then sparse repre- image annotation framework.
sentation is used to formulate the image similarity assessment Thiagarajan et al. [66] conducted a study on supervised
problem. The proposed scheme is applied to three popular local sparse coding of subimage features for image retrieval.
applications which are image copy-move detection, retrieval, After being widely used in image modeling, sparse
10 Mathematical Problems in Engineering

representation is now being used in applications of com-


puter vision. The features that differentiate one image from
Sky, sky, grass,
the other must be extracted for retrieving and classifying
people, buildings
images. To perform supervised local sparse coding of larger
overlapping regions, a feature extraction approach is pro-
posed which uses multiple global/local features. A method is
proposed for designing dictionary and supervised local
sparse coding of subimage heterogeneous features. Experi- Sky, mountain, grass,
mental results show that proposed features outperform the grass, horses
spatial pyramid features obtained using local descriptors.
Hong and Zhu [67] proposed a novel ranking method with
QBME for retrieving images faster which is based on a novel
learning framework. The current QBME approach uses all Cloud, cloud, water,
examples individually and then combines their results in water, building
which on each increment of query example their compu-
tational time also increases. First, the semantic correlation,
which is learned using sparse representation, of image data Figure 4: Example of image annotation [19].
in the training process is explored. A semantic correlation
hypergraph is constructed to model the relationship between using similarity measure. The best thing about this approach
images in the dataset. A prelearned semantic correlation is is that it does not require training and works well on dif-
used after constructing SCHG to estimate the linking value ferent medical databases. An images database named IRNA
among images. Second, a multiple probing strategy is is used for evaluating the performance of the proposed
proposed to rank the images with multiple query examples. method. Results demonstrate that the proposed method
The current QBME method accepts one input example at a efficiently retrieves image from medical databases.
time, but in the proposed method, all input examples are Mohamadzadeh and Farsi [70] conducted a study on
processed at the same time. Therefore, the proposed scheme content-based image retrieval system via sparse represen-
shows effectiveness in terms of speed and retrieval perfor- tation. Several multimedia information processing systems
mance. Wang et al. [68] carried out a study on retrieval- and applications require image retrieval which finds query
based face annotation by weak label regularized local co- image in image datasets and then represents as required.
ordinate coding. To detect a human face from an image and Studies show that the images are retrieved in two ways, i.e.,
annotate it according to the image automatically is im- text-based and content-based image retrieval. The purpose
portant to many real-world applications. A framework is of the retrieval systems is to retrieve the image automatically
provided to address the problems in mining massive web according to the query. But many researchers are attracted
facial images available online. For a given query image, first towards the speed and accuracy with which the images are
using content-based image retrieval, top “n” images from retrieved automatically. The proposed scheme uses sparse
web facial image databases are retrieved and then their labels representation to retrieve images. The goal is to present a
are used for auto annotations. This method has two main CBIR technique involving IDWT feature and sparse rep-
problems that are (1) how to match the query image and resentation. The color spaces that are considered include HSI
images placed in the archive and (2) how similar labels can and CIE-L∗ a∗ b∗ . The P (0.5), P (1), and ANMRR metrics of
be assigned to the images that are not correlated with each the proposed scheme and existing methods have been
other. A WLRLCC technique is proposed which exploits the computed and compared. The datasets that are used to
principle of both local coordinate coding and graph-based obtain metrics are Flower, Corel, ALOI, Vistex, and MPEG-
weak label organization. To evaluate this proposed study, 7. The results of the experiments show that the proposed
experiments were conducted on many different web facial method has higher retrieval accuracy than the other con-
image databases. The result proves this technique to be ventional methods with the DALM algorithm for S plane.
effective. For further improving the efficiency and scalability, This proposed method has high performance than other
an offline approximation scheme (AWLRLCC) is proposed. methods for five datasets, and the size of the feature vector
This is better in maintaining the comparable results and and storage space are reduced and image retrieval is
takes less time to annotate images. improved.
Srinivas et al. [69] carried out a study on content-based Mainly two different approaches are used for the query
medical image retrieval using dictionary learning. For to retrieve the images: one is text-based and the other one is
grouping large medical datasets, a clustering method using through the image-based search. Image-based retrieval
dictionary learning is proposed. A K-SVD groups similar systems rely on models such as BoVW, and CBIR is one
images into the clusters using dictionaries. An orthogonal important application of BoVW with the aim of providing
matching pursuit (OMP) algorithm is used to match a query the similar image related to the query. Consider the image
image with the existing dictionary to identify the dictionary retrieval system when a user cannot provide an exemplar
with the sparsest representation. For retrieving the images image instead only a sketch, and the raw counter is available
that are similar to the query images, the images included in that is called sketch-based image retrieval (SBIR). SBIR uses
the cluster associated with this dictionary are compared the edges or counter image for retrieval, and hence, it is
Mathematical Problems in Engineering 11

difficult compared to CBIR. Li et al. [71] proposed a novel it to with existing model that learns the feature code in-
sketch-based imaged retrieval using product quantization dividually. The proposed method [73], CA-LBFL, takes the
with sparse coding to construct the codebook. In this contextual information of adjacent bits by limiting the
method, the desired image sketch is drawn and features are number of bitwise changes in each descriptor and obtains
extracted using the state-of-the-art local descriptors. Then by more robust local binary features. A detailed summary of the
using product quantization and sparse coding, authors [71] abovementioned local feature for CBIR is represented in
encoded the features into the optimized codebook and then Table 4. Figures 5–7 represent images that are randomly
encode the sketch features using quantization residual to selected from the benchmarks that are commonly used to
improve the representation ability. Hence, this method can evaluate the performance of CBIR, while Figure 8 provides
be efficiently computed and good performance is achieved an overview of commonly used techniques of machine
compared to several popular SBIR. Due to the product learning for CBIR framework and Figure 9 is about the key
quantization, its benefit is that it can be quickly disciplines of machine-human interactions.
implemented. As discussed in section 5, histogram-based image de-
Image retrieval is a technique to browse, search, and scription extracts local features and then encodes them. This
retrieve the image for a large set of database. It provides process requires a precomputed codebook, also known as
convenience to human lives [72]. Machine learning is ef- visual vocabulary. If there are n numbers of image datasets,
fectively increasing the quality of retrieval. Machine learning separate codebook is required to be computed for every case
is also efficiently used for image annotation, image classi- and this process requires high computational cost [77]. In
fication, and image recognition. Many different techniques case of a limited number of training samples, the computed
are used to retrieve the image using color and texture fea- codebook can be biased and it can degrade the performance
tures. It is difficult for simple feature extraction technique to of the BoVW model. When the precomputed codebook
obtain the high-level semantics information of target in- from any dataset is applied for online/new set of images, the
formation; hence, for this solution, many different models discriminating ability of codebook decreases [77]. To
are proposed which contribute to extract the semantic in- overcome this limitation, the authors proposed a novel
formation of the target image. Due to advancement in implicit codebook transfer method for visual representation
machine learning, deep learning has appeared in many fields [77]. The proposed approach is different from the previous
of modern life. In the deep learning also, different techniques research as it is based on a prelearned codebooks based on
are presented. It is to be mention that the sparse repre- nonlinear transfer. In this case, the local features are
sentation model is based on the foundation of sparse rep- reconstructed on the basis of nonlinear transformation and
resentation. However, the high quality of the image retrieval implicit transformation is possible. This approach provides
result is obtained from a large number of learning instances. the use of prelearned codebooks for new visual applications
But with the wastage of many human resources, it also through implicit learning. The proposed research is vali-
occupies much computing resources. To solve this problem, dated through several standard image benchmarks, and
the authors proposed the sparse coding-based few learning experimental results demonstrate the effectiveness and ef-
instances model for image retrieval. This model combines ficiency of this implicit learning [77].
cross-validation sparse coding representation, sparse cod- The authors [78] proposed a novel fine-grained image
ing-based instance distance, and improved KNN model that classification model by using a combination of codebook
reduce the number of learning instances by deleting some generation with low-rank sparse coding (LRSC). Class-
nonuseful even mistaken learning instances and selecting the specific and generic codebooks are computed by applying
optimized learning instances while preserving the retrieval optimization on accumulative reconstruction error, the
accuracy. sparsity constraints, and incoherence of codebook. The
According to Duan et al. [73], face recognition gained proposed research [78] is different from the baseline ap-
high attention in computer vision. In the last two decades, proach of BoVW image classification model that is based on
many face recognition methods are introduced. There are the computation of a generic codebook by using all images
two main procedures for face recognition: one is to extract from the training set. The local features that lie within a
the discriminative feature from the face so that it can spatial region are encoded jointly through LRSC. The
separate face image of different person and the second is that similarity among the local features is obtained through LRSC
the face matching is to design effective classifiers to rec- approach as this provides more discriminating fine-grained
ognize different person. A large number of face recognition image classification [78].
methods are proposed in the last few years, which are mainly According to [79], image visual features play a vital role
classified into holistic and local feature representation. in autonomous image classification. However, in computer
Generally, the local feature has better performance than the vision applications, the appearance of the same view in the
holistic feature because of robustness and stableness to local images of different classes often results in visual features
change in image feature description. Most of the local feature inconsistently. The construction of explicit semantic space is
representations need strong prior knowledge. Because of this an open computer vision research problem. To deal with
feature of the contextual information, the authors propose a visual features inconsistently and construction of explicit
context-aware local binary feature learning (CA-LBFL) semantic space, the authors proposed structured weak se-
method for face reorganization. It takes the context-aware mantic space for image classification problem [79]. To
binary code directly from the raw pixels and then compared handle the limitation of weak semantic space, exemplar
12 Mathematical Problems in Engineering

Table 4: A summary of the performance of local feature-based approaches for CBIR.


Author Application Method Dataset Accuracy
Kang et al. [64] Image similarity assessment Feature-based sparse representation COIL-20 0.985
Semisupervised image
Zhao et al. [65] Cooperative sparse representation ImageCLEF-VCDT —
annotation
Supervised local sparse coding of sub Cambridge image
Thiagarajan et al. [66] Image retrieval 0.97
image feature dataset
Transductive learning image
Hong and Zhu [67] Hypergraph-based multiexample ranking Yale face dataset 0.65
retrieval
Retrieval-based face Weak label regularized local coordinate Databases “WDB,”
Wang et al. [68] —
annotation coding “ADB”
Content-based medical image
Srinivas et al. [69] Dictionary learning ImageCLEF dataset 0.5
retrieval
Mohamadzadeh and Content-based image retrieval Flower dataset, Corel
Sparse representation —
Farsi [70] system dataset
SBIR framework based on product
Li et al. [71] Sketch-based image retrieval Eitz benchmark dataset 0.98
quantization (PQ) with sparse coding
Context-aware local binary feature
Duan et al. [73] Face recognition LFW, YTF, FERET 0.846
learning

Figure 5: Randomly selected images from Corel-1500 image benchmark [74].

Figure 6: Randomly selected images from some of the classes of Caltech-101 image benchmark [75].
Mathematical Problems in Engineering 13

Figure 7: Randomly selected images from 15Scene image benchmark [43].

Machine

Graphics and rendering


Support vector machine Statistical analysis Semantic-based
approach
Data mining
Artificial neural network

Information visualization
“The best of both sides”
Data management

Decision tree
Human-centered

Compression and
computing
Machine learning filtering

Bayesian method

Human
Nonparametric approach Human cognition

Perception
Information design
Parametric approach

Figure 8: An overview of basic machine learning techniques for Visual intelligence


CBIR [1, 76].
Decision-making
theory
classifier is trained to discriminate between training images Figure 9: An overview of the key disciplines of machine-human
and test images. The structured constraints are considered to interactions [76].
construct the weak semantic space and this is obtained by
applying a low-rank constraint on the outputs of exemplar of image classification model degrades if the available se-
classifiers with a sparsity constraint. An alternative opti- mantic information within the image is ignored. The au-
mization technique is applied to obtain the learning of thors proposed a novel approach for object categorization
exemplar classifiers. Various visual features are combined to through Semantically modeling the Object and Context
obtain efficient learning of exemplar classifier [79]. information (SOC). A prelearned classifier is applied by
According to [80], object-centric-based categorization computing correlations of each candidate region with high
for image classification is more reliable as compared to the confidence scores, and these regions are grouped as a
approaches that are based on division of the image into cluster for object selection. The other areas of the images in
subregions like SPM. To find the location of an object which there is no object are treated as the background. This
within the image is an open problem for computer vision approach provides a unique and discriminative feature for
research community. According to [80], the performance object categorization and representation [80].
14 Mathematical Problems in Engineering

According to [81], supervised learning is mostly used


for categorization and classification of digital images. Su- L0 (input)
512 ∗ 512
pervised learning is dependent on labeled datasets, and in
some cases, when there are too many images, it is difficult to
manage the labeling process. To handle this problem, the
authors proposed a novel weak semantic consistency
constrained (WSCC) approach for image classification. In
L1
this case, the extreme circumstance is obtained by con-
256 ∗ 256
sidering each image as one class. Through this approach,
learning of exemplar classifier is used to predict weak
semantic correlations [81]. In case when there is no
available labeled information, the images are clustered

Convolution
through the weak semantic correlations and images within
the one cluster are assigned the same midlevel class. The L2
128 ∗ 128
partially labeled images are used to constrain the process of
clustering and they are assigned to various midlevel classes
on the basis of visual semantics. In this way, the newly
assigned images are used for classifier learning and the
L3
process is repeated till convergence. The experiments are 64 ∗ 64
performed by using semisupervised and unsupervised
image classification [81].
L4
8. CBIR Research Using Deep- 32 ∗ 32
Learning Techniques
Fully connected

Searching for digital images from lager storage or databases


F5
is often required, so content-based image retrieval (CBIR)
also known as query-based image retrieval (QBIR) is used
for image retrieval. Many approaches are used to resolve this F6
(output)
issue such as scale-invariant transform and vector of locally
aggregated descriptor. Due to most prominent results and Figure 10: Example of image classification-based framework using
with a great performance of the deep convolutional neural DNN framework.
network (CNN), a novel term frequency-inverse document
frequency (TF-IDF) using as description vector the weighted
convolutional word frequencies based on CNN is proposed Zhu et al. [84] proposed unsupervised visual hashing
for CBIR. For this purpose, the learned filers of convolu- approach known as the semantics assisted visual hashing
tional layers of convolution neuron model were used as a (SAVH). This system uses two components that are offline
detector of the visual words, in which the degree of the visual learning and online learning. In offline learning firstly, the
pattern is provided by the activation of each filter as tf part. image pixel is transformed into mathematical vector rep-
Then three approaches of computing the idf part are pro- resentation by extracting the visual and texture feature.
posed [82]. By providing powerful image retrieval tech- Then, text enhancing the visual graph is extracted with the
niques with a better outcome, these approaches concatenate assistance of topic hypergraph, and the semantics in-
the TF-IDF with CNN analysis for visual content. To prove formation is extracted from the text information and then
the proposed model, the authors conduct experiment on the hash code of image is learned which preserves the
four image retrieval datasets and the outcomes of the ex- correlation of image between the semantics and images, and
periments show the existence of the truth of the model. then at the last, the hash function code is generated within
Figure 10 represents an example of image classification- the linear aggressive model. These desirable properties
based framework using the DNN framework. match the requirement of real application scenarios of CBIR
In order to handle the large scale, Shi et al. [83] proposed [84].
a hashing algorithm that extracts features from images and In computer vision applications, the use of CNN has
learns their binary representations. The authors model the shown a remarkable performance, especially in CBIR
pairwise matrix and an objective function with deep- models. Most of the CNN models get the features in the last
learning framework that learns the binary representations of layer using a single CNN with order less quantization ap-
images. Experiments are conducted on thousands of his- proach and its drawback is they limit the utilization of
topathology images (on 5356 skeletal muscle and 2176 lung intermediate convolutional layer for identifying local image
cancer images with 4 types of diseases) to indicate the pattern. So, in this paper, a new technique is identified as
trustworthiness of the proposed algorithm. The efficiency of bilinear CNN-based architecture. This method used two
the proposed algorithms is achieved with 97.94% classifi- parallel CNN models to extract the feature without the prior
cation accuracy. knowledge of the semantics of image content. The feature is
Mathematical Problems in Engineering 15

directly extracted from the activation of the convolutional good performance in many application domains. The use of
layer rather than reducing very low-dimensional feature. The CNN models is still challenging for precise categorization of
experiment on this approach gives a very important con- object and in the case with limited training information and
clusion: This model reduces the image representation to the labels. To handle the semantic gap, the smooth constraints
compact length as it used different quantized levels to extract can be used, but the performance of the CNN model de-
the feature, so it is remarkable to boost the retrieval per- grades due to the smaller size of the training set. The authors
formance and the search time and storage cost. Secondly, the [92] proposed a multiview algorithm with few labels and
bilinear CRB-CNN is very effective in learning a very view consistency (MVFL-VC). Both labeled and unlabeled
complex image having different semantics. Ten milliseconds images are used together for the image view consistency with
is needed to extract the feature from the image and search multiview information. The discriminative power of the
from the database and very small disk size is needed to learned parameter is also enhanced by unlabeled training
represent and store the image. And at the end, end-to-end images. To evaluate the proposed algorithm, experiments are
tanning is applied without any other metadata, annotations, conducted on different datasets. The proposed MVFL-VC
tags which conformed the capability of CRB-CNN to extract algorithm can be used with other image classification and
the feature from only the visual information in CBIR task. representation techniques. The algorithm is tested on un-
This technique also applies on the large-scale database image labeled and unseen datasets. The results of experiments and
to retrieve the image and showed a high retrieval perfor- analysis reveal the effectiveness of the proposed method [92].
mance [85]. The extraction of domain space knowledge can be
For efficient image search, hashing function gains effi- beneficial to reduce the semantic gap [93]. The authors
cient attention in CBIR [86]. Hashing function gives a proposed multiview semantics representation (MVSR),
similar binary code to the similar content of the image which which is a semantics representation for visual recognition.
maps the high-dimensional visual data into low-dimensional The proposed algorithm divides the images on the basis of
binary space. This approach is basically depending upon the semantic and visual similarities [93]. Two visual similarities
CNN. It is to be assumed that the semantic labels are for training samples provide a stable and homogenous
represented by the several latent layer attributes (binary perception that can handle different partition techniques
code) and classification also depends upon these attributes. and different views. The proposed research based on MVSR
Based on this approach, the supervised deep hashing is more discriminative than other semantics approaches as
technique constructs a hash function from a latent layer in the semantic information is computed for future use from
the deep neurons network and the binary code is learned each view and from separate collection of images and dif-
from the objective functions that explained about the ferent views. Different publicly available image benchmarks
classification error and other desirable properties in the are used to evaluate this research, and the experimental
binary code. The main feature of the SSDH is that it unifies results show the effectiveness of MVSR. The result dem-
retrieval and classification in a single model. SSDH is onstrated that MVSR improved classification performance
scalable to large-scale search, and by slight modification in in terms of precision for image sets with more visual
the existing deep network for classification, SSDH is simple variations.
and easily realizable [86]. A detailed summary of the
abovementioned deep-learning-based features for CBIR is 9. Feature Extraction Techniques for
represented in Table 5. Face Recognition
Effective image analysis and classification of the visual
information using discriminative information is considered Face recognition is one of the important applications of
as an open research problem [91]. Many research models are computer vision and is used for the identity of a person on
proposed using different approaches either by combining the basis of facial features and is considered as a challenging
views by graph-based approach or by using transfer learning. computer vision problem due to complex nature of facial
It is difficult from the existing methods to compute the manifold. In the study [94], the authors proposed a pose-
discriminative information at the image borders and to find and expression-invariant algorithm for 3D face recognition.
similarity consistency constraint. The authors [91] proposed The pose of the probe face image is corrected by employing
a multiview label sharing method (MVLS) for this open an intrinsic coordinate system (ICS)-based approach. For
research problem and tried to maintain and retain the feature extraction, this study employed region-based prin-
similarity. For visual classification and representation, op- cipal component analysis (PCA). The classification module
timization over the transformation and classification pa- was implemented by using Mahalanobis Cosine (MahCos)
rameters is combined for transformation matrix learning distance metric and weighted Borda count method through
and classifier training. Results on MVLS with different six re-ranking stage. The methodology is validated by using two
views (no intra-view and no inter-view plus no intra-view) face recognition datasets that are GavabDB and FRGC v2.0.
and nine views (combination of intra-view and inter-view) In another 3D face recognition algorithm [95], the au-
are conducted. Experimental results are compared with thors employed a two-pass face alignment method capable of
several state-of-the-art research and results shows the ef- handling frontal and profile face images using ICS and a
fectiveness of the proposed MVLS approach [91]. minimum nose-tip-scanner distance-based approach. Face
For the understanding of images and object categori- recognition in multiview mode was performed using PCA-
zation, methods like CNN and local feature have shown based features employing multistage unified classifier and
16 Mathematical Problems in Engineering

Table 5: A summary of the performance of deep-learning-based approaches for CBIR.


Authors Datasets Purpose Model Accuracy
37.50% top-1 and 17.00% top-5
Krizhevsky et al. error rate on ILSVRC-2010 and
ILSVRC-2010 and ILSVRC-2012 Image classification CNN
[87] 15.3% top-5 error rate on
ILSVRC-2012
ConvNets
Sun et al. [88] LFW (Labeled Face in the Wild) Face verification 97.45% accuracy
DeepID
Karpathy and Flickr8K, Flickr30 K and Generation of descriptions of CNN and
Encouraging results
Fei-Fei [89] MSCOCO image regions multimodal RNN
The performance of CBIR 0.512
Li et al. [90] MIRFlickr and NUS-WIDE Social image understanding DCE on MIRFlickr and 0.632 NUS-
WID with k � 1000
Kondylidis et al. INRIA Holidays, Oxford 5k,
Content-based image retrieval CNN based tf-idf Improved results
[82] Paris 6k, UK Bench
5356 skeletal muscle and 2176
Histopathology image 97.49% classification accuracy
Shi et al. [83] lung cancer images with four PDRH algorithm
classification and retrieval and MAP (97.49% and 97.33%)
types of diseases

SVM. The performance of the methodology is corroborated In [100], facial asymmetry-based anthropometric di-
using four image benchmarks that are GavabDB, Bosphorus, mensions have been used to estimate the gender and eth-
UMB-DB, and FRGC v2.0. nicity of a given face image. A regressive model is first used
In a recently published research [96], the authors in- to determine the discriminative dimensions. The gender-
troduced a novel approach for alignment of facial faces and and ethnic-specific dimensions are subsequently applied to
transformed pose of face acquisition into aligned frontal train a neural network for the face classification task. The
view based on the three-dimensional variance of the facial study is significant to analyze the role of facial asymmetry-
data. The facial features are extracted using Kernel Fisher based dimensions to estimate the gender and race of a test
analysis (KFA) in a subject-specific perspective based on iso- face image.
depth curves. The classification of the faces is performed by Asymmetric face features have been used to grade face
using four classification algorithms. The methodology is palsy disease in [101]. More specifically, the generative
tested on GavabDB and FRGC v2.0 3D face databases. adversarial network (GAN) has been used to estimate the
In another recently proposed research [97], the authors severity of facial palsy disease for a given face image. Deeply
proposed a deeply learned pose-invariant image analysis al- learned features from a face image are then used to grade the
gorithm with applications in 3D face recognition. The face facial palsy into one of the five grades according to
alignment in the proposed methodology was accomplished benchmark definitions. A matching-scores space-based face
using a nose-tip heuristic-based pose learning approach fol- recognition scheme has been presented in [102]. Local,
lowed by a coarse-to-fine alignment algorithm. The feature global, and densely sampled asymmetric face features have
extraction module is employed through a deep-learning al- been used to build a matching-scores space. A probe face
gorithm using AlexNet. The classification is performed using image can be recognized based on the matching scores in the
AlexNet and SVM in separate experiments employing Gav- proposed space. The study is very significant to analyze the
abDB, Bosphorus, UMB-DB, and FRGC v2.0 3D face databases. impact of age on facial asymmetry.
In [98], a hybrid model to age-invariant face recognition The role of facial asymmetry-based age group estimation
has been presented. Specifically, face images are represented in recognizing face images across temporal variations has
by generative and discriminative models. Deep networks are been studied in [103]. First, the age group of a probe face
then used to extract discriminative features. The deeply image is estimated using facial asymmetry. The information
learned generative and discriminative matching scores are learned from the age group estimation is then used to
then fused to get final recognition accuracies. The approach recognize face image across aging variations more
is suitable to recognize face images across a variety of effectively.
challenging datasets such as MORPH and FG-Net. In [104], data augmentation has been effectively used to
In [99], demographic traits including age group, gen- recognize face images across makeup variations. The authors
der, and race have been used to enhance the recognition used six celebrity-famous makeup styles to augment the face
accuracies of face images across challenging aging varia- datasets. The augmented datasets are then used to train a
tions. First, the convolutional neural networks are used to deep network. Face recognition experiments show the ef-
extract age-, gender-, and race-specific face features. These fectiveness of the proposed approach to recognize face
features in conjunction with deeply learned features are images across artificial makeup variations across a variety of
used to recognize and retrieve face images. The experi- challenging datasets. More recently, the impact of asym-
mental results suggest that recognition and retrieval rates metric left and asymmetric right face images on accurate age
can be enhanced significantly by demographic-assisted face estimation has been studied in [105]. The study analyses
features. how accurate the age estimation is influenced by the left
Mathematical Problems in Engineering 17

and right half-face images. The extensive experimental 11.1. Precision and Recall. Precision (P) and recall (R) are
results suggest that asymmetric right face images can be commonly used for performance evaluation of CBIR research.
used to estimate the exact age of a probe face image more Precision is the ratio of the number of relevant images within
accurately. the first k results to the total number of images that are
3D face recognition is an active area of research and retrieved and is expressed as follows: precision (P) is
underpins numerous applications [94–97]. However, it is a equivalent to the ratio of relevant images retrieved to the total
challenging problem due to the complex nature of the number of images retrieved (NTR ):
facial manifold. The existing methods based on holistic, tp tp
local, and hybrid features show competitive performance P� � , (1)
NTR tp + fp
but are still short of what is needed [94–97]. Alignment of
facial surfaces is another key step to obtain state-of-the-art where tp refers to the relevant images retrieved and fp refers
performance. Novel and accurate alignment algorithms to the false positive, i.e., the images misclassified as relevant
may further enhance face recognition accuracies. On the images.
other hand, deep-learning algorithms successfully
employed in various image processing applications are
needed to be explored to improve 3D face recognition 11.2. Recall. Recall (R) is stated as the ratio of relevant
performance. images retrieved to the number of relevant images in the
In the above-presented studies [98–105], handcrafted database:
and deeply learned face features have been introduced for tp tp
R� � , (2)
robust face recognition. The experimental results suggest NRI tp + fn
that deeply learned face features can surpass the perfor-
mance of handcrafted features. The results have been re- where tp refers to the relevant images retrieved, NRI refers to
ported on aging datasets such as MORPH, FG-Net, CACD, the number of relevant images in the database. NRI is ob-
and FERET. In future, the presented studies can be extended tained as tp + fn, where fn refers to the false negative, i.e., the
to analyze the impact of deeply learned densely sampled images that actually belonged to the relevant class, but
features on face recognition performance. Moreover, new misclassified as belonging to some other class.
datasets such as LAP-1 and LAP-2 can also be used for face
recognition and age estimation. 11.3. F-Measure. It is the harmonic mean of P and R; the
higher F-measure values indicate better predictive power:
P·R
10. Distance Measures F�2 , (3)
P+R
Different distance measures are applied on the feature
where P and R refer to precision and recall, respectively.
vectors to compute the similarity among the query images
and the images placed in the archive. The distance
measure is selected according to the structure of the 11.4. Average Precision. The average precision (AP) for a
feature vector and it indicates the similarity. The effective single query k is obtained by taking the mean over the
image retrieval is dependent on the type of applied precision values at each relevant image:
similarity as it matches the object regions, background, 􏽐​ NRI
k�1 (P(k) × R(k)) (4)
and objects in the image. According to the literature [76], AP � .
it is a challenging task to find the adequate and robust NRI
distance measure. A detailed summary of the popular
distance measures that are commonly used in CBIR is
referred to the article [76]. Figure 11 represents the 11.5. Mean Average Precision. For a set of queries S, the
concept of top-5 to top-25 image retrieval results on the mean average precision (MAP) is the mean of AP values for
basis of search by query image. each query and is given by
􏽐​ Sq�1 AP(q)
MAP � , (5)
11. Performance Evaluation Criteria S
where S is the number of queries.
There are various performance evaluation criteria for CBIR
and they are handled in a predefined standard. It is im-
portant to mention here that there is no single standard 11.6. Precision-Recall Curve. Rank-based retrieval systems
rule/criterion to evaluate the CBIR performance. There are display appropriate sets of top-k retrieved images. The P and
set of some common measures that are reported in the R values for each set are demonstrated graphically by the
literature. The selection of any measure among the criteria PRcurve. The PRcurve shows the trade-off between P and R
mentioned below depends on the application domain, user under different thresholds.
requirement, and the nature of the algorithm itself. The Many other evaluation measures have also been pro-
following performance evaluation criteria are commonly posed in the literature as averaged normalized modified
used. retrieval rank (ANMRR) [106]. It has been applied for
18 Mathematical Problems in Engineering

Figure 11: Example of top-5 to top-25 image retrieval results on the basis of search by query image.

MPEG-7 color experiments. ANMRR produces results in the Conflicts of Interest


range [0-1], where smaller values indicate better perfor-
mance. Mean normalized retrieval order (MNRO) pro- The authors declare no conflicts of interest.
posed by Chatzichristofis et al. [107] used a metric to
represent the scaled-up behavior of the system without bias References
for top-k retrievals. For more details on performance
evaluation metrics, the readers are referred to the article [1] D. Zhang, M. M. Islam, and G. Lu, “A review on automatic
[76]. image annotation techniques,” Pattern Recognition, vol. 45,
no. 1, pp. 346–362, 2012.
[2] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of content-
12. Conclusion and Future Directions based image retrieval with high-level semantics,” Pattern
Recognition, vol. 40, no. 1, pp. 262–282, 2007.
We have presented a comprehensive literature review on [3] T. Khalil, M. U. Akram, H. Raja, A. Jameel, and I. Basit,
different techniques for CBIR and image representation. “Detection of glaucoma using cup to disc ratio from spectral
The main focus of this study is to present an overview of domain optical coherence tomography images,” IEEE Access,
vol. 6, pp. 4560–4576, 2018.
different techniques that are applied in different research
[4] S. Yang, L. Li, S. Wang, W. Zhang, Q. Huang, and Q. Tian,
models since the last 12–15 years. After this review, it is “SkeletonNet: a hybrid network with a skeleton-embedding
summarized that image features representation is done by process for multi-view image representation learning,” IEEE
the use of low-level visual features such as color, texture, Transactions on Multimedia, vol. 1, no. 1, 2019.
spatial layout, and shape. Due to diversity in image [5] W. Zhao, L. Yan, and Y. Zhang, “Geometric-constrained
datasets, or nonhomogeneous image properties, they multi-view image matching method based on semi-global
cannot be represented by using single feature represen- optimization,” Geo-Spatial Information Science, vol. 21, no. 2,
tation. One of the solutions to increase the performance of pp. 115–126, 2018.
CBIR and image representation is to use low-level features [6] W. Zhou, H. Li, and Q. Tian, “Recent advance in content-
in fusion. The semantic gap can be reduced by using the based image retrieval: a literature survey,” 2017, https://arxiv.
fusion of different local features as they represent the org/abs/1706.06064.
image in the form of patches and the performance is [7] A. Amelio, “A new axiomatic methodology for the image
similarity,” Applied Soft Computing, vol. 81, p. 105474, 2019.
enhanced while using the fusion of local features. The
[8] C. Celik and H. S. Bilge, “Content based image retrieval with
combination of local and global features is also one of the
sparse representations and local feature descriptors: a
directions for future research in this area. Previous re- comparative study,” Pattern Recognition, vol. 68, pp. 1–13,
search for CBIR and image representation is with tradi- 2017.
tional machine learning approaches that have shown good [9] T. Khalil, M. Usman Akram, S. Khalid, and A. Jameel,
result in various domains. The optimization of feature “Improved automated detection of glaucoma from fundus
representation in terms of feature dimensions can provide image using hybrid structural and textural features,” IET
a strong framework for the learning of classification-based Image Processing, vol. 11, no. 9, pp. 693–700, 2017.
model and it will not face the problems like overfitting. The [10] L. Amelio, R. Janković, and A. Amelio, “A new dissimilarity
recent research for CBIR is shifted to the use of deep neural measure for clustering with application to dermoscopic
networks and they have shown good results on many images,” in Proceedings of the 2018 9th International Con-
datasets and outperformed handcrafted features subject to ference on Information, Intelligence, Systems and Applications
the condition of fine-tuning of the network. The large-scale (IISA), pp. 1–8, IEEE, Zakynthos, Greece, July 2018.
[11] S. Susan, P. Agrawal, M. Mittal, and S. Bansal, “New shape
image datasets and high computational machines are the
descriptor in the context of edge continuity,” CAAI Trans-
main requirements for any deep network. It is a difficult
actions on Intelligence Technology, vol. 4, no. 2, pp. 101–109,
and time-consuming task to manage a large-scale image 2019.
dataset for supervised training of a deep network. [12] L. Piras and G. Giacinto, “Information fusion in content
Therefore, the performance evaluation of a deep network based image retrieval: a comprehensive overview,” In-
on a large-scale unlabeled dataset in unsupervised learning formation Fusion, vol. 37, pp. 50–60, 2017.
mode is also one of the possible future research directions [13] L. Amelio and A. Amelio, “Classification methods in image
in this area. analysis with a special focus on medical analytics,” in
Mathematical Problems in Engineering 19

Machine Learning Paradigms, pp. 31–69, Springer, Basel, [29] Y. Liu, D. Zhang, and G. Lu, “Region-based image retrieval
Switzerland, 2019. with high-level semantics using decision tree learning,”
[14] D. Ping Tian, “A review on image feature extraction and Pattern Recognition, vol. 41, no. 8, pp. 2554–2570, 2008.
representation techniques,” International Journal of Multi- [30] M. M. Islam, D. Zhang, and G. Lu, “Automatic categori-
media and Ubiquitous Engineering, vol. 8, no. 4, pp. 385–396, zation of image regions using dominant color based vector
2013. quantization,” in Proceedings of the Digital Image Comput-
[15] D. Zhang and G. Lu, “Review of shape representation and ing: Techniques and Applications, pp. 191–198, IEEE, Can-
description techniques,” Pattern Recognition, vol. 37, no. 1, berra, Australia, December 2008.
pp. 1–19, 2004. [31] Z. Jiexian, L. Xiupeng, and F. Yu, “Multiscale distance co-
[16] R. Datta, J. Li, and J. Z. Wang, “Content-based image re- herence vector algorithm for content-based image retrieval,”
trieval: approaches and trends of the new age,” in Proceedings The Scientific World Journal, vol. 2014, Article ID 615973,
of the 7th ACM SIGMM International Workshop on Mul- 13 pages, 2014.
timedia Information Retrieval, pp. 253–262, ACM, Singa- [32] G. Papakostas, D. Koulouriotis, and V. Tourassis, “Feature
pore, November 2005. extraction based on wavelet moments and moment in-
[17] Z. Yu and W. Wang, “Learning DALTS for cross-modal variants in machine vision systems,” in Human-Centric
retrieval,” CAAI Transactions on Intelligence Technology, Machine Vision, InTech, London, UK, 2012.
vol. 4, no. 1, pp. 9–16, 2019. [33] G.-H. Liu, Z.-Y. Li, L. Zhang, and Y. Xu, “Image retrieval
[18] N. Ali, D. A. Mazhar, Z. Iqbal, R. Ashraf, J. Ahmed, and based on micro-structure descriptor,” Pattern Recognition,
F. Zeeshan, “Content-based image retrieval based on late vol. 44, no. 9, pp. 2123–2133, 2011.
fusion of binary and local descriptors,” International Journal [34] X.-Y. Wang, Z.-F. Chen, and J.-J. Yun, “An effective method
of Computer Science and Information Security (IJCSIS), for color image retrieval based on texture,” Computer
vol. 14, no. 11, 2016. Standards & Interfaces, vol. 34, no. 1, pp. 31–35, 2012.
[19] N. Ali, Image Retrieval Using Visual Image Features and [35] R. Ashraf, K. Bashir, A. Irtaza, and M. Mahmood, “Content
Automatic Image Annotation, University of Engineering and based image retrieval using embedded neural networks with
Technology, Taxila, Pakistan, 2016. bandletized regions,” Entropy, vol. 17, no. 6, pp. 3552–3580,
[20] B. Zafar, R. Ashraf, N. Ali et al., “Intelligent image classi- 2015.
fication-based on spatial weighted histograms of concentric [36] A. Irtaza and M. A. Jaffar, “Categorical image retrieval
circles,” Computer Science and Information Systems, vol. 15, through genetically optimized support vector machines
no. 3, pp. 615–633, 2018. (GOSVM) and hybrid texture features,” Signal, Image and
[21] G. Qi, H. Wang, M. Haner, C. Weng, S. Chen, and Z. Zhu, Video Processing, vol. 9, no. 7, pp. 1503–1519, 2015.
“Convolutional neural network based detection and judge- [37] C. C. Chang and C. J. Lin, “LIBSVM: a library for support
ment of environmental obstacle in vehicle operation,” CAAI vector machines,” ACM Transactions on Intelligent Systems
Transactions on Intelligence Technology, vol. 4, no. 2, and Technology (TIST), vol. 2, no. 3, pp. 1–27, 2011.
pp. 80–91, 2019. [38] S. Fadaei, R. Amirfattahi, and M. R. Ahmadzadeh, “Local
[22] U. Markowska-Kaczmar and H. Kwaśnicka, “Deep learning––a derivative radial patterns: a new texture descriptor for
new era in bridging the semantic gap,” in Bridging the Semantic content-based image retrieval,” Signal Processing, vol. 137,
Gap in Image and Video Analysis, pp. 123–159, Springer, Basel, pp. 274–286, 2017.
Switzerland, 2018. [39] X. Wang and Z. Wang, “A novel method for image retrieval
[23] F. Riaz, S. Jabbar, M. Sajid, M. Ahmad, K. Naseer, and N. Ali, based on structure elements’ descriptor,” Journal of Visual
“A collision avoidance scheme for autonomous vehicles Communication and Image Representation, vol. 24, no. 1,
inspired by human social norms,” Computers & Electrical pp. 63–74, 2013.
Engineering, vol. 69, pp. 690–704, 2018. [40] N.-E. Lasmar and Y. Berthoumieu, “Gaussian copula mul-
[24] H. Shao, Y. Wu, W. Cui, and J. Zhang, “Image retrieval based tivariate modeling for texture image retrieval using wavelet
on MPEG-7 dominant color descriptor,” in Proceedings of transforms,” IEEE Transactions on Image Processing, vol. 23,
the 9th International Conference for Young Computer Sci- no. 5, pp. 2246–2261, 2014.
entists ICYCS 2008, pp. 753–757, IEEE, Hunan, China, [41] Z. Hong and Q. Jiang, “Hybrid content-based trademark
November 2008. retrieval using region and contour features,” in Proceedings
[25] X. Duanmu, “Image retrieval using color moment invariant,” of the 22nd International Conference on Advanced In-
in Proceedings of the 2010 Seventh International Conference formation Networking and Applications-Workshops AINAW
on Information Technology: New Generations (ITNG), 2008, pp. 1163–1168, IEEE, Okinawa, Japan, March 2008.
pp. 200–203, IEEE, Las Vegas, NV, USA, April 2010. [42] N. Ali, K. B. Bajwa, R. Sablatnig et al., “A novel image re-
[26] X.-Y. Wang, B.-B. Zhang, and H.-Y. Yang, “Content-based trieval based on visual words integration of SIFT and SURF,”
image retrieval by integrating color and texture features,” PLoS One, vol. 11, no. 6, Article ID e0157428, 2016.
Multimedia Tools and Applications, vol. 68, no. 3, pp. 545– [43] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of
569, 2014. features: spatial pyramid matching for recognizing natural
[27] H. Zhang, Z. Dong, and H. Shu, “Object recognition by a scene categories,” in Proceedings of the 2006 IEEE Computer
complete set of pseudo-Zernike moment invariants,” in Society Conference on Computer Vision and Pattern Recog-
Proceedings of the 2010 IEEE International Conference on nition—Volume 2 (CVPR’06), pp. 2169–2178, IEEE, New
Acoustics Speech and Signal Processing (ICASSP), pp. 930– York, NY, USA, June 2006.
933, IEEE, Dallas, TX, USA, March 2010. [44] Z. Mehmood, S. M. Anwar, N. Ali, H. A. Habib, and
[28] J. M. Guo, H. Prasetyo, and J. H. Chen, “Content-based M. Rashid, “A novel image retrieval based on a combination
image retrieval using error diffusion block truncation coding of local and global histograms of visual words,” Mathe-
features,” IEEE Transactions on Circuits and Systems for matical Problems in Engineering, vol. 2016, Article ID
Video Technology, vol. 25, no. 3, pp. 466–481, 2015. 8217250, 12 pages, 2016.
20 Mathematical Problems in Engineering

[45] M. Naeem, R. Ashraf, N. Ali, M. Ahmad, and M. A. Habib, [61] C. Li, Y. Huang, and L. Zhu, “Color texture image retrieval
“Bottom up approach for better requirements elicitation,” in based on Gaussian copula models of Gabor wavelets,”
Proceedings of the International Conference on Future Net- Pattern Recognition, vol. 64, pp. 118–129, 2017.
works and Distributed Systems, p. 60, ACM, Cambridge, UK, [62] H. H. Bu, N. Kim, C. J. Moon, and J. H. Kim, “Content-based
July 2017. image retrieval using combined color and texture features
[46] B. Zafar, R. Ashraf, N. Ali et al., “A novel discriminating and extracted by multi-resolution multi-direction filtering,”
relative global spatial image representation with applications Journal of Information Processing Systems, vol. 13, no. 3,
in CBIR,” Applied Sciences, vol. 8, no. 11, p. 2242, 2018. pp. 464–475, 2017.
[47] N. Ali, B. Zafar, F. Riaz et al., “A hybrid geometric spatial [63] A. Nazir, R. Ashraf, T. Hamdani, and N. Ali, “Content
image representation for scene classification,” PLoS One, based image retrieval system by using HSV color histo-
vol. 13, no. 9, Article ID e0203339, 2018. gram, discrete wavelet transform and edge histogram
[48] B. Zafar, R. Ashraf, N. Ali, M. Ahmed, S. Jabbar, and descriptor,” in Proceedings of the 2018 International
S. A. Chatzichristofis, “Image classification by addition of Conference on Computing, Mathematics and Engineering
spatial information based on histograms of orthogonal Technologies (iCoMET), pp. 1–6, IEEE, Sukkur, Pakistan,
vectors,” PLoS One, vol. 13, no. 6, Article ID e0198175, 2018. March 2018.
[49] N. Ali, K. B. Bajwa, R. Sablatnig, and Z. Mehmood, “Image [64] L.-W. Kang, C.-Y. Hsu, H.-W. Chen, C.-S. Lu, C.-Y. Lin, and
retrieval by addition of spatial information based on his- S.-C. Pei, “Feature-based sparse representation for image
tograms of triangular regions,” Computers & Electrical En- similarity assessment,” IEEE Transactions on Multimedia,
gineering, vol. 54, pp. 539–550, 2016. vol. 13, no. 5, pp. 1019–1030, 2011.
[50] R. Khan, C. Barat, D. Muselet, and C. Ducottet, “Spatial [65] Z.-Q. Zhao, H. Glotin, Z. Xie, J. Gao, and X. Wu, “Co-
orientations of visual word pairs to improve bag-of-visual- operative sparse representation in two opposite directions
words model,” in Proceedings of the British Machine Vision for semi-supervised image annotation,” IEEE Transactions
Conference, pp. 89–91, BMVA Press, Surrey, UK, September on Image Processing, vol. 21, no. 9, pp. 4218–4231, 2012.
2012. [66] J. J. Thiagarajan, K. N. Ramamurthy, P. Sattigeri, and
[51] H. Anwar, S. Zambanini, and M. Kampel, “A rotation-in- A. Spanias, “Supervised local sparse coding of sub-image
variant bag of visual words model for symbols based ancient
features for image retrieval,” in Proceedings of the 2012 19th
coin classification,” in Proceedings of the 2014 IEEE In-
IEEE International Conference on Image Processing (ICIP),
ternational Conference on Image Processing (ICIP),
pp. 3117–3120, IEEE, Melbourne, Australia, September-
pp. 5257–5261, IEEE, Paris, France, October 2014.
October 2012.
[52] H. Anwar, S. Zambanini, and M. Kampel, “Efficient scale-
[67] C. Hong and J. Zhu, “Hypergraph-based multi-example
and rotation-invariant encoding of visual words for image
ranking with sparse representation for transductive learning
classification,” IEEE Signal Processing Letters, vol. 22, no. 10,
image retrieval,” Neurocomputing, vol. 101, pp. 94–103, 2013.
pp. 1762–1765, 2015.
[68] D. Wang, S. C. Hoi, Y. He, J. Zhu, T. Mei, and J. Luo,
[53] R. Khan, C. Barat, D. Muselet, and C. Ducottet, “Spatial
“Retrieval-based face annotation by weak label regularized
histograms of soft pairwise similar patches to improve the
bag-of-visual-words model,” Computer Vision and Image local coordinate coding,” IEEE Transactions on Pattern
Understanding, vol. 132, pp. 102–112, 2015. Analysis and Machine Intelligence, vol. 36, no. 3, pp. 550–563,
[54] N. Ali, B. Zafar, M. K. Iqbal et al., “Modeling global geo- 2014.
metric spatial information for rotation invariant classifica- [69] M. Srinivas, R. R. Naidu, C. S. Sastry, and C. K. Mohan,
tion of satellite images,” PLos One, vol. 14, no. 7, Article ID “Content based medical image retrieval using dictionary
e0219833, 2019. learning,” Neurocomputing, vol. 168, pp. 880–895, 2015.
[55] R. Ashraf, M. Ahmed, S. Jabbar et al., “Content based image [70] S. Mohamadzadeh and H. Farsi, “Content-based image re-
retrieval by using color descriptor and discrete wavelet trieval system via sparse representation,” IET Computer
transform,” Journal of Medical Systems, vol. 42, no. 3, p. 44, Vision, vol. 10, no. 1, pp. 95–102, 2016.
2018. [71] Q. Li, Y. Han, and J. Dang, “Sketch4Image: a novel
[56] R. Ashraf, M. Ahmed, U. Ahmad, M. A. Habib, S. Jabbar, and framework for sketch-based image retrieval based on
K. Naseer, “MDCBIR-MF: multimedia data for content- product quantization with coding residuals,” Multimedia
based image retrieval by using multiple features,” Multi- Tools and Applications, vol. 75, no. 5, pp. 2419–2434, 2016.
media Tools and Applications, pp. 1–27, 2018. [72] H. Wu, R. Bie, J. Guo, X. Meng, and S. Wang, “Sparse coding
[57] Y. Mistry, D. Ingole, and M. Ingole, “Content based image based few learning instances for image retrieval,” Multimedia
retrieval using hybrid features and various distance metric,” Tools and Applications, vol. 78, no. 5, pp. 6033–6047, 2018.
Journal of Electrical Systems and Information Technology, [73] Y. Duan, J. Lu, J. Feng, and J. Zhou, “Context-aware local
vol. 5, no. 3, pp. 878–888, 2017. binary feature learning for face recognition,” IEEE Trans-
[58] K. T. Ahmed, M. A. Iqbal, and A. Iqbal, “Content based actions on Pattern Analysis and Machine Intelligence, vol. 40,
image retrieval using image features information fusion,” no. 5, pp. 1139–1153, 2018.
Information Fusion, vol. 51, pp. 76–99, 2018. [74] J. Li and J. Z. Wang, “Real-time computerized annotation of
[59] P. Liu, J.-M. Guo, K. Chamnongthai, and H. Prasetyo, pictures,” in Proceedings of the 14th ACM International
“Fusion of color histogram and LBP-based features for Conference on Multimedia, pp. 911–920, ACM, Santa Bar-
texture image retrieval and classification,” Information Sci- bara, CA, USA, October 2006.
ences, vol. 390, pp. 95–111, 2017. [75] G. Griffin, A. Holub, and P. Perona, Caltech-256 Object Cat-
[60] W. Zhou, H. Li, J. Sun, and Q. Tian, “Collaborative index egory Dataset, California Institute of Technology, Pasadena,
embedding for image retrieval,” IEEE Transactions on Pat- CA, USA, 2007, https://authors.library.caltech.edu/7694/.
tern Analysis and Machine Intelligence, vol. 40, no. 5, [76] A. Alzu’bi, A. Amira, and N. Ramzan, “Semantic content-
pp. 1154–1166, 2018. based image retrieval: a comprehensive study,” Journal of
Mathematical Problems in Engineering 21

Visual Communication and Image Representation, vol. 32, [93] C. Zhang, J. Cheng, and Q. Tian, “Multiview semantic
pp. 20–54, 2015. representation for visual recognition,” IEEE Transactions on
[77] C. Zhang, J. Cheng, J. Liu, J. Pang, Q. Huang, and Q. Tian, Cybernetics, pp. 1–12, 2018.
“Beyond explicit codebook generation: visual representation [94] N. I. Ratyal, I. A. Taj, U. I. Bajwa, and M. Sajid, “3D face
using implicitly transferred codebooks,” IEEE Transactions recognition based on pose and expression invariant align-
on Image Processing, vol. 24, no. 12, pp. 5777–5788, 2015. ment,” Computers & Electrical Engineering, vol. 46,
[78] C. Zhang, C. Liang, L. Li, J. Liu, Q. Huang, and Q. Tian, pp. 241–255, 2015.
“Fine-grained image classification via low-rank sparse [95] N. Ratyal, I. Taj, U. Bajwa, and M. Sajid, “Pose and expression
coding with general and class-specific codebooks,” IEEE invariant alignment based multi-view 3D face recognition,”
Transactions on Neural Networks and Learning Systems, KSII Transactions on Internet & Information Systems, vol. 12,
vol. 28, no. 7, pp. 1550–1559, 2016. no. 10, 2018.
[79] C. Zhang, J. Cheng, and Q. Tian, “Structured weak semantic [96] N. I. Ratyal, I. A. Taj, M. Sajid, N. Ali, A. Mahmood, and
space construction for visual categorization,” IEEE Trans- S. Razzaq, “Three-dimensional face recognition using vari-
actions on Neural Networks and Learning Systems, vol. 29, ance-based registration and subject-specific descriptors,”
no. 8, pp. 3442–3451, 2017. International Journal of Advanced Robotic Systems, vol. 16,
[80] C. Zhang, J. Cheng, and Q. Tian, “Semantically modeling of no. 3, article 1729881419851716, 2019.
object and context for categorization,” IEEE Transactions on [97] N. Ratyal, I. A. Taj, M. Sajid et al., “Deeply learned pose
invariant image analysis with applications in 3D face rec-
Neural Networks and Learning Systems, vol. 30, no. 4,
ognition,” Mathematical Problems in Engineering, vol. 2019,
pp. 1013–1024, 2018.
Article ID 3547416, 21 pages, 2019.
[81] C. Zhang, J. Cheng, and Q. Tian, “Unsupervised and semi-
[98] M. Sajid and T. Shafique, “Hybrid generative–discriminative
supervised image classification with weak semantic consis-
approach to age-invariant face recognition,” Journal of
tency,” IEEE Transactions on Multimedia, 2019.
Electronic Imaging, vol. 27, no. 2, article 023029, 2018.
[82] N. Kondylidis, M. Tzelepi, and A. Tefas, “Exploiting tf-idf in
[99] M. Sajid, T. Shafique, S. Manzoor et al., “Demographic-
deep convolutional neural networks for content based image assisted age-invariant face recognition and retrieval,” Sym-
retrieval,” Multimedia Tools and Applications, vol. 77, no. 23, metry, vol. 10, no. 5, p. 148, 2018.
pp. 30729–30748, 2018. [100] M. Sajid, T. Shafique, I. Riaz et al., “Facial asymmetry-based
[83] X. Shi, M. Sapkota, F. Xing, F. Liu, L. Cui, and L. Yang, anthropometric differences between gender and ethnicity,”
“Pairwise based deep ranking hashing for histopathology Symmetry, vol. 10, no. 7, p. 232, 2018.
image classification and retrieval,” Pattern Recognition, [101] M. Sajid, T. Shafique, M. Baig, I. Riaz, S. Amin, and
vol. 81, pp. 14–22, 2018. S. Manzoor, “Automatic grading of palsy using asymmetrical
[84] L. Zhu, J. Shen, L. Xie, and Z. Cheng, “Unsupervised visual facial features: a study complemented by new solutions,”
hashing with semantic assistant for content-based image Symmetry, vol. 10, no. 7, p. 242, 2018.
retrieval,” IEEE Transactions on Knowledge and Data En- [102] M. Sajid, I. A. Taj, U. I. Bajwa, and N. I. Ratyal, “The role of
gineering, vol. 29, no. 2, pp. 472–486, 2017. facial asymmetry in recognizing age-separated face images,”
[85] A. Alzu’bi, A. Amira, and N. Ramzan, “Content-based image Computers & Electrical Engineering, vol. 54, pp. 255–270,
retrieval with compact deep convolutional features,” Neu- 2016.
rocomputing, vol. 249, pp. 95–105, 2017. [103] M. Sajid, I. A. Taj, U. I. Bajwa, and N. I. Ratyal, “Facial
[86] H.-F. Yang, K. Lin, and C.-S. Chen, “Supervised learning of asymmetry-based age group estimation: role in recognizing
semantics-preserving hash via deep convolutional neural age-separated face images,” Journal of Forensic Sciences,
networks,” IEEE Transactions on Pattern Analysis and Ma- vol. 63, no. 6, pp. 1727–1749, 2018.
chine Intelligence, vol. 40, no. 2, pp. 437–451, 2018. [104] M. Sajid, N. Ali, S. H. Dar et al., “Data augmentation-assisted
[87] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet makeup-invariant face recognition,” Mathematical Problems
classification with deep convolutional neural networks,” in Engineering, vol. 2018, Article ID 2850632, 10 pages, 2018.
Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2012. [105] M. Sajid, N. Iqbal Ratyal, N. Ali et al., “The impact of
[88] Y. Sun, X. Wang, and X. Tang, “Deep learning face repre- asymmetric left and asymmetric right face images on ac-
sentation from predicting 10,000 classes,” in Proceedings of curate age estimation,” Mathematical Problems in Engi-
the IEEE Conference on Computer Vision and Pattern Rec- neering, vol. 2019, Article ID 8041413, 10 pages, 2019.
ognition, pp. 1891–1898, Columbus, OH, USA, June 2014. [106] B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to
[89] A. Karpathy and L. Fei-Fei, “Deep visual-semantic align- MPEG-7: Multimedia Content Description Interface, John
ments for generating image descriptions,” in Proceedings of Wiley & Sons, Hoboken, NJ, USA, 2002.
the IEEE Conference on Computer Vision and Pattern Rec- [107] S. A. Chatzichristofis, C. Iakovidou, Y. S. Boutalis, and
ognition, pp. 3128–3137, Boston, MA, USA, June 2015. E. Angelopoulou, “Mean normalized retrieval order
[90] Z. Li, J. Tang, and T. Mei, “Deep collaborative embedding for (MNRO): a new content-based image retrieval performance
social image understanding,” IEEE Transactions on Pattern measure,” Multimedia Tools and Applications, vol. 70, no. 3,
Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2070– pp. 1767–1798, 2014.
2083, 2018.
[91] C. Zhang, J. Cheng, and Q. Tian, “Multiview label sharing for
visual representations and classifications,” IEEE Transactions
on Multimedia, vol. 20, no. 4, pp. 903–913, 2018.
[92] C. Zhang, J. Cheng, and Q. Tian, “Multiview, few-labeled
object categorization by predicting labels with view con-
sistency,” IEEE Transactions on Cybernetics, vol. 49, no. 11,
pp. 3834–3843, 2019.
Advances in Advances in Journal of The Scientific Journal of
Operations Research
Hindawi
Decision Sciences
Hindawi
Applied Mathematics
Hindawi
World Journal
Hindawi Publishing Corporation
Probability and Statistics
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www.hindawi.com
www.hindawi.com Volume 2018
2013 www.hindawi.com Volume 2018

International
Journal of
Mathematics and
Mathematical
Sciences

Journal of

Hindawi
Optimization
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Submit your manuscripts at


www.hindawi.com

International Journal of
Engineering International Journal of
Mathematics
Hindawi
Analysis
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal of Advances in Mathematical Problems International Journal of Discrete Dynamics in


Complex Analysis
Hindawi
Numerical Analysis
Hindawi
in Engineering
Hindawi
Differential Equations
Hindawi
Nature and Society
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

International Journal of Journal of Journal of Abstract and Advances in


Stochastic Analysis
Hindawi
Mathematics
Hindawi
Function Spaces
Hindawi
Applied Analysis
Hindawi
Mathematical Physics
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

You might also like