Accepted Manuscript: Information Fusion
Accepted Manuscript: Information Fusion
PII: S1566-2535(17)30789-3
DOI: https://doi.org/10.1016/j.inffus.2018.11.004
Reference: INFFUS 1042
Please cite this article as: Khawaja Tehseen Ahmed , Shahida , Muhammad Amjad Iqbal , Content
Based Image Retrieval using Image Features Information Fusion, Information Fusion (2018), doi:
https://doi.org/10.1016/j.inffus.2018.11.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights
Information fusion for the spatial color information with shaped and object features
Image retrieval based on their primitive and spatial features
Competitive image retrieval against state-of-the-art descriptors and benchmarks
T
IP
CR
US
AN
M
ED
PT
CE
AC
1
ACCEPTED MANUSCRIPT
Abstract— Recent image retrieval techniques are focusing on multiple image features for the efficient image retrieval. It has been an inevitable
requirement to fetch the images from a variety of semantic groups and datasets. It is vital to retrieve the images based on their primitive features
shape, texture, color and spatial information to cater the versatile image datasets. State-of-the-art detectors and descriptors are capable of finding
the interest points based on their specialty. To encompass the strength of the image features for the information fusion purpose this contribution
T
presents a novel technique to fuse the spatial color information with shaped extracted features and object recognition. For RGB channels L2 spatial
color arrangements are applied and features are extracted, thereby fused with intensity ranged shapes formed by connecting the discovered edges
IP
and corners for the grey level image. Perifoveal receptive field estimation with 128-bit cascade matching with symmetric sampling on the detected
interest points that discovers the potential information for the complex, overlay, foreground and background objects. Firstly the process is
accomplished by reducing the massive features vectors, selecting high variance coefficient and secondly obtaining the indexing and retrieval by
employing a Bag-of-Words approach. Extensive experiments are conducted on ten highly recognized image dataset benchmarks, specialized for
CR
texture, shapes, colors and objects including ImageNet, Caltech-256, Caltech-101, 102-Flower, Corel-10000, 17-Flower, Corel-1000, COIL, ALOT
and FTVL tropical fruits. To check the affectivity and robustness of the proposed method, it is compared with state-of-the-art detectors and
descriptors SIFT, SURF, HOG, LBP, DoG, MSER and RGBLBP. Encouraging results reported that the proposed method has a remarkable
performance in most of the image categories of versatile image datasets and can gain better precision to those of the state-of-the-art detectors and
descriptors.
US
Keywords— Content Based Image Retrieval, interest point detection, image descriptor, features fusion, bag of words, spatial color features,
shape features, objects recognition, features selection, information fusion.
AN
I. INTRODUCTION
Image retrieval is a task of searching similar images of a certain type from the image datasets. In recent years real-life applications of
image retrieval has gained a great interest in the research area. Content-based image retrieval (CBIR) is a widespread technique
gradually applied in retrieval systems [1]. In CBIR, images retrieval is done by using visual characteristics also known as features,
M
extracted from the database. CBIR system’s retrieval accuracy and efficiency rely greatly on the adopted visual feature. Visual
features may describe numerous properties of either low-level features include shape [2], color [3, 4], spatial relationship and texture
[5, 6] or high-level features also called semantic features. Earlier research in CBIR motivate encoding the spatial arrangement of
colors due to the problem of having several diverse images with identical or similar color histograms. Now this problem is being
ED
reconsidered with the visual dictionary model which takes documents as bag-of-words (BoW). This model has numerous important
advantages including compactness and invariance to the image or the scene transformations and is one of the most famous feature
representations in the CBIR framework.
Shape based methods are important to represent the different types of shapes by employing the diverse methods. These methods are
PT
grouped into region-based and counter-based techniques. Region based methods are applied to the general shapes by representing
interior shape information. It uses Zernike moments [2] and geometric moments [7] for the region representation. Contour-based
methods employ shape signature [8], autoregressive models [9] and polygonal approximation [10]. These methods exploit boundary
information which may not be available in some cases and are also crucial to the human perception. It is important to cope up the
CE
geometrical shape description with the object recognition. Properly described image features are capable of representing the object. In
grey scale images, objects of different semantic groups may recognized as same like lemon and ball. It is required to distinguish the
object with their primary color along with their similar shapes. Shape with color together can distinguish the objects more specifically.
Furthermore by adding the spatial color information to the feature vectors can increase the retrieval accuracy. Shape detection for
AC
regions conform better understating of objects. It is more favorable for information retrieval to add the pixel intensities values to
feature vectors. Thus by detecting the corners and edges information using pixel intensities can create most powerful and robust
detectors and descriptors. Grey level change to find the edges and corners also lead to connect these interest points to form the regions.
These connected regions are a great resource of information for the object detection. Hence the feature selection techniques, combined
with primitive features can obtain more accurate results. Moreover structural elements [11], optimized local patterns [12], edge binary
patterns [13], and their combinations [14] reports significant results in content based image retrieval. Color is an important feature that
is distinctly recognized by the human eye. Color features also contain fundamental information for objects visualization. Therefore,
proper color analysis leads to reveal spatial and object information. Shape detection based on edges and corner by incorporating the
color boundaries is a powerful recipe for accurate object detection. Some images contain the same color information at different axes,
2
ACCEPTED MANUSCRIPT
therefore the spatial coefficients possess important object attributes. In case, color and shape mimic the similar object of different
categories, as observed in fruit datasets, the texture information distinguishes the objects. Therefore, color in spatial coefficients, shape
attributes and texture values in integration recognize complex and overlay objects from versatile image groups.
This contribution incorporates the spatial color coefficients along with the shapes feature description and object recognition. The
resultant feature vector carry the accumulated features to identify the shapes and recognize object from versatile image categories
significantly. Novelty of the proposed method is to extract the color features from RGB images and to use the grey level image for the
pixel intensity based local features. Grey level image provides better understanding of the edges and corners to form the shapes.
Robust proposed method is capable of finding the objects based on the detected shapes. Perifoveal receptive field estimation method
returns the best descriptive signatures for the detected objects boundaries. To efficiently process the large datasets, dimension
reduction is performed on feature vectors by selecting the highly variance principal components. Then the compact and robust feature
set data is input to bag-of-words for the quick indexing and retrieval. The proposed method reports significant results on highly
recognized benchmarks, existing research methods and state-of-the-art detectors and descriptors.
The remainder of this paper is organized as follows. Section 2 presents related work and Section 3 explains the proposed
methodology. The experimental results are briefed in Section 4, and the findings are summarized in Section 5.
T
II. RELATED WORK
In the current era, CBIR methods employ local features with Bag-of-Words (BoW) representation. To achieve better accuracy, studies
IP
focused on the primitive features [15] color, texture, spatial information and shape. A comparative study of the extraction of the
features using shape and texture is proposed by [16]. In this approach, Gray Level Co-occurrence Matrix (GLCM) and shape invariant
Hu-moments are used. A comparison is performed by combining shape and texture features. To compute retrieval accuracy precision
CR
and recall methods are used. Shape is a visual cue in the field of the object recognition. On the basis of detection, feature extraction,
vector quantization and classification, binary shapes are classified. A classification framework for the invariant features using BoW
model is presented by [17]. Experimentation is performed on animal shapes dataset using shape classifier. Two methods based on
image descriptors are introduced by [18]. Based on SIFT extraction k-means clustering is performed on feature matrix. To obtain more
US
precision, two types of dimensionality reduction techniques are applied. Experimental results are computed for Caltech-101 and Li
database images. Another work on 3D shape retrieval method is introduced by [19]. In this method, an evaluation is performed among
12 and 6 dissimilar 3D shape retrieval methods. This experimentation is accomplished by comparing twenty-six retrieval methods,
based on common benchmark evaluation. In another technique [20] images are generated with color auto-correlogram, Gabor Wavelet
AN
and Wavelet Transform. At first step the features associated with color information are derived from RGB color space. Secondly the
features with texture information are extracted with proposed feature extraction method. In third step shape based image extraction id
performed using edges and corners detection. An image retrieval approach is proposed by extracting color-texture features named
Color Directional Local Quinary Pattern [21]. RGB channel wise directional edge information is extracted for the reference and
surrounding pixels. Experimentation is performed on Core-5000 and MIT-Color databases. In [22] presented a method that extends
M
and simplify the functionality of MPEG-7 descriptor by using four image extraction methods together to generate interest points. The
experiments are conducted mainly on UCID and UKbench. Objects classification is presented by [23] by using local image features to
generate fuzzy classifiers. Meta learning approach is used to find the local features. The proposed method is experimented on three
ED
classes of PASCAL VOC (Visual Object Classes) dataset. By combining the shape and texture features [24] proposed a technique that
employ exponent moments descriptor to extract shape features and localized angular phase histogram for the texture features. Shape
features are extracted for RGB color space and texture features are extracted for hue saturation intensity (HSI) color space. In [25]
focused the most relevant features by suggesting a features selection technique to maximize the selection with simplification. For this
PT
3D color histogram and Gabor filter are used to extract the color and texture features. Genetic algorithm is applied to reduce the
complexity in the feature space. Experiments are conducted on Corel-1000 dataset. It reports average precision rates for complex
background images. Another attempt to retrieve the images without using features fusion works in the stages [26]. In the first stage a
fixed number of images are retrieved by color features. More relevant images are filtered by applying texture and shape features.
CE
Computation cost is reduced by eliminating the fusion and normalization steps. Experimentation is performed on Corel-1000 dataset
with 0.767 precision rate. It is unable to classify the image based on spatial information. ANN classifier [27] is used with color co-
occurrence matrix (CCM) and difference between pixels of scan pattern (DBPSP) to extract color and texture features. A feature
matching strategy is also presented to compute the similarity value. It reports below average rates for overlay objects images. In
AC
another attempt [28] two image channels are used to capture image properties as opposed to multichannel descriptors. For
performance improvements a hyperopponent color space is proposed by embedding Sobel filter information. This method is unable to
classify the foreground and background objects.
The technique presented in this paper introduces a new method to: 1) finds the images by their primitive features color, shapes and
texture; 2) finds the interest points that potentially represent the objects; 3) finds the feature sets to classify images from multiple
categories; 4) recognizes the complex and cluttered objects for better precision; 5) find the spatial color features to better classify the
images; 6) introduces a novel image retrieval method to effectively index and retrieve images; 7) test the proposed technique on
texture, objects, color and complex background image databases to comparatively check the precision rates; 8) check the robustness
and affectivity compare the proposed method with state-of-the-art detectors and descriptors; 9) presents an efficient image retrieval
method with low computation cost; 10) compares the proposed method with existing research methods to evaluate the performance.
3
ACCEPTED MANUSCRIPT
III. METHODOLOGY
A. Keypoints detection and shape formation
The first step is to find the interest points for the query image. The input image is first converted to gray-level representation (0-255).
The grey level image contains the potential patterns to highlight the keypoint candidates. These keypoints are better represented by
their intensities. Intensity ranging is an interested phenomenon for the threshold estimation that leads to find the corner and edges. We
proposed a novel method that detects the regions based on their intensities which are converted to shapes by connecting these
keypoints. In the next phase the detected interest points are described by Fast Retina Keypoints.
T
IP
CR
US
AN
M
ED
PT
CE
AC
Figure-1: Proposed method to show step by step image retrieval process for input image by features information fusion.
4
ACCEPTED MANUSCRIPT
For a gray-level image thresholds are defined for 0 and 1 ranges such that all values below the threshold are 0 and above are 1; equally
these 0s and 1s are the representative of black and white values. Regions of interest are found by connecting the threshold values in a
series after sorting the pixels by their intensity. Binary sort is applied at this step because the gray-level range is small (0-255) which
produces optimal computational complexity Ο(n). Ascending or descending ordered pixel values are joined to create the regions. The
area of these regions is computed, using union find algorithm. The complexity of the implementation of this algorithm is Ο(n log log
n). The area of each component is stored as an intensity function. Overlapped regions are formed by the insertion of small regions
pixels into the larger one. This algorithm mimics the structure of the watershed algorithm but differs in output. In watershed algorithm
threshold is focused to merge the regions therefore the threshold is a minor concern; however in our approach the threshold is a major
concern.
The connected regions can be well defined if
(1)
where reflexive and antisymmetric and its gray-value lies in the range of 0-255. As being an interest point the returned values are
discrete. Moreover, in this approach four neighborhoods are used. The adjacency relation for this can be defined as
T
(2)
IP
Where and a where such that a and b are adjacent. The condition satisfies if
∑ | | . (3)
CR
For each region there always exist
(4)
∑ (5)
AN
Where for x=1 the value is and for x=n it is . The outer pixel that is adjacent to but not a part of this region
basically represents the outer boundary pixel. The outer boundary can be represented in the form of equation
M
. (6)
In the case, if then if all and then the values of I(a) and I(b)creates the ranges of intensities for the regions.
Boundaries of intensity based regions are potential interest point candidates.
ED
PT
CE
AC
5
ACCEPTED MANUSCRIPT
Gaussian kernels. Log-polar retinal patterns are observed to change the size of the Gaussian kernel for the better performance.
Receptive fields are overlapped to increase the performance. Let consider the receptive fields with intensities Ii such that:
(7)
where has impact in case of the overlapped fields.
T
Figure-3: Shows that each circle as a receptive field with smoothed image using Gaussian kernel. [29]
Overlapped fields yield redundancy which also exist in the real receptive fields of retina. To describe the keypoints difference is
IP
computed for the pair of receptive fields for the corresponding Gaussian kernel. Precisely is a string formed by one bit Difference-
of-Gaussian (DoG) [30] then:
CR
∑ (8)
where
US { (9)
is the smoothed intensity of first receptive field of pair Pa. A combination of receptive field are capable of producing
AN
enormous pairs. However, all the pairs not carry useful information so one approach is to compute their spatial difference. By
applying this highly correlated pairs loss some useful information. Another approach to select appropriate pair is to create a matrix
such that where k=1000 and E denotes extracted keypoints. Each row of represents keypoints exist in retina
sampling pattern with 43 receptive fields. Mean of each column is computed to obtain the desired rate of the variance to find the
M
discriminant features. A mean of 0.5 reports the highest variance. In the next step columns are sorted in ascending order with respect
to the value of variance. Column with low correlation are added to the column with 0.5 mean value. In this pair arrangement a coarse-
to-fine ordering of Difference of Gaussians exist. To evaluate the location of an object perifoveal receptive field are applied. In the
fovea area dense distributed receptive fields are used for the validation process. The FREAK [29] descriptor acts like saccadic
ED
searching in which observer eye move around with discontinuous individual movement. The high density photoreceptor of fovea
captures high-resolution information. It is therefore an important player in object recognition and matching. Low frequency
information is captured perifoveal area. It is low frequency information therefore it contains less detailed information. As a result, the
location of the objects of interest can be easily described.
PT
The saccadic search adopted by the FREAK descriptor is started by searching with first 16 bytes of descriptor. To extract the finer
level information next bytes are used for analysis. The cascade of matching results in further details. However most of the information
is well represented by first 16 bytes. To estimate the rotation of keypoints local gradients are sum over selected pairs. The pairs are
selected with symmetric receptive fields from the center as opposed to global orientation as shown in figure-4.
CE
AC
6
ACCEPTED MANUSCRIPT
∑ ( ) ( ) (10)
| |
Where N are number of pairs in H and is 2D vector. It has spatial coordinates of the center of the receptive fields.
The returned feature vectors contain useful information about the image. These feature vectors are large in size with high dimensions.
To reduce the massive and redundant feature vectors covariant coefficients are produced using principal component analysis. The
patterns in the data are analyzed and compressed to reduce the dimensions without losing data. This process is completed by applying
orthogonal transformation to convert correlated variables into linearly uncorrelated variables. Number of the returned coefficients are
equal to the total number of observations. These limited principal components represent the patterns in data.
In this step the spatial constraints of an image are considered for image ranking. For this estimated transformations are applied to map
the feature locations for query and result images. A color histogram captures only the color distribution and it is unable to represent
T
any spatial information. In our case spatial correlation of color changes is expressed with distance.
Let for an Image I, n colors are quantized as Ci,…,Cn. Color representation for a pixel can be defined as
IP
. (11)
CR
To compute the distance between color pixels and , we define
| | | || | (12)
[ | | ] (14)
Equation 14 shows the real representation of spatial arrangement of color pixels in the image. In the equation denotes the
probability that a color pixel da is l distance away form a given color pixel. The spatial relationship between similar color values is
ED
represented as:
(15)
PT
Equation-14 is derived from the equation-15 where represents the probability of d color pixel with l distance.
In the next stage Bag-of-Words (BoW) model is adopted for the fast image indexing and retrieval. In BoW model each image is
represented by a single linear vector. One of strength of BoW model is that it employs the powers of local feature descriptor such as
SIFT (Scale Invariant Features Transform) [31]. Secondly single vectored BoW comparison is performed by using dissimilarity score
AC
that is easily manipulated. Moreover sparse space representation of large dimensional data results in fast indexing and searching. The
local feature descriptor SIFT represents the patches as numerical vectors. SIFT creates a collection of equal size dimensional vector
which is 128 for SIFT. These local features are then clustered, whose center represents the codewords. The occurrence count of each
codeword (also called visual word) is represented in the form of histogram for the each image. Based on histograms, an inverted
image index is generated for the efficient image retrieval. Each index represents one visual word. A list of image identity is also
created to map the terms with images. Depending upon the query image, a set of images with similar visual words are indexed by
applying union of lists. Finally the image ranking is performed by counting the number of visual words shared between query image
and indexed images. A high number of shared words for an image brings its rank to top. BoW is unable to capture the co-occurrence,
7
ACCEPTED MANUSCRIPT
location and spatial information for the visual words. Our spatial color extraction method embeds the spatial information in feature
vectors at the time of feature extraction which results in more relevant retrieved images.
IV. EXPERIMENTATION
A. Datasets
The accuracy and effectiveness of an image retrieval system is reflected by the selection of suitable image databases. Some datasets
are tailored depending on the project nature. Some contributions are domain oriented. Consequently it is difficult to inter-compare the
results of different databases. The precision is mapped by the image attributes like shape, object, size, features overlapping, and
occlusion [43].
In our experimentation phase widely used databases are selected depending upon their versatile image categories, generic CBIR usage,
object occlusion, spatial color and object information. The experiments performed on 10 standardized benchmarks ImageNet [32],
Corel-1000 [33], Corel-10,000 [34], 17-Flowers [35], 102-Flowers [36], FTVL [37], COIL-100 [38], Caltech-101 [39], Caltech-256
[40] and ALOT [41].
T
In the experiments, each image is considered as a potential candidate for query image. For each query image, top 20 retrieved images
are displayed based on indexing technique used by bags-of-words method. The resultant images are considered as correctly classified
IP
if those belong to their respective class.
Table-1: Image database summary table showing the number of classes in each dataset, number of images in each category, total
CR
number of images in each category, and image size for each dataset.
Corel-1000
No. of
10
Images Image
Classes in Each
Class
100
s
(Total)
1,000
Image
Size
348 × 256
US
AN
[33]
Corel-10,000 100 100 10,000 128 × 85
[34]
17-Flowers 17 80 1,360 Vary
[35]
M
768
100,000 Vary 14,197 Vary
ImageNet [32]
,122
COIL-100 [38] 100 72 7,200 128 × 128
Caltech-101 102 Vary 9,146 300 × 200
PT
[39]
Caltech-256 257 Vary 30,607 Vary
[40]
CE
1) ImageNet Synset
AC
The ImageNet [32] is a large-scale image database. It is used for indexing and retrieval the complex and multi-category images, It
contains more than 100k dataset (i.e. called synsets) which can be labeled by 80,000+ nouns. The repository contains an enormous
collection of more than 1.4 million high-resolution images. In our case the experimentation is performed on 15 synsets including aeria,
car, cherry-tomato, coffee cup, dish, dust-bag, flag, flower, gas-fixture, golf-ball, Nard-Aeromatic Ointment, Wooly bear-caterpillar,
Radio-Telephone, Scooter and Walnut.
8
ACCEPTED MANUSCRIPT
(a)
T
IP
CR
(b)
Figure-5: (a) Shows one sample image from each category for ImageNet synset. (b) Shows one sample image from each category for
FTVL tropical fruits dataset.
2) FTVL Dataset
US
FTVL database [37] is having 2612 images of fruits and vegetables categories in 15 types named as Agata Potato, Asterix Potato,
AN
Cashew, Diamond Peach, Fuji Apple, Granny Smith Apple, Honeydew Melon, Kiwi, Nectarine, Onion, Orange, Plum, Spanish Pear,
Watermelon and Taiti Lime.
3) Caltech-101
M
Caltech-101 [39] is a widely used benchmark that is used for the image classification and object recognition. It has 9146 images split
in 102 semantic groups. For the experimentation, we selected fifteen versatile categories including airplane, things, Bonsai, brain,
Face-easy, Buddha, Butterfly, Chandeliers, Tortoise, Ketch, Leopard, motorbikes, Wrist Watch, Ewer and Face.
ED
PT
CE
(a)
AC
(b)
9
ACCEPTED MANUSCRIPT
Figure-6: (a) Shows one sample image from each category for Caltech-101 dataset. (b) Shows one sample image from each category
for 102-flower categories dataset.
These versatile categories contain the images having spatial information, objects orientation, shape and texture and color information
to check the effectiveness of the proposed method. Totally 1350 images are used for the experiments split in 15 categories.
4) 102-Flower
102-Flower categories dataset [36] consist of 8189 flower images. These flowers are normally found in UK and its premises. Number
of images in each class vary from 40 to 258. This is a versatile image dataset to check the color and shape effectiveness of the
proposed method.
5) Corel-1000
Corel-1000 dataset [33] is a widely acceptable benchmark used for image classification and retrieval. Corel database consists of
T
different image categories including plain background images to the complex object. This dataset comprises of 1000 images split into
10 categories. The categories are Natives, beach sites, buildings, buses with different colors, dinosaurs with blank background,
IP
elephants, a variety of flowers, horses in multiple backgrounds, mountains with mimic to beach category, and food. Each category
consists of 100 images with the size 256×384 or 384×256.
CR
US
AN
(a)
M
ED
PT
(b)
Figure-7: (a) Shows one sample image from each category for Corel-1000 dataset. (b) Shows one sample image from each category
for 17-flower categories dataset.
CE
6) 17-Flower
17 category flower dataset [35] consist of 80 images per class. These flowers images are selected from common flowers exist in the
UK. The images have the attributes like pose and light variations. In some classes flowers resemble in color but differ in shape.
AC
Figure-7 shows the samples images from the each class. The seventeen different types of flowers are Daffodils, Snowdrop, Lily
Valley, Bluebell, Crocus, Iris, Tigerlily, Tulip, Fritillary, Sunflower, Daisy, Colts' Foot, Dandelion, Cowslip, Buttercup, Windflower
and Pansy.
7) Corel-10000
Corel 10,000 (or 10K) images dataset [34] comprises of several image categories. It is used to validate the results of content base
content-based image retrieval systems. Corel-10K database contains hundred categories where each category consists of 100 images.
There is a variety of semantic group available for the results evaluation. The Image size is 85×128 or 128×85 in JPEG format. The 15
10
ACCEPTED MANUSCRIPT
categories are selected for evaluation are Butterfly, Cars, Ketch, Planets, Texture, flags, Shining stars, Hospital, Text, Sunset, Flowers,
Food, Human Texture, Trees and Animals.
T
IP
(a)
CR
US
AN
(b)
Figure-8: (a) Shows one sample image from each category for Corel-10000 dataset. (b) Shows one sample image from each category
for Caltech-256 categories dataset.
M
8) Caltech-256
Caltech-256 is a large-scale dataset [40] contains 30,607 images. Images are grouped into 257 categories. Caltech-256 has complex
images as compared to Caltech-101. We selected 15 diverse image categories including Tomato, Billiards, back-pack, Bowling-ball,
ED
boxing-glove, teddy-bear, Teapot, bonsai, bulldozer, Airplane, wrist watch, spider, butterfly, cactus and swan. The selected categories
contribute texture patterns, shapes, complex objects and cluttered scenes. Image size is not fixed and varies from category to category
and within categories.
PT
9) COIL-100
Columbia Object Image Library (COIL-100) [38] dataset contains 7,200 images categorized into 100 different semantic groups. In
each category 72 distinct orientation of each object is presented. Figure-9 shows the sample image from each category. Each image
CE
category contains the same object rotated at 360 degrees. Image are captured with fixed camera with black background. Image size is
128×128 for all categories.
AC
11
ACCEPTED MANUSCRIPT
(a)
T
IP
CR
(b)
US
Figure-9: (a) Shows one sample image from each category for COIL-100 dataset. (b) Shows one sample image from each category
for ALOT dataset.
AN
10) ALOT
ALOT dataset [41] is a collection of 25000 images split into 250 categories where each category consist of 100 images. This dataset is
used to test the texture capability of the proposed descriptor. Each category contains the same image size of 384 × 235.
M
B. Results
1) Input process
ED
In the first step, a dataset is selected for the training and testing purpose. All the images from small/medium datasets or specific
categories from large datasets are selected for features extraction. The feature extraction is completed as described in the methodology
section. A random image is selected as query image from any category. Features are extracted for the query image and BoW is used to
index and search the k-nearest images. The robust and well-designed proposed descriptor is capable to distinguish the potential
PT
C. Experimental Results
CE
1) Results of the Corel-1000, FTVL and 17-Flower datasets with existing methods
To check the effectiveness of the proposed method, it is compared with those from existing methods. The results are compared with
the methods reported remarkable performance. The comparative work has also been cited by current researchers. For the dataset
AC
Corel-1000 the proposed method is compared with those to Kundu et al. [42], ElAlami et al. [27], Rao et al. [43], Esmel et al. [25],
Poursistani et al. [44], Shriv et al. [26], Dubey et al. [45], Xiao et al. [28] and Pan et al. [46]. These existing research methods are
performed on Corel-1000 dataset. In [42] applied the statistical features computed using the Multi-scale Geometric Analysis (MGA)
and performs well in horse, elephant and bus categories. This method is unable to find the foreground objects from complex
backgrounds. By applying the color and texture patterns [43] retrieves the images and results better retrieval rates in dinosaur, flower
and bus categories. However it is unable to report betters in all categories due to less object recognition capability of the method. In
[44] proposed vector quantization techniques (VQ) and a codebook generated using K-means clustering in image retrieval. It reports
better rates in non-cluttered images. In [45] report average results for beach and mountain categories which are different in semantic
and spatial information. Object are better recognized by [46] but it reports below average retrieval rates for bus and dinosaur
categories.
12
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure-10: Comparison of the average precision obtained by proposed method with other standard retrieval systems using Corel-1000
AN
dataset.
Figure-10 shows the results of the proposed method compared to those from existing research methods. The results show that the
proposed method has improved results as compared to those of methods. The proposed method reports significant performance in
many image categories. The results show that the proposed method has better average precision in most of the categories.
M
FTVL tropical fruit dataset is also compared with those of the existing methods. The proposed method is compared with those of the
standardized methods work with color and texture techniques. In figure-11 results show that the proposed method outperforms in
many image categories. It results that the proposed method works equally good for the color and texture images.
ED
PT
CE
AC
13
ACCEPTED MANUSCRIPT
T
IP
CR
Figure-11: Comparison of the average precision obtained by the proposed method with other standard retrieval systems using FTVL
dataset.
US
AN
M
ED
PT
CE
Figure-12: Comparison of the average precision obtained by the proposed method with other standard retrieval systems using 17-
AC
Figure-12 shows the results of 17-flower categories dataset compared with existing research methods. Competitive research methods
contain color and shape attributes. The results of the proposed method are compared with Gao et al. [47], Yihong et al. [48], Yihong et
al. w/B [48], Kai et al. w/B [49], Zhou et al. [50], Zhou et al. w/B [50]. It is observed that the proposed method shows significant
performance in most of the flower categories.
14
ACCEPTED MANUSCRIPT
Table-2: Comparison of the average precision obtained by the proposed method and other standard retrieval methods.
Corel-1000 Database – Average Precision Rates
Proposed ElAlami Rao et Kundu et Shriv et Dubey Xiao et Esmel et Pan et Poursistani
Categories Method et al. [27] al. [43] al. [42] al. [26] et al. al. [28] al. [27] al. [46] et al. [44]
[45]
Africa 0.90 0.70 0.56 0.44 0.75 0.75 0.67 0.72 0.72 0.72
Beach 0.60 0.56 0.53 0.32 0.58 0.55 0.60 0.59 0.65 0.65
Building 0.90 0.57 0.61 0.52 0.62 0.67 0.56 0.58 0.78 0.78
Bus 0.75 0.87 0.89 0.62 0.80 0.95 0.96 0.89 0.52 0.52
Dinosaur 1.00 0.97 0.98 0.40 1.00 0.97 0.98 0.99 0.47 0.47
Elephant 0.70 0.67 0.57 0.80 0.75 0.63 0.53 0.70 0.67 0.67
Flower 0.90 0.91 0.89 0.57 0.92 0.93 0.93 0.92 0.62 0.62
Horse 1.00 0.83 0.78 0.75 0.90 0.89 0.82 0.85 0.75 0.75
Mountain 0.70 0.53 0.51 0.57 0.56 0.45 0.46 0.56 0.85 0.85
Food 0.90 0.74 0.69 0.56 0.80 0.70 0.58 0.77 0.60 0.60
T
Table 2 shows average precision of the proposed method in comparison with those of the standard retrieval systems. The proposed
method reports significant performance in most of the categories especially in Africa, building, dinosaur, horse and food. The
IP
proposed method provides better precision in most of the image categories. Outstanding results are observed in Africa, building,
dinosaur, horse and food image categories. The proposed method extract the features based on color and shape which provides more
accurate results. The existing methods report good results in some categories like [21] gives better results for dinosaur. Similarly, [23,
CR
24] reported good results for the flower classification. The proposed method shows good precision rates also in this category.Table-3:
Comparison of the average precision obtained by the proposed method and other standard retrieval methods [51] for FTVL Tropical
Fruits dataset.
CCV
US
Table-3: Comparison of the average precision obtained by the proposed method and other standard retrieval methods [51] for FTVL
FTVL Tropical Fruit Dataset – Average Precision Rates
Proposed CCV CDH+SHE GCH CDH CDH
AN
Categories
Method +CLBP +LTP +CLBP + LBP +SHE +CLBP
Agata Potato 0.97 1.00 1.00 0.99 1.00 0.96 0.93
Asterix Potato 1.00 0.98 0.99 0.96 0.96 0.96 0.95
Cashew 0.99 1.00 1.00 1.00 1.00 1.00 1.00
Diamond Peach 0.91 1.00 1.00 0.96 1.00 0.90 0.96
M
Plum 1.00
Spanish Pear 0.91 0.86 0.75 0.86 0.84 0.70 0.83
Taiti Lime 1.00 1.00 1.00 0.99 1.00 0.96 1.00
Watermelon 0.94 0.99 1.00 0.99 0.99 0.99 0.98
CE
Table 3 shows average precision of the proposed method for the fifteen categories of the FTVL tropical fruits dataset. The results of
the proposed method are compared with CCV+CLBP, CCV+LTP, CDH+SHE+CLBP, GCH+LBP, CDH+SHE, CDH+CLBP results
[51]. The proposed method reports remarkably high performance in most of the image categories. The proposed method shows the
highest precision rates in asterix potato, fuji apple, kiwi, nectarine and Spanish pear. The proposed method is capable of recognizing
the objects with colors and shapes so that it outperforms in most of the categories of the tropical fruit dataset. The existing methods
AC
also report good results in many categories but the proposed method reports the highest mean average precision.
Table 4: Average precision computed for proposed method vs. existing retrieval methods for 17-Flower categories dataset.
17-Flower Categories Dataset – Average Precision Rates
Categories Proposed Gao et Yihong et Yihong et al. Kai et al. w/B Zhou et Zhou et al.
15
ACCEPTED MANUSCRIPT
Method al. [47] al. [48] w/B [48] [49] al. [50] w/B [50]
Daffodils 0.95 0.73 0.53 0.61 0.60 0.75 0.70
Snowdrop 0.70 0.75 0.65 0.82 0.90 0.95 0.96
Lily Valley 0.70 0.75 0.68 0.83 0.80 0.81 0.88
Bluebell 0.89 0.46 0.58 0.63 0.64 0.58 0.78
Crocus 0.94 0.68 0.70 0.74 0.76 0.70 0.84
Iris 1.00 0.90 0.18 0.33 0.26 0.35 0.32
Tiger lily 0.85 0.80 0.45 0.73 0.60 0.60 0.72
Tulip 0.91 0.40 0.51 0.69 0.72 0.78 0.82
Fritillary 0.90 0.73 0.63 0.79 0.79 0.85 0.78
Sunflower 0.95 0.88 0.58 0.61 0.80 0.75 0.80
Daisy 0.95 0.83 0.58 0.82 0.74 0.70 0.76
Colts Foot 0.92 0.68 0.50 0.68 0.41 0.45 0.42
Dandelion 0.99 0.80 0.38 0.58 0.56 0.60 0.64
Cowslip 0.89 0.50 0.70 0.70 0.80 0.86 0.88
Buttercup 0.92 0.71 0.43 0.18 0.63 0.65 0.78
T
Windflower 0.95 0.88 0.20 0.45 0.29 0.45 0.34
Pansy 0.93 0.83 0.58 0.51 0.75 0.80 0.78
IP
Table 4 shows average precision of the proposed method for all the categories of 17-flower categories dataset. The proposed method is
compared with the existing research methods; Gao et al. [47] , Yihong et al. [48], Yihong et al. w/B [48], Kai et al.w/B [49], Zhou et
al. [50], Zhou et al. w/B [50]. The proposed method shows significant precision rates in most of the image categories.
CR
US
AN
M
16
ACCEPTED MANUSCRIPT
the proposed method and existing research methods. Figure 13 (a) shows the mean average precision for Corel-1000 dataset. The
proposed method shows 0.835 mean average precision that is 6.8% higher than the existing top ranked method. Similarly for tropical
fruits dataset 0.2% and 17-flower categories dataset 13.4% gain is resulted. Figure 13 (b) shows mean average recall rates for the
proposed method in comparison with those of existing research methods. It is observed that the proposed method shows better recall
rates as compared with those of existing research methods. The proposed method has superiority over all the existing research
methods in object based, color and texture datasets.
D. Comparison against state-of-the-art keypoint detectors and descriptors
Feature detectors are employed for image classification and object recognition. Detectors are considered as potential keypoints finders.
These keypoints may be edges, corners etc. Feature descriptors are used to describe the interest point by using applying the proposed
algorithm. Well-known keypoints detectors and descriptors are used; to compare the results with the proposed method. Seven state-of-
the-art detectors and descriptors are used for the experimentation. These are SURF [52], HOG [53], RGBLBP [54], MSER [55], DoG
[30], LBP [56] and SIFT [31]. Scale Invariant Features Transform (SIFT) [31] was presented in the proceedings of the International
Conference on Computer Vision (ICCV) and Histogram of Oriented Gradients (HOG) [53] was presented at the Conference on
T
Computer Vision and Pattern Recognition (CVPR). These descriptors are used for object detection and image classification. Speeded
Up Robust Feature (SURF) [52] was presented at the European Conference on Computer Vision (ECCV). Local Binary Pattern (LBP)
IP
[56] was presented at International Conference on Pattern Recognition (ICPR). It is a simple and powerful descriptor for the texture
recognition. Originally LBP was proposed for grey-level images while RGBLBP [54] is an extension to LBP which has been
presented in different forms. Maximally Stable Extremal Regions (MSER) [55] was presented at the Proceedings of British Machine
CR
Vision Conference. Its feature extraction method strength leads to object detection.
Experimentation is performed for the descriptors SURF [52], HOG [53], RGBLBP [54], MSER [55], DoG , LBP [56] and SIFT [31]
and the results are compared with the results of proposed method. For these descriptors the experimentation is performed on 10
datasets ImageNet, Corel-1000, Corel-10,000, 17-Flowers, 102-Flowers, FTVL, COIL-100, Caltech-101, Caltech-256 and ALOT. For
US
all the descriptors and the proposed method, the same categories are used for the experimentation for each database. Top 20 image
retrievals are evaluated to compute the precision of descriptor. These top 20 image retrievals are iterated for 10 times to obtain the
exact accuracy of the method. Thus, average precision for each category is obtained by computing the average of ten iterations in
which each iteration contains top 20 retrieved images.
AN
1) Average Precision (AP) for the proposed method Vs. State-of-the-art detectors and descriptors
Average precision is calculated for the proposed method and state-of-the-art descriptors SURF [52], HOG [53], RGBLBP [54], MSER
[55], DoG [30], LBP [56] and SIFT [31]. Precision is a positive predicted value which is the ratio of relevant images to total retrieved
M
where P is the precision, IA(q) represents the relevant matched images and IR(q) represents the total retrieved images.
∑
(17)
PT
Where AP is the average precision, P is the precision for each category and n is the total number of iterations which is 10 in our case.
It means that the experimentation is performed for 10 different query images for each category; then the precision results of these ten
iterations are summed and divided by the total number of categories.
CE
Figure 14 shows the average precision for ten datasets 17-Flowers, 102-Flowers, FTVL, ImageNet, COIL, Caltech-101, Caltech-256,
ALOT, Corel-1000 and Corel-10,000. The results of the proposed method are compared with those of state-of-the-art detectors and
descriptors SURF [52], HOG [53], RGBLBP [54], MSER [55], DoG [30], LBP [56] and SIFT [31]. Code is implemented for all state-
of-the-art detectors and descriptors and the experimentation is performed on the same machine. It is observed from the graph results
that novel proposed method has the highest average precision in most of the categories of all datasets. In figure 14 (a) & (b) color and
AC
texture based flower datasets, the average precision of the proposed method is highest in most of the categories. In complex object
based datasets Caltech-101, Caltech-256 the proposed method shows better average precision in many categories. For texture dataset
ALOT, the proposed method shows equally good results in many image categories. For the challenging ImageNet synset the proposed
method reports good average precision in different synsets. Corel-1000 and Corel-10000 contains versatile image semantic groups in
which the proposed method shows encouraging results. In FTVL dataset, asterix potato and onion reported low precision as compared
to other categories due to their resemblance. Some of the asterix potato are misclassified as onion and vice-versa. In 102 flowers
dataset, the flowers of second category contains multiple colors with varying shapes therefore misclassified with other flower classes.
Hence, the second flower category reports lower precision as compared to other categories. Cherry category of imagnet synset shows
better precision than other categories. It is due to the fact that the objects in this category are non-cluttered and with similar object
colors. HOG also shows better precision due to same reason. In 17-flower dataset, the flowers with similar colors report low precision;
that is observed in snowdrop, iris and lily valley.
17
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
PT
CE
AC
18
ACCEPTED MANUSCRIPT
Figure-14: Average Precision (AP) shown in graphical form for the (a) 17-Flower categories, (b) 102-Flower categories, (c) FTVL
tropical fruits, (d) ImageNet synsets, (e) COIL, (f) Caltech-101, (g) Caltech-256, (h) ALOT, (i) Corel-1000, and (j) Corel-10,000
dataset.
2) Average Recall (AR) for the proposed method Vs. State-of-the-art detectors and descriptors
Recall rate is the sensitivity that is the fraction of relevant retrieved instances over total relevant instances.
(18)
∑
(19)
Where AR is the average recall, R is the recall for each category and n is the total number of iterations which is 10 in our case. The
T
experimentation is performed on 10 different query images for each semantic group and the recall results of these ten iterations are
aggregated and divided by ten.
IP
Figure 15 shows the average recall rates for the proposed method in comparison with state-of-the-art methods.
CR
US
AN
M
(a) (b)
ED
PT
CE
AC
(c) (d)
Figure-15: Graphical representation of Average Recall (AR) for the (a) 17-Flower categories, (b) 102-Flower categories, (c) FTVL
tropical fruits, (d) ImageNet synsets.
The proposed method shows remarkable average recall rates in most of the image semantic groups. Significance recall rates are
produced in color and object based datasets. FTVL dataset also reports better average recall rates in many categories. ImageNet shows
better recall rates in some categories and average recall rates in rest of the categories. In Caltech-101 improved recall rates are
19
ACCEPTED MANUSCRIPT
observed in butterfly, chandeliers, tortoise, ketch and leopard categories. COIL dataset shows equally good recall rates in most of the
image categories. Corel 10000 dataset shows above average recall rates in most of the image categories. The recall rates are
competitive as compared to state-of-the-art methods. For the textures in ALOT dataset the reported average recall rates are
encouraging in most of the image categories. The proposed method can specifically identify the shapes and colors therefore it reports
better recall rates for the daffodils, crocus, tulip, daisy and such categories.
T
IP
CR
(a) (b)
US
AN
M
ED
(c) (d)
Figure-16: Graphical representation of Average Recall (AR) for the (a) COIL, (b) Caltech-101, (c) Caltech-256, and (d) ALOT
PT
datasets.
Same fact can be observed in aata potato, asterix potato, cashew and related categories of tropical fruit dataset. The ambiguity to
recognize the possible same objects also results in high recall rates that is observed in flower and wooly bear-caterpillar category of
CE
imagenet synset. The objects with clear boundaries produces better recall rates as reported in butterfly, chandeliers, ketch and similar
image categories. In ALOT dataset red-coal and ice-thick
AC
20
ACCEPTED MANUSCRIPT
T
(a) (b)
Figure-17: Graphical representation of Average Recall (AR) for the (a) Corel-1000, and (b) Corel-10,000 datasets.
IP
CR
3) Precision and Recall ratio for the proposed method Vs. State-of-the-art detectors and descriptors
Precision and recall graph is a useful representation denoting the relevant retrieved images over retrieved relevant images. The graph
US
represents average precision along x-axis and average recall rates along x-axis. Each point along x and y axes represents the relevant
and irrelevant images proportion in the respective categories. Figure 16 shows the precision and recall ratio for ten datasets. It is
observed from the graphs that precision and recall ratio of the proposed method is encouraging in most of the datasets. In Caltech-256
and Corel-10000 the precision and recall ratio is average due to occluded and overlay objects in different categories. In most of the
AN
datasets the proposed method shows better proportion of relevant and retrieved images.
M
ED
PT
CE
21
ACCEPTED MANUSCRIPT
T
(d) (e) (f)
IP
CR
US
AN
(j)
AC
Figure-18: Precision and Recall (P&R) ratio is shown graphically for the (a) 17-Flower categories, (b) 102-Flower categories, (c)
FTVL tropical fruits, (d) ImageNet synsets, (e) COIL, (f) Caltech-101, (g) Caltech-256, (h) ALOT, (i) Corel-1000, and (j) Corel-
10,000 datasets.
The precision and recall ratio of the proposed method is better in flower datasets; however in small colored objects recognition state-
of-the-art descriptors HOG and SIFT also performs well due to their speciality in object recognition. Similarly COIL dataset has
distinctive curve for the proposed method and object based descriptors. Caltech-101 and caltech-256 reports better rates for the
proposed method. In Corel-10k dataset, DoG [30] shows improved rates due to its discreminative power of Guassian differences for
these categories; however the proposed method also shows significant results for small size of images.
4) Average Retrieval Precision (ARP)for the proposed method Vs. State-of-the-art detectors and descriptors
22
ACCEPTED MANUSCRIPT
ARP graphs show the Average Retrieval Precision of the proposed in comparison with those of state-of-the-art descriptors. Average
Retrieval Precision is computed by:
∑ (20)
T
IP
CR
(a) (b)
US
ARP is computed for all the categories of each dataset. ARP graph shows data orientation in an order where the each data bar
represents the correctly retrieved images regardless the category. ARP is computed for all categories of a dataset. X-axis shows the
number of categories against the average precision. As the number of categories are increased the precision is gradually decreased.
AN
Reason is large number of categories put large denominator; in our case it is n in the equation-20. In Corel-10000 dataset DoG
descriptor shows outperformance. Figure 19-21 shows the ARP results for all ten datasets used for the experimentation. These graphs
reports that the proposed method shows the highest ARP in 17-Flowers, 102-Flowers, FTVL, COIL, Caltech-101, Caltech-256,
ALOT and Corel-1000 dataset. In ImageNet synset the proposed method has average ARP.
M
ED
PT
CE
(c) (d)
AC
Figure-19: Graphical representation of Average Retrieval Precision (ARP) is shown for the (a) 17-Flower categories, (b) 102-Flower
categories, (c) FTVL tropical fruits, and (d) ImageNet datasets.
Figure 21 shows ARP for Corel-1000 and Corel-10000 datasets. In Corel-1000 dataset the proposed method reports the highest
ARP rates. In Corel-10000 DoG reports better as compared to state-of-the-art detectors and descriptors.
23
ACCEPTED MANUSCRIPT
T
(a) (b)
IP
CR
US
AN
(c) (d)
Figure-20: Graphical representation of Average Retrieval Precision (ARP) is shown for the (a) COIL, (b) Caltech-101, (c) Caltech-
M
(a) (b)
Figure-21: Graphical representation of Average Retrieval Precision (ARP) is shown for the (a) Corel-1000, and (b) Corel-10,000
datasets.
The proposed method shows the outstanding ARP curve for 8 out of 10 datasets. In ImageNet and Corel-10,000 dataset MSER, DoG
and SIFT shows better average retrieval precision by more correctly recognizing the objects in some categories.
5) Average Retrieval Recall (ARR)for the proposed method Vs. State-of-the-art detectors and descriptors
24
ACCEPTED MANUSCRIPT
ARP graphs show the Average Retrieval Recall of the proposed in comparison with those of state-of-the-art descriptors. Average
Retrieval Recall (ARR) is computed by the formula:
∑ (21)
Where AR is the average recall and n is total number of categories. Average retrieval recall is calculated for all categories of all
dataset. To draw the ARR graph, computed ARR values are sorted in ascending order. Interestingly the proposed method shows
remarkable ARR rates in many dataset. The competitive algorithm is capable to show considerable ARR in color and object based
datasets. It also reports better performance in overlay object classes. As compared to state-of-the-art descriptor the proposed method
shows promising results in most of the image datasets. Figure-22 shows the Average Recall Rates (ARR) for 17-flower categories and
102-flower categories datasets. In both dataset the proposed method has outstanding ARR in all categories.
T
IP
CR
(a)
US (b)
AN
Figure-22: Average Retrieval Recall (ARR) ratio is shown graphically for the (a) 17-Flower categories and (b) 102-Flower categories
datasets
Figure-23 shows the ARR for FTVL tropical fruits and ImageNet datasets. Tropical fruits contains round object shapes with different
colors. The proposed method successffully recognizes the different color objects and shows the best ARR rates. ImageNet dataset has
M
challenging image categories. The proposed method reports better ARR rates for ImageNet synset. For few number of categories the
ARR rate is above average and it is average rate for the higher number of categories.
ED
PT
CE
AC
(a) (b)
Figure-23: Average Retrieval Recall (ARR) ratio is shown graphically for the (a) FTVL tropical fruits and (b) ImageNet datasets.
Grpahs in figure 24 shows ARR rates for COIL and Caltech-101 datasets. In COIL dataset for 1-15 number of image categories ARR
rate is the lowest than all of the state-of-the-art detectors and descriptors. For Caltech-101 dataset Difference of gaussians (DoG) [30]
descriptor has the lowest ARR rate for 1-5 number of categories. Number of 5-15 categories has the lowest ARR rates for Caltech-101
dataset.
25
ACCEPTED MANUSCRIPT
T
IP
(a) (b)
Figure-24: Average Retrieval Recall (ARR) ratio is shown graphically for the (a) COIL and (b) Caltech-101 datasets.
CR
Caltech-256 and ALOT datasets ARR rates are shown in figure-25. MSER descriptor and the proposed method has the same ARR for
one category. The proposed method has lowest ARR rates for 2-15 number of categroies. Similarly in ALOT dataset the proposed
method outperforms in all categories aggregates.
US
Figure 26 shows ARR rates for Corel-1k and Corel-10k datasets. In Corel-1k
dataset the proposed method has outstanding ARR rates and it reports average ARR rates for Corel-10k dataset.
Average retrieval recall curve is straight line or low curve for most of the image categories. In tropical fruits for cumulative 15
categories the ARR rates goes high due to some misclassified images of agata potato, onion and orange. Similarly taiti lime and
AN
granny smith apply causes the possible misclassification and affects the ARR rates. In 17-flowers categories fritillary and sunflower
are similar in shape and size with color differences.
M
ED
PT
CE
(a) (b)
AC
Figure-25: Average Retrieval Recall (ARR) ratio is shown graphically for the (a) Caltech-256 and (b) ALOT datasets.
The proposed method successfully classify and index these categories by applying the L2 color coefficients. Daffodils and buttercup
flowers mimic in color, shape and size therefore ARR curve raises upward. In COIL dataset, pink cup, white cup and cup resemble
with each other and decreases the ARR rates.
26
ACCEPTED MANUSCRIPT
T
(a) (b)
Figure-26: Average Retrieval Recall (ARR) ratio is shown graphically for the (a) Corel-1000 and (b) Corel-10,000 datasets.
IP
6) Retrieved images from all datasets with feature extraction and feature retrieval time
CR
Figure 20 shows the top 20 retrieved images for each dataset. Results are not shown for all the categories of all datasets. One output
per dataset is shown to represent the results of one category from the each dataset. It is observed from the retrieved images that
remarkable precision is achieved in most of the image datasets. The proposed method is capable of finding the most relevant and
US
correct images from the complex and overlay objects images as shown in (b). The proposed method retrieves the most accurate images
based on color values as shown in (a) and (c) for 102-flower and 17-flower category datasets. The proposed method has a significance
to categorize the images based on texture so that ALOT dataset has remarkable results (d). COIL dataset with rotational objects has
outstanding accuracy in all categories (f). Corel-1000 is a versatile benchmark that shows high accuracy in many categories (g). The
AN
proposed method shows the results for cars with significant accuracy form Corel-10000 dataset (h). It is also observed that the
proposed method shows considerable accuracy for the challenging dataset ImageNet synset (j).
M
ED
PT
27
ACCEPTED MANUSCRIPT
(i) (j)
Figure-27: Top 20 retrieved images shown for all datasets (a) 102-Flower categories, (b) Caltech-101, (c) 17-Flower categories, (d)
ALOT , (e) Caltech-256, (f) COIL, (g) Corel-1000, (h) Corel-10000, (i) FTVL fruits and (j) ImageNet synset datasets. In each dataset
T
retrieved images are shown for one category.
IP
Feature extraction and retrieval time is an important factor that directly impact the number of retrieved images. Table5 shows the
features extraction and retrieval time for top 10,15,20,25,30,35,40 images. Table-5 shows that the proposed method efficiently
performs the feature extraction and retrieval for different sizes and contents of images.
CR
Table 5: Shows the accuracy, features extraction time and total time (retrieval time) for top 10,15,20,25,30,35,40 images in all dataset
17-Flowers FTVL Corel-1000
Features Total Features Total Features Total
No. of Accuracy No. of Accuracy No. of Accuracy
extraction time extraction time extraction time
images
10
15
(%)
99.5
98.2
time (sec.)
0.176
0.175
(sec.)
1.51
1.53
images
10
15
(%)
96.3
94.5
US
time (sec.)
0.181
0.185
(sec.)
1.82
1.96
images
10
15
(%)
87.1
85.4
time (sec.)
0.140
0.145
(sec.)
1.06
1.11
AN
20 85.9 0.180 1.59 20 93 0.187 2.04 20 83.5 0.142 1.19
25 90 0.179 1.61 25 89.8 0.183 2.2 25 82.4 0.148 1.25
30 86.4 0.178 1.64 30 85.6 0.190 2.83 30 79.9 0.147 1.26
M
Caltech-256
Features Total
No. of Accuracy
extraction time
images (%)
time (sec.) (sec.) COIL-100
10 74.22 0.140 0.9 Features Total
No. of Accuracy
extraction time
15 73.5 0.145 0.99 images (%)
time (sec.) (sec.)
20 71.3 0.120 1.13 10 93.6 0.138 0.574
25 69.85 0.141 1.19 15 93.1 0.138 0.66
20 92.3 0.138 0.76
25 85.4 0.137 0.77
30 80.7 0.139 0.82
35 72.4 0.140 0.83 28
40 69.7 0.140 0.9
ACCEPTED MANUSCRIPT
Corel-10000
T
Features Total
No. of Accuracy
extraction time
IP
images (%)
time (sec.) (sec.)
10 63.89 0.146 0.328
CR
15 58.2 0.147 0.338
20 56.7 0.140 0.451
25 55.4 0.141 0.552
30
35
40
51.4
49.5
US
50.67
0.147
0.146
0.144
0.599
0.659
0.755
AN
7) Mean Average Precision (mAP)for the proposed method Vs. State-of-the-art detectors and descriptors
M
Mean average precision graphs are used to provide an aggregate results for each method in a single bar. Each bar shows mean average
precision in each dataset for the respective descriptor. It provides an instant view of the overall precision of the descriptor in the
dataset. In 17-flowers classification the mean average precision of the proposed method is 0.859 that is 31.8% higher than MSER;
which has second highest mAP. Similarly in 102-flower categories classification the proposed method report 39.7% better results. In
ED
(a) (b)
29
ACCEPTED MANUSCRIPT
T
(c) (d)
Figure-28: Mean Average Precision (mAP) ratio is shown graphically for the (a) 17-Flower categories, (b) 102-Flower categories, (c)
IP
FTVL tropical fruits, and (d) ImageNet datasets.
For the challenging ImageNet synset the proposed method has 0.7% higher mAP than the second highest result of SIFT [31]
CR
descriptor. COIL dataset shows 0.923 and Caltech-101 reports 0.657 mAP. In Corel-10000, the highest mAP is 0.567 that is reported
by the proposed method and Difference of Gaussians descriptor.
US
AN
M
ED
(a) (b)
PT
CE
AC
(c) (d)
30
ACCEPTED MANUSCRIPT
T
(e) (f)
Figure-29: Mean Average Precision (mAP) ratio is shown graphically for the (a) COIL, (b) Caltech-101, (c) Caltech-256, (d) ALOT,
IP
(e) Corel-1000, and (f) Corel-10,000 datasets.
CR
V. CONCLUSION
In this paper we proposed a new approach based on the most salient image features for the effective image retrieval. We combined the
local image features, spatial information in BoW architecture and evaluated the results on ten popular image collection databases.
Comparison of the proposed method is conducted with seven state-of-the-art detectors and descriptors for the performance evaluation.
US
Results investigation is performed with many others from the literature to determine the competency and affectivity of the proposed
method. From the experimental results it is concluded that the proposed method accurately classifies the visual objects from
challenging datasets, retrieves the images with high precision from complex categories and distinguishes the visually same objects.
The results are computed in terms of the average precision, average recall, average retrieval precision, average retrieval recall,
AN
precision and recall and mean average precision and compared with benchmarks and state-of-the-art detectors and descriptors. It is
also deducted that the proposed method outperforms the state-of-the-art detectors and descriptors over color, texture, cluttered objects
and complex datasets. A future extension to the introduced approach is to make the technique scale invariant and noise robust.
M
References
1. Ahmed, K.T. and M.A. Iqbal, Region and texture based effective image extraction. Cluster Computing, 2017: p. 1-10.
2. Shirazi, S.H., et al., Content-Based Image Retrieval Using Texture Color Shape and Region. International Journal of
ED
features. IEEE Transactions on Circuits and Systems for Video Technology, 2015. 25(3): p. 466-481.
5. Verma, M. and B. Raman, Local neighborhood difference pattern: A new feature descriptor for natural and texture image
retrieval. Multimedia Tools and Applications, 2017: p. 1-24.
CE
6. Dubey, S.R., S.K. Singh, and R.K. Singh. Boosting local binary pattern with bag-of-filters for content based image retrieval.
in Electrical Computer and Electronics (UPCON), 2015 IEEE UP Section Conference on. 2015. IEEE.
7. Ahmed, K.T., A. Irtaza, and M.A. Iqbal, Fusion of local and global features for effective image extraction. Applied
Intelligence, 2017: p. 1-18.
8. Kandasamy, A. and M. Sundaram, Multi-layer multi-level color distribution–User feedback model with wavelet analysis for
AC
31
ACCEPTED MANUSCRIPT
14. Murala, S., A.B. Gonde, and R.P. Maheshwari. Color and texture features for image indexing and retrieval. in Advance
Computing Conference, 2009. IACC 2009. IEEE International. 2009. IEEE.
15. Dubey, S.R., S.K. Singh, and R.K. Singh, Rotation and illumination invariant interleaved intensity order-based local
descriptor. IEEE Transactions on Image Processing, 2014. 23(12): p. 5323-5333.
16. Bagri, N. and P.K. Johari, A comparative study on feature extraction using texture and shape for content based image
retrieval. International Journal of Advanced Science and Technology, 2015. 80: p. 41-52.
17. Ramesh, B., C. Xiang, and T.H. Lee, Shape classification using invariant features and contextual information in the bag-of-
words model. Pattern Recognition, 2015. 48(3): p. 894-906.
18. Montazer, G.A. and D. Giveki, Content based image retrieval system using clustered scale invariant feature transforms.
Optik-International Journal for Light and Electron Optics, 2015. 126(18): p. 1695-1699.
19. Li, B., et al., A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries.
Computer Vision and Image Understanding, 2015. 131: p. 1-27.
20. Anandh, A., K. Mala, and S. Suganya. Content based image retrieval system based on semantic information using color,
texture and shape features. in Computing Technologies and Intelligent Data Engineering (ICCTIDE), International
Conference on. 2016. IEEE.
T
21. Vipparthi, S.K. and S.K. Nagar, Color directional local quinary patterns for content based indexing and retrieval. Human-
Centric Computing and Information Sciences, 2014. 4(1): p. 6.
IP
22. Iakovidou, C., et al., Localizing global descriptors for content-based image retrieval. EURASIP Journal on Advances in
Signal Processing, 2015. 2015(1): p. 80.
23. Korytkowski, M., L. Rutkowski, and R. Scherer, Fast image classification by boosting fuzzy classifiers. Information
CR
Sciences, 2016. 327: p. 175-182.
24. Wang, X.-Y., et al., Image retrieval based on exponent moments descriptor and localized angular phase histogram.
Multimedia Tools and Applications, 2017. 76(6): p. 7633-7659.
25. ElAlami, M.E., A novel image retrieval model based on the most relevant features. Knowledge-Based Systems, 2011. 24(1):
26.
27.
p. 23-32.
US
Shrivastava, N. and V. Tyagi, An efficient technique for retrieval of color images in large databases. Computers & Electrical
Engineering, 2015. 46: p. 314-327.
ElAlami, M.E., A new matching strategy for content based image retrieval system. Applied Soft Computing, 2014. 14: p.
AN
407-418.
28. Xiao, Y., J. Wu, and J. Yuan, mCENTRIST: A multi-channel feature generation mechanism for scene categorization. IEEE
Transactions on Image Processing, 2014. 23(2): p. 823-836.
29. Alahi, A., R. Ortiz, and P. Vandergheynst. Freak: Fast retina keypoint. in Computer vision and pattern recognition (CVPR),
2012 IEEE conference on. 2012. Ieee.
M
30. Bundy, A. and L. Wallen, Difference of gaussians, in Catalogue of Artificial Intelligence Tools1984, Springer. p. 30-30.
31. Lowe, D.G. Object recognition from local scale-invariant features. in Computer vision, 1999. The proceedings of the seventh
IEEE international conference on. 1999. Ieee.
ED
32. Russakovsky, O., et al., Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015.
115(3): p. 211-252.
33. Li, J. and J.Z. Wang, Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on
pattern analysis and machine intelligence, 2003. 25(9): p. 1075-1088.
PT
34. Wang, J.Z., J. Li, and G. Wiederhold, SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE
Transactions on pattern analysis and machine intelligence, 2001. 23(9): p. 947-963.
35. Nilsback, M.-E. and A. Zisserman. A visual vocabulary for flower classification. in Computer Vision and Pattern
Recognition, 2006 IEEE Computer Society Conference on. 2006. IEEE.
CE
36. Nilsback, M.-E. and A. Zisserman. Automated flower classification over a large number of classes. in Computer Vision,
Graphics & Image Processing, 2008. ICVGIP'08. Sixth Indian Conference on. 2008. IEEE.
37. FTVL Database, accessed on Dec. 2017. [Online]. Available: http://ic.unicamp.br/~rocha/pub/downloads/tropical-fruits-
DB-1024x768.tar.gz.
AC
38. Nayar, S., S. Nene, and H. Murase, Columbia object image library (coil 100). Department of Comp. Science, Columbia
University, Tech. Rep. CUCS-006-96, 1996.
39. Fei-Fei, L., R. Fergus, and P. Perona, Learning generative visual models from few training examples: An incremental
bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007. 106(1): p. 59-70.
40. Griffin, G., A. Holub, and P. Perona, Caltech-256 object category dataset. 2007.
41. Burghouts, G.J. and J.-M. Geusebroek, Material-specific adaptation of color invariant features. Pattern Recognition Letters,
2009. 30(3): p. 306-313.
42. Kundu, M.K., M. Chowdhury, and S.R. Bulò, A graph-based relevance feedback mechanism in content-based image
retrieval. Knowledge-Based Systems, 2015. 73: p. 254-264.
43. Rao, M.B., B.P. Rao, and A. Govardhan, CTDCIRS: content based image retrieval system based on dominant color and
texture features. International Journal of Computer Applications, 2011. 18(6): p. 40-46.
32
ACCEPTED MANUSCRIPT
44. Poursistani, P., et al., Image indexing and retrieval in JPEG compressed domain based on vector quantization. Mathematical
and Computer Modelling, 2013. 57(5): p. 1005-1017.
45. Dubey, S.R., S.K. Singh, and R.K. Singh, Multichannel decoded local binary patterns for content-based image retrieval.
IEEE Transactions on Image Processing, 2016. 25(9): p. 4018-4032.
46. Pan, S., et al. Content retrieval algorithm based on improved HOG. in Applied Computing and Information Technology/2nd
International Conference on Computational Science and Intelligence (ACIT-CSI), 2015 3rd International Conference on.
2015. IEEE.
47. Gao, S., I.W.-H. Tsang, and Y. Ma, Learning category-specific dictionary and shared dictionary for fine-grained image
categorization. IEEE Transactions on Image Processing, 2014. 23(2): p. 623-634.
48. Yang, J., et al. Linear spatial pyramid matching using sparse coding for image classification. in Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. 2009. IEEE.
49. Wang, J., et al. Locality-constrained linear coding for image classification. in Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on. 2010. IEEE.
50. Zhou, N. and J. Fan, Jointly learning visually correlated dictionaries for large-scale visual recognition applications. IEEE
Transactions on pattern analysis and machine intelligence, 2014. 36(4): p. 715-730.
T
51. Dubey, S.R. and A.S. Jalal, Fruit and vegetable recognition by fusing colour and texture features of the image using machine
learning. International Journal of Applied Pattern Recognition, 2015. 2(2): p. 160-181.
IP
52. Bay, H., T. Tuytelaars, and L. Van Gool, Surf: Speeded up robust features. Computer vision–ECCV 2006, 2006: p. 404-417.
53. Dalal, N. and B. Triggs. Histograms of oriented gradients for human detection. in Computer Vision and Pattern Recognition,
2005. CVPR 2005. IEEE Computer Society Conference on. 2005. IEEE.
CR
54. Ojala, T., M. Pietikäinen, and T. Mäenpää. Gray scale and rotation invariant texture classification with local binary patterns.
in European Conference on Computer Vision. 2000. Springer.
55. Matas, J., et al., Robust wide-baseline stereo from maximally stable extremal regions. Image and vision computing, 2004.
22(10): p. 761-767.
56.
US
Ojala, T., M. Pietikainen, and D. Harwood. Performance evaluation of texture measures with classification based on
Kullback discrimination of distributions. in Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision & Image
Processing., Proceedings of the 12th IAPR International Conference on. 1994. IEEE.
AN
M
ED
PT
CE
AC
33