0% found this document useful (0 votes)

83 views10 pages

Pattern Recognition: Guang-Hai Liu, Lei Zhang, Ying-Kun Hou, Zuo-Yong Li, Jing-Yu Yang

Multi-texton histogram is a novel image feature representation method. It integrates the advantages of co-occurrence matrix and histogram. The proposed MTH method is extensively tested on the Corel dataset. It has good discrimination power of color, texture and shape features.

Uploaded by

Yikwang Tan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views10 pages

Pattern Recognition: Guang-Hai Liu, Lei Zhang, Ying-Kun Hou, Zuo-Yong Li, Jing-Yu Yang

Uploaded by

Yikwang Tan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

ARTICLE IN PRESS

Pattern Recognition 43 (2010) 2380–2389

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

Image retrieval based on multi-texton histogram

Guang-Hai Liu a,, Lei Zhang b, Ying-Kun Hou d, Zuo-Yong Li c, Jing-Yu Yang c
a
College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China
b
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
c
Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, China
d
School of Information Science and Technology, Taishan University, Taian 271021, China

a r t i c l e in f o a b s t r a c t

Article history: This paper presents a novel image feature representation method, called multi-texton histogram (MTH),
Received 30 March 2009 for image retrieval. MTH integrates the advantages of co-occurrence matrix and histogram by
Received in revised form representing the attribute of co-occurrence matrix using histogram. It can be considered as a
16 January 2010
generalized visual attribute descriptor but without any image segmentation or model training. The
Accepted 11 February 2010
proposed MTH method is based on Julesz’s textons theory, and it works directly on natural images as a
shape descriptor. Meanwhile, it can be used as a color texture descriptor and leads to good performance.
Keywords: The proposed MTH method is extensively tested on the Corel dataset with 15 000 natural images. The
Image retrieval results demonstrate that it is much more efﬁcient than representative image feature descriptors, such
Texton detection
as the edge orientation auto-correlogram and the texton co-occurrence matrix. It has good
Multi-texton histogram
discrimination power of color, texture and shape features.
Crown Copyright & 2010 Published by Elsevier Ltd. All rights reserved.

1. Introduction Some statistical models have been proposed to exploit the

similarities between image regions or patches, which are
Image retrieval is an important topic in the field of pattern represented in a uniform vector, such as the Visual Token Catalog
recognition and artificial intelligence. Generally speaking, there [14] and the Visual Language Modeling [15]. They map the blob to
are three categories of image retrieval methods: text-based, visual words and apply language model to visual words. A visual
content-based and semantic-based. The text-based approach can token catalog is generated to exploit the content similarities
be traced back to 1970s [4]. Since the images need to be manually between regions in [14], while the Visual Language Modeling in
annotated by text descriptors, it requires much human labour [15] is based on the assumption that there are implicit visual
for annotation, and the annotation accuracy is subject to human grammars in a meaningful image. Those methods need accurate
perception. In early 1990s, researchers had built many content- image segmentation, which is however still an open problem.
based image retrieval systems, such as QIBC, MARS, Virage, Limited by the current advances of artificial intelligence and
Photobook, FIDS, Web Seek, Netra, Cortina [5], VisualSEEK [6] and cognitive science, semantic-based image retrieval still has a long
SIMPLIcity [7]. Various low-level visual features can be extracted way to go for real applications. Comparatively, content-based
from the images and stored as image indexes. The query is an image retrieval (CBIR) is still attracting much attention by
image example that is indexed by its features, and the retrieved researchers. In general, the research of CBIR techniques mainly
images are ranked with respect to their similarity to the query focuses on two aspects: part-based object retrieval [16–19] and
image. Since the indexes are directly derived from the image low-level visual feature-based image retrieval [20–23].
content, it requires no semantic labeling [8]. Considering that In this paper, we focus on edge-based image representation for
humans tend to use high-level features to interpret images image retrieval. In [24,25], Jain et al. introduced the edge direction
and measure their similarity and image low-level features histogram (EDH) for trademark images retrieval. This method is
(e.g. color, texture, shape) often fail to describe the high level invariant to image translation, rotation and scaling because it uses
semantic concepts, researchers have proposed some methods for the edges only but ignores correlation between neighboring
image retrieval by using machine learning techniques such as edges. EDH only suits for flat-images of trademarks. Gevers et al.
SVM [9–13]. [23,35] proposed a new method for image indexing and retrieval
by combining color and shape invariant features. This method is
robust to partial occlusion, object clutter and change in viewpoint.
Corresponding author. Tel./fax: + 86 25 84315510. The MPEG-7 edge histogram descriptor (EHD) can capture
E-mail addresses: [email protected] (G.-H. Liu), the spatial distribution of edges, and it is an efficient texture
[email protected] (L. Zhang), [email protected] (J.-Y. Yang). descriptor for images with heavy textural presence. It can also

0031-3203/$ - see front matter Crown Copyright & 2010 Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2010.02.012
ARTICLE IN PRESS
G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389 2381

work as a shape descriptor as long as the edge field contains the represent the spatial correlation of textons, and it can discrimi-
true object boundaries [26]. In [27], Mahmoudi et al. proposed nate color, texture and shape features simultaneously. Let r, g and
the edge orientation autocorrelogram (EOAC) for shape-based b be unit vectors along the R, G and B axes in RGB color space, we
image indexing and retrieval. It can be used for edge-based image define the following vectors for a full color image f(x,y) [3,32]:
indexing and retrieval without segmentation. The EOAC is
invariant to translation, scaling, color, illumination, and small @R @G @B
u¼ rþ gþ b ð1Þ
viewpoint variations, but it is not appropriate for texture-based @x @x @x
images retrieval. Lowe [28] proposed a very effective algorithm,
called scale-invariant feature transform (SIFT), in computer vision @R @G @B
v¼ rþ gþ b ð2Þ
to detect and describe local features in images. It has been widely @y @y @y
used in object recognition, robotic mapping and navigation, image
stitching, 3D modeling, gesture recognition, video tracking, etc. gxx, gyy and gxy are defined as the dot products of these vectors:
Banerjee et al. [31] proposed to use edge-based features for CBIR.
gxx ¼ uT u ¼ j@R=@xj2 þj@G=@xj2 þj@B=@xj2 ð3Þ
The algorithm is computationally attractive as it computes
different features with limited number of selected pixels. The
texton co-occurrence matrices (TCM) proposed in [20] can gyy ¼ vT v ¼ j@R=@yj2 þ j@G=@yj2 þ j@B=@yj2 ð4Þ
describe the spatial correlation of textons for image retrieval. It
has the discrimination power of color, texture and shape features.
@R @R @G @G @B @B
Kiranyaz et al. [21] proposed a generic shape and texture gxy ¼ uT v ¼ þ þ ð5Þ
@x @y @x @y @x @y
descriptor over multi-scale edge field for image retrieval, which
is the so called 2-D walking ant histogram (2D-WAH). As a shape Let v(x,y) be an arbitrary vector in RGB color space. Using the
descriptor, it deals directly with natural images without any above notations, it can be seen that the direction of maximum
segmentation or object extraction preprocessing stage. When rate of the change of v(x,y) is [3,32]
tuned as a texture descriptor, it can achieve good retrieval
accuracy especially for directional textures. Luo et al. [38] 1 2gxy
yðx; yÞ ¼ tan1 ð6Þ
developed a robust algorithm called color edge co-occurrence 2 ðgxx gyy Þ
histogram (CECH), which is based on a particular type of spatial-
The value of the rate of change at (x, y) in the direction of y(x, y) is
color joint histogram. This algorithm employs perceptual color
given by
naming to handle color variation, and pre-screening to limit the
search scope (i.e. size and location) of the object. 1 1=2
Gðx; yÞ ¼ 2½ðgxx þ gyy Þ þ ðgxx gyy Þcos 2y þ 2gxy sin 2y ð7Þ
Natural scenes are usually rich in both color and texture, and a
wide range of natural images can be considered as a mosaic of Denote by Max(G) and Min(G) the maximum and minimum
regions with different colors and textures. The human visual values of G along some direction by Eq. (7). The original color
system exhibits a remarkable ability to detect subtle differences image is quantized into 256 colors in RGB color space, denoted by
in textures that are generated from an aggregate of fundamental C(x, y). Five special types of texton templates are used to detect
micro-structures or elements [1,2]. Color and texture have close the textons, which are shown in Fig. 1. The flow chart of texton
relationship via fundamental micro-structures in natural images detection is illustrated in Fig. 2. In an image, we move the 2 2
and they are considered as the atoms for pre-attentive human grid from left-to-right and top-to-bottom throughout the image to
visual perception. The term ‘‘texton’’ is conceptually proposed by detect textons with one pixel as the step-length. If the pixel values
Julesz [1]. It is a very useful concept in texture analysis and has that fall in the texton template are the same, those pixels will
been utilized to develop efficient models in the context of texture form a texton, and their values are kept as the original values.
recognition or object recognition [33,34]. However, few works Otherwise they will be set to zero. Each texton template can lead
were proposed to apply texton models to image retrieval. How to to a texton image (an example of texton detection result is shown
obtain texton features, and how to map the low-level texture in Fig. 2(a)), and the five texton templates will lead to five texton
features to textons need to be further studied. To this end, in this images. We combine them into a final texton image, as shown in
paper we propose a new descriptor for image retrieval. It can Fig. 2(b).
represent the spatial correlation of color and texture orientation For the texton images detected with Max(G), Min(G) and
without image segmentation and learning processes. C(x, y), we use co-occurrence matrix to extract their features.
This paper presents a new feature extractor and descriptor, Denote the values of a texton image as f(P) =w, wA{0, 1, y, 255}.
namely multi-texton histogram (MTH), for image retrieval. MTH can The pixel position is P=(x, y). Let P1 = (x1, y1), P2 =(x2, y2), f(P1)= w
be viewed as an improved version of TCM. It is specially designed for ^ If the probability of two values w and w
and f ðP2 Þ ¼ w. ^ co-occur
natural image analysis and can achieve higher retrieval precision with two pixel positions related by D, we define the cell entry
than that of EOAC [27] and TCM [20]. It integrates the advantages of ^ of co-occurrence matrix CD;y as follows:
(w, w)
co-occurrence matrix and histogram by representing the attribute of
co-occurrence matrix using histogram, and can represent the spatial ^ 1 P2 j ¼ Dg
CD;y ¼ 1Prff ðP1 Þ ¼ w4f ðP2 Þ ¼ wjjP ð8Þ
correlation of color and texture orientation.
The rest of this paper is organized as follows. In Section 2, the The TCM utilizes energy, contrast, entropy and homogeneity to
TCM is introduced. The MTH is presented in Section 3. In Section describe image features. For an image, a 12-dimensional vector
4, performance comparison among EOAC, TCM and MTH is taken will be obtained as the final feature for retrieval.
on two Corel datasets. Section 5 concludes the paper.

2. The texton co-occurrence matrix (TCM)

Before describing in detail the proposed MTH, let us brieﬂy

review the TCM [20] method for image retrieval. TCM can Fig. 1. Five special texton types used in TCM.
ARTICLE IN PRESS
2382 G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389

Fig. 2. The flow chart of texton detection in TCM: (a) an example of texton detection; (b) the five detected texton images and the final texton image.

3. The multi-texton histogram (MTH) By applying some gradient operator, such as the Sobel
operator, to a gray level image along horizontal and vertical
The study of pre-attentive (also called effortless) texture directions, we can have two gradient images, denoted by gx and
discrimination can serve as a model system, with which the roles gy. A gradient map g(x,y) can be obtained, with the gradient
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
of local texture detection and global (statistical) computation in magnitude and orientation defined as jgðx; yÞj ¼ gx2 þgy2 and
visual perception can be distinguished [1]. This can be easily
yðx; yÞ ¼ arctanðgy =gx Þ.
explained by the local orientation differences between the
As for full color images, there are red, green and blue channels. If
elements that constitute the two texture images. It is possible
we convert the full color image into a gray image, and then detect
to describe the differences between the texture images globally. If
the gradient magnitude and orientation from the gray image, much
the first-order statistics of two texture images are identical, the
chromatic information will lose. In order to detect the edges caused
second-order statistics may also differ greatly [1]. The first and
by chromatic changes, we propose the following method.
second-order statistics have their own advantages in texture
In the Cartesian space, let a ¼ ðx1 ; y1 ; z1 Þ and b ¼ ðx2 ; y2 ; z2 Þ.
discrimination, so in this paper we propose to combine the first-
Their dot product is defined as
order statistics and second-order statistics into an entity for
texton analysis. We call the proposed technique multi-texton a b ¼ x1 x2 þ y1 y2 þ z1 z2 ð9Þ
histogram (MTH), and use it for image retrieval.
so that
Based on the texton theory [1,2], texture can be decomposed
into elementary units, the texton classes of colors, elongated blobs d a:b x1 x2 þy1 y2 þ z1 z2
cosða; bÞ ¼ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð10Þ
of specific widths, orientation and aspect ratios, and the jajjbj x1 þ y21 þ z21 : x22 þy22 þ z22
2
terminators of these elongated blobs. In the proposed MTH-based
image retrieval scheme, texture orientation needs to be detected We apply the Sobel operator to each of the red, green and blue
for texton analysis. In the following sub-section, we propose a channels of a color image f(x,y). The reason that we use the Sobel
computationally efficient method for texture orientation detec- operator is that it is less sensitive to noise than other gradient
tion. operators or edge detectors while being very efficient [3]. The
gradients along x and y directions can then be denoted by two
vectors aðRx ; Gx ; Bx Þ and bðRy ; Gy ; By Þ, where Rx denotes the gradient
3.1. Texture orientation detection in R channel along horizontal direction, and so on. Their norm and
dot product can be defined as
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Texture orientation analysis plays an important role in
jaj ¼ ðRx Þ2 þ ðGx Þ2 þ ðBx Þ2 ð11Þ
computer vision and pattern recognition. For instance, orientation
is used in pre-attentive vision to characterize textons [1–3]. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Orientation of texture images has a strong influence on human’s jbj ¼ ðRy Þ2 þ ðGy Þ2 þðBy Þ2 ð12Þ
perception of a texture image. Texture orientation can also be
used to estimate the shape of textured images. The orientation a b ¼ Rx Ry þGx Gy þ Bx By ð13Þ
map in an image represents the object boundaries and texture
The angle between a and b is then
structures, and it provides most of the semantic information in
the image. In this paper, we propose a computationally efficient d ab
cosða; bÞ ¼ ð14Þ
algorithm for texture orientation detection. jaj jbj
ARTICLE IN PRESS
G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389 2383

d ab texture discrimination can be increased because the texton
y ¼ arccos½cosða; bÞ ¼ arccos ð15Þ
jaj jbj gradients exist only at texture boundaries [2]. In view of this
and for the convenience of expression, the 2 2 block is used in
After the texture orientation y of each pixel is computed,
this paper for textons detection.
we quantize it uniformly into 18 orientations with 101 as the
The texton templates defined in MTH are different from those
step-length.
in TCM (refer to Fig. 1). In this paper, four special texton types are
defined on a 2 2 grid, as shown in Fig. 3. Denote the four pixels
3.2. Color quantization in RGB color space as V1, V2, V3 and V4. If the two pixels highlighted in gray color have
the same value, the grid will form a texton. Those 4 texton types
It is well known that color provides powerful information for are denoted as T1, T2, T3 and T4, respectively.
image retrieval or object recognition, even in the total absence of The working mechanism of texton detection is illustrated in
shape information. HSV color space could mimic human color Fig. 4. In the color index image C(x, y), we move the 2 2 block
perception well, and thus many researchers use it for color quant- from left-to-right and top-to-bottom throughout the image to
ization. In terms of digital processing, however, RGB color space is detect textons with 2 pixels as the step-length. If a texton is
most commonly used in practice and it is straightforward. In detected, the original pixel values in the 2 2 grids are kept
order to extract color information and simplify manipulation, in unchanged. Otherwise it will have zero value. Finally, we will
this work the RGB color space is used and it quantized into 64 obtain a texton image, denoted by T(x, y).
colors. In Section 4.4, the experiments demonstrated that the RGB The four texton types used in MTH contain richer information
color space is well suitable for our framework. Given a color than those in TCM because the co-occurring probability of
image with size N N, we uniformly quantize the R, G, and B two same-valued pixels is bigger than that of three or four
channels into 4 bins so that 64 colors are obtained. Denote by same-valued pixels in a 2 2 grid. As for the texton detection
C(x, y) the quantized image, where x, y=[0, 1,y, N 1]. Then each procedure, MTH is also faster than TCM. In the texton detection of
value of C(x, y) is a 6-bits binary code, ranging from 0 to 63. TCM, the 2 2 grid moves throughout the image with one pixel as
the step-length, and the detected textons in a neighborhood may
overlap. The final texton image needs to be fused by the
3.3. Texton detection
overlapped components of textons, and this will increase the
computational complexity. Therefore, in this paper the step-
The concept of ‘‘texton’’ was proposed in [1] more than 20
length is set to two pixels to reduce the computational cost.
years ago, and it is a very useful tool in texture analysis. In
general, textons are defined as a set of blobs or emergent patterns
sharing a common property all over the image; however, defining
3.4. Features representation
textons remains a challenge. In [2], Julesz presented a more
complete version of texton theory, with emphasis on critical
In [16], the angle and radius are quantized by using the log
distances (D) between texture elements on which the computa-
polar quantization scheme as in [29,30]. The angle is quantized
tion of texton gradients depends. Textures are formed only if the
into 12 bins and the radius is quantized into 5 bins. The log-polar
adjacent elements lie within the D-neighborhood. However, this
quantization has a good performance in image retrieval. It can
D-neighborhood depends on element size. If the texture elements
well express the local information, but the feature dimension is
are greatly expanded in one orientation, pre-attentive discrimina-
big and the feature matrix is sparse. The TCM scheme utilizes
tion is somewhat reduced. If the elongated elements are not
energy, contrast, entropy and homogeneity to describe image
jittered in orientation, this increases the texton-gradients at the
features [20]. However, these metrics cannot fully represent the
texture boundaries. Thus, with a small element size, such as 2 2,
discrimination power of color, texture and shape features. There is
still much room to improve TCM, and the proposed method in this
section is such an improved version of TCM.
The co-occurrence matrix characterizes the relationship
between the values of neighboring pixels, while the histogram-
based techniques have high indexing performance and are simple
to compute. If we use the co-occurrence matrix to represent
image features directly, the dimension will be high and the
Fig. 3. Four texton types defined in MTH: (a) 2 2 grid; (b) T1; (c) T2; (d) T3 performance can be decreased. If we use histogram only to
and (e) T4. represent image features, the spatial information will be lost. In

Fig. 4. Illustration of the texton detection process.

ARTICLE IN PRESS
2384 G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389

Fig. 5. Two examples of MTH: (a) stained glass; (b) racing car.

order to combine the advantages of co-occurrence matrix and comparison are the edge orientation autocorrelogram (EOAC) [27]
histogram, in this paper we propose the MTH descriptor. and TCM [20]. Both EOAC and the proposed MTH are based on
The values of a texton image T are denoted as wA{0, edge features without image segmentation, and TCM is the origin
1,y,W 1}. Denote by P1 =(x1, y1) and P2 = (x2, y2) two neighboring of our method. In the experiments, we selected randomly 50
pixels, and their values are T(P1)= w1 and T(P2)= w2. In the texture images from every category as query image. The performance is
orientation image yðx; yÞ, the angles at P1 and P2 are denoted by evaluated by the average results of each query respectively. The
yðP1 Þ ¼ v1 and yðP2 Þ ¼ v2 . In texton image T, two different texture source code of the proposed MTH algorithm can be downloaded at
orientations may have the same color, while in texture orientation http://www.comp.polyu.edu.hk/cslzhang/code/MTH_C_Code.txt.
image yðx; yÞ, two different colors may have the same texture
orientation. Denote by N the co-occurring number of two values
v1 and v2, and by N the co-occurring number of two values w1 and 4.1. Datasets
w2. With two neighboring pixels whose distance is D, we define
the MTH as follows: There are so far no standard test datasets and performance
( evaluation models for CBIR systems [4]. Although many image
NfyðP1 Þ ¼ v1 4yðP2 Þ ¼ v2 jjP1 P2 j ¼ Dg datasets, such as Coil-100 dataset, ETH-80 dataset and VisTex
HðTðP1 ÞÞ ¼ ð16Þ
where yðP1 Þ ¼ yðP2 Þ ¼ v1 ¼ v2 texture dataset, are available, they are mainly used for image
classification or object recognition. There are essential differences
( between image retrieval and image classification. Image
NfTðP1 Þ ¼ w1 4TðP2 Þ ¼ w2 jjP1 P2 j ¼ Dg classification has the training dataset and aims at identifying
HðyðP1 ÞÞ ¼ ð17Þ
where TðP1 Þ ¼ TðP2 Þ ¼ w1 ¼ w2 the class of the query image; however, in image retrieval there is
no training set, and the purpose is to search for similar images to
The proposed algorithm analyzes the spatial correlation the given one. The image datasets used for image classification are
between neighboring color and edge orientation based on four not well fitted for image retrieval, and the image representations
special texton types, and then forms the textons co-occurrence used in image classification are often not well fitted for image
matrix and describes the attribute of texton co-occurrence matrix retrieval, either. The Corel image dataset is the most commonly
using histogram. This is why we call it multi-texton histogram used dataset to test image retrieval performance and the Brodatz
(MTH). Fig. 5 shows two examples of the proposed MTH. texture dataset [36] and the OUTex texture dataset [37] are also
H(T(P1)) can represent the spatial correlation between neigh- widely used. Images collected from internet serve as another
boring texture orientation by using color information, leading to a data source especially for systems targeting at Web image
64 dimensional vector. H(y(P1)) can represent the spatial correla- retrieval [37].
tion between neighboring colors by using the texture orientation The Corel image database contains a large amount of images of
information, leading to a 18 dimensional vector. Thus in total various contents ranging from animals and outdoor sports to
MTH uses a 64+ 18= 82 dimensional vector as the final image natural scenarios. Two Corel datasets are used in our image
features in image retrieval. retrieval systems. The first one is the Corel 5000 dataset, which
contains 50 categories. There are 5000 images from diverse
contents such as fireworks, bark, microscopic, tile, food texture,
4. Experimental results tree, wave, pills and stained glass. Every category contains 100
images of size 192 128 in JPEG format. The second dataset is
In this section, we demonstrate the performance of Corel 10 000 dataset. It contains 100 categories. There are 10 000
our method using two Corel datasets. The methods used in images from diverse contents such as sunset, beach, flower,
ARTICLE IN PRESS
G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389 2385

building, car, horses, mountains, fish, food, door, etc. Every 4.4. Retrieval performance
category contains 100 images of size 192 128 in JPEG format.
The Corel 10 000 dataset contains all categories of Corel 5000 In the experiments, different quantization levels of texture
dataset. orientation and color are used to test the performance of the
proposed MTH in RGB color space. The HSV color space is also
used for comparison. Denote by bin(H), bin(S) and bin(V) the
4.2. Distance metric
number of bins for H, S and V components. Similar to [26,39], in
this paper we let bin(H)Z8, bin(S)Z3 and bin(V) Z3 for HSV color
For each template image in the dataset, an M-dimensional space quantization in the image retrieval experiments, and hence
feature vector T=[T1, T2 y TM] will be extracted and stored in the the number of total bins is at least 72 and it is gradually increased
database. Let Q=[Q1, Q2 y QM] be the feature vector of a query to 128 bins. Tables 1 and 2 provide the average retrieval precision
image, the distance metric between them is simply calculated as and recall of MTH in both RGB and HSV color spaces. We can see
XM
jTi Qi j that under the same or similar retrieval precision, the
DðT; Q Þ ¼ ð18Þ performance of MTH in RGB color space is better than that in
i¼1
1þ Ti þ Q i
HSV color space. The precision is about 48–50% in RGB color space
The above formula is as simple to calculate as the L1 distance, when the number of color quantization is 64, while the precision
which needs no square or square root operations. It can save is about 47–49% in HSV color space when the number of color
much computational cost and is very suitable for large scale quantization is 72. In other words, the total number of
image datasets. Actually, it can be considered as a weight L1 quantization bins in HSV color space is higher than that in RGB
distance with the 1=ð1 þTi þ Qi Þ being the weight. For the color space, but its image retrieval precision is lower than that of
proposed MTH, M= 82 for color images. The class label of RGB color space. Considering that the color quantization level
the template image which yields the smallest distance will be determines the feature vector dimensionality, we select the RGB
assigned to the query image. color space for color quantization in the proposed MTH scheme.
However, it should be stressed that this does not mean that RGB
color space will also be better than HSV color space in other image
4.3. Performance measure retrieval methods. It only validates that RGB is better fitted for the
proposed MTH. Indeed, HSV color space is widely use in image
In order to evaluate the effectiveness of our method, the retrieval and object recognition and achieves good performance
Precision and Recall curves are adopted, which are the most [3,26,38,39]. Based on the results in Table 1 and in order to
common measurements used for evaluating image retrieval balance the retrieval precision and vector dimensionality, the
performance. Precision and Recall are defined as follows: final number of color quantization and texture orientation
quantization in the proposed MTH are set to 64 and 18,
PðNÞ ¼ IN =N ð19Þ
respectively.
To validate the performance of the proposed texture orienta-
RðNÞ ¼ IN =M ð20Þ
tion detection method proposed in Section 3.1, we used several
where IN is the number of images retrieved in the top N positions typical gradient operators to detect the gradient magnitude and
that are similar to the query image, M is the total number of orientation and listed the image retrieval results in Table 3. Note
images in the database similar to the query, and N is the total that the proposed method works on the full color image, while the
number of images retrieved. In our image retrieval system, N= 12 other four operators work on the gray level version of the color
and M= 100. images. It can be seen from Table 3 that the proposed orientation

Table 1
The average retrieval precision of MTH with different texture orientation quantization and color quantization levels on the Corel-5000 dataset in RGB color space.

Color quantization levels Texture orientation quantization levels

Precision (%) Recall (%)

6 9 12 18 24 36 6 9 12 18 24 36

128 50.77 51.43 51.22 51.25 51.32 51.14 6.09 6.17 6.15 6.15 6.16 6.13
64 48.82 49.43 49.85 49.98 50.08 49.52 5.86 5.93 5.98 6.00 6.01 5.94
32 45.95 46.93 47.43 47.93 48.00 47.48 5.51 5.63 5.69 5.75 5.76 5.70
16 41.88 42.76 43.42 44.20 44.25 44.43 5.03 5.13 5.21 5.30 5.31 5.33

Table 2
The average retrieval precision of MTH with different texture orientation quantization and color quantization levels on the Corel-5000 dataset in HSV color space.

Color quantization levels Texture orientation quantization levels

Precision (%) Recall (%)

6 9 12 18 24 36 6 9 12 18 24 36

192 48.38 48.77 49.22 49.90 49.78 50.05 5.81 5.85 5.91 5.99 5.97 6.01
128 48.13 48.62 49.07 49.85 49.83 50.38 5.78 5.83 5.89 5.98 5.98 6.05
108 47.95 48.70 49.00 49.37 49.92 49.87 5.75 5.84 5.88 5.92 5.99 5.98
72 47.70 48.15 49.05 49.23 49.41 49.48 5.72 5.78 5.89 5.91 5.93 5.94
ARTICLE IN PRESS
2386 G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389

detector achieve better results because it exploits the chromatic The average retrieval precision of MTH is from about 49–48% for
information that is ignored by other gradient operators in Corel-5000 dataset and from about 40–39% for Corel-10 000
orientation detection. dataset. The best performance of MTH is obtained when D=1 for
We then validate the performance of our distance metric and both Corel 5000 dataset and Corel 10 000 dataset. MTH takes into
other popular distance or similarity metrics in the proposed MTH account the spatial correlation between neighboring color and
method. As can be seen from Table 4, the proposed distance edge orientation by using the four texton types. If we increase the
metric obtains much better results than other others distance values of distance parameter, the performance is reduced because
metrics or similarity metric such as histogram intersection. We the probability of neighboring pixels with the same gray level
can also see that the L1 distance and L2 Euclidian distance have the in a 2 2 grid is higher than that in a bigger grid. In other word,
same result with the proposed MTH method, but L1 distance is the information with D=1 is richer than other distance values,
much more computationally efﬁcient at the price of losing thus MTH obtains the best performance when the distance
rotation invariant property [38]. parameter D= 1.
The proposed MTH integrates the merits of co-occurrence The average retrieval precision and recall results on the two
matrix and histogram by representing the attribute of co- datasets are listed in Table 6, and the average retrieval precision
occurrence matrix using histogram. As can be seen form Fig. 5, and recall curves are plotted in Fig. 6. It can be seen from the
there are many bins whose frequencies are close to zero, thus if Table 6 and Fig. 6 that our method achieves much better results
we apply histogram intersection to MTH. The probability that than EOAC and TCM methods. On the Corel-5000 dataset with
minðTi ; Qi Þ ¼ 0 will be high and hence false match may appear. D=1, MTH’s precision is 22.62% and 18.75% higher than TCM and
Therefore, histogram intersection is not suitable to the proposed EOAC, respectively. On the Corel-10 000 MTH’s precision is 20.45%
MTH as a similarity metric. The results in Table 4 also validate and 17.51% higher than TCM and EOAC, respectively.
this. Meanwhile, the proposed distance metric in Section 4.2 is Figs. 7 and 8 show two retrieval examples on the Corel 5000
simple to calculate, while it can be considered as a weighted L1 and Corel 10 000 datasets. In Fig. 7, the query is a stained glass
distance with the 1=ð1 þ Ti þ Qi Þ being the weight. Since the same image, and the top all retrieved images show good match of
values of jTi Qi j can come from different pairs of Ti and Qi , using texture and color to the query image. In Fig. 8, the query image is
the weight parameter can reduce the opposite forces. a racing car which has obvious shape features. All the top 12
We vary the distance parameter D= 1, 2,y,9 in calculating the retrieved images show good match of the shape, where 10
MTH. The average retrieval precision values are listed in Table 5. returned images belong to F1 racing car.

Table 3
The retrieval precision of MTH with different gradient operators for orientation Table 6
detection. The average retrieval precision and recall results on the two Corel datasets.

Dataset Performance Gradient operators Datasets Performance Method

Proposed Sobel Robert LoG Prewitt EOAC TCM MTH

Corel-5000 Precision (%) 49.98 49.58 48.93 48.02 49.24 Corel-5000 Precision (%) 31.23 27.36 49.98
Recall (%) 6.00 5.45 5.87 5.76 5.91 Recall (%) 3.74 3.28 6.00

Corel-10 000 Precision (%) 40.87 39.48 39.18 38.72 39.26 Corel-10 000 Precision (%) 23.36 20.42 40.87
Recall (%) 4.91 4.74 4.71 4.65 4.71 Recall (%) 2.81 2.45 4.91

Table 4
The average retrieval precision of MTH with different distance metrics.

Dataset Performance Distance or similarity metrics

Our distance metric L1 Euclidian Histogram intersection

Corel-5000 Precision (%) 49.98 45.55 45.55 35.62

Recall (%) 6.00 5.47 5.47 4.27

Corel-10 000 Precision (%) 40.87 35.29 35.29 27.37

Recall (%) 4.91 4.23 4.23 3.28

Table 5
The average retrieval precision of MTH with different distance parameter.

Datasets Performance Distance parameter (D)

1 2 3 4 5 6 7 8 9

Corel-5000 Precision (%) 49.98 49.37 49.10 49.30 49.22 49.08 49.07 48.63 48.47
Recall (%) 6.00 5.93 5.89 5.92 5.91 5.89 5.89 5.84 5.82

Corel-10 000 Precision (%) 40.87 40.79 40.61 40.33 40.26 40.18 40.02 39.86 39.52
Recall (%) 4.91 4.89 4.87 4.84 4.83 4.82 4.80 4.78 4.74
ARTICLE IN PRESS
G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389 2387

Fig. 6. The precision and recall curves of EOAC, TCM and MTH. (a) Corel-5000 dataset and (b) Corel-10 000 dataset.

Fig. 7. An example of image retrieval by MTH on the Corel 5000 dataset. The query is a stained glass image, and all images are correctly retrieved and ranked within the top
12 images. (The top-left image is the query image, and the similar images include the query image itself).

Fig. 8. An example of image retrieval by MTH on the Corel 10 000 dataset. The query is a racing car image, and all the returned images are correctly retrieved and ranked
within top 12 images, where 10 returned images belong to F1 racing car. (The top-left image is the query image, and the similar images include the query image itself).

EOAC is invariant to translation, scaling, illumination and small well represent the shape information of the image, it cannot well
rotation. It represents edges features based on their orientations represent the color and texture features [27]. In the experiments
and correlation between neighboring edges. Though EOAC can we see that EOAC achieves good performance only for a few image
ARTICLE IN PRESS
2388 G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389

categories which have obvious shape features without complex References

background. EOAC is also appropriate for retrieving images with
continuous and clear edges, especially for images with direct [1] B. Julesz, Textons, the elements of texture perception and their interactions,
Nature 290 (5802) (1981) 91–97.
lines. However, EOAC is not appropriate for retrieving images [2] B. Julesz, Texton gradients: the texton theory revisited, Biological Cybernetics
with texture and unclear edges [27]. In order to be invariant to 54 (1986) 245–251.
illumination, EOAC loses some color information. The spatial [3] R.C. Gonzalez, R.E. Woods, Digital Image Processing, third ed, Prentice Hall,
2007.
correlation of edge and edge orientation can only represent image [4] Y. Liu, D. Zhang, G. Lu, W.-Y. Ma, A survey of content-based image retrieval
features partially. EOAC has advantage in shape feature repre- with high-level semantics, Pattern Recognition 40 (11) (2007) 262–282.
sentation by the spatial correlation of edge orientation, and this [5] T. Quack, U. Monich, L. Thiele, B.S. Manjunath, Cortina: a system for large-
scale, content-based web image retrieval, in: Proceedings of the 12th annual
advantage is preserved in the proposed method. ACM international conference on Multimedia, 2004.
TCM describes an image by its gradient information and color [6] J.R. Smith, S.-F. Chang, Visual Seek: A Fully Automated Content-Based Image
information with a 12-dimensional vector, including features of Query System, In ACM Multimedia, Boston, MA, 1996 87–98.
[7] J.Z. Wang, J. Li, G. Wiederholdy, SIMPLIcity: semantics-sensitive integrated
energy, contrast, entropy and homogeneity [20]. However, TCM matching for picture libraries, IEEE Transactions on Pattern Analysis and
does not take into account the relationship between gradient Machine Intelligence 9 (23) (2001) 947–963.
and color features, and thus the discrimination power of TCM is [8] F. Monay, D. Gatica-perez, Modeling semantic aspects for cross-media image
indexing, IEEE Transactions on Pattern Analysis and Machine Intelligence 29
not high enough for image retrieval in large scale image data-
(10) (2007) 1802–1817.
sets. The features used in TCM belong to the second-order [9] R. Marée, P. Geurts, L. Wehenkel, Content-based image retrieval by indexing
statistics. Based on Julesz’s texton theory, the second-order random subwindows with randomized trees, ACCV 4844 (2007) 611–620.
statistics are not always identical to the difference of two textures [10] A. Singhal, J. Luo, W. Zhu, Probabilistic spatial context models for
scene content understanding, Proceedings of Computer Vision and Pattern
[1,2], so using only those features to describe image content may Recognition 1 (1) (2003) 1235–1241.
not always enhance the texture discrimination power. The [11] N. Vasconcelos, Image indexing with mixture hierarchies, Proceedings of
proposed MTH combines the first-order statistics and second- Computer Vision and Pattern Recognition 1 (1) (2001) 1–10.
[12] X. He, W.-Y. Ma, H.-J. Zhang, Learning an image manifold for retrieval, in:
order statistics into an entity for texton analysis, and thus the Proceedings of the 12th Annual ACM International Conference on Multi-
texture discrimination power is greatly increased. MTH can media, 2004.
represent the spatial correlation of edge orientation and color [13] S.C.H. Hoi, R. Jin, J. Zhu, M.R. Lyu, Semi-Supervised SVM batch mode active
learning and its applications to image retrieval, ACM Transactions on
based on textons analysis. So its performance is better than Information Systems (TOIS) 27 (3) (2009) 1–29.
EOAC and TCM. [14] R. Zhang, Z. Zhang, Effective image retrieval based on hidden concept
The experiments were all performed on a double core 1.8 GHz discovery in image database, IEEE Transactions on Image processing 16 (2)
(2007) 562–572.
Pentium PC with 1024 MB memory and the Windows XP
[15] L. Wu, Y. Hu, M. Li, N. Yu, X.-S. Hua, Scale invariant visual language modeling
operating system. The image retrieval system was built in for object categorization, IEEE Transactions on Multimedia 11 (2) (2009)
Borland Delphi 7. During the course of features extraction for a 286–294.
[16] J. Amores, N. Sebe, P. Radeva, Context based object-class recognition and
natural image of size 192 128, the average time consumption of
retrieval by generalized correlograms, IEEE Transactions on Pattern Analysis
EOAC, TCM and MTH are 887.40, 157.36 and 314.38 ms, and Machine Intelligence 29 (10) (2007) 1818–1833.
respectively. The time used by MTH is mainly on the stage of [17] Y. chi, M.K.H. Leung, Part-based object retrieval in cluttered environment,
texton analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (5) (2007)
890–895.
[18] Y. chi, M.K.H. Leung, ALSBIR: a local-structure-based image retrieval, Pattern
Recognition 40 (1) (2007) 244–261.
[19] N. Alajlan, M.S. Kamel, G.H. Freeman, Geometry-based image retrieval in
5. Conclusion binary image databases, IEEE Transactions on Pattern Analysis and Machine
Intelligence 30 (6) (2008) 1003–11013.
[20] G.-H. Liu, J.-Y. Yang, Image retrieval based on the texton co-occurrence
We proposed a new method, namely multi-texton histogram matrix, Pattern Recognition 41 (12) (2008) 3521–3527.
(MTH), to describe image features for image retrieval. MTH can [21] S. Kiranyaz, M. Ferreira, M. Gabbouj, A generic shape/texture descriptor over
represent both the spatial correlation of texture orientation and multiscale edge field: 2-D walking ant histogram, IEEE Transactions on Image
processing 17 (3) (2008) 377–390.
texture color based on textons. It integrates co-occurrence matrix [22] C.-H. Yao, S.-Y. Chen, Retrieval of translated, rotated and scaled color
and histogram into one descriptor and represents the attribute of textures, Pattern Recognition 36 (4) (2003) 913–929.
co-occurrence matrices using histograms. MTH does not need any [23] T. Gevers, A.W.M. Smeulders, PicToSeek: combining color and shape invariant
features for image retrieval, IEEE Transactions on Image processing 9 (1)
image segmentation, learning and training stages, and it is very
(2000) 102–119.
easy to implement. It is well suited for large-scale image dataset [24] A.K. Jain, A. Vailaya, Image retrieval using color and shape, Pattern
retrieval. MTH can be considered as a generalized visual attribute Recognition 29 (8) (1996) 1233–1244.
descriptor. Moreover, when used as a color texture descriptor, it [25] A.K. Jain, A. Vailaya, Shape-based retrieval: a case study with trademark
image database, Pattern Recognition 31 (9) (1998) 1369–1390.
can obtain good performance for natural texture extraction. The [26] B.S. Manjunath, J.-R. Ohm, V.V. Vasudevan, A. Yamada, Color and texture
dimension of MTH feature vector is only 82, which is efficient for descriptors, IEEE Transactions on Circuit and Systems for Video Technology
image retrieval. The experiments were conducted on two Corel 11 (6) (2001) 703–715.
[27] F. Mahmoudi, J. Shanbehzadeh, et al., Image retrieval based on shape
datasets in comparison with the edge orientation auto-correlo- similarity by edge orientation autocorrelogram, Pattern Recognition 36 (8)
gram (EOAC) method and the texton co-occurrence matrix (TCM) (2003) 1725–1736.
method. The experimental results validated that our method has [28] D.G. Lowe, Distinctive image features from scale-invariant keypoints,
International Journal of Computer Vision 60 (2) (2004) 91–110.
strong discrimination power of color, texture and shape features, [29] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using
and outperforms EOAC and TCM significantly. shape contexts, IEEE Transactions on Pattern Analysis and Machine
Intelligence 24 (4) (2002) 509–522.
[30] G. Mori, S. Belongie, J. Malik, Efficient shape matching using shape contexts,
IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (11)
Acknowledgment (2005) 1832–1837.
[31] M. Banerjee, M.K. Kundu, Edge based features for content based image
retrieval, Pattern Recognition 36 (11) (2003) 2649–2661.
This work was supported by the National Natural Science Fund [32] S. Di Zenzo, A note on the gradient of a multi-image, Computer Vision,
Graphics, and Image Processing 33 (1) (1986) 116–125.
of China (No. 60632050) and the Hong Kong RGC General
[33] T. Leung, J. Malik, Representing and recognizing the visual appearance of
Research Fund (PolyU 5351/08E). The authors would like to thank materials using three-dimensional textons, International Journal of Computer
the anonymous reviewers for their constructive comments. Vision 43 (1) (2001) 29–44.
ARTICLE IN PRESS
G.-H. Liu et al. / Pattern Recognition 43 (2010) 2380–2389 2389

[34] J. Winn, A. Criminisi, T. Minka, Object categorization by learned universal [36] /http://www.ux.uis.no/ tranden/brodatz.htmlS.
visual dictionary, in: Proceedings of the 10th IEEE International Conference [37] /http://www.outex.oulu.ﬁ/index.php?page=image_databaseS.
on Computer Vision, (2005) pp. 1800–1807. [38] J. Luo, D. Crandall, Color object detection using spatial-color joint probability
[35] A. Diplaros, T. Gevers, I. Patras, Combining color and shape information for functions, IEEE Transactions on Image Processing 15 (6) (2006) 1443–1453.
illumination-viewpoint invariant object recognition, IEEE Transactions on [39] M.J. Swain, D.H. Ballard, Color indexing, International Journal of Computer
Image processing 15 (1) (2006) 1–11. Vision 7 (1) (1991) 11–32.

About the Author—GUANG-HAI LIU is currently an associate professor with the College of Computer Science and Information Technology, Guangxi Normal University in
China. He received Ph.D degree from the School of Computer Science and Technology, Nanjing University of Science and Technology (NUST). His current research interests
are in the areas of image processing, pattern recognition and artiﬁcial intelligence.

About the Author—LEI ZHANG received the B.S. degree in 1995 from Shenyang Institute of Aeronautical Engineering, Shenyang, PR China, the M.S. and PhD degrees in
Electrical and Engineering from Northwestern Polytechnic University, Xi’an, PR China, respectively, in 1998 and 2001. From 2001 to 2002, he was a research associate in the
Department of Computing, The Hong Kong Polytechnic University. From January 2003 to January 2006 he worked as a Postdoctoral Fellow in the Department of Electrical
and Computer Engineering, McMaster University, Canada. Since January 2006, he has been an Assistant Professor in the Department of Computing, The Hong Kong
Polytechnic University. His research interests include Image and Video Processing, Biometrics, Pattern Recognition, Multi-sensor Data Fusion and Optimal Estimation
Theory, etc.

About the Author—YING-KUN HOU is currently a Ph.D candidate with the School of Computer Science and Technology, Nanjing University of Science and Technology
(NUST). He is also a lecturer with the School of Information Science and Technology, Taishan University. His current research interests include image processing, digital
watermarking and pattern recognition.

About the Author—ZUO-YONG LI received the B.S. degree in computer science and technology from Fuzhou University in 2002. He got his M.S. degree in computer science
and technology from Fuzhou University in 2006. He is a Ph.D. candidate now in Nanjing University of Science and Technology, China. His research interests include image
segmentation and pattern recognition.

About the Author—JING-YU YANG received the B.S. Degree in Computer Science from Nanjing University of Science and Technology (NUST), China. From 1982 to 1984 he
was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. From 1993 to 1994 he was a visiting professor at the Department
of Computer Science, Missourian University in 1998; he worked as a visiting professor at Concordia University in Canada. He is currently a professor and Chairman in the
department of Computer Science at NUST. He is the author of over 100 scientific papers in computer vision, pattern recognition and artificial intelligence. He has won
more than 20 provincial awards and national awards. His current research interests are in the areas of image processing, robot vision, pattern recognition and
artificial intelligence.