Image Retrieval Using Colour Co-occurrence
Histograms
Linjiang Yu and Georgy Gimel’farb
CITR, Department of Computer Science, Tamaki Campus,
The University of Auckland , New Zealand
[email protected], [email protected]
Abstract
Content-based image retrieval (CBIR) has been intensively studied recent years due to its importance in various
database management and computer vision applications. Searching by an image example that allows to retrieve
a given image or similar images from a large image collection is one of the most challenging CBIR problems
today. The paper proposes and investigates a new algorithm for a partial solution of this problem. The algorithm
uses combined colour – texture features to find out whether an image contains spatially homogeneous colour
textured regions similar to the given example (training image). First, quantization of the HSV colour space
focuses only on the colours to be found during the search. Secondly, similarity between characteristic normalised
colour co-occurrence histograms (nCCHs) in the moving windows over the image and the like training nCCHs is
measured to detect the desired regions. Finally, the frequency distributions of the similarity values are compared to
rank the images in the database in their similarity to the training image. Our experiments show that the proposed
algorithm effectively retrieves images containing the desired textures.
Keywords: content-based image retrieval (CBIR), texture, colour co-occurrence histogram (CCH), pairwise
interaction
1 Introduction contrast, inverse difference moment, entropy and
several other properties of a grey level co-occurrence
Image retrieval has important practical applications in matrix in [10, 11], statistics for a group of such
database management and computer vision (see, for matrices reflecting a characteristic structure of pairwise
instance, comprehensive surveys in [1, 2, 3]). An effec- pixel interactions [12], or statistics of coefficients of
tive image retrieval system should integrate both text- the wavelet or Gabor image transforms [13, 14].
based [4, 5] and content-based image retrieval (CBIR) Today’s CBIR exploits usually first three of the six
techniques. Limitations of the former ones are the large perceptually meaningful textural features derived
amount of human’s labour for manual image annota- from the co-occurrence statistics in [15], namely,
tion and subjectivity of human perception of rich im- image coarseness, contrast, directionality, linelikeness,
age contents. The latter techniques try to overcome regularity, and roughness.
these limitations by relating image content to partic- Combinations of particular features result sometimes in
ular quantitative image features such as colour, texture, better performance [16, 17, 18]. Thus one might expect
shape of objects, and so on. The retrieval problem that some combined colour-texture features can in prin-
is formulated as the problem of searching in a large ciple enhance the current CBIR techniques. This paper
pictorial database for images having features coincid- proposes and investigates an algorithm that retrieves a
ing with or closely similar to those of a given example given training image and/or similar images from a large
(training image). image collection by comparing characteristic subsets of
Most of the recently developed CBIR techniques are normalised colour co-occurrence histograms (nCCHs)
based on colour and texture features. Popular colour collected over small windows of a fixed size around
features include a pixel-wise colour histogram [6], each pixel. The characteristic subsets are selected using
colour moments [7], and colour set vectors [8, 9]. a generic Gibbs random field (GGRF) texture model
Experiments in [7] have shown that the moment-based proposed initially in [19, 20] for greyscale spatially ho-
features perform sometimes better than other ones. mogeneous textures.
Texture features describing spatial signal interrelations 2 Retrieval Algorithm: Basic Steps
are also widely used in image retrieval. Most of such
features measure statistical properties of grey level The proposed algorithm converts first the original RGB
co-occurrences in particular subsets of pixels, e.g., colour space into the HSV (Hue, Saturation, Value)
42 Image and Vision Computing NZ
one and quantise this latter in order to reduce the data of perceptually meaningful independent channels. To
volume while preserving most of the training colours. match human colour perception having more tolerance
Then the GGRF model is used for selecting the pixel to saturation and value deviations, the quantised images
neighbourhood which is most characteristic for the should preserve more hue levels comparing to the other
training image and for collecting the corresponding two channels. The quantisation helps also to keep a
training nCCHs both over the entire image (as to reasonable computing time.
Let G gi gi h gi s gi v : i 1 M; gi Qhsv
describe it quantitatively) and in the moving windows
of a fixed size centred at each pixel. These latter
where Qhsv Qh Qs Qv denote a digital colour im-
nCCHs are used to select a similarity threshold for
age in the 3D HSV vector colour space. Here, g i is
detecting pixels that might belong to the desired
a colour vector for the image position i being a short-
texture. The threshold relates to the minimum
hand notation of the 2D integer Cartesian coordinates
i x y, and Qh Qs Qv are the finite integer sets of
similarity between the local and global training
nCCHs. After selecting the candidate texture regions,
the colour component values: Q h 0 Qh , Qs
their similarity to the training texture is measured by
forming an empirical distribution of distances between
0 Qs , Qv 0 Qv . Due to a specific bi-
conic form of the perceived HSV space, the quantised
colours involve Q h 1 hue levels, Qs saturation levels
the local (window-based) nCCHs for the image at
and Qv value levels plus Qv 1 pure grey levels, so
hand and the global (entire-image-based) training
nCCHs and comparing it to the like empirical training
that the total number of colours after quantisation is
τ Qh 1 Qs Qv Qv 1. For example, if Q h
distribution.
Let both the training sample and all the images in an 17 Qs 3 Qv 3, then τ 166.
Let Gtr gtrj gtrjh gtrjs gtrjv : j 1 Mtr ; gtrj
image database be already converted into the HSV
Qhsv denote the training image.
colour space. Then the basic steps of the algorithm are
as follows: Let a colour
set Φ present all the colours contained in the
training sample Gtr . Then a candidate image
G¼ g¼i : i 1 M; g¼i Φ can be obtained from
1. Collect the colour set by quantising the training
sample.
the image G to be searched for as follows:
2. Choose the most characteristic nCCHs for the
training sample. g¼i gi , if gi Φ or it is sufficiently close to that
set;
3. Find the empirical distribution of the distances be-
tween the training local and global nCCHs. g¼i 0 0 Qv, otherwise (the white background).
4. Select the maximum distance as the distance The candidate image G ¼ that keeps all the colours sim-
threshold. ilar to ones in the training sample is used at the next
5. For each image in the database, step. In order to reduce computing time, the training
sample Gtr and all the candidate images G ¼ use the
(a) Select the candidate colour pixels in the im- index images with respect to the colour set Φ, so that
age using the above training colour set. the images themselves need not be changed.
(b) Find the candidate texture regions by thresh- Let the index set of Φ be Q 0 Q. Then the in-
olding the distances between the local nC- dex images of the training sample G tr and the candidate
CHs in the moving window around each can- image G¼ will be Gtr gtrj : j 1 Mtr ; gtrj Q
didate colour pixel and the global training and G¼ g¼i : i 1 M; g¼i Q, respectively.
nCCHs.
(c) Calculate the distance between the empirical 2.2 Pixel-wise Similarity Measure
distribution of the nCCHs-based distances
over the candidate texture region and the Spatially homogeneous greyscale image textures
like empirical training distribution. can be modelled as samples of a generic Gibbs
random field (GGRF) with multiple pairwise pixel
6. Retrieve the database images with the bottom- interactions [19, 20]. Characteristic geometric
rank distances being below a certain threshold. structure of interactions and Gibbs potentials giving
quantitative interaction strengths for a particular
2.1 Colour space quantization texture are analytically estimated from the training
sample of the texture. The estimation yields a
The RGB colour space is the most common format for characteristic subset of pixel neighbours A specifying
digital images, while the HSV (Hue, Saturation, Value) most “energetic” translation invariant families of
model is more attractive in CBIR applications because interacting pixel pairs, or cliques of the neighbourhood
Palmerston North, November 2003 43
graph, Ca i i a : i 1 M; i a that 0 ζ U. The desired images to be retrieved
1 M ; a A. In our case, each such clique have the top-rank ζ –values after the database is sorted
family presented in the GGRF model is extended onto by the descending distance order, that is, the smaller
a colour texture by the corresponding CCH acting as a the distance ζ , the higher the rank of an image and
sufficient statistic. the more likely the image regions similar to the
training sample. A threshold ζ̄ of the region similarity
Let Fa Gtr Fa q sGtr : q s Q and Fa i G
measure ζ should be chosen to reject images with the
Fa iq sG : q s Q denote the global nCCH for the “inappropriate” candidate regions.
clique family Ca over the training sample G tr and the
like local nCCH over the moving window W around a
position i in the image G, respectively [21]. Experi- 3 Experimental Results and Conclu-
ments with different textures show that the symmetric sions
χ 2 -distance between these two CCHs:
The experiments below use 167 colour images
Da i Fa i G Fa (128 128) from the MIT Media Lab VisTex database.
Gtr
Other parameters of the algorithm are chosen as
Fa i q sG Faq sGtr
2
(1) follows: the moving window 17 17, one (A 1)
∑
q s¾Q Fa i q sG Faq sGtr most energetic clique family per each training sample
Gtr selected among 1830 possible variants with the
inter-pixel x and y offsets up to 30, and the quantising
is much less scattered over the training sample than, for
distance step ∆ 00125. The original VisTex texture
instance, the pixel-wise Gibbs energies or conditional
patches (64 64) selected as the training samples are
probabilities of signals. Therefore, for the A charac-
shown in Fig. 1.
teristic families, the pixel-wise similarity measure be-
tween G and Gtr can be defined as follows: Figures 2 – 9 illustrate results of texture retrieval us-
ing the proposed algorithm and the HSV quantisation
Di Fi G FGtr Da i Fa i G Fa Gtr
1
A a∑
with Qh 23 Qs 3 Qv 3. Each example demon-
¾A strates the original VisTex texture patch selected as the
training sample (a) and the four top-rank (in similarity)
In fact, most of the CCHs for the different clique retrieved images with quantitatively similar texture re-
families in a homogeneous texture have very similar gions (b)-(e). White regions in the images represent the
patterns. Therefore the similarity measure does rejected background having colours that differ much
not change significantly when the number of the from the training ones. As one may expect, the images
characteristic families is increasing. In our experiments have the smallest region distance with respect to their
below only one most “energetic” family is used to own training patches.
calculate the distances. Table 1 lists more results of colour texture retrieval. If
The candidate texture regions in the image G ¼ the threshold ζ̄ 175, almost all the retrieved images
are detected pixel-by-pixel by thresholding all the are our expected ones except for the training samples
distances Di Fi G¼ FGtr : i 1 M using the “Fabric.4”, “Metal.0”, and “Tile.0”.
distance threshold ξ max D j F j Gtr FGtr : j The reason is that the retrieval accuracy depends on
1 Mtr . The detected regions are of the similar the number of colours for the training image in the
texture type relating to the training sample with respect colour set Φ specified by the HSV quantization. Gen-
to the pixel-wise CCH-based similarity measure. erally, texture discrimination becomes reasonably good
for the chosen image database if the number of individ-
2.3 Region Similarity Measure ual colours in the set Φ is greater than 16. Results of the
colour texture retrieval for training sample “Fabric.4”
Let ΩG¼ be the candidate texture regions in the image and “Stone.4” with different variants of the HSV quan-
G¼ . Let DΩG¼ Dk Fk G¼ FGtr : k ΩG¼ tisation are shown in Tables 2-3. This suggests that
be the distance distribution over the candidate texture the set Φ has to be found by dynamically choosing the
regions. Let DGtr D j F j Gtr FGtr : j quantising levels for each training sample in order to
1 M be the distance distribution over the
tr obtain the precise retrieval. Our experiments show that
training sample. The distance range in Eq.(1) is 16 Φ 32 is a suitable range for most of the images
from 0 to 2 inclusively. We can estimate relative in our database. Meanwhile, an inappropriate set Φ can
frequency distributions DG tr and DΩG¼ by result in some unexpected retrieved images. For exam-
quantising the distance range with a certain step ple, for the training sample “Brick.0” shown in Fig. 3,
∆, U 0 ∆ 2∆ U∆ 2. The symmetric the totally different texture “Misc.2” emerges among
χ 2 -distance between these two distributions similar the images retrieved with Q h 23 Qs 3 Qv 3, and
to Eq.(1) gives the region similarity measure ζ such Φ 8. However, when Φ 17 under the quantisa-
44 Image and Vision Computing NZ
tion with Qh 27 Qs 5 Qv 5, the texture “Misc.2”
is be excluded from the retrieved subset, and only the
two textures “Brick.0” (ζ 0049) and “Brick.1” (ζ
129) remain as the retrieved ones. These and other our
experiments show that the proposed algorithm effec-
tively retrieves images containing the desired textures.
Figure 4: (a) Training sample: Buildings.1; (b)–(e)
the retrieved four top-rank texture regions: Buildings.1,
ζ 0084; Buildings.2, ζ 025; Buildings.4, ζ
084; Buildings.10, ζ 137, respectively.
Figure 1: The original VisTex texture patches (64 Figure 5: (a) Training sample: Clouds.0; (b)–(e) the
64) selected as the training samples: Bark.0, Bark.12, retrieved four top-rank texture regions: Clouds.0, ζ
Brick.0, Buildings.1, Clouds.0, Fabric.0, Fabric.4, Fab- 136; Brick.6, ζ 182; Buildings.6, ζ 273; Tile.7,
ric.8; Fabric.11, Fabric.13, Fabric.15, Fabric.17, Flow- ζ 398, respectively.
ers.0, Flowers.4, Food.0, Food.6; Food.10, Grass.1,
Leaves.1, Leaves.12, Metal.0, Misc.2, Paintings.1.0,
Sand.0; Stone.1, Stone.4, Terrain.0, Tile.0, Tile.7,
Water.0, Water.2, WheresWaldo.1, respectively.
Figure 6: (a) Training sample: Fabric.4; (b)–(e) the
retrieved four top-rank texture regions: Fabric.6, ζ
498; Fabric.4, ζ 660; Food.5, ζ 1324; Fabric.5,
ζ 2649, respectively.
Figure 2: (a) Training sample Bark.12; (b)–(e) the [2] A. W. M. Smeulders, M. Worring, S. Santini,
retrieved four top-rank texture regions: Bark.12, ζ A. Gupta, and R. Jain. Content based image
012; Bark.11, ζ 022; Bark.10, ζ 091; Bark.9, retrieval at the end of the early years. IEEE
ζ 149, respectively. Trans. Pattern Analysis and Machine Intelligence,
22(12):1349–1380, 2000.
[3] S. K. Chang and A. Hsu. Image information sys-
tems: Where do we go from here? IEEE Trans.
Knowledge and Data Engineering, 4(5):431–442,
1992.
Figure 3: (a) Training sample Brick.0; (b)–(e) the [4] N. S. Chang and K. S. Fu. Picture query lan-
retrieved four top-rank texture regions: Brick.0, ζ guages for pictorial database systems. IEEE
0054; Misc.2, ζ 035; Brick.1, ζ 044; Misc.3, Computer Magazine, 14(11):23–33, 1981.
ζ 199, respectively.
[5] S. K. Chang and T. L. Kunii. Pictorial data-base
4 Acknowledgements systems. IEEE Computer, 14(11):13–21, 1981.
This work was supported in part by the Royal [6] M. J. Swain and D. H. Ballard. Color indexing.
Society of New Zealand Marsden Fund under Grant Int. J. Computer Vision, 7(1):11–32, 1991.
3600771/9143 (UOA 122). [7] M. Stricker and M. Orengo. Similarity of color
images. In Proc. SPIE Storage and Retrieval for
References Image and Video Databases, volume 2420, pages
381–392, 1995.
[1] Y. Rui, T. S. Huang, and S. Chang. Image re-
trieval: current techniques, promising directions [8] J. R. Smith and S. Chang. Single color extraction
and open issues. J. Visual Communication and and image query. In Proc. IEEE Int. Conf. Image
Image Representation, 10(4):39–62, 1999. Processing (ICIP’95), pages 528–531, 1995.
Palmerston North, November 2003 45
Table 1: Colour texture retrieval from the MIT VisTex database with Q h 23 Qs 3 Qv 3
Training Number Three Top Rank Textures
Patch of 1 2 3
64 64 Colours ζ Texture ζ Texture ζ Texture
Bark.0 14 040 Bark.0 559 Bark.2 772 Bark.1
Bark.12 10 012 Bark.12 022 Bark.11 091 Bark.10
Brick.0 8 0054 Brick.0 035 Misc.2 044 Brick.1
Buildings.1 10 0084 Buildings.1 025 Buildings.2 084 Buildings.4
Clouds.0 4 136 Clouds.0 182 Brick.6 273 Buildings.6
Fabric.0 10 053 Fabric.0 744 Fabric.1 - -
Fabric.4 9 498 Fabric.6 660 Fabric.4 1324 Food.5
Fabric.8 12 0095 Fabric.8 146 Fabric.9 148 Fabric.10
Fabric.11 10 014 Fabric.11 032 Fabric.12 1098 Tile.9
Fabric.13 11 010 Fabric.13 021 Fabric.14 993 Fabric.4
Fabric.15 14 0014 Fabric.15 0031 Fabric.16 1916 Paintings.41.1
Fabric.17 5 0029 Fabric.17 117 Clouds.1 155 Brick.6
Flowers.0 25 0012 Flowers.1 0034 Flowers.0 512 Sand.2
Flowers.4 13 0086 Flowers.4 023 Flowers.5 118 Flowers.7
Food.0 12 0169 Food.0 1151 Grass.0 1197 Tile.9
Food.6 73 023 Food.6 031 Food.7 207 WheresWaldo.2
Food.10 14 035 Food.10 483 Food.11 1968 Sand.4
Grass.1 13 057 Grass.1 213 Grass.2 965 Brick.7
Leaves.1 9 141 Leaves.1 588 Leaves.0 1341 Bark.6
Leaves.12 13 147 Leaves.12 233 Leaves.13 1377 Leaves.11
Metal.0 6 199 Metal.0 308 Metal.3 343 Metal.2
Misc.2 6 0023 Misc.2 121 Misc.3 149 Stone.5
Paintings.1.0 49 054 Paintings.1.0 120 Paintings.31.1 181 Leaves.9
Sand.0 7 149 Sand.0 590 Sand.1 874 Sand.4
Stone.1 21 175 Stone.1 1677 Brick.3 2113 Sand.6
Stone.4 3 044 Water.5 044 Stone.4 079 Fabric.7
Terrain.0 11 031 Terrain.0 333 Terrain.9 351 Terrain.1
Tile.0 9 598 Tile.0 1435 Metal.0 1843 Metal.3
Tile.7 5 136 Tile.7 1312 Clouds.6 1614 Buildings.6
Water.0 6 011 Water.0 015 Water.7 019 Water.1
Water.2 10 047 Water.2 - - - -
WheresWaldo.1 26 0049 WheresWaldo.1 569 Food.4 649 Leaves.14
Figure 7: (a) Training sample: Paintings.1.0; (b)– Figure 8: (a) Training sample: Tile.0; (b)–(e) the
(e) the retrieved four top-rank texture regions: Paint- retrieved four top-rank texture regions: Tile.0, ζ
ings.1.0, ζ 054; Paintings.31.1, ζ 120; Leaves.9, 598; Metal.0, ζ 1435; Metal.3, ζ 1843; Fabric.5,
ζ 181; Paintings.1.1, ζ 219, respectively. ζ 1897, respectively.
[9] J. R. Smith and S. Chang. Tools and techniques scriptors based on co-occurrence matrices. Com-
for color image retrieval. In Proc. SPIE Storage puter Vision, Graphics, and Image Processing,
and Retrieval for Image and Video Databases, 51(1):70–86, 1990.
volume 2670, pages 426–437, 1996.
[11] R. M. Haralick, K. Shanmugam, and I. Dinstein.
[10] C. C. Gotlieb and H. E. Kreyszig. Texture de- Textural features for image classification. IEEE
46 Image and Vision Computing NZ
Table 2: Colour texture retrieval in VisTex database for training sample “Fabric.4” with different quantization in
HSV colour space.
HSV Number Three Top Rank Textures
Quantization of 1 2 3
Colours ζ Texture ζ Texture ζ Texture
Qh 23 Qs 3 Qv 3 9 498 Fabric.6 660 Fabric.4 1324 Food.5
Qh 25 Qs 4 Qv 4 14 119 Fabric.4 582 Fabric.5 899 Fabric.6
Qh 27 Qs 5 Qv 5 18 103 Fabric.4 813 Fabric.5 1865 Food.5
Table 3: Colour texture retrieval in VisTex database for training sample “Stone.4” with different quantization in
HSV colour space.
HSV Number Three Top Rank Textures
Quantization of 1 2 3
Colours ζ Texture ζ Texture ζ Texture
Qh 23 Qs 3 Qv 3 3 044 Water.5 044 Stone.4 079 Fabric.7
Qh 25 Qs 4 Qv 4 8 054 Stone.4 093 Fabric.18 151 Brick.3
Qh 27 Qs 5 Qv 5 10 063 Stone.4 388 Fabric.7 414 Fabric.18
spaces. In Proc. Int. Conf. Image Processing
(ICIP’94), volume 3, pages 412–416, 1994.
[17] A. Kundu and J. Chen. Texture classification
using qmf bank-based subband decomposition.
CVGIP: Graphical Models and Image Process-
Figure 9: (a) Training sample: Tile.7; (b)–(e) the ing, 54(5):369–384, 1992.
retrieved four top-rank texture regions: Tile.7, ζ
[18] K. S. Thyagarajan, T. Nguyen, and C. E. Persons.
136; Clouds.0, ζ 1312; Buildings.6, ζ 1614;
A maximum likelihood approach to texture clas-
Brick.6, ζ 1640, respectively.
sification using wavelet transform. In Proc. IEEE
Trans. Systems, Man and Cybernetics, 3(6):610– Int. Conf. Image Processing (ICIP’94), volume 2,
621, 1973. pages 640–644, 1994.
[12] G. L. Gimel’farb and A. K. Jain. On retrieving [19] G. L. Gimel’farb. Texture modeling by multiple
textured images from an image data base. Pattern pairwise pixel interactions. IEEE Trans. Pat-
Recognition, 29(9):1461–1483, 1996. tern Analysis and Machine Intelligence, 18:1110–
1114, 1996.
[13] J. R. Smith and S. Chang. Transform features for
texture classification and discrimination in large [20] G. L. Gimel’farb. Image Textures and Gibbs
image databases. In Proc. IEEE Int. Conf. Image Random Fields. Kluwer Academic Publishers,
Processing (ICIP’94), volume 3, pages 407–411, Dordrecht, 1999.
1994.
[21] G. L. Gimel’farb and L. Yu. Separating a texture
[14] J. R. Smith and S. Chang. Automated binary from an arbitrary background using pairwise grey
texture feature sets for image retrieval. In Proc. level cooccurrences. In Proc. 4th Int. Workshop
IEEE Int. Conf. Acoustics, Speech and Signal Energy Minimization Methods in Computer Vi-
Processing, pages 2239–2242, 1996. sion and Pattern Recognition, pages 306–322,
Portugal, Lisbon, July 2003. Springer, Berlin.
[15] H. Tamura, S. Mori, and T. Yamawaki. Texture Lecture Notes in Computer Science 2683.
features corresponding to visual perception. IEEE
Trans. Systems, Man and Cybernetics, 8(6):460–
473, 1978.
[16] M. H. Gross, R. Koch, L. Lippert, and A. Dreger.
Multiscale image texture analysis in wavelet
Palmerston North, November 2003 47