Image Classification For Content-Based Indexing
Image Classification For Content-Based Indexing
Abstract—Grouping images into (semantically) meaningful these databases more useful, we need to develop schemes for
categories using low-level visual features is a challenging and indexing and categorizing the humungous data.
important problem in content-based image retrieval. Using binary Several content-based image retrieval systems have been
Bayesian classifiers, we attempt to capture high-level concepts
from low-level image features under the constraint that the test recently proposed: QBIC [5], Photobook [26], SWIM [44],
image does belong to one of the classes. Specifically, we consider Virage [10], Visualseek [36], Netra [17], and MARS [20].
the hierarchical classification of vacation images; at the highest These systems follow the paradigm of representing images
level, images are classified as indoor or outdoor; outdoor images using a set of attributes, such as color, texture, shape, and
are further classified as city or landscape; finally, a subset of land- layout, which are archived along with the images. Retrieval
scape images is classified into sunset, forest, and mountain classes.
We demonstrate that a small vector quantizer (whose optimal is performed by matching the features of a query image with
size is selected using a modified MDL criterion) can be used to those in the database. Users typically do not think in terms
model the class-conditional densities of the features, required by of low-level features, i.e., user queries are typically semantic
the Bayesian methodology. The classifiers have been designed and (e.g., “show me a sunset image”) and not low-level (e.g., “show
evaluated on a database of 6931 vacation photographs. Our system me a predominantly red and orange image”). As a result, most
achieved a classification accuracy of 90.5% for indoor/outdoor,
95.3% for city/landscape, 96.6% for sunset/forest & mountain, of these image retrieval systems have poor performance for
and 96% for forest/mountain classification problems. We further (semantically) specific queries. For example, Fig. 1(b) shows
develop a learning method to incrementally train the classifiers the top-ten retrieved images (based on color histogram features)
as additional data become available. We also show preliminary from a database of 2145 images of city and landscape scenes,
results for feature reduction using clustering techniques. Our for the query in Fig. 1(a). While the query image has a monu-
goal is to combine multiple two-class classifiers into a single
hierarchical classifier. ment, some of the retrieved images have mountain and coast
scenes. Recent research in human perception of image content
Index Terms—Bayesian methods, content-based retrieval, digital [21], [24], [27], [31] suggests the importance of semantic cues
libraries, image content analysis, minimum description length, se-
mantic indexing, vector quantization. for efficient retrieval. One method to decode human perception
is through the use of relevance feedback mechanisms [33]. A
second method relies on grouping the images into semantically
I. INTRODUCTION meaningful classes [42]. Fig. 1(c) shows the top-ten results
(again based on color histograms) on a database of 760 city
C ONTENT-BASED image retrieval has emerged as an
important area in computer vision and multimedia
computing. Many organizations have large image and video
images for the same query; clearly, filtering out landscape
images improves the retrieval result.
collections (programs, news segments, games, art) in digital As shown in Fig. 1(a)–(c), a successful indexing/categoriza-
format, available for on-line access. Organizing these libraries tion of images greatly enhances the performance of content-
into categories and providing effective indexing is imperative based retrieval systems by filtering out irrelevant classes. This
for “real-time” browsing and retrieval. With the development rather difficult problem has not been adequately addressed in
of digital photography, more and more people are able to store current image database systems. The main problem is that only
vacation and personal photographs on their computers. As low-level features (as opposed to higher level features such as
an example, travel agencies are interested in digital archives objects and their inter-relationships) can be reliably extracted
of photographs of holiday resorts; a user could query these from images. For example, color histograms are easily extracted
databases to plan a vacation. However, in order to make from color images, but the presence of sky, trees, buildings,
people, etc., cannot be reliably detected. The main challenge,
thereby, lies in grouping images into semantically meaningful
Manuscript received February 8, 1999; revised August 11, 2000. The asso- categories based on low-level visual features. One attempt to
ciate editor coordinating the review of this manuscript and approving it for pub- solve this problem is the hierarchical indexing scheme proposed
lication was Prof. Tsuhan Chen.
A. Vailaya is with Agilent Technologies, Palo Alto, CA 94303-0867 USA in [45], [46], which performs clustering based on color and tex-
(e-mail: [email protected]). ture, using a self-organizing map. This indexing scheme was
M. Figueiredo is with the Instituto de Telecomunicações and Instituto Supe- further applied in [16] to create a texture thesaurus for indexing
rior Técnico, 1049-001 Lisboa, Portugal (e-mail: [email protected]).
A. K. Jain is with the Department of Computer Science and Engi- a database of aerial photographs. However, the success of such
neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: clustering-based schemes is often limited, largely due to the
[email protected]). low-level feature-based representation of image content. For ex-
H.-J. Zhang is with Microsoft Research China, Beijing 100 080, China
(e-mail: [email protected]). ample, Fig. 2(a)–(d) shows two images and their corresponding
Publisher Item Identifier S 1057-7149(01)00098-7. edge direction coherence feature vectors (see [42]). Although,
1057–7149/01$10.00 © 2001 IEEE
118 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 1, JANUARY 2001
Fig. 1. Color-based retrieval. (a) Query image, (b) top-ten retrieved images
from 2145 city and landscape images, and (c) top-ten retrieved images from 760
city images; filtering out landscape images prior to querying clearly improves
the retrieval results.
codebook size from the training samples. Advantages of the are grouped into the class natural scenes. Natural scenes and
Bayesian approach include sunset images were further grouped into the landscape class.
1) small number of codebook vectors represent each class, City shots, monuments, and shots of Washington DC were
thus greatly reducing the number of comparisons neces- grouped into the city class. Finally, the miscellaneous, face,
sary for each classification; landscape, and city classes were grouped into the top-level
2) it naturally allows for the integration of multiple features class of vacation scenes. We conducted additional experiments
through the class-conditional densities; to verify that the above hierarchy is reasonable: we used a
3) in addition to a classification rule, we have degrees of con- multidimensional scaling algorithm to generate a three-dimen-
fidence which may be used to incorporate a reject option sional (3-D) feature space to embed the 171 images from the
into the classifiers. dissimilarity matrix used above (generated from user
groupings). We then applied a -means clustering algorithm
The paper is organized as follows. Section II briefly mentions
to partition the (3-D) data. Our goal was to verify if the main
psychophysical studies which are the basis of our work in iden-
clusters in this representation space agreed with the hierarchy
tifying the global scene represented in an image. We also de-
shown in Fig. 3(a). For , we obtained two clusters of
scribe our experiments with human subjects to identify concep-
62 and 109 images, respectively. The first cluster consisted of
tual classes in a database of vacation images. After reviewing the
predominantly city images, while the second cluster contained
Bayesian framework for image classification in Section III, Sec-
landscape images. The following clusters were obtained with
tion IV addresses VQ-based density estimation and the MDL
principle for selecting codebook sizes. Section V discusses im-
plementation issues. We report the classification accuracies in 1) city scenes (70 images);
Section VI. Sections VII and VIII discuss approaches for using 2) sunrise/sunset images (21 images);
incremental learning and automatic feature selection. Finally, 3) forest and farmland scenes and pathways (49 images);
Section IX concludes the paper and presents directions for fu- 4) mountain and coast scenes (31 images).
ture research. These groupings motivated us to study a hierarchical classifica-
tion of vacation images.
In order to make the problem more tractable, we simplified
II. HIGH-LEVEL CLASSES IDENTIFIED BY HUMANS
the classification hierarchy as shown in Fig. 3(b). The solid lines
Psychophysical and psychological studies have shown show the classification problems addressed in this paper. This
that scene identification by humans can proceed, in certain hierarchy is not complete, e.g., a user may be interested in im-
cases, without any kind of object identification [1], [2], [34]. ages captured in the evening or images containing faces. How-
Biederman [1], [2] suggested that an arrangement of volumetric ever, it is a reasonable approach to simplify the image retrieval
primitives (geons), each representing a prominent object in the problem.
scene, may allow rapid scene identification independently of Another limitation of the proposed hierarchy is that the
local object identification. Schyns and Oliva [34] demonstrated leaf nodes are not mutually exclusive. For example, an image
that scenes can be identified from low spatial-frequency images can belong to both the city and sunset categories. One way
that preserve the spatial relations between large-scale structures to address this issue is to develop individual classifiers such
in the scene, but which lack the visual detail to identify local as city/non-city or sunset/non-sunset, instead of a hierarchy.
objects. These results suggest the possibility of coarse scene However, this would drastically increase the complexity of the
identification from global low-level features before the identity classification task (now we will have to identify city scenes
of objects is established. Based on these observations, we from all possible scenes, rather than differentiate between city
address the problem of scene identification as the first step and landscape scenes).
toward building semantic indices into image databases. Most images can be classified as representing indoor or
The first step toward building a classifier is to identify mean- outdoor scenes. Exceptions include close-ups and pictures of
ingful image categories which can be automatically identified a window or door. Outdoor images can be further divided into
by simple and efficient pattern recognition techniques. For city or landscape [40], [42]. City scenes can be characterized
this purpose, we conducted a simple small-scale experiment by the presence of man-made objects and structures such as
in which eight human subjects classified 171 vacation images buildings, cars, roads. Natural scenes, on the other hand, lack
[42]. Our goal was to identify a hierarchy of classes into which these structures. A subset of landscape images can be further
the vacation images can be organized. Since these classes classified into one of the sunset, forest, and mountain classes.
match human perception, they allow organizing the database Sunset scenes are characterized by saturated colors (red,
for effective browsing and retrieval. orange, or yellow), forest scenes have predominantly green
Our experiments revealed a total of 11 semantic cate- color distribution, and mountain scenes can be characterized
gories: forests and farmlands, mountains, beach scenes, by long distance shots of mountains (either snow covered, or
pathways, sunset/sunrise images, long distance city shots, barren plateaus).
streets/buildings, monuments/towers, shots of Washington, We assume that the input images do belong to one of the
DC, miscellaneous images, and faces. We organized these classes under consideration. This restriction is imposed because
11 categories into the hierarchy shown in Fig. 3(a). The first automatically rejecting images that do not belong to any of the
four classes (forests, mountains, beach scenes, and pathways) classes, based on low-level image features alone, is in itself a
120 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 1, JANUARY 2001
very difficult problem (see Fig. 2). However, for images be- an encoder : , mapping the input alphabet to
longing to the classes of interest, the Bayesian methodology the channel symbol set , and a decoder : which
can be used to reject ambiguous images based on the confi- maps to the output alphabet (or codebook). A distortion
dence values associated with the images (images that belong to measure specifies the cost associated with quantiza-
both the classes of interest, such as an image of a city scene tion, where . An optimal quantizer minimizes the
at sunset). We briefly discuss incorporating the reject option in average distortion under a size constraint on [8]. The gen-
Section VI-F. eralized Lloyd algorithm (GLA) is an iterative algorithm for
obtaining a (locally) optimal VQ. Under a mean square error
III. BAYESIAN FRAMEWORK (MSE) distortion criterion, GLA is equivalent to the -means
( ) clustering algorithm [11]. Any given input vector
Bayesian methods have been successfully adopted in many
is quantized into the closest (in ) of the codebook
image analysis and computer vision problems. However, its use
vectors. This defines a partition of the space into the so-called
in content-based retrieval from image databases is just being
Voronoi cells [8]. A comprehensive
realized [43].
study of VQ can be found in [3], [8].
We now review the Bayesian framework for image classifi-
cation. The set of possible images is partitioned into classes
B. Vector Quantization for Density Estimation
; any image belongs to one and only one
class. The images from class are modeled as samples of a Vector quantization provides an efficient tool for density esti-
random variable, , whose class-conditional probability den- mation [9]. Consider training samples from a class . In order
sity function is . Each class has an a priori probability, to estimate the class-conditional density of the th feature vector,
, with . A loss , VQ is used to obtain (with , usually )
function, : , specifies the loss incurred codebook vectors, ( ), from the training data.1
when class is chosen and the true class is . As is common In the so-called high-resolution approximation (i.e., for small
in classification problems, we adopt the “0/1” loss function: Voronoi cells), this density can be approximated by a piece-
, and , if . wise-constant function over each cell , with value
In most image classification problems, the decision is based
on, say , feature sets, , rather
than directly on the raw pixel values. Of course, is a function for (3)
of the image . We will then have class-conditional densities for
the features, rather than for the raw images. It is often assumed
that the feature sets are class-conditionally independent, that is where and are the ratio of training samples
falling into cell and the volume of cell , respectively,
for (1) (see [9]). This approximation fails if the cells are not suffi-
ciently small, for example, when the dimensionality of
is large. In that case, the class-conditional densities can be
The classification problem can be stated as: “given the feature
approximated using a mixture of Gaussians [9], [43], each
sets , classify the image into one of the classes in .”
centered at a codebook vector. The MSE criterion is the sum
The decision rule resulting from the “0/1” loss function is the
of the Euclidean distances of each training sample from its
maximum a posteriori (MAP) criterion [4], [29],
closest codebook vector. From a mixture point of view, this is
(2) equivalent to assuming covariance matrices of the form
(where is the identity) [43], leading to
In addition to the MAP classification, we also have a degree of
confidence which is proportional to .
(4)
IV. DENSITY ESTIMATION BY VECTOR QUANTIZATION
The performance of a Bayes classifier depends critically on where , (note that
the ability of the features to discriminate among the various ). The value of is not estimated by the VQ
classes. Moreover, since the class-conditional densities have to algorithm, and so we empirically choose it for each feature.
be estimated from data, the accuracy of these estimates is also Alternatively, we could use the EM algorithm to directly find
critical. Choosing the right set of features is a difficult problem maximum likelihood (ML) estimates of the mixture parameters,
to which we return in Section V-A. In this section, we focus under a diagonal covariance constraint [19]. This choice is
on estimating the class-conditional densities, adopting a vector computationally demanding, and we have found that the value
quantization approach [9]. of is not crucial; it simply affects the number of codebook
vectors that influence classification. Unless is exceptionally
A. Introduction to Vector Quantization
1Actually, learning vector quantization (LVQ) is used to select the codebook
For compression and communication applications, a vector
vectors. LVQ does not run the GLA separately for each class; in this algorithm,
quantizer (VQ) is described as a combination of an encoder and the codebook vectors are also “pushed away” from incorrectly classified sam-
a decoder [8]. A -dimensional VQ consists of two mappings: ples (see [14], [29]).
VAILAYA et al.: IMAGE CLASSIFICATION FOR CONTENT-BASED INDEXING 121
large, only a few codebook vectors close to the input pattern A. Image Features
influence the class-conditional probabilities.
Outdoor images tend to have uniform spatial color distribu-
tions, such as the sky is on top and is typically blue. Indoor
C. Selecting Codebook Size images tend to have more varied color distributions and have
Selecting is a key issue in using a VQ, or a mixture, for den- more uniform lighting (most are close up shots). Thus, it seems
sity representation. We start by noting that GLA approximately logical that spatial color distribution can discriminate between
looks for the maximum likelihood (ML) estimates of the param- indoor and outdoor images. On the other hand, shape features
eters of the mixture in (4). In fact, the EM algorithm becomes may not be useful because objects with similar shapes can be
exactly equivalent to the GLA when the variance goes to zero present in both indoor and outdoor scenes. Therefore, we use
[29]. We will therefore apply an MDL criterion to select , since spatial color information features to represent these qualitative
MDL allows extending maximum likelihood (ML) estimation to attributes. Specifically, first- and second-order moments in the
situations where the dimension of the model is unknown [30]. color space were used as color features (it was pointed
Consider a training set of independent samples out in [7] that moments yield better results in image re-
, from the class . These are, of course, trieval than other spaces). The image was divided into
samples of one of the features, although here we omit this subblocks and six features (three means and three standard de-
from the notation to keep it simpler. A direct application of the viations) were extracted [37], [41]. As another set of features for
standard MDL criterion would lead to the following criterion indoor/outdoor classification, we extract subblock MSAR tex-
to select [the size of the mixture in (4)] ture features as described in [18], [39].
We looked for similar qualitative attributes for city/land-
scape classification, and further classification of landscape
images. City images usually have strong vertical and horizontal
edges due to the presence of man-made objects. Non-city
images tend to have randomly distributed edge directions. The
where is the ML estimate assuming size , and edge direction distribution seems then as a natural feature to
is the number of real-valued discriminate between these two categories [42]. On the other
parameters needed to specify a -component mixture (with hand, color features would not have sufficient discriminatory
denoting “dimension of”) [30]. Notice that the additional power as man-made objects have arbitrary colors. In the case
term proportional to grows with , thus counter- of further classification of landscape images as sunset, forest,
balancing the unbounded increase, with , of the likelihood. or mountain, global color distributions seem to adequately
The penalty paid by each additional real param- describe these classes. Sunset pictures typically have saturated
eter has an asymptotical justification (see [30]). For a mixture, colors (mostly yellow and red); mountain images tend to have
however, it can be argued that each center does not “see” data the sky in the background (typically blue); and forest scenes
points, but only (on average) (for the th center) (see tend to have more greenish distributions. Based on the above
[15] and [6], for details). This leads to the following modified observations, we use edge direction features (histograms and
MDL (MMDL) criterion coherence vectors) for city/landscape classification and color
features (histograms, coherence vectors, and spatial moments)
in and color space for further classification of
landscape images [25], [38], [42]. Table I summarizes the
qualitative attributes of the various classes and the features
used to represent them.
(5)
B. Vector Quantization
We used the LVQ_PAK package [14] for vector quantization.
Half of the database was used to train the LVQ for each of the
V. IMPLEMENTATION ISSUES
image features. The MMDL criterion (Section IV-C) was used
Experiments were conducted on two databases (both inde- to determine the codebook sizes. For the indoor and outdoor
pendently and combined) of 5081 (indoor/outdoor classifica- classes, with the spatial color moment features, Fig. 4(a)–(c)
tion) and 2716 (city/landscape classification and further classifi- plots the MMDL cost function [(5)] versus the codebook size
cation of landscape images) images. The two databases, hence- . These plots show that and are the MMDL
forth referred to as D1 and D2, have 866 images in common, choices for the indoor and outdoor classes, respectively. For the
leading to a total of 6931 distinct images, collected from var- combination of the two classes, minimizes the MMDL
ious sources (Corel library, scanned personal photographs, key criterion. To confirm this choice from a classification point of
frames from TV serials, and images downloaded from the Web) view, Fig. 5 plots the accuracy of the indoor/outdoor classifier
and are of varying sizes (from to ). The (on an independent test set of size 2540) as a function of the
color images are stored with 24-bits per pixel in JPEG format. total codebook size . As is initially increased, the classifier
The ground truth for all the images was assigned by a single accuracy improves. However, it soon stabilizes and further in-
subject. creasing beyond 30 does not improve the accuracy. This con-
122 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 1, JANUARY 2001
TABLE I
QUALITATIVE ATTRIBUTES OF THE SEVERAL CLASSIFACTION PROBLEMS AND ASSOCIATED LOW-LEVEL FEATURES
TABLE III
ACCURACIES (IN PERCENT) FOR INDOOR/OUTDOOR CLASSIFICATION USING
COLOR MOMENTS; TEST SET 1 AND TEST SET 2 ARE INDEPENDENT TEST SETS
TABLE IV
CLASSIFICATION ACCURACIES (IN PERCENT) FOR CITY/LANDSCAPE CLASSIFICATION; THE FEATURES ARE ABBREVIATED AS FOLLOWS: EDGE DIRECTION
HISTOGRAM (EDH), EDGE DIRECTION COHERENCE VECTOR (EDCV), COLOR HISTOGRAM (CH), AND COLOR COHERENCE VECTOR (CCV)
E. Feature Saliency
The accuracy of the individual classifiers depends on the un-
derlying low-level representation of the images. For example,
the edge direction and coherence vector features yield accura-
Fig. 7. Subset of the misclassified (a) city images and (b) landscape images cies of about 60% for the indoor/outdoor problem, yet they yield
using a combination of edge direction coherence vector and color histogram approximately 95% accuracy for the city/landscape problem.
features. The corresponding confidence values (in percent) associated with the This shows the importance of feature definition and selection.
true class are indicated.
We have empirically determined that
1) spatial color moment features are better for indoor/out-
were misclassified (a classification accuracy of 96%) when the door classification;
spatial color moment features were used. Again, the combina- 2) edge direction histograms and coherence vector features
tions of features did not perform better than the color features, have sufficient discrimination power for city/landscape
showing that these features are adequate for this problem. Note classification;
that the spatial color moment features and the color coherence 3) color moments, histograms, and coherence vectors are
vector features yield similar accuracies for the classification of more suited for the classification of landscape images.
landscape images. However, the database of 528 images is very
small to identify the best color feature for the classification of
landscape images. Using color coherence vector features in- F. Reject Option
creases the complexity of the classifiers. Introducing a reject option is useful, yet a difficult problem
in image classification. For Bayesian classifiers, the simplest
D. Error Propagation in Hierarchical Classification strategy is to reject images whose maximum a posteriori prob-
ability is below a threshold . Table VII shows the accuracies
The goal of hierarchical classification is to break a complex for the indoor/outdoor and city/landscape image classifiers with
problem into simpler problems. However, since each classifier reject option, for . The indoor/outdoor classifier used
is not perfect, the errors from a classifier located higher up in spatial color moment features and was trained on 2541 images
the tree are propagated to the lower levels. from database and tested on the entire set (6931 images).
The indoor/outdoor image classifier yielded an accuracy of The classification accuracy improved from 90.5% (no rejec-
90.5% on the entire database of 6931 images (658 images were tion) to 92.1% at 5.4% reject rate. The city/landscape classifier
misclassified). Of these, 269 images were indoor images out used edge direction coherence vector features; it was trained on
of which 229 were close-ups of people and pets. Out of the 1358 images from database and tested on the complete data-
remaining 40 images, three were classified as landscape images base (2716 images). The classification accuracy improved
and 37 were classified as city images. Fig. 8 shows these three from 95.0% (no rejection) to 95.7% at 2.1% reject rate. There
VAILAYA et al.: IMAGE CLASSIFICATION FOR CONTENT-BASED INDEXING 125
TABLE V
CLASSIFICATION ACCURACIES (IN PERCENT) FOR SUNSET/FOREST/MOUNTAIN CLASSIFICATION; SPM STANDS FOR “SPATIAL COLOR MOMENTS”
TABLE VI
CLASSIFICATION ACCURACIES (IN PERCENT) FOR FOREST/MOUNTAIN CLASSIFICATION
TABLE VII
CLASSIFIER PERFORMANCE UNDER A REJECT OPTION
above learning paradigm, the new data will unduly influence the TABLE IX
current value of the codebook vectors. Learning with this small NAIVE APPROACH TO INCREMENTALLY TRAINING A CLASSIFIER. ACCURACIES
ARE REPORTED ON AN INDEPENDENT TEST SET OF SIZE 2540
amount of new data will in fact lead to unlearning of the distri-
bution based on previous samples. Table IX demonstrates the re-
sults of training the indoor/outdoor classifier using only the new
data. The indoor/outdoor classifier was initially trained on 1418
images and yielded an accuracy of 79.8% on an independent test
set of 2540 images. When the classifier is further trained with
350 new images, the performance on the independent test set de-
teriorates to 63.7%. When the classifier is further trained on an
additional 773 samples using the naive approach, the accuracy
on the test set slightly recovers to 72.5%. Note that when all the
available data were used ( images),
the accuracy on the independent test set was 88.2% (Table VIII).
These results show that any robust incremental learning scheme
must assign an appropriate weight to the already learnt distribu-
tion.
[14] T. Kohonen, J. Kangas, J. Laaksonen, and K. Torkkola, “LVQ PAK: A [37] M. Stricker and A. Dimai, “Color indexing with weak spatial con-
program package for the correct application of learning vector quantiza- straints,” in Proc. SPIE Storage Retrieval Image Video Databases IV,
tion algorithms,” in Proc. Int. Joint Conf. Neural Networks, Baltimore, San Jose, CA, Feb. 1996, pp. 29–41.
MD, June 1992, pp. 725–730. [38] M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Comput. Vis.,
[15] J. L. M. Figueiredo and A. K. Jain, “On fitting mixture models,” in vol. 7, no. 1, pp. 11–32, 1991.
Energy Minimization Methods in Computer Vision and Pattern Recog- [39] M. Szummer and R. W. Picard, “Indoor-outdoor image classification,”
nition, E. Hancock and M. Pellilo, Eds. Berlin, Germany: Springer- in IEEE Int. Workshop Content-Based Access Image Video Databases
Verlag, 1999. (in conjunction with ICCV’98), Bombay, India, Jan. 1998.
[16] W. Y. Ma and B. S. Manjunath, “Image indexing using a texture dictio- [40] A. Vailaya, M. Figueiredo, A. Jain, and H.-J. Zhang, “A Bayesian frame-
nary,” in Proc. SPIE Conf. Image Storage Archiving Systems, vol. 2606, work for semantic classification of outdoor vacation images,” in Proc.
Philadelphia, PA, October 1995, pp. 288–298. SPIE Storage Retrieval Image Video Databases VII, vol. 3656, San Jose,
[17] W. Y. Ma and B. S. Manjunath, “Netra: A toolbox for navigating large CA, Jan. 1999, pp. 415–426.
image databases,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, [41] A. Vailaya, M. Figueiredo, A. Jain, and H.-J. Zhang, “Content-based hi-
Santa Barbara, CA, Oct. 1997, pp. 568–571. erarchical classification of vacation images,” in Proc. IEEE Multimedia
[18] J. Mao and A. K. Jain, “Texture classification and segmentation using Systems’99, vol. 1, Florence, Italy, June 7–11, 1999, pp. 518–523.
multiresolution simultaneous autoregressive models,” Pattern Recognit., [42] A. Vailaya, A. K. Jain, and H. J. Zhang, “On image classification:
vol. 25, no. 2, pp. 173–188, 1992. City images vs. landscapes,” Pattern Recognit., vol. 31, no. 12, pp.
[19] G. McLachlan and T. Krishnan, The EM Algorithm and Exten- 1921–1936, 1998.
sions. New York: Wiley, 1997. [43] N. Vasconcelos and A. Lippman, “Library-based coding: A representa-
[20] S. Mehrotra, Y. Rui, M. Ortega, and T. S. Huang, “Supporting con- tion for efficient video compression and retrieval,” in Data Compression
tent-based queries over images in MARS,” in Proc. IEEE Int. Conf. Mul- Conf. ’97, Snowbird, UT, 1997.
timedia Computing Systems, ON, Canada, June 3–6, 1997, pp. 632–633. [44] H. J. Zhang, C. Y. Low, S. W. Smoliar, and J. H. Wu, “Video parsing
[21] T. P. Minka and R. W. Picard, “Interactive learning using a society of retrieval and browsing: An integrated and content-based solution,” in
models,” Pattern Recognit., vol. 30, no. 4, p. 565, 1997. Proc. ACM Multimedia ’95, San Francisco, CA, Nov. 5–9, 1995, pp.
[22] T. Mitchell, Machine Learning. New York: McGraw-Hill, 1997. 15–24.
[23] P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for [45] H. J. Zhang and D. Zhong, “A scheme for visual feature based image in-
feature subset selection,” IEEE Trans. Comput., vol. 26, pp. 917–922, dexing,” in Proc. SPIE Conf. Storage Retrieval Image Video Databases,
Sept. 1977. San Jose, CA, February 1995, pp. 36–46.
[24] T. V. Papathomas, T. E. Conway, I. J. Cox, J. Ghosn, M. L. Miller, [46] D. Zhong, H. J. Zhang, and S.-F. Chang, “Clustering methods for video
T. P. Minka, and P. N. Yianilos, “Psychophysical studies of the per- browsing and annotation,” in Proc. SPIE Storage Retrieval Image Video
formance of an image database retrieval system,” in Proc. IS&T/SPIE Databases IV, San Jose, CA, February 1996, pp. 239–246.
Conf. Human Vision Electronic Imaging III, San Jose, CA, July 1998,
pp. 591–602.
[25] G. Pass, R. Zabih, and J. Miller, “Comparing images using color co-
herence vectors,” in Proc. 4th ACM Conference on Multimedia, Boston, Aditya Vailaya (A’00) received the B.Tech degree
MA, Nov. 1996, http://simon.cs.cornell.edu/Info/People/rdz/rdz.html. from the Indian Institute of Technology, Delhi, in
[26] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based 1994 and the M.S. and Ph.D. degrees from Michigan
manipulation of image databases,” Proc. SPIE Storage Retrieval Image State University, East Lansing, in 1996 and 2000,
Video Databases II, pp. 34–47, Feb. 1994. respectively.
[27] R. W. Picard and T. P. Minka, “Vision texture for annotation,” Multi- He joined Agilent Laboratories, Palo Alto, CA,
media Syst., vol. 3, pp. 3–14, 1995. in May 2000, where he is currently applying pattern
[28] P. Pudil, J. Novovicova, and J. Kittler, “Floating search methods in fea- recognition techniques for decision support in
ture selection,” Pattern Recognit. Lett., vol. 15, pp. 1119–1125, Nov. bioscience research. His research interests include
1994. pattern recognition and classification, machine
[29] B. Ripley, Pattern Recognition and Neural Networks. Cambridge, learning, image and video databases, and image
U.K.: Cambridge Univ. Press, 1996. understanding.
[30] J. Rissanen, Stochastic Complexity in Stastistical Inquiry. Singapore: Dr. Vailaya received the Best Student Paper Award from the IEEE Interna-
World Scientific, 1989. tional Conference on Image Processing in 1999.
[31] B. E. Rogowitz, T. Frese, J. Smith, C. A. Bouman, and E. Kalin, “Percep-
tual image similarity experiments,” in Proc. IS&T/SPIE Conf. Human
Vision Electronic Imaging III, San Jose, CA, July 1998, pp. 576–590.
[32] H. A. Rowley, S. Baluja, and T. Kanade, “Neural network-based face de- Mário A. T. Figueiredo (S’87–M’95) received
tection,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 23–38, the E.E., M.S. and Ph.D. degrees in electrical and
Jan. 1998. computer engineering, all from the Higher Institute
[33] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: of Technology [Instituto Superior Tecnico (IST)],
A power tool for interactive content-based image retrieval,” IEEE Trans. Technical University of Lisbon, Lisbon, Portugal, in
Circuits Syst. Video Technol., vol. 8, pp. 644–655, Sept. 1998. 1985, 1990, and 1994, respectively.
[34] P. G. Schyns and A. Oliva, “From blobs to boundary edges: Evidence Since 1994, he has been an Assistant Professor
for time and spatial scale dependent scene recognition,” Psychol. Sci., with the Department of Electrical and Computer
vol. 5, pp. 195–200, 1994. Engineering, IST. He is also a Researcher with the
[35] Y. Singer and M. Warmuth, “A new parameter estimation method for Communication Theory and Pattern Recognition
Gaussian mixtures,” in Advances in Neural Information Processing Sys- Group, Institute of Telecommunications, Lisbon. In
tems 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds. Cambridge, 1998, he held a visiting position with the Department of Computer Science
MA: MIT Press, 1999. and Engineering, Michigan State University, East Lansing. His scientific
[36] J. R. Smith and S. F. Chang, “Visualseek: A fully automated content- interests are in the fields of image analysis, computer vision, statistical pattern
based image query system,” in Proc. ACM Multimedia, Boston, MA, recognition, and information theory.
Nov. 1996, pp. 87–98. Dr. Figueiredo received the Portuguese IBM Scientific Prize in 1995.
130 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 1, JANUARY 2001