0% found this document useful (0 votes)

68 views25 pages

Relevance Feedback and Learning in Content-Based Image Search

A major bottleneck in content-based image retrieval systems is the gap between low-level image features and high-level semantic contents of images. One solution to this bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process. In this paper, we present a framework of relevance feedback and semantic learning in CBIR.

Uploaded by

mrohaizat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views25 pages

Relevance Feedback and Learning in Content-Based Image Search

Uploaded by

mrohaizat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

World Wide Web: Internet and Web Information Systems, 6, 131–155, 2003

 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Relevance Feedback and Learning in Content-Based

Image Search ∗
HONGJIANG ZHANG, ZHENG CHEN, MINGJING LI and ZHONG SU [email protected]
Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, China

Abstract

A major bottleneck in content-based image retrieval (CBIR) systems or search engines is the large gap between
low-level image features used to index images and high-level semantic contents of images. One solution to this
bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process.
In this paper, we first address the key issues involved in relevance feedback of CBIR systems and present a
brief overview of a set of commonly used relevance feedback algorithms. Almost all of the previously proposed
methods fall well into such framework. We present a framework of relevance feedback and semantic learning
in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in
feedback processes to improve the retrieval performance. We have also extended framework to a content-based
web image search engine in which hosting web pages are used to collect relevant annotations for images and
users’ feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed
schemes, and our experimental results indicated that our approach outperforms traditional CBIR system and
relevance feedback approaches.

Keywords: image retrieval, relevance feedback, machine learning, web mining

1. Introduction

The popularity of digital images is rapidly increasing due to improving digital imaging
technologies and convenient availability facilitated by the Internet. However, how to find
user-intended images from the Internet is still non-trivial. The main reason is that web
images are usually not well annotated using semantic descriptors. The development his-
tory of image retrieval systems features two stages. The first stage is keyword-based image
retrieval, which is summarized by Chang et al. [2]. Since manual image annotation is
a tedious process, it is practically impossible to annotate all the images on the Internet.
Furthermore, due to the multiplicity of contents in a single image and the subjectivity of
human perception, it is also difficult to make exactly the same annotations to the same
image by different users. These difficulties have limited the applications of the keyword-
based image retrieval technology. Having been actively researched on in the last decade
[6,30], content-based image retrieval (CBIR) attempts to automate the process of indexing
or annotating image in image databases. CBIR approaches work with descriptions based
on inherent properties of images, such as color, texture and shape. However, despite all

∗ This paper is based on the invited keynote that first author gave in VDB2002, Brisbane, Australia, May 2002.
132 ZHANG ET AL.

the research efforts, the retrieval accuracy of today’s CBIR algorithms is still very limited.
In addition to many other difficulties, the bottleneck is the gap between low-level image
features and semantic image contents. This problem stems from the fact that visual sim-
ilarity measures, such as color histograms, in general do not necessarily match semantics
of images human subjectivity. Also, each type of visual feature tends to capture only one
aspect of image property and it is usually hard for a user to specify clearly how different
aspects are combined to form an optimal query. To make the problem even worse, people
often have different semantic interpretations of the same image. Even the same person
may have different perception about the same image at different times. To address this
bottleneck, interactive relevance feedback techniques have been proposed. The key idea
is that we should incorporate human perception subjectivity into the retrieval process and
provide users opportunities to evaluate retrieval results and automatically refine queries on
the basis of those evaluations. In the last few years, this research topic has become the
focus in CBIR research community.
Relevance feedback, originally developed for textual document retrieval [16], is a super-
vised active learning technique used to improve the effectiveness of information systems.
The main idea is to use positive and negative examples from the user to improve system
performance. For a given query, the system first retrieves a list of ranked images according
to a predefined similarity metrics. Then, the user marks the retrieved images as relevant
(positive examples) to the query or not (negative examples). The system will refine the
query based on the feedback and retrieves a new list of images and presents to user. Hence,
the key issue in relevance feedback is how to incorporate positive and negative examples
to refine the query and/or to adjust the similarity measure.
In this paper, we present a content-based image retrieval framework that integrate low-
level and semantic-based image similarities and support automated annotation through
learning from relevance feedback, and the extension of the framework in a web image
search engine. Instead of detailed description of the novel component algorithms, we fo-
cus our description on the key ideas in the framework. Details of the algorithms and the
framework implementation can be found in the reference [4,9,12,23,24]. Also, we want
the paper to serve as a reference on the current state of the art of CBIR relevance feed-
back research, a comprehensive survey is presented in this paper on relevance feedback
algorithms in terms of their natures and limitations.
There are many issues in relevance feedback approaches CBIR, such as learning
schemes, feature selection, index structure and scalability. Instead of giving an exhaus-
tive survey of each published relevance feedback algorithms for CBIR in term of their
advantages and limitations, we focus our discussions with the consideration that relevance
feedback in CBIR is a small sample machine learning problem and extend our description
in detail in respect to learning and searching natures of each algorithm. This is presented
in Section 2.
In Section 3, we present the integrated relevance feedback framework for framework
for CBIR. In this framework, while the user is interacting with the system by providing
feedbacks in a query session, a progressive learning process is activated to propagate the
keyword annotations from the labeled images to un-labeled images as the system refines
the retrieval. The knowledge learned in the relevance feedback sessions are accumulated
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 133

in a semantic network. In addition, a cross-modality query expansion scheme is imple-

mented to improve the retrieval performances significantly either a query is initiated with
a keyword or an example image.
The proposed framework has been further extended to a web image search system, as
presented in Section 4. In this extension, we combine visual features and text descriptors
initially extracted from web pages where images exist, such as image URLs, filenames,
page titles, ALT text, hyperlinks, and surrounding text. Those visual and texture features
build the document space model of images. However, the initial text descriptors are in
general are less accurate than annotating text and there are often mismatch between the
page author’s expression and the user’s understanding and expectation of the annotation.
To overcome these problems, we apply the proposed relevant feedback framework. Fur-
thermore, data mining technology is also applied on the user’s log of feedback to improve
the image retrieval performance in two aspects. Firstly, the original document space model
built from the images and the text content of the web pages can be analyzed to detect and
remove clutter and irrelevant text information. Secondly, the user space model, which is the
keyword vectors used by the users to represent images in the database, can be constructed
from the user’s log data of relevance feedback. The user space model is then combined
with the document space model to eliminate mismatch between the page author’s expres-
sion and the user’s understanding and expectation.

2. Relevance feedback algorithms

In this section, we review a set of relevance feedback approaches used in CBIR. The review
is focused on learning and searching natures of each relevance feedback algorithm as we
consider that relevance feedback in CBIR is a machine learning problem. We begin the
discussion by first providing an overview of classical relevance feedback approaches in
CBIR.

2.1. Classical algorithms

The early relevant feedback schemes for CBIR were mainly adopted from those developed
for classical textual document retrieval. These approaches can be classified into two ap-
proaches: query point movement (query refinement) and reweighting (similarity measure
refinement) [1]. Both of them were developed based on vector space model, the most
popular model used in information retrieval [20].
The query point movement method essentially tries to improve the estimate of the “ideal
query point” by moving it towards positive example points and away from bad example
points in the query space. There are various ways to update the query. The frequently used
technique to iteratively improve this estimation is the Rocchio’s formula given below for
sets of relevant documents DR and non-relevant documents DN given by the user [16]:

1 1
Q = αQ + β Di − γ Di , (1)
NR
NN
i∈DR i∈DN
134 ZHANG ET AL.

where α, β, and γ are suitable constants; NR and NN are the number of documents in
DR and DN , respectively. This technique is also referred as learning query vector. It was

implemented in the MARS system [18] by replacing the document vector with visual fea-
ture vectors. Experiments show that retrieval performance can be improved considerably
by using such relevance feedback approaches.
The basic idea behind the re-weighting method is to enhance the importance of the di-
mensions of a feature that help in retrieving the relevant images and reduce the importance
of those dimensions that hinder this process. This is achieved by updating the weights of
feature vectors in the distance metric. Considered a weighted metric defined as

D= ωj Xj(1) − Xj(2) . (2)
j ∈[N]

When an image of the query result is labeled as a positive example, the feature components
that contribute more similarity to the match is considered more important, while the com-
ponents with less contribution is considered to be less important. Therefore, the weight for
a feature component, ωi , is updated in the following way:

ωi = ωi · 1 + δ − δi , δ = f (Q) − f A+ j ,
(3)

where δ is the mean of δ. On the other hand, if an image is labeled as a negative exam-
ple, the feature components that contribute more to the match should be considered to be
depressed. That is, the weight is updated as:

ωi = ωi · 1 − δ + δi . (4)

This technique is also referred as learning the metric. This approach was implemented
proposed by Huang et al. [7]. The MARS system implemented a slight refinement to the
re-weighting method called the standard deviation method [18].
Instead of updating the individual components of a distance metric, we can also begin
with a set of predefined distance metrics and use relevance feedback to automatically select
the best one in the retrieval process. For instance, in ImageRover system [21], appropriate
Lp Minkowski distance metrics are automatically selected to minimize the mean distance
between the relevant images specified by the user.
Another relevance feedback approach, proposed by Minka and Picard, is to update the
query space by selecting feature models. It is assumed that each feature model has its
own strength in representing a certain aspect of image content, and thus, the best way for
effective content-based retrieval is to utilize “a society of models.” This approach uses a
learning scheme to dynamically determine which feature model or combination of models
is best for subsequent retrieval.
Recently, more computationally robust methods that perform global feature optimization
have been proposed. The MindReader retrieval system designed by Ishikawa et al. [8]
formulates a minimization problem on the parameter estimating process. Unlike traditional
retrieval systems whose distance function can be represented by ellipses aligned with the
coordinate axis, the MindReader system proposed a distance function that is not necessarily
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 135

aligned with the coordinate axis. Therefore, it allows for correlations between attributes in
addition to different weights on each component.
A further improvement over the MindReader approach is given in [17]. In this ap-
proach, optimal query estimation and weighting functions are derived in a unified frame-
work. Based on the minimization of total distances of positive examples from the revised
query, the weighted average and a whitening transform in the feature space were found to
be the optimal solutions. In more detail, assume that a query vector component qi corre-
sponds to the ith feature, an N element vector r = [r1 , . . . , rN ] represents the degree of
relevance for each of the N input training samples, and there is a set of N training vec-
tors xni for each feature i. It is derived that the ideal query vector qi∗ for feature i is the
weighted average of the training samples for feature i given by
R T Xi
qiT∗ = N , (5)
n=1 rn
where Xi is the N × Ki training sample matrix for feature i, obtained by stacking the N
training vectors xni into a matrix. It is interesting to note that the original query vector qi
does not appear in (5). This shows that the ideal query vector with respect to the feedbacks
is not influenced by the initial query.
The optimal weight matrix Wi∗ is given by
1/Ki −1
Wi∗ = det(Ci ) Ci , (6)
where Ci is the weighted covariance matrix of Xi . That is,
N
πn (xnir − qir )(xnis − qis )
Cirs = n=1 N , r, s = 1, . . . , Ki . (7)
n=1 πn
We can see from the above equations that the critical inputs into the system are training
vectors xni and the relevance matrix R. In this algorithm, initially, the user needs to input
these data to the system. Another issue with this algorithm is that negative examples are
not utilized in updating of query and similarity.

2.2. Relevance feedback as a learning process

Relevance feedback can be considered as a leaning problem – a user provides feedback

examples from the retrieval results of a query and system learns from such examples to
refine retrieval results. The original query-movement method represented by the Roc-
chio’s formula and reweighting method [16] is both simple learning methods. According
to Mitchell’s [15] definition, machine learning are concerned with the question of how to
construct computer programs that automatically improve with experience. In this view, any
task that could be improved with respect to certain performance measure based on some
experience can be considered the machine-learning task. In CBIR, relevance feedback is
a task to improve the retrieval performance and the experience here is feedback examples
provided by the users. Hence, classical machine-learning methods, such as decision tree
136 ZHANG ET AL.

learning [13], artificial neural networks [10], Bayesian learning [5,27], and kernel based
learning [26] can be and have been applied to relevance feedbacks in CBIR. However, as
users are usually reluctant to provide a large number of feedback examples, the number of
training samples is very small, typically less than ten in each round of feedback session. On
the contrary, feature dimensions in CBIR systems are usually high. Hence, the crucial is-
sue in performing relevance feedback in CBIR systems is how to learn from small training
samples in a very high dimension feature space. This fact makes many learning methods,
such as decision tree learning and artificial neural networks, not suitable for CBIR.
The key issues in addressing relevance feedback in CBIR as a small sample learning
problem include: How to learn fast from small sets of feedback samples to improve re-
trieval accuracy effectively; How to accumulate knowledge learned from feedback; and
How to integrate low-level visual and high-level semantic features in query. However, most
of the published works have been focused on the first issue. Compared with other learn-
ing methods, Bayesian learning shows its advantages in addressing the first issue above
and almost all aspects of Bayesian learning have been touched in researching for effective
learning algorithms.
Vasconcelos and Lippman [27] treated feature distribution as a Gaussian mixture and
used Bayesian inference for learning during feedback iterations in a query session. Richer
information captured by the mixture model also makes image regional matching possible.
The potential problems of their methods are computing efficiency and complex data model
that leads to too many parameters need to be estimated with very limited samples.
To speed up the learning process so the retrieval result can be converged faster to user’s
satisfaction, active learning methods have been used to actively select samples in order to
achieve the maximal information gain, or the minimized entropy/uncertainty in decision-
making. The approached proposed in [5] used Monte Carlo sampling in search of the
set of sample that will minimize the expected number of future iterations. In estimating
the expected number of future iterations, entropy is used as an estimate of the number of
future iterations under the ambiguity specified by the current probability distribution of the
target image over all test images. Tong and Chang [26] proposed a SVM active learning
algorithm to select the sample to maximally reduce the size of the vector space in which
the class boundary lies. Without knowing a priori the class of a candidate, the best strategy
is to halve the search space each time. They attempted to justify that selecting the points
near the SVM boundary can approximately achieve this goal, and it is more efficient than
other more sophisticated schemes, which require exhaustive trials on all the test items.
Therefore, in their work, the points near the SVM boundary are used to approximate the
most-informative points; and the most-positive images are chosen as the ones farthest from
the boundary on the positive side in the feature space.
Some researchers consider relevance feedback process in CBIR as a pattern recognition
or classification problem. Under such a consideration, the positive and negative exam-
ples provided by user can be treated as training examples and a classifier could be trained.
Then, such classifier can separate all data set into relevant and irrelevant groups. It seemed
that many existing pattern recognition tools could be adopted for this task and many kinds
of classifiers have been experimented, such as linear classifier [29], nearest-neighbor clas-
sifier [28], Bayesian classifier [24], support vector machines (SVM) [26], and so on. In
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 137

this category, the most popular algorithm is represented by [26] where SVM classifier is
trained to divide the positive and negative examples. Then such SVM classifier will clas-
sify all images in database into two groups: relevant and irrelevant groups to a given query.
However, in most cases of CBIR, there is no predefined class structure. From application
point of view, such classification-based methods may improve the retrieval performance in
some constrained contexts; but they will be limited when applied to general purpose image
databases.

2.3. Feature versus semantics in relevance feedback

All the approaches described in above perform relevance feedback at the low-level feature
vector level by basically replacing keywords with features when adopting the vector space
model developed for document retrieval. While these approaches do improve the perfor-
mance of ICBR, there are severe limitations. The inherent problem is that the low-level
features are often not as powerful in representing complete semantic content of images as
keywords in representing text documents. Furthermore, users often pay more attention to
the semantic content (or a certain object/region) of an image than to the background and
other, the feedback images may be similar only partially in semantic content, but may vary
largely in low-level features. Hence, using low-level features alone may not be effective in
representing users’ feedbacks and in describing their intentions.
In addition, there are typically two different modes of user interactions involved in image
retrieval systems. In one case, the user types in a list of keywords representing the semantic
contents of the desired images. In the other case, the user provides a set of examples
images as the input and the retrieval system will retrieve other similar images. In most
image retrieval systems, these two modes of interaction are mutually exclusive. However,
combining these two approaches and allowing them to benefit from each other will yield a
great deal of advantage in terms of both retrieval accuracy and ease of use of the system.
There have been efforts on incorporating semantics in relevance feedback for image
retrieval. The framework proposed in [11] (to be discussed later in more detail in this
section) attempted to embed semantic information into a low-level feature based image re-
trieval process using a correlation matrix. The FourEye system by Minka and Picard [14]
and the PicHunter system by Cox et al. [5], made use of hidden annotation through learn-
ing process. However, they excluded the possibility of benefiting from good annotations,
which may lead to a very slow convergence.
In terms of feature selection, unlike most CBIR systems that use image features such
as color histogram or moments, texture, shape, and structure features, Tieu and Viola [25]
used a boosting technique to learn a classification function in a feature space of more than
45,000 features. The features were demonstrated to be sparse with high kurtosis, and were
argued to be expressive for high-level semantic concepts. Weak 2-class classifiers were
formulated based on Gaussian assumption for both the positive and negative (randomly
chosen) examples along each feature component, independently. The strong classifier is
then a weighted sum of the weak classifiers as in AdaBoost.
138 ZHANG ET AL.

The framework to be discussed in Section 3 integrates both semantics and low-level

features into the relevance feedback process in a new way. Only when the semantic infor-
mation is not available, the method is reduced to one of the previously described low-level
feedback approaches as a special case.

2.4. Relevance feedback with memory

A disadvantage in the classic relevance feedback as well as many “learning” based

approaches discussed above is that the captured knowledge in the relevance feedback
processes in one query session or one learning step is not memorized to continuously im-
prove the retrieval accuracy. That is, even with the same query, a user will have to go
through the same, often tedious, feedback process to obtain the same result, despite the
fact the user has given the same query and feedbacks before. Strictly speaking, there is no
learning or only limited learning in such systems as there is no knowledge accumulation
across different query sessions. To overcome these limitations, another school of ideas is to
using learning approaches to memorize users’ subjectivities in relevance feedback process.
The challenge in this approach is how to memorize knowledge learned and how to handle
the inconsistency of content subjectivities across difference users and/or across different
query sessions of the same user.
The approach proposed in [11] was the first attempt to explicitly memorize learned se-
mantic information to improve CBIR performance. The basic idea of this approach is to
accumulate semantic relevance between image clusters learnt from user’s feedback in cor-
relation network. In other words, a correlation network is used to memorize. Figure 1
illustrates the correlation network. Mathematically, the correlation network is represented
by a correlation matrix, M, defined as below:
 
w11 w12 . . . w1N
 w21 w22 . . . w2N 
 
M= . .. . ..  , (8)
 . . . . . . 
wN1 wN2 . . . wNN
where the weight or coefficient, wij , represents the semantic correlation between images
in cluster i and j .
The system works a follows. First, all images in a database are clustered into N clusters
based on visual feature similarity using, for instance, k-means algorithm. Obviously, the
images in each cluster initially are only similar in term of the selected visual features, like
in a typical CBIR system. Also, initially, all correlation coefficients between each two
clusters are set to zero, meaning only images within the same cluster are correlated and
images across clusters are uncorrelated. That is, the initial matrix is a unit one,
M0 = IN×N . (9)
Then, for a given query, the initial retrieval is based on visual features. Assume that after
a given iteration, n + m images are displayed, and n images are marked relevant and
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 139

Figure 1. Correlation network to memorize semantic correlations between image groups.

m irrelevant. The relevant as well as irrelevant images may or may not be from difference
clusters. This approach memorizes such feedbacks by updating the correlation matrix as
below:

m
n
Mt = Mt −1 + F (q)F (pi )T − F (q)F (ni )T , (10)
i=1 i=1

where q is the feature vector of the query, pi and ni are feature vectors of positive and
negative feedback samples, and F (x) is a transform function used to determine the update
magnitude based on the feedback samples. In this way, the correlation between the clus-
ter where the query original falls in and these the positive samples fall in are increased,
progressively embedding the information on semantic correlations between images. This
correlation is then used in subsequent retrievals in which not only the visual features, but
also the semantic correlations are used in determine the similarity of an image to the query.
Experiments have shown that such a progressive learning approach effectively utilizes the
knowledge learnt from previous queries to reduce the number of iterations to achieve high
retrieval accuracy [11].
140 ZHANG ET AL.

Also, if there are two distinct groups in one initial cluster which semantically dissimilar,
meaning that they are negative examples to each other, a splitting is performed to spit the
initial cluster into two clusters. On the other hand, based on feedbacks, when two clusters
that are close in features space and have high correlation between them according to M, the
two initial clusters could be merged into one. That is, the correlation network dynamically
updates its structure in addition to updating the correlation matrix as learning from user
feedback.

2.5. Log mining in relevance feedback

More recently, people are aware of the fact that the Web is a rich resource of image data and
some of their semantics is usually available on the same web documents. Shen et al. [22]
exploit such reality and use some natural language processing technique to obtain semantic
features from the web text to characterize the web images. Hence, they are able to find
relevant images from the web using text-based queries. In our work of web image search
engine, we also use the web pages as the potential sources of semantics. There are two
kinds of difference for two systems. First difference is in the natural language processing
approach to obtaining semantic features. They use a so-called weighted chain-net, which
is actually a lexical chain, to represent the document space model for images, while our
document space model of all media objects is simply a vector space model, which is an
effective approach and has widely been used in traditional information retrieval. Other
natural language processing methods, such as, proper noun identification, are also used to
extract semantic features. Another difference is that our system exploits relevant feedback
and data mining on the users’ feedback logs to update the document space model. So our
approach outperforms traditional CBIR system and relevance feedback approaches.

3. An integrated relevance feedback framework

As discussed in Section 2, an effective relevance feedback system should provide effective

solutions to learning effectively from small sets of feedback samples, accumulating learned
knowledge and integrating low-level visual and high-level semantic features in query and
feedbacks to achieve high retrieval accuracy.
In addition, there typically are two different modes of user interactions involved in image
retrieval systems. In one case, the user types in a list of keywords representing the semantic
contents of the desired images. In the other case, the user provides a set of examples images
as the input and the retrieval system will try to retrieve other similar images. In most image
retrieval systems, these two modes of interaction are mutually exclusive. We argue that
combining these two approaches and allow them to benefit from each other yields a great
deal of advantage in terms of both retrieval accuracy and ease of use of the system.
To address all of above-mentioned issues, a CBIR framework with integrated relevance
feedback and query expansion was proposed [9,12,23,24]. Figure 2 illustrates the proposed
CBIR framework. It consists of a semantic network which links images to semantic an-
notations in a database, a similarity measure that integrating both semantic features and
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 141

Figure 2. The proposed framework of integrated relevance feedback and query expansion.

Figure 3. Semantic network.

image features, and a machine learning algorithm to iteratively update the semantic net-
work and to improve the system’s performance over time. The system supports both query
by keyword and query by image example through semantic network and low-level fea-
ture indexing. More importantly, the learning process propagates the keyword annotations
from the labeled images to unlabeled ones during the feedback. In this way, more and more
images are implicitly labeled by keywords by the semantic propagation process. This an-
notation propagation process also helps the system in accumulating knowledge learned to
improve performance of future retrieval requests.

3.1. Semantic network

The semantic network is a two-layered structure. The top layer is represented by a set of
keywords having links to the images in the database. It can be considered an extension
of the initial information embedding idea in the system shown in Figure 1. The degree
of relevance of the keywords to the associated images’ semantic content is represented as
the weight on each link, as shown pictorially in Figure 3. This layer is what we need in
142 ZHANG ET AL.

keyword relevance feedback and will be updated during the semantic propagation. Bottom
layer is a keyword thesaurus to construct the connection between different keywords.
The initial weights can be obtained by manual labeling. In our web image search engine,
they are initially extracted from the following sources on the web page that contains the
image based according to some empirical rules.
1. Image filename and URL. We assume that web page authors/editors usually assign
meaningful filenames to images in a web page. Some heuristic rules are used to extract
the keywords from the filenames. First, the filename is segmented into meaningful key-
words based on pre-define dictionary. For example, filename “redflower.jpg” includes
two semantic words: “red” and “flower.” Then, the clutter letters in filenames, such
as digits, hyphens, filename extension, etc., are discarded. We also extract semantic
keywords from the URL of the image files. The URL usually represents the hierar-
chy information of an image on the web page. For instance, “animal” and “bird” are
useful information in the URL http://www.ditto.com/images/animals/
anim_birds.jpg. We apply the similar technology of the filename segmentation to
segment the URL into meaning pieces.
2. ALT (alternate) text. The ALT text in a web page is used for displaying to replace the
associated image in a text-based browser. Hence, it usually represents the semantics
of the image concisely; hence, it is a very relevant feature to represent the semantic
meaning of the images.
3. Surrounding text. In web pages, images are used to enhance the content that the editors
want to present. Hence, some texts in the surrounding areas are semantically relevant
to the content of the image. However, it is difficult to judge which area among all
of the four possible areas (above, below, left, right) is the most relevant to the image.
Therefore, in our prototype, all of the four areas are chosen as the sources of the text
features for the image. This feature will be refined by log mining on the users’ relevant
feedback logs as discussed in Section 4.
4. Page title. The page title is a good candidate of the text feature of images in a web page.
5. Other information. Image hyperlinks, anchor text, etc., are also candidates of text fea-
tures of the images.
The initial value of weight wij associated with each keyword of an image is calculated
by the TF*IDF method [19]. That is, a feature vector is used to represent the all keywords
of an image and the vector is defined as
Dih = TF i · IDFi

N N N
= ti1 · log , . . . , tij · log , . . . , tim · log , (11)
n1 nj nm
where Dih is the feature vector, with each component value corresponding to the initial
weight assigned to the association of a keyword to an image i. tij stands for the frequency
of keyword j appearing in the text description of the image i. nj is the number of images
that are characterized by keyword j . N is the total number of images. Of course, if no
keyword information to the image, the corresponding feature vector is set to null.
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 143

With the semantic network, semantic based relevance feedback can be performed rela-
tively easily compared to its low-level feature counterpart. This is performed by updating
the weights wij associated with each link shown in Figure 3. The weight updating process
is described below.
1. A user submits a query and the system retrieves similar images using cross-modality
query extension, to be explained later in next subsection.
2. System collects the positive and negative feedback examples corresponding to the
query.
3. For each keyword in the input query, check to see if any of them is not in the keyword
database. If so, add them into the database without creating any links.
4. For each positive example, check to see if any query keyword is not linked to it. If so,
create a link with an initial weight from each missing keyword to this image. For all
other keywords that are already linked to this image, increase the weight by a predefined
value or using the method defined by (10) and (11).
5. Similarly, for each negative example, check to see if any query keyword is linked with
it. If so, decrease its weight, until it is zero.
Through this updating process, the keywords that represent the actual semantic content
of each image will receive a larger weight. Also, it can be easily seen that as more queries
are inputted into the system, the system is able to expand its vocabulary. Furthermore,
a semantic propagation method is used to populate keywords to unlabeled image during
user’s feedback iteration, which will be described later in this section.

3.2. Integrated and cross modality query and retrieval

The proposed framework has an integrated relevance feedback scheme in which both low-
level feature based and high-level semantic feedbacks are performed. We define a unified
metric function G to measure the relevance between query Q and any image j within an
image database in terms of both semantic and low-level feature content, where Q includes
the original query and users’ feedback information:
G(j, Q ) = α · simk (j, Qk ) + (1 − α) · simf (j, Qf ), (12)
where α ∈ [0, 1] is the weight of the semantic relevance in the overall similarity measure,
which can be specified by users. The larger α is, the more important the semantic rele-
vance will play in the overall similarity measurement. simf (j, Qf ) and simk (j, Qk ) are the
semantic similarity and low-level feature similarity between image j and revised query Q ,
respectively.
The revised query Q consists of two parts: the feature-based one Qf and the semantic
(keyword)-based one Qk . Qf is defined by (3)–(5) based on feature vectors of feedback im-
ages. With the semantic network, simk (j, Qk ) can be directly computed with the updated
weights.
To further improve the retrieval performance of the proposed framework, a cross-
modality query expansion method is supported. That is, once a query is submitted in
144 ZHANG ET AL.

the form of keywords, the retrieved images based on keyword search are considered as
the positive examples, based on which the query is expanded by features of these images.
This is done by first searching the semantic network, shown in Figure 3, for the keywords.
Then, the visual features of these images that contains these keywords (referred as training
images) are incorporated into the expanded query, qi∗ , defined by (5).
Since the expanded query qi∗ is defined by the feature vectors of all image associated
with query keywords, we need to determine relevance vector R in (5) so that proper rele-
vance factor r is assigned to each image feature vector. Therefore, we introduce a relevance
factor rij of the ith keyword association to the j th image, defined
wij
rij = N , (13)
j
l=1 wlj
which is the relative weighting of matched keyword i over all keyword weights of image j .
This relevance factor is defined in accordance to information retrieval theory. That is, the
importance of a keyword to an image that has links spreading over a large number of
images in the database should be penalized. We use this relevance factor in (5) to compute
the expanded query in feature space.
One can consider this expanded query as a result of relevance feedback, except that
the feedback images are obtained by semantic network search. Using this approach may
seem dangerous at first, since some images may have keyword association which the user
does not intent to search for. However, the goal here is to generate a set of query that is
guaranteed to contain the user’s intended search results. The query can then be narrowed
down by including more feedback images through the relevance feedback cycle.
For query by image example, similar procedure takes effect to extend the retrieval from
feature space to semantic space through the semantic network. In this way, user input
information is utilized as much as possible to improve the retrieval performance.
Using the methods described above, we can perform the combined semantic and feature-
based relevance feedback as follows.
1. Collect the user query and expand the query.
2. Compute the combined similarity according to (12) to retrieval initial set of images
using the expanded query.
3. Collect positive and negative feedbacks from the user.
4. Compute the feature vectors xni and the degree of relevance vector R of the retrieved
images to form the revised feature-based query Qf defined by (5).
5. Update weights in the semantic network. The new weights implicitly defined the revised
keyword-based query Qk by defining simk (j, Qk ) in (12).
6. Using the revised query to compute the ranking score for each image based on (12) and
sort the results.
7. Show new retrieval results and go to step 3.
From this query and combined feedback process, we can see that system learns from the
user’s feedback both semantically and in a feature based manner. In addition, it can be eas-
ily seen that our method degenerates into the method of [18] when no semantic information
is available.
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 145

3.3. Probabilistic keyword propagation scheme

As illustrated in Figure 3, the more images are annotated (correctly), the better the system
retrieval performance will be. However, the reality is human labeling of images is tedious
and expensive, hence not a feasible solution, which was what motivated CBIR research
fifteen years ago. To address this issue, a probabilistic progressive keyword propagation
scheme is proposed in our framework to automatically annotate images in the databases in
the relevance feedback process utilizing based a small percentage of annotated images.
We assume that initially only a few images in a database have been manual labeled with
keywords and the retrieval is performed mainly based on low-level features. As stated be-
fore, the initial keywords annotation can be from web through the crawler when the images
are from the Web, or labeled by humans. While the user is interacting with the system by
providing feedbacks in a query session, a progressive learning process is activated to prop-
agate the keyword annotation from the labeled images to un-labeled images so that more
and more images are implicitly labeled by keywords. In this way, the semantic network is
updated in which the keywords with a majority of user consensus will emerge as the dom-
inant representation of the semantic content of their associated images. As more queries
are inputted into the system, the system is able to expand its vocabulary. Also, through
the propagation process, the keywords that represent the actual semantic content of each
image will receive a large weight.
There are two major issues in keyword propagation: which images and which key-
word(s) should be propagated during a query session. To answer the first question, a
probability model, based on Bayesian learning, is proposed. We assume that, (1) all pos-
itive examples in one retrieval session belong to the same semantic class with common
semantic object(s) or meaning(s); and (2) the features from the same semantic class fol-
lows the Gaussian or Mixture Gaussian distributions. Therefore, all positive examples in a
query session are used to calculate and update the parameters of the corresponding seman-
tic Gaussian classes. Then, the probability of each image in the database belonging to such
semantic class is calculated. The common keywords in positive examples are propagated
to the images with very high probability belonging to this class.
As we can see, the propagation framework uses the same procedure as the feedback algo-
rithm in low-level features [23]. The only difference is that for low-level feature feedbacks,
the calculated probability is used for the ranking of an image in retrieval candidate list,
while here it is used to determine if an image should be in the propagation candidate list.
The propagation candidate set S is obtained as follows:
S = {c1 , . . . , ck }, where p(cj ) > ψ, (14)
where p(cj ) is the probability that image j in the database belonging to such semantic
class and ψ is a constant threshold which can be estimated by the training process. The
weight associates with the propagated keyword i and the image j is wij = p(cj ). More
complex distribution model, for example, Mixture Gaussian, may be used in this propaga-
tion framework. However, because the user’s feedback examples in practice are often very
few, complex models will leads into much more parameter estimation errors as there are
more parameters to be estimated.
146 ZHANG ET AL.

Also, to determine which keyword(s) should be propagated when an image is associated

with multiple keywords, there two approaches: using relevance factor defined by (13), or
using region-based approach [9]. In the former approach, the relevance factor rij can be
directly used to modify the weight with the propagated keyword. Obviously, the lower the
relevance of a keyword to an image is, the less weight it will be assigned to the keyword
in the prorogation, and vice versa. When the region-based approach is used, unlabeled
images to be propagated are firstly segmented into regions. By analyzing the feature distri-
bution of the segmented regions, a probability association between each segmented regions
and annotated keywords is set up for labeled images by region-based relevance feedback
approach. Then, each keyword of labeled image was assigned to one or several regions of
the image with certain probabilities. The detail of the region-based feedback framework is
in [9].

3.4. Experiment results

The image set used in evaluating the proposed framework described in this section is the
Corel Image Gallery of 10,000 images, manually labeled into 79 semantic categories.
200 random selected images compose the test query set. Whether a retrieved image is
correct or incorrect is judged according to the ground truth. Three types of color features
and three types of texture features are used in our system. Feedback process is running as
follows. Given a query from the test set, a different test image of the same category as the
query is used in each round of feedback iteration as the positive example for updating the
Gaussian parameters and revise the query. To incorporate negative feedback, the first two
irrelevant images are assigned as negative examples. The accuracy is defined as
relevant images retrieved in top N returns
Accuracy = . (15)
N
Several experiments have been performed as follows. First, three feature-based feedback
algorithms are compared. They are: a Bayesian feedback scheme by Su et al. in [23,24],
the scheme by [27] and scheme by [17] as defined by (5)–(7). This comparison is done
in the same feature space. Figure 4 shows that the accuracy of Bayesian feedback scheme
(referred as “our feedback approach”) becomes higher than the other two methods after
two feedback iterations. This demonstrates that the incorporated Bayesian estimation with
the Gaussian parameter-updating scheme is able to improve retrieval effectively.
To demonstrate the performance of the semantic propagation, the following experiment
was designed. 200 images in the query set were annotated by their category names. So
only one keyword is associated to one query image and other images in database have no
keyword annotations. During the test, each query image was used twice. The retrieval
performance is shown in Figure 5 with comparison to that with the propagation. It is seen
that for feedback with propagation, the retrieval accuracy is much higher than the original
one without it. This is because, when a system has propagation ability, latter queries can
utilize the accumulated knowledge from previous feedback iterations. In other words,
system has the learning ability and will be smarter with more users’ interactions.
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 147

Figure 4. Retrieval accuracy for top 100 results in original feature space.

Figure 5. Retrieval accuracy for top 100 results performance between feedback without propagation and feed-
back with propagation scheme.

4. Incorporating log mining in web image search engine

The architecture of our proposed web image search engine is shown in Figure 6. In addition
to all components in a CBIR system, the web search engine contains an image crawler and
three other modules, namely, the log miner, the model updater, and the query updater [3,4].
The data organization of the system mainly consists of four parts: the image database that
also contains metadata of images (i.e., low-level and high-level features), the user’s relevant
feedback log database, the document space model, and the user space model.
A typical scenario of the system is as follows. The off-line crawler is first employed at
regular intervals (e.g., once every day at non-peak network traffic hours) to collect potential
web pages containing images and store them into a local database. The feature extractor is
then applied to these pages to extract both the low-level visual features and the high-level
148 ZHANG ET AL.

Figure 6. Architecture of the proposed web image search engine.

semantic features for the images appear in these pages. In our system, the crawler and the
feature extractor actually work simultaneously. An image indexer is applied to the images
and their features to build the document space model, which is the representation of the
images in the database using their features. Once the document space model is available,
the matcher compares the user’s query with the document space model of images to yield
the image retrieval results. Since many irrelevant images may be returned by the retrieval
system, the user feedback interface is also provided for users to specify whether a returned
image is relevant or not to the user’s intents. The image retrieval system can utilize user
feedbacks to gain an understanding as to the relevancy of certain images and update the
query or adjust the matcher to return more accurate retrieval results. The user’s feedback
log data are also stored in the user log database in the system, from which the log miner
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 149

can find and build the user space model through log analysis. The user space model is
then combined with the document space model to update the document space model to
eliminate the mismatch between the page author’s expression and the user’s understanding
and expectation and can further improve the retrieval accuracy.

4.1. Document modeling of images

The document space model in the image search engine combines the low-level visual fea-
tures and high-level semantic features to index the images on the web. The detail process
is described as follows.
To collect images in the web, a crawler (or a spider, which is a program that can automat-
ically analyze the web pages and download the related pages hyper-linked to the analyzed
web pages) is used to collect images from many web sites. First, we re-arrange the semantic
network shown in Figure 3 a concept hierarchy of image categories, such as “animals,” “ar-
chitecture,” “arts,” etc. Then, we select some representative sites to be collected for each
concept category. For instance, http://www.nba.com for sports, http://www.
cnn.com for news, http://www.disney.com for entertainment, etc. For each site
candidate, the crawler collects the images and saves it to a local web page database. We
then use a simple classifier to classify the images into meaningful and junk (e.g., ban-
ners, backgrounds, buttons, icons, etc.) categories based on certain information like color
histograms, image sizes, image file types, etc.
For each image collected, the initial keywords are assigned in the way as described in
Section 3.1. In addition, the low-level features of each image are calculated. The keywords
and low-level features of all collected images form the document space.
In the image search process, the overall similarity is simply the linear combination of the
visual and the textual similarities, as defined in (12). It is not a good idea to set the same
default weight α = 0.5 in (12) to balance the importance of low-level features and high-
level features. However, it is very efficient for us to build up the baseline configuration
of our image retrieval system. The weight is automatically adjusted to a suitable value
by the system through the user’s feedback as to the relevancy of certain returned images.
Moreover, after we collect enough user log information of user feedback, data mining
technology (which will be presented in the next section) can be applied to find out the
importance of low-level feature and high-level feature for different concepts/categories.
For example, we find that for concept “Clinton,” the high-level features are more important
than the low-level features, while for concept “sunshine,” the low-level features are more
useful than the high-level features.

4.2. Log mining and feedback

In order to reduce the ambiguity in the text descriptors extracted from web pages and
the low-level image features, and to improve the search performance, we have proposed
a user space model to supplement the original document space model. This is achieved
by applying a user log analysis process. The user space model is also a vector space
150 ZHANG ET AL.

model. The difference between the user space model and the document space model is that
vectors in the user space model are constructed from the information mined from the user
feedback log data, not from the original information extracted from the web pages. When
a user submits a query, our system will return to the user some images found based on the
original document space model. The user can then use the feedback user interface to tell
the system about the return images as whether relevant or irrelevant to the query based on
his/her subjective judgment. Of course, most users do not have the patience and time to
mark all relevant and irrelevant images in the returned image collection. However, this is
not a very serious problem because even a small set of feedback images can provide very
useful information.
After we get some user’s feedback log data, the user space model can be built from the
user log. Let Q be the set of total queries used until now. Let Tj (j = 1, . . . , NT ) be the
set of all individual words that appear in Q. (Note that a singe query may contain multiple
words.) For a query in Q, Iri is one of the relevant images and Iii is one of the irrelevant
images specified by the user and stored in the user log.
From the user log, we can easily calculate the probabilities listed below:
Nri
P (Iri ) = , (16)
NQ
where Nri is the number of query times that image Iri has been retrieved and marked as
relevant, and NQ is the total number of queries.
Nri (Tj )
P (Iri |Tj ) = , (17)
NQ (Tj )
where Nri (Tj ) is the number of query times that image Iri has been retrieved and marked
as relevant for those queries that contain word Tj , and NQ (Tj ) is the number of queries
that contain Tj .
NQ (Tj )
P (Tj ) = . (18)
NQ
Based on the Bayesian theory, we have
P (Iri |Tj )P (Tj )
P (Tj |Iri ) = . (19)
P (Iri )
In addition, for irrelevant images in the user log, we have
Nii (Tj )
P (Iii |Tj ) = , (20)
NQ (Tj )
where Nii (Tj ) is the number of times that image Iii has been retrieved and marked as
irrelevant for those queries that contain word Tj .
For a given image I , P (Tj |I ) (j = 1, . . . , NT ) calculated using (19) also form a vector
for I . We call this vector the user space model of image I , compared to the document
space model of image I , which is built from the related features extracted from the web
pages.
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 151

If we have a large collection of user log data, it is reasonable to say that the information
in the user space model is more accurate than the information in the original document
space model. However, as we have previously stated, few users like to tag all relevant and
irrelevant images in the retrieval result. Hence, the user feedback log is usually not enough
and this causes the user space model to be not as comprehensive as the original document
space model. Therefore, we cannot replace the document space model with the user space
model completely. We choose to integrate the user space model into the original document
space model to improve the accuracy of the final document space model.
For each image I , vector U is the feature in the user space model, and vector D is the
textual feature in the document space model. We simply use the linear combination method
to integrate these two vectors. We use D new to denote the updated document space model,
which is calculated using

new = ηU + (1 − η)D,

D (21)

where η is used to adjust the weight between the user space model and the document space
model. Actually, η is the confidence of the vector U in the user space model. In our
approach, if the vector in the user space model is accurate and comprehensive enough, we
can assign η with a value very close to 1.0. If the vector in the user space model is not
accurate and not comprehensive enough, the value of η should be relatively small. The
times that an image is marked in the feedback by the user can be used to determine the
value of η for this image. Obviously, if an image is marked in user feedback more times
than another image, the feedback information of this image should be more accurate and
comprehensive than the other image. The confidence of its vector U in the user space
model should thus be higher for this image than that for the other image and we can assign
a bigger η for this image than for the other image.
Since irrelevant images are also recorded in the user feedback log, we can also utilize
this information. For each irrelevant image Iii , we use P (Iii |Tj ) as the confidence that Iii
is irrelevant to query Tj and form a vector I. We denote D final to the text feature vector
of the image in the final document space model and calculate it using (22), similar to the
TF ∗ IDF method:

D new · 1 − I .
final = D (22)

4.3. Experiments

Based on the proposed architecture, a demo system of image search engine, called iFind©,
has been developed in Microsoft Research Asia. The graphic interface is shown in Figure 7.
The search options that iFind supports include:

• Keyword-based search. One can type in one or more keywords, such as girl, in the
textbox and start the retrieval. One will see some images displayed in several pages in
the browse mode.
152 ZHANG ET AL.

Figure 7. iFind user interface.

• Query by example. If the “Similar” hyperlink under an image is selected, the system
will retrieve some similar images that are semantically/visually similar to the example
image.
• Relevance feedback. The system will improve the performance of retrieval after the user
provides some positive and/or negative examples. One is promised to get much better
result after several iterations of feedback.
• Log mining. The retrieval performance of the system will be greatly improved after
off-line log mining process. The user could benefit from other users’ usages.
To illustrate improvement brought by log mining in image search, we show here some
evaluation results based on three system configurations: (1) the baseline system, which
provides only query and retrieval; (2) the feedback system, which can provides user feed-
back as well as the baseline functionality; (3) the full configuration including user log
mining.
In our experiments, we have selected more than 2000 representative image websites. The
intelligent crawler is used to collect the images from these hyperlinks. All related semantic
features, including image filenames, ALT texts, surrounding texts, and page titles, as well
the low-level visual features are also extracted using the feature extractor at the same time.
The images are stored in the database and indexed with their textual and visual features. In
total, we have collected more than 30,000 images from these websites. It is difficult for us
to calculate the recall of the system because it is a tedious job to browse the entire image
database and specify the ground truth manually. Therefore, we only choose 17 queries
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 153

Figure 8. The average precision–recall curve of the system’s retrieval performance for all queries.

to demonstrate the performance of the system. Furthermore, the calculation of recall is

roughly estimated after scanning the top 1000 images returned for each query. The selected
queries are: Clinton, Jordan, car, flower, tree, cat, submarine, mars, spring, galaxy, movie
star, potato, ship, space, tomb raider, female, and mountain. Figure 8 shows the average
precision–recall for all queries.
Although the feedback from a single user is limited in our experiments, multiple users’
feedbacks are accumulated and stored in the user log. The user space model is constructed
from the user log and used to improve the document space model and to improve the
retrieval performance. The system performance of applying log mining is represented in
the dash-dotted line in Figure 8 for the corresponding cases. As we can see from these
figures, the log mining not only improves the precision when the recall is low, but can also
improve the precision when the recall is high. In other words, the overall performance of
the system is improved after log mining.

5. Conclusions

In this paper, we have discussed in detail relevance feedback technologies in content-based

image retrieval systems. The key issues and representative algorithms in relevance feed-
back in CBIR are reviewed. We have presented a framework of integrated relevance feed-
back and semantic learning in content-based retrieval. Our method utilizes both semantic
and low-level feature properties of every feedback image in refine the retrieval, while in
the meantime, learning semantic annotations of each image. While the user is interacting
with the system by providing feedbacks in a query session, a progressive learning process
is activated to propagate the keyword annotation from the labeled images to un-labelled
images so that more and more images are implicitly labeled by keywords at certain prob-
abilities. In this way, more and more images are implicitly labelled by keywords by the
semantic propagation process. Thus, such process will improve the retrieval performance
in future, either query by image examples or by keywords. Furthermore, we extended
154 ZHANG ET AL.

the framework in a web image search engine by incorporating user log mining in refining
search accuracy. This new framework makes the image retrieval system to be superior over
either the classical CBIR or text-based systems.

Publisher’s note

This article is based on the original conference paper published by Kluwer Academic Pub-
lishers in Visual and Multimedia Information Management, edited by Xiaofang Zhou and
Pearl Pu. ISBN: 1-4020-7060-8. © 2002 by International Federation for Information
Processing.

References

[1] C. Buckley and G. Salton, “Optimization of relevance feedback weights,” in Proceedings of SIGIR’95,
1995.
[2] S. K. Chang, C. W. Yan, D. C. Dimitroff, and T. Arndt, “An intelligent image database system,” IEEE
Transactions on Software Engineering 14(5), 1988.
[3] Z. Chen, W. Liu, C. Hu, M. Li, and H. J. Zhang, “iFind: A web image search engine,” in Proceedings of
SIGIR2001, 2001.
[4] Z. Chen, W. Liu, F. Zhang, M. Li, and H. J. Zhang, “Web mining for web image retrieval,” Journal of the
American Society for Information Science and Technology 52(10), August 2001, 831–839.
[5] I. J. Cox, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, “The Bayesian image retrieval system,
PicHunter: Theory, implementation, and psychophysical experiments,” IEEE Transactions on Image
Processing, Special Issue on Digital Libraries, 2000.
[6] M. Flickner, H. Sawhney, W. Niblack et al., “Query by image and video content: The QBIC system,” IEEE
Computer Magazine 28, 1995, 23–32.
[7] J. Huang, S. R. Kumar, and M. Metra, “Combining supervised learning with color correlograms for content-
based image retrieval,” in Proceedings of ACM Multimedia’95, November 1997, pp. 325–334.
[8] Y. Ishikawa, R. Subramanya, and C. Faloutsos, “Mindreader: Query databases through multiple examples,”
in Proceedings of the 24th VLDB Conference, New York, 1998.
[9] F. Jing, M. Li, H. J. Zhang, and B. Zhang, “An effective region-based image retrieval framework,” in
Proceedings of ACM Multimedia 2002, Juan-les-Pins, France, December 1–6, 2002.
[10] J. Laaksonen, M. Koskela, and E. Oja, “PicSOM: Self-organizing maps for content-based image retrieval,”
in Proceedings of International Joint Conference on NN, July 1999.
[11] C. Lee, W. Y. Ma, and H. J. Zhang, “Information embedding based on user’s relevance feedback for image
retrieval,” in Proceedings of SPIE International Conference on Multimedia Storage and Archiving Sys-
tems IV, Boston, 19–22 September 1999.
[12] Y. Lu et al., “A unified framework for semantics and feature based relevance feedback in image retrieval
systems,” in Proceedings of ACM MM2000, 2000.
[13] S. D. MacArthur, C. E. Brodley, and C.-R. Shyu, “Relevance feedback decision trees in content-based image
retrieval,” in IEEE Workshop on Content-Based Access of Image and Video Libraries, 2000, pp. 68–72.
[14] T. Minka and R. Picard, “Interactive learning using a ‘Society of Models’,” Pattern Recognition 30(4), 1997.
[15] T. Mitchell, Machine Learning, McGraw-Hill, 1997.
[16] J. J. Rocchio Jr., “Relevance feedback in information retrieval,” in The SMART Retrieval System: Experi-
ments in Automatic Document Processing, ed. G. Salton, Prentice-Hall, 1971, pp. 313–323.
[17] Y. Rui and T. S. Huang, “A novel relevance feedback technique in image retrieval,” in Proceedings of 7th
ACM Conference on Multimedia, 1999.
[18] Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrieval with relevance feedback in MARS,”
in Proceedings of IEEE International Conference on Image Processing, 1997.
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 155

[19] G. Salton, Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.

[20] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
[21] S. Sclaroff, L. Taycher, and M. L. Cascia, “ImageRover: a content-based image browser for the World Wide
Web,” Technical Report 97-005, Boston University CS Dept., 1997.
[22] H. T. Shen, B. C. Ooi, and K. L. Tan, “Giving meanings to WWW images,” in Proceedings of ACM
MM2000, 2000, pp. 39–48.
[23] Z. Su, S. Li, and H. J. Zhang, “Extraction of feature subspaces for content-based retrieval using relevance
feedback,” in ACM Multimedia 2001, Ottawa, Canada, 2001.
[24] Z. Su, H. J. Zhang, and S. Ma, “Relevant feedback using a Bayesian classifier in content-based image
retrieval,” in SPIE Electronic Imaging 2001, San Jose, CA, January 2001.
[25] K. Tieu and P. Viola, “Boosting image retrieval,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2000.
[26] S. Tong and E. Chang, “Support vector machine active leaning for image retrieval,” in ACM Multimedia
2001, Ottawa, Canada, 2001.
[27] N. Vasconcelos and A. Lippman, “Learning from user feedback in image retrieval systems,” in NIPS’99,
Denver, CO, 1999.
[28] P. Wu and B. S. Manjunath, “Adaptive nearest neighbour search for relevance feedback in large image
database,” in ACM Multimedia Conference, Ottawa, Canada, 2001.
[29] Y. Wu, Q. Tian, and T. S. Huang, “Discriminant EM algorithm with application to image retrieval,” in IEEE
CVPR, South Carolina, 2000.
[30] H. J. Zhang and D. Zhong, “A scheme for visual feature based image indexing,” in Proceedings of
IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases III, 1995, pp. 36–46.

Thesis PDF
No ratings yet
Thesis PDF
142 pages
Exercise - Analytical Exposition Text
40% (5)
Exercise - Analytical Exposition Text
3 pages
Mahidhar Project Documentation
No ratings yet
Mahidhar Project Documentation
67 pages
Review Paper
No ratings yet
Review Paper
12 pages
Content-Based Image Retrieval
No ratings yet
Content-Based Image Retrieval
10 pages
Document 1
No ratings yet
Document 1
15 pages
Content-Based Image Retrieval (CBIR) : Match
No ratings yet
Content-Based Image Retrieval (CBIR) : Match
71 pages
PR2011 PDF
No ratings yet
PR2011 PDF
31 pages
CBIR Report
0% (1)
CBIR Report
27 pages
2021 - State of The Art Content Based Image Retrieval Techniques Using Deep Learning A Survey
No ratings yet
2021 - State of The Art Content Based Image Retrieval Techniques Using Deep Learning A Survey
23 pages
Content-Based Image Retrieval Based On ROI Detection and Relevance Feedback
No ratings yet
Content-Based Image Retrieval Based On ROI Detection and Relevance Feedback
31 pages
Content Based Image Retrieval
No ratings yet
Content Based Image Retrieval
18 pages
Semisupervised Biased Maximum Margin Analysis For Interactive Image Retrieval
No ratings yet
Semisupervised Biased Maximum Margin Analysis For Interactive Image Retrieval
15 pages
S.D.M College of Engineering and Technology: Content-Based Image Retrieval
No ratings yet
S.D.M College of Engineering and Technology: Content-Based Image Retrieval
18 pages
Cbir Thesis PDF
100% (3)
Cbir Thesis PDF
6 pages
A Naive Relevance Feedback Model For Content-Basedimageretrievalusingmultiple
No ratings yet
A Naive Relevance Feedback Model For Content-Basedimageretrievalusingmultiple
11 pages
Semi Supervised Biased Maximum Margin Analysis For Interactive Image Retrieval
No ratings yet
Semi Supervised Biased Maximum Margin Analysis For Interactive Image Retrieval
62 pages
Final Year Project Report
No ratings yet
Final Year Project Report
56 pages
Multilabel Neighborhood Propagation For Region-Based Image Retrieval
No ratings yet
Multilabel Neighborhood Propagation For Region-Based Image Retrieval
13 pages
Active Learning Methods For Interactive Image Retrieval
No ratings yet
Active Learning Methods For Interactive Image Retrieval
12 pages
Content Based Image Retrieval System Using K-Means Clustering Technique
No ratings yet
Content Based Image Retrieval System Using K-Means Clustering Technique
9 pages
A Stochastic Approach To Image Retrieval Using Relevance Feedback and Particle Swarm Optimization
No ratings yet
A Stochastic Approach To Image Retrieval Using Relevance Feedback and Particle Swarm Optimization
11 pages
Content Based Image Retrieval in Peer-To-Peer Networks
No ratings yet
Content Based Image Retrieval in Peer-To-Peer Networks
6 pages
Categorical Image Retrieval Through Genetically Optimized Support Vector Machines (GOSVM) and Hybrid Texture Features
No ratings yet
Categorical Image Retrieval Through Genetically Optimized Support Vector Machines (GOSVM) and Hybrid Texture Features
17 pages
Content-Based Image Retrieval Over The Web Using Query by Sketch and Relevance Feedback
No ratings yet
Content-Based Image Retrieval Over The Web Using Query by Sketch and Relevance Feedback
8 pages
A Literature Review of Image Retrieval Based On Semantic Concept
No ratings yet
A Literature Review of Image Retrieval Based On Semantic Concept
8 pages
Content Based Image Retrieval
No ratings yet
Content Based Image Retrieval
5 pages
Unified Relevant Feedback Frame Work
No ratings yet
Unified Relevant Feedback Frame Work
13 pages
Image Quantification Learning Technique Through Content Based Image Retrieval
No ratings yet
Image Quantification Learning Technique Through Content Based Image Retrieval
4 pages
Content-Based Image Retrieval (CBIR) and Annotation System
No ratings yet
Content-Based Image Retrieval (CBIR) and Annotation System
4 pages
Learning From Relevance Feedback Sessions Using A K-Nearest-Neighbor-Based Semantic Repository
No ratings yet
Learning From Relevance Feedback Sessions Using A K-Nearest-Neighbor-Based Semantic Repository
4 pages
Efficient Relevance Feedback For Content-Based Image Retrieval by Mining User Navigation Patterns
No ratings yet
Efficient Relevance Feedback For Content-Based Image Retrieval by Mining User Navigation Patterns
13 pages
Eng Survey S. Subitha
No ratings yet
Eng Survey S. Subitha
12 pages
A Review On Feature Extraction Technique PDF
No ratings yet
A Review On Feature Extraction Technique PDF
7 pages
Ijmer 45060109 PDF
No ratings yet
Ijmer 45060109 PDF
9 pages
Singer Sewing Machine Instruction Manual
100% (1)
Singer Sewing Machine Instruction Manual
47 pages
Sketch4Match - Content-Based Image Retrieval
No ratings yet
Sketch4Match - Content-Based Image Retrieval
4 pages
Navigation Pattern Based Relevance Feedback For Content Based Image Retrieval
No ratings yet
Navigation Pattern Based Relevance Feedback For Content Based Image Retrieval
10 pages
Relevance Feedback in Content Based Image Retrieval: A Review
No ratings yet
Relevance Feedback in Content Based Image Retrieval: A Review
7 pages
Design of An Effective Method For Image Retrieval
No ratings yet
Design of An Effective Method For Image Retrieval
6 pages
Sketch4Match - Content-Based Image Retrieval System Using Sketches
No ratings yet
Sketch4Match - Content-Based Image Retrieval System Using Sketches
17 pages
Visual Information Retrieval For Content Based Relevance Feedback: A Review
No ratings yet
Visual Information Retrieval For Content Based Relevance Feedback: A Review
6 pages
Web Image Reranking Project Report
100% (1)
Web Image Reranking Project Report
28 pages
Query Specific Semantic Signature For Improved Web Image Re-Ranking
No ratings yet
Query Specific Semantic Signature For Improved Web Image Re-Ranking
3 pages
An Effective Relevance Feedback Scheme For Content-Based Image
No ratings yet
An Effective Relevance Feedback Scheme For Content-Based Image
6 pages
Abstract
No ratings yet
Abstract
2 pages
An Efficient Perceptual of Content Based Image Retrieval System Using SVM and Evolutionary Algorithms
No ratings yet
An Efficient Perceptual of Content Based Image Retrieval System Using SVM and Evolutionary Algorithms
7 pages
Content-Based Image Retrieval
No ratings yet
Content-Based Image Retrieval
5 pages
A User-Oriented Image Retrieval System Based On Interactive Genetic Algorithm
No ratings yet
A User-Oriented Image Retrieval System Based On Interactive Genetic Algorithm
8 pages
A User-Oriented Image Retrieval System Based On Interactive Genetic Algorithm
No ratings yet
A User-Oriented Image Retrieval System Based On Interactive Genetic Algorithm
8 pages
Content-Based Image Retrieval: October 2017
No ratings yet
Content-Based Image Retrieval: October 2017
4 pages
Content-Based Image Retrieval Using Feature Extraction and K-Means Clustering
No ratings yet
Content-Based Image Retrieval Using Feature Extraction and K-Means Clustering
12 pages
3 Pagesynopsis 2
No ratings yet
3 Pagesynopsis 2
3 pages
Tutorial Letter 102 - Portfolio Exam Information
No ratings yet
Tutorial Letter 102 - Portfolio Exam Information
10 pages
An Efficient Image Retrieval in DSR: IPASJ International Journal of Information Technology (IIJIT)
No ratings yet
An Efficient Image Retrieval in DSR: IPASJ International Journal of Information Technology (IIJIT)
7 pages
Chapter-1: 1.1 Tapered Steel Members
No ratings yet
Chapter-1: 1.1 Tapered Steel Members
11 pages
Elektor Electronics USA 1991 03
No ratings yet
Elektor Electronics USA 1991 03
72 pages
Efficient Image Retrieval Using Indexing Technique: Mr.T.Saravanan, S.Dhivya, C.Selvi
No ratings yet
Efficient Image Retrieval Using Indexing Technique: Mr.T.Saravanan, S.Dhivya, C.Selvi
5 pages
Content Based Image Retrieval Methods Using Self Supporting Retrieval Map Algorithm
No ratings yet
Content Based Image Retrieval Methods Using Self Supporting Retrieval Map Algorithm
7 pages
Intelligent Water Drop Algorithm Based Relevant Image Fetching Using Histogram and Annotation Features
No ratings yet
Intelligent Water Drop Algorithm Based Relevant Image Fetching Using Histogram and Annotation Features
7 pages
Content Based Image Retrieval System Using Sketches and Colored Images With Clustering
No ratings yet
Content Based Image Retrieval System Using Sketches and Colored Images With Clustering
4 pages
Content-Based Image Retrieval System Using Sketches
No ratings yet
Content-Based Image Retrieval System Using Sketches
4 pages
INFO1113 Assignment 2023 S2
No ratings yet
INFO1113 Assignment 2023 S2
11 pages
Nobel Prize - Story by Vikas Taya
No ratings yet
Nobel Prize - Story by Vikas Taya
1 page
Building Pharaoh's Ships Cedar, Incense and Sailing The Great Green
100% (1)
Building Pharaoh's Ships Cedar, Incense and Sailing The Great Green
16 pages
The Social Work Student's Research Handbook - 2nd Edition Instant DOCX Download
100% (15)
The Social Work Student's Research Handbook - 2nd Edition Instant DOCX Download
16 pages
The Origin of The Junk and Sampan
No ratings yet
The Origin of The Junk and Sampan
10 pages
Ngo The Bach-Vietnamese Ceramics in Asian Maritime Trade Between 14th and 17th Centuries
No ratings yet
Ngo The Bach-Vietnamese Ceramics in Asian Maritime Trade Between 14th and 17th Centuries
13 pages
Cashless Economy
No ratings yet
Cashless Economy
9 pages
The Signification of Naga in Thai Architectural and Sculptural Ornaments
No ratings yet
The Signification of Naga in Thai Architectural and Sculptural Ornaments
19 pages
Facilities Management Conference Indonesia
No ratings yet
Facilities Management Conference Indonesia
6 pages
CANON Color ImageRUNNER C2880, C2880i, C3380, C3380i Parts List
100% (1)
CANON Color ImageRUNNER C2880, C2880i, C3380, C3380i Parts List
150 pages
Embr 1 PDF
No ratings yet
Embr 1 PDF
32 pages
ChuteDesignFormulas Paper43
No ratings yet
ChuteDesignFormulas Paper43
11 pages
Chiller York San Lorenzo Ycal0024
No ratings yet
Chiller York San Lorenzo Ycal0024
112 pages
A. J. PARKER-The Philosophy of Shipbuilding-Conceptual Approaches To The Study of Wooden Ships
No ratings yet
A. J. PARKER-The Philosophy of Shipbuilding-Conceptual Approaches To The Study of Wooden Ships
2 pages
Frieze Groups PDF
No ratings yet
Frieze Groups PDF
6 pages
Yellowstripe Scad
No ratings yet
Yellowstripe Scad
7 pages
My Homework For You
100% (1)
My Homework For You
4 pages
Mẫu Câu Writing Task 2 Hay
No ratings yet
Mẫu Câu Writing Task 2 Hay
15 pages
PP PPT Myp5
No ratings yet
PP PPT Myp5
14 pages
BC672 772RB-2 6pg
No ratings yet
BC672 772RB-2 6pg
6 pages
P1 - M24TZ2 - ChatGPT - Marking Notes
No ratings yet
P1 - M24TZ2 - ChatGPT - Marking Notes
2 pages
Austronesian Shipping in The Indian Ocean: From Outrigger Boats To Trading Ships
No ratings yet
Austronesian Shipping in The Indian Ocean: From Outrigger Boats To Trading Ships
26 pages
Adam J.Young-Root of Contemporary Maritime Piracy in SA PDF
No ratings yet
Adam J.Young-Root of Contemporary Maritime Piracy in SA PDF
177 pages
DLP Cot2
No ratings yet
DLP Cot2
3 pages
Composition of Malay Woodcarving
No ratings yet
Composition of Malay Woodcarving
18 pages
Combo
No ratings yet
Combo
11 pages
Neurophysiological Effects of Spinal Manipulation: Joel G. Pickar, DC, PHD
No ratings yet
Neurophysiological Effects of Spinal Manipulation: Joel G. Pickar, DC, PHD
15 pages
Trading Ships of The South China Sea - Shipbuilding
No ratings yet
Trading Ships of The South China Sea - Shipbuilding
29 pages
10024947D00 - Turbine Control Board Requirements Specification, PB 540
No ratings yet
10024947D00 - Turbine Control Board Requirements Specification, PB 540
8 pages
Pci Assigment
No ratings yet
Pci Assigment
7 pages
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
No ratings yet
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
5 pages
List of Authorised Recyclers 09 07 2024 (2) - 3
No ratings yet
List of Authorised Recyclers 09 07 2024 (2) - 3
1 page
Azizi Bahauddin-Sustainable Design of The Suckling Elephant House of Malaysia
No ratings yet
Azizi Bahauddin-Sustainable Design of The Suckling Elephant House of Malaysia
11 pages
Azizi Bahauddin-Sustainable Design of The Suckling Elephant House of Malaysia
No ratings yet
Azizi Bahauddin-Sustainable Design of The Suckling Elephant House of Malaysia
11 pages
Balloon Manual
No ratings yet
Balloon Manual
7 pages
The Pulau Brani Jong
No ratings yet
The Pulau Brani Jong
7 pages
Marker Enzymes
No ratings yet
Marker Enzymes
4 pages
1c - Business Letter Rules
No ratings yet
1c - Business Letter Rules
1 page
Uponor Technical Information Smatrix
No ratings yet
Uponor Technical Information Smatrix
1 page
Content Based Image Retrieval: Unlocking Visual Databases
From Everand
Content Based Image Retrieval: Unlocking Visual Databases
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Relevance Feedback and Learning in Content-Based Image Search

Uploaded by

Relevance Feedback and Learning in Content-Based Image Search

Uploaded by

World Wide Web: Internet and Web Information Systems, 6, 131–155, 2003

 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Relevance Feedback and Learning in Content-Based

Keywords: image retrieval, relevance feedback, machine learning, web mining

in a semantic network. In addition, a cross-modality query expansion scheme is imple-

2. Relevance feedback algorithms

2.1. Classical algorithms

2.2. Relevance feedback as a learning process

Relevance feedback can be considered as a leaning problem – a user provides feedback

2.3. Feature versus semantics in relevance feedback

The framework to be discussed in Section 3 integrates both semantics and low-level

2.4. Relevance feedback with memory

A disadvantage in the classic relevance feedback as well as many “learning” based

Figure 1. Correlation network to memorize semantic correlations between image groups.

2.5. Log mining in relevance feedback

3. An integrated relevance feedback framework

As discussed in Section 2, an effective relevance feedback system should provide effective

Figure 3. Semantic network.

3.1. Semantic network

3.2. Integrated and cross modality query and retrieval

3.3. Probabilistic keyword propagation scheme

Also, to determine which keyword(s) should be propagated when an image is associated

3.4. Experiment results

4. Incorporating log mining in web image search engine

Figure 6. Architecture of the proposed web image search engine.

4.1. Document modeling of images

4.2. Log mining and feedback

 new = ηU + (1 − η)D,

Figure 7. iFind user interface.

to demonstrate the performance of the system. Furthermore, the calculation of recall is

In this paper, we have discussed in detail relevance feedback technologies in content-based

[19] G. Salton, Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.

You might also like

new = ηU + (1 − η)D,