0% found this document useful (0 votes)
46 views8 pages

A Generation Model To Unify Topic Relevance and Lexicon-Based Sentiment For Opinion Retrieval

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval

Uploaded by

Office Work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views8 pages

A Generation Model To Unify Topic Relevance and Lexicon-Based Sentiment For Opinion Retrieval

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval

Uploaded by

Office Work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A Generation Model to Unify Topic Relevance

and Lexicon-based Sentiment for Opinion Retrieval


Min Zhang Xingyao Ye
State key lab of Intelligent Tech.& Sys, School of Software
Dept. of Computer Science, Tsinghua University
Tsinghua University, Beijing, 100084, China Beijing, 100084, China
86-10-6279-2595 86-10-5153-1413
[email protected] [email protected]

ABSTRACT request than getting encyclopedia-like descriptions. General


Opinion retrieval is a task of growing interest in social life and opinion retrieval is an important issue in practical activities such
academic research, which is to find relevant and opinionate as product survey, political opinion polls, advertisement analysis,
documents according to a user’s query. One of the key issues is etc. Some researchers have observed this underrepresented need
how to combine a document’s opinionate score (the ranking score of information and made attempts towards efficient detection,
of to what extent it is subjective or objective) and topic relevance extraction and summarization of opinions from web data [7, 8,
score. Current solutions to document ranking in opinion retrieval 15]. However, much of the work focused on presenting a
are generally ad-hoc linear combination, which is short of comprehensive and detailed analysis of the sentiments expressed
theoretical foundation and careful analysis. In this paper, we focus in the text, without studying how well each source document can
on lexicon-based opinion retrieval. A novel generation model that meet the need of the user. In addition, this branch of work seek
unifies topic-relevance and opinion generation by a quadratic solutions to a specific data domain, such as product/movie review
combination is proposed in this paper. With this model, the websites [7,15] and weblogs [8], so they make use of many field-
relevance-based ranking serves as the weighting factor of the dependent features such as different aspects of a product, which
lexicon-based sentiment ranking function, which is essentially are not present for other types of text data.
different from the popular heuristic linear combination approaches. The rising prospects of research and implementation on opinion
The effect of different sentiment dictionaries is also discussed. search are opened up by the explosive amount of user-centric data
Experimental results on TREC blog datasets show the significant available recently. People have been writing about their lives and
effectiveness of the proposed unified model. Improvements of thoughts more freely than ever on personal blogs, virtual
28.1% and 40.3% have been obtained in terms of MAP and p@10 communities and special interest forums. Driven by this trend and
respectively. The conclusion is not limited to blog environment. its intriguing research values, TREC started a special track on
Besides the unified generation model, another contribution is that blog data in 2006 with a main task of retrieving personal opinions
our work demonstrates that in the opinion retrieval task, a towards various topics, and it has been the track that has the most
Bayesian approach to combining multiple ranking functions is participants in 2007.1
superior to using a linear combination. It is also applicable to
But how to combine opinion score (the ranking score of to what
other result re-ranking applications in similar scenario.
extent it is subjective or objective) with relevance score is a key
Categories and Subject Descriptors problem in research. In previous work, there are many examples
that the existing methods of document opinion ranking provide no
H.3.3 [Information Search and Retrieval]: Retrieval Models
improvements over mere topic-relevance ranking. [12] Things
General Terms: Algorithms, Experimentation, Theory come better in 2007. But there’s still an interesting observation
that the topic-relevance result outperforms most opinion-based
Keywords approaches [26]. Ad-hoc solutions have been adopted to combine
Generation model, topic relevance, sentiment analysis, opinion relevance ranking and the opinion detection result, causing
retrieval, opinion generation model performance to suffer from lack of adequate theoretical support.
In this paper, we focus on the problem of searching opinions over
1. INTRODUCTION general topics with the aim of presenting a ranked list of
In recent years, there is a growing interest in finding out people’s documents containing personal opinions towards the given query.
opinions from web data. In many cases, obtaining subjective We start from the general statistics-based information retrieval,
attitudes towards some object, person or event is often a stronger following the idea of taking relevance estimation problem as
query generation and document generation. Then considering the
opinion retrieval background, we induct the new constrain of
Permission to make digital or hard copies of all or part of this work for sentiment expression into the model. With probabilistic derivation,
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists, * Supported by the Chinese National Key Foundation Research &
requires prior specific permission and/or a fee. Development Plan (2004CB318108), Natural Science Foundation
SIGIR’08, July 20–24, 2008, Singapore (60621062, 60503064, 60736044) and National 863 High
Copyright 2008 ACM 978-1-60558-164-4/08/07…$5.00. Technology Project (2006AA01Z141.

411
we come to a novel generation model that unifies the topic- retrieval, i.e. to find opinions about a given topic. However in that
relevance model and the opinion generation model by a quadratic work, the emphasis is on how to judge the presence of such
combination. It is essentially different from the linear opinions and no ranking strategy is put forward. The first opinion
interpolation between the document’s relevant score and its ranking formula is introduced by Eguchi and Lavrenko [2] as the
opinion score, which is popularly used in such tasks. With this cross entropy of topics and sentiments under a generation model.
proposed model, the relevance-based ranking criterion now serves The instantiation of this formula, however, does not perform very
as the weighting factor for the lexicon-based sentiment ranking well in the following TREC opinion retrieval experiments. No
function. Experimental results show the significant effectiveness encouraging result has been obtained.
of the proposed unified model. It is reasonable since the relevance
score is a reliable indicator of whether opinions, if any, expressed Opinion search systems that perform well empirically generally
in the document is indeed towards the wanted object. This notion adopt a two-stage approach [12]. Topic-relevance search is carried
is a novel characteristic of our model because in previous work, out first by using relevance ranking (e.g. TF*IDF ranking or
the opinion score is always calculated independently to the topic- language modeling). Then heuristic opinion detection is used to
relevance degree. Furthermore, this process can be viewed as a re-rank the documents. One major method to identify opinionate
result re-ranking. Our work demonstrates that in IR and sentiment content is by matching the documents with a sentiment word
analysis, a Bayesian approach to combining multiple ranking dictionary and calculating term frequency [6, 10, 11, 19]. The
functions is superior to using a linear combination. It is also matching process is often performed multi-times for different
applicable to other result re-ranking applications in similar dictionaries and different restrictions on matching. Dictionaries
scenario. This opinionate document ranking problem is of are constructed according to existing lexical categories [6, 10, 19]
fundamental benefits to all opinion-related research issues, in that or the word distribution over the dataset [10, 11, 19]. Matching
it can provide high quality results for further feature extraction constraints often concern with the distance between topic terms
and user behavior learning. and opinion terms, which can be thought of as a sliding window.
Some require the two types of words to be in the same sentence
Although the experiments in this paper are conducted on TREC [10], others set the maximum word allowed between them [19].
(Text REtreival Conference) blog 06 and 07 data sets, no After the opinion score is calculated, an effective ranking formula
characteristic of blog data has been used, such as feature is needed to combine multiple sources of information. Most
extraction, blog spamming filtering, processing on blog feed and existing approaches use a linear combination of relevance score
comments, etc. In addition, the lexicons used in this work are all and opinion score [6, 10, 19]. A typical example is shown below.
domain-independent ones. Hence the conclusion is not limited to
blog environment and the proposed approach is applicable to all α * Scorerel + β * Scoreopn (1)
opinion retrieval tasks on different kinds of resource.
where α and β are combination parameters, which are often tuned
The rest of the paper is organized as follows. We first review by hand or learned to optimize a target metric such as binary
previous work in section 2. In section 3, we present our generation preference [10]. Other alternatives include demoting the ranking
model for opinion retrieval that unifies topic relevance model and of neutral documents [11].
sentiment-based opinion generation. Details for estimating model
Domain specific information has always been studied by
parameters are also discussed in the section. After introducing
researchers. Mishne [22, 23] proposed three simple heuristics with
experiment settings in section 4, we test our generation model
improved opinion retrieval performance by using blog-specific
with comparative experiments in section 5, together with some
properties. Other works make use of many field-dependent
further discussions. Finally, we summarize the paper and suggest
features such as different aspects of a product or movie [7, 15],
avenues for future work in section 6.
which are not present for other types of text data. TREC blog
2. RELATED WORK track is also an important research and experimental platform for
opinion retrieval. The major goal is to explore the information
There has long been interest in either the topics discussed or the
seeking behavior in the blogosphere, with an emphasis on spam
opinions expressed in web documents. A popular approach to
detection, blog structure analysis, etc. Hence submitted work
opinion identification is text classification [7, 15, 22]. Typically, a
often goes to great lengths to exploit the non-textual nature of a
sentence classifier is learned from both opinionate and neutral
blog post [10, 12]. This approach makes strong assumptions on
web pages available using language features such as local phrases
the problem domain and is difficult to generalize.
[15] and domain-specific adjective-noun patterns [7]. In order to
calculate an opinion score, the classification result is then 3. GENERATION MODEL FOR OPINION
combined with topic-relevance score using binary operator [12].
RETRIEVAL
Another line of research on opinionate documents comes from
natural language processing and deals with pure text without
3.1 A New Generation Model
The opinion retrieval task aims to find the documents that contain
constraints on the source of opinionate data. The work in general
relevant opinions according to a user’s query. In existing
treats opinion detection as a text classification problem and use
probabilistic-based IR models, relevance is modeled with a binary
linguistic features to determine the presence and the polarity of
random variable to estimate “What is the probability that this
opinions [13, 17, 22]. Nevertheless, they either neglect the
document is relevant to this query”. There are two different ways
problem of retrieving valuable documents [13, 17], or adopt an
to factor the relevance probability, i.e. query generation and
intuitive solution to ranking that is in a way out of their opinion
document generation [5].
detection [22].
In order to rank the document by their relevance, the posterior
It is the first in Hurst and Nigam’s work [4] that topicality and
probability p(d|q) is generally estimated, which captures how well
polarity are first fused together to form the notion of opinion

412
the document d “fits” the particular query q. According to Bayes 3.2 Topic Relevance Ranking
formula, In the topic relevance model, Irel(d,q) is based on the notion of
p(d | q) ∝ p(q | d ) p(d ) (2) document generation. A classic probabilistic model, the Binary
Independent Retrieval (BIR) model [5], is one of the most famous
where p(d) is the prior probability that a document d is relevant to ones in this branch. The heuristic ranking function BM25 and its
any query, and p(q|d) denotes the probability of query q being variants have been successfully applied in many IR experiments,
“generated” by d. When assuming a uniform document prior, the including TREC (Text Retrieval Conference) evaluation.
ranking function is reduced to the likelihood of generating the
expected query terms from the document. Hence in this paper, we adopt this BIR-based document
generation model, by which the topic relevance score ScoreIrel(d,q)
However, when explicitly searching for opinions, users’
given by the ranking function presented in [25] can be shown as:
information need is now restricted to only an opinionate subset of
the relevant documents. This subset is characterized by sentiment N − df ( w) + 0.5
expressions s towards topic q. Thus the ranking estimation for ScoreI rel ( d , q ) = ∑ w∈q ∩d (ln ×
df ( w) + 0.5
opinion retrieval changes to p(d|q,s).
( k1 + 1) × c ( w, d ) ( k + 1) × c ( w, q ) (6)
In this paper, for simplicity, when we discuss the lexicon-based × 3 )
|d | k 3 + c ( w, q )
sentiment analysis, the latent variable s is assumed to be a pre- k1 (1 − b ) + b + c ( w, d )
constructed bag-of-word sentiment thesaurus, and all sentiment avdl
words si are uniformly distributed. Then the prior probability that where c(w,d) is the count of word w in the document d,
the document d contains relevant opinions to query q is given by c(w,q) is the count of word w in the query q,
p(d | q, s) = ∑ p (d | q, si ) p ( si , s ) N is the total number of documents in the collection,
i

1 df(w) is the number of documents that contain word w ,


(3)
= ∑ p (d | q, si )
|S | i |d| is the length of document d,
1 avdl is the average document length,
∝ ∑ p (q, si | d ) p (d )
|S| i k1(from 1.0 to 2.0),b (usually 0.75) and k3 (from 0 to 1000) are
1 constants.
= ∑ p (si | d , q ) p (q | d ) p (d )
|S | i
3.3 Opinion Generation Model Parameter
where |s| is the number of words in sentiment thesaurus s.
Estimation
When Referring to Equation 2, it is easy to find that Eq.3 is In the opinion generation model, Iop(d,q,s) focus on the problem
combined with two factors: the last part p(q|d)p(d) gives the that given query q, how probably a document d generates a
estimation of topic relevance, and the remaining shows that given sentiment expression s. This model is on the branch of query
query q, how probably a document d generates a sentiment word generation, in which language model has been shown quite
si. Then Equation 3 is rewritten as: effective in information retrieval during recently years.
p(d | q, s ) = I op (d , q, s ) I rel (d , q ), where
(4) The sentiment expressions s is a latent variable in our framework
1 which is not inputted in the query but expected to appear in search
I op (d , q, s ) ≡ ∑ p(si | d , q),
|S| i
I rel (d , q) ≡ p (q | d ) p(d )
results. In this work, we assume s to be a bag-of-word sentiment
thesaurus, and sentiment words s is uniformly distributed. Hence
This is the generation model for opinion retrieval. In this model,
Irel(d,q) is the document generation probability to estimate topic 1 (7)
relevance, and Iop(d,q,s) is the opinion generation probability to
I op (d , q, s ) ≡ ∑ p ( s i | d , q ) ∝ ∑i p ( s i | d , q )
|S| i
sentiment analysis.
Different from query generation-based language model in IR,
Essentially it presents a quadratic relationship between document where the number of query terms (|q|) is usually small (less than
sentiment and topic relevance, which is naturally induced from the 100, and in most cases be 1 or 2), in our opinion generation model,
opinion generation process and is proven more effective in our the number of sentiment words (i.e. |s|) is large (generally several
experiments than the popular linear interpolation used in previous thousand), and the sparseness problem is prominent. Hence
work, e.g. smoothing has turned out to play an important role for parameter
rank
(5) estimation in this proposed model.
p(d | q, s) = (1 − λ ) p(s | d , q) + λp(q | d ) p(d )
where λ is the linear combination weight. ⎧ p seen ( s i | d , q) ⎧ p S ( s i | d , q) if s i is seen (8)
p( s i | d , q) = ⎨ =⎨
This result is reasonable since the relevance score is a reliable ⎩ p unseen ( s i | d , q) ⎩α d p( s i | C , q) otherwise
indicator of whether opinions, if any, expressed in the document is where pS(si |d, q) is the smoothed probability of a word si seen in
indeed towards the wanted object. This notion is a novel the document d given query q, αd is a coefficient controlling the
characteristic of our framework in that previous work calculated probability mass assigned to unseen words, p(si |C, q) is the
p(d|q,s) independent of the topic-relevance degree. collection language model given query q.
In the following two sections, we will discuss the two sub-models
This unigram model can be estimated using any existing method.
in the generation opinion retrieval model respectively.
As iluustrated in Zhai & Lafferty’s study [20], Jelinek-Mercer
smoothing is much more effective than the other two when the

413
“queries” are long and more verbose. In this proposed opinion rank ⎧[1 + λ ′ log(TF if λ ≠ 0
CO ( s , q ,W ) + 1)] × ScoreI rel ( d , q )
generation model, the “queries” are sentiment words. Therefore, p ( d | q, s ) = ⎨ (12)
⎩ ScoreI rel ( d , q ) if λ = 0
under this similar scenario, we use the MLE estimation, smoothed
by Jelinek-Mercer method. According to Jelinek-Mercer 1− λ co( si , q | W )
where λ ′ = , TFCO ( s ,q ,W ) = ∑S ∈d
smoothing, λ i
c(q, d )⋅ | W |

ps(si|d,q) = (1-λ) pml(si|d,q) + λ p(si|C,q), αd = λ The experimental analysis on this logarithm relationship will be
made in section 5.3, which shows the effectiveness of this
where λ is the smoothing parameter, and pml(si|d,q) is the
normalization.
maximum likelihood estimation of p(si|d,q). Then use this
smoothing to Equation 7 and Equation 8, we get the estimation: 4. EXPERIMENTAL SETUP
∑i p(si | d , q) 4.1 Data set
= ∑S ∈d p( si | d , q) + ∑S ∉d p( si | d , q) We test our opinion retrieval model on the TREC Blog06 and
i i (9)
= ∑S ∈d p S ( si | d , q) + ∑S ∉d α d p( si | C , q) Blog07 corpus [12, 26], which is the most authoritative opinion
i i
retrieval dataset available up to date.
= ∑S ∈d [(1 − λ ) p ml ( si | d , q) + λp( si | C , q)] + ∑S ∉d λp( si | C , q)
i i The corpus is collected from 100,649 blogs during a period of two
= ∑S ∈d (1 − λ ) p ml ( si | d , q) + λ ∑i p( si | C , q) and a half months. We focus on retrieving permalinks from this
i
dataset since human evaluation result is only available for these
= ∑S ∈d (1 − λ ) p ml ( si | d , q) + λ
i documents. There are 50 topics (Topic 851~900) from the TREC
2006 blog opinion retrieval task, and 50 topics (Topic 901~950)
We use the co-occurrence of sentiment word s and query word q
from TREC blog 2007. Query terms are extracted from the title
inside document d within a window W as the ranking measure of
field using porter stemming and standard stop words removal.
pml(si|d,q) . Hence the sentiment score of a document d given by
the opinion generation model is: Generally, queries from blog 06 are used for parameter
co( si , q | W ) (10) comparison study, including selection of sentiment thesaurus,
ScoreI op (d , q, s ) = ∑S ∈d (1 − λ ) +λ window size, and the effectiveness of different models. And
i
c(q, d )⋅ | W |
queries of blog 07 are used as the testing set, where all the
Where co(si,q|W) is the frequency of sentiment word si which is
parameters have been tuned in blog 06 data and no modification is
co-occurred with query q within window W, c(q,d) is the query
made.
term frequency in the document.

3.4 Ranking function of generation model for 4.2 Evaluation


To make the experiments applicable to real word applications and
opinion retrieval comparable to TREC evaluations, only short queries are used.
Taking the topic-relevance rank (Equation 6) and opinion-
The evaluation metrics used are general IR measures, i.e. mean
generation rank (Equation 11), we get the overall ranking function
average precision (MAP), R-Precision (R-prec), and precision at
for the unified generation model:
rank top 10 results (p@10). Totally three approaches have been
p (d | q, s ) = ScoreI op (d , q, s ) × ScoreI rel (d , q) (11) comparative studied in our experiments.
co( si , q | W ) (1) General linear combination (Shown as Linear Comb.)
= (∑ S ∈d (1 − λ ) + λ ) × ScoreI rel (d , q )
i c(q, d )⋅ | W | rank

rank ⎧(1 + λ ′TFCO ( s , q ,W ) ) × ScoreI rel ( d , q ) if λ ≠ 0


p(d | q, s) = (1 − λ )ScoreI op (d , q, s) + λScoreIrel (d , q)
= ⎨ where the ScoreIop(d,q,s) and ScoreIrel(d,q) are computed using
⎩ScoreI rel (d , q ) if λ = 0 the same way as that in the Equation 11.
1− λ co( si , q | W )
where λ ′ = , TFCO ( s, q ,W ) = ∑ S (2) Our proposed generation model with Jelinek-Mercer
λ i ∈d c(q, d )⋅ | W |
smoothing (Shown as Generation Model). See Equation 11.
Notice that this ranking function is not the precise quantitative (3) Our proposed generation model with Jelinek-Mercer
estimation of p(d|q,s) , because proportion factor 1/|S| in opinion- smoothing and logarithm normalization (Shown as Generation,
generation rank is ignored. But this factor has no affect to log). See Equation 12.
document ranking and hence this approximation is order-
preserving. 4.3 Selection of Sentimental Lexicon
In this ranking function, we directly use the co-occurrence For lexicon-based opinion detection methods, the selection of
frequency as the factor to estimate the generation probability opinion thesaurus plays an important role. There are several
pml(si|d,q). But as mentioned in section 3.3, generally, the number online public dictionaries from the area of linguistics, such as
of query terms are relative small, such as 1 or 2, but the size of WordNet [18] and General Inquirer [14]. We follow the general
sentiment thesaurus is really large, e.g. over several thousand or way [6] to select a small seed sentiment words list of WordNet,
even tens of thousands. In order to reduce this impact of and then incrementally enlarge the list with synonyms and
unbalance, the logarithm normalization is taken on opinion antonyms.
ranking. By this way, the ranking function turns out to be:
Another option is to rely on a self-constructed dictionary. Wilson
et al [17] manually selected 8821 words as their sentiment lexicon
and it has been used in some other works. Esuli and Sebastiani [3]

414
scored each word in WordNet regarding its positive, negative and
neutral indications to obtain a SentiWordNet lexicon. Words with
positive or negative score above a threshold in SentiWordNet are
used by some participants of the TREC opinion retrieval task.
Furthermore, we seek help from other languages. HowNet [1] is a
knowledge database of the Chinese language, and some of the
words in the dictionary have properties of positive or negative.
We use the English translation of those sentiment words provided
by HowNet.
For comparison, sentimental words from HowNet, WordNet,
General Inquirer and SentiWordNet are used as lexicons
respectively. Table 1 shows the detail information on the lists.

Table 1. Sentiment thesauruses used in our experiments


Thesaurus
Size Description
Name
English translation of
1 HowNet 4621 positive/negative
Chinese words
Selected words from Figure 1 MAP-λ curves with different thesaurus. (Blog 06)
2 WordNet 7426
WordNet
Words appeared in both
3 Intersection 1413
1 and 2
5.2 Selection of Window Size
It is intuitive that opinion modifiers are less likely to be related to
Words appeared in either an object far away from it than those close to it in the text. Thus
4 Union 10634
1or 2 during the opinion term matching process, a proximity window is
General Words in the positive often used to restrict the valid distance between the sentiment
5 3642 words and topic words. However, no one is sure about how close
Inquirer and negative category
the two types of words should be to each other and this threshold
Words with a positive or is often set by hand with various indications. In previous work,
6 SentiWordNet 3133
negative score above 0.6 window sizes that represent the length of direct modification (e.g.
3 [11]), a sentence [10, 22] (e.g. 10~20), a paragraph (e.g. 30~50
[11]), or the whole document [6] have been used.
5. EXPERIMENTAL RESULTS AND We test the retrieval performance under these settings respectively
DISCUSSION to illustrate how this factor could influence the opinion retrieval
ability of our model. The result is given in Figure 2.
5.1 Effectiveness of Sentimental Lexicons
The retrieval performance under different sentiment thesauruses is
presented in Figure 1. The cross-language HowNet dictionary
performs better than all other candidates and is quite insensitive to
the smoothing parameter. SentiWordNet and the Intersection
thesauri perform next and close to each other. General Inquirer
does not perform well and has the worst result.
There might be two reasons that lead to the better performance of
using the words from HowNet than using that from WordNet.
First, the list generated from WordNet might be lack of diversity
since the words come from a limited initial seeds and only
synonyms and antonyms are taken into consideration. Second, the
English translations of the Chinese sentiment words are annotated
by non-native speakers; hence most of them are common and
popular terms, which are generally used in the Web environment.
Since the performance of SentiWordNet and HowNet are with no
big difference when λ is higher, and SentiWordNet is open in the
Internet, we choose SentiWordNet as the sentiment thesaurus in
the following experiments to make the experiments much easier to
repeat by other researchers.
Figure 2. MAP v.s. window size with different λ. (Blog06)

415
It is clear that the larger the window is, the better the performance
is. And this tendency is invariant to different levels of smoothing.
The result is reasonable since the distance between a query term
and a sentiment word is generally used to demonstrate the opinion
relevance to the topic, which has already been taken into
consideration in this unified model by the quadratic combination
of topic relevance. And in the Web documents, the opinion words
may not always been located near the topic words.
Therefore, we set the full document as the default window size in
the following experiments.

5.3 Opinion Retrieval Model Comparison


Three opinion ranking formulas are tested in our experiment.
Their performance is compared in Figure 3.
We can see that the generation model is more effective than linear Figure 4. Precision-recall curves before and after opinion re-
combination especially when mild smoothing is performed. As the ranking of top 1000 relevant documents.
value of λ goes up, desired documents with only a few opinion
terms are deprived of the discriminative ability contained in their
opinion expressions, as this part of the probability is discounted to Table 2. Comparison of opinion retrieval performance
the whole document collection. Generation log model overcomes
this problem and gives the best retrieval performance under all Data
Method MAP R-Prec P@10
values of λ. This demonstrates the usefulness of our log- Set
smoothing approach in the setting of opinion search. In addition, Best run at blog 06 0.2052 0.2881 0.468
all three ranking schemes perform equivalent to or better than the
Blog Best title-run at blog 06 0.1885 0.2771 0.512
best run at TREC 2006 owing to the careful selection of sentiment
thesaurus and window size as discussed above. 06 Our Relevance Baseline 0.1758 0.2619 0.350
To further demonstrate the effectiveness of our opinion retrieval Our Unified Model 0.2257 0.3038 0.507
model, a comparison of opinion MAP with previous work is given Most improvement at
in Table 2. Performance improvement after opinion re-ranking is 15.9% 8.6% 21.6%
blog 07
shown in Figure 4 in precision-recall curves. Blog Our Relevance Baseline 0.2632 0.3249 0.432
07
Our Unified Model * 0.3371 0.3896 0.606
improvement 28.1% 19.9% 40.3%
*: on Blog 07 data, use the same parameters as those on Blog
06 data. λ=0.6, window=full, thesaurus: SentiWordNet.
All our approaches use title–only run.

In Figure 5, per topic gain in opinion MAP and p@10 are


visualized on blog 07 data set. Notice that no characteristic of
blog data has been used in this work, such as feature extraction,
blog spamming filtering, processing on blog feed and comments,
etc. In terms of MAP, 16 of the 50 topics receive improvement of
more than 50%, while only 5 topics result in minor performance
loss. Few topics that benefit the most from opinion re-ranking,
such as topic 912 (144%) and topic 928 (135%), are those where
only a few documents with relevant opinions are retrieved and
ranked lowly in the first stage. Only 4 topics’ performances
decrease a little (less than 40%). In terms of p@10, even more
significant results are given. Three topics get more than 200%
improvement, such as topic 946 (+900%), and only 6 topics get a
little drop on performance.
Table 3 gives detailed descriptions of two topics in blog06 and
blog07. We can see our re-ranking procedure successfully re-
scores almost all the target documents into the top 100 results.
Figure 3. MAP- curve for different opinion ranking This proves our formula to be highly accurate in discriminating a
formulas. few subjective texts from a large amount of factual descriptions.

416
Figure 5. Per-topic analysis: Performance improvement over 50 topics after re-ranking on Blog 07 data.
(a)MAP improvement, (b) p@10 improvement
(in (b), the three topics whose improvement is much higher than the figure upper-bound have been annotated individually.)

Table 3. Details of the best re-ranked topics examples


Topic Title Description
TREC 06 - 895 Oprah Find opinions about Oprah Winfrey's TV show
MAP Prec@10 Prec@30 Prec@100 Prec@1000
Before re-ranking 0.0687 0.2000 0.0333 0.1200 0.0640
After re-ranking 0.2721 0.8000 0.5000 0.3400 0.0640
Topic Title Description
TREC 07 - 946 tivo Find opinions about TIVO brand digital video recorders
MAP Prec@10 Prec@30 Prec@100 Prec@1000
Before re-ranking 0.2779 0.1000 0.3333 0.3900 0.2650
After re-ranking 0.4991 1.0000 0.9667 0.8300 0.2650

417
[9] Metzler, D., Strohman T., Turtle H., and Croft, W.B. Indri at
6. CONCLUSION AND FUTURE WORK TREC 2004: Terabyte Track. Online Proceedings of 2004
Text REtrieval Conference (TREC 2004), 2004
In this work we deal with the problem of opinion search towards
general topics. Contrary to previous approaches that view facts [10] Mishne, G. Multiple Ranking Strategies for Opinion
retrieval and opinion detection as two distinct parts to be linearly Retrieval in Blogs. Online Proceedings of TREC, 2006.
combined, we proposed a formal probabilistic generation model to [11] Oard, D., Elsayed, T., Wang, J., and Wu, Y. TREC-2006 at
unify the topic relevance score and opinion score. A couple of Maryland: Blog, Enterprise, Legal and QA Tracks. Online
opinion re-ranking formulas are derived using the language Proceedings of TREC, 2006. http://trec.nist.gov/
modeling approach with smoothing, together with logarithm
normalization paradigm. Furthermore, the effectiveness of [12] Ounis, I., de Rijke, M., Macdonald, C., Mishne, G., and
different sentiment lexicons and variant distances between Soboroff, I. Overview of the TREC 2006 Blog Track. In
sentiment words and query terms are compared and discussed Proceedings of TREC 2006, 15–27. http://trec.nist.gov/
empirically. Experiment shows that bigger windows are better [13] Pang, B., et al, Thumbs up? Sentiment Classification Using
than smaller windows. According to the experiments, the Machine Learning Techniques. In Proceedings of the
proposed model yields much better results on TREC Blog06 and Conference on Empirical Methods in Natural Language
Blog07 dataset. Processing (EMNLP) 2002, 79-86.
The novelty of our work lies in a probabilistic generation model [14] Stone, P., Dunphy, D., Smith, M., and Ogilvie, D. The
for opinion retrieval, which is general in motivation and flexible General Inquirer: A Computer Approach to Content
in practice. This work derives a unified model from the quadratic Analysis. MIT Press, Cambridge, 1966.
relation between opinion analysis and topic relevance, which is [15] Tong, R. 2001. An Operational System for Detecting and
essentially different from general linear combination. Furthermore, Tracking Opinions in on-line discussion. SIGIR Workshop
in this work, we do not make any assumption on the nature of on Operational Text Classification. 2001. 1-6.
blog-structured text. Therefore this approach is expected to be
generalized to all kinds of resources for opinion retrieval task. [16] Turtle, H. and Croft, W.B. Evaluation of an Inference
Network-Based Retrieval Model. ACM Transactions on
Future directions on opinion retrieval may go beyond merely Information System, in 9(3),187-222, 1991.
document re-ranking. An opinion-oriented index, as well as
[17] Wilson, T., Wiebe, J., and Hoffmann, P. Recognizing
deeper analysis on the structural information of opinion resources Contextual Polarity in Phrase-Level Sentiment Analysis. In
such as blogs and forums could be helpful in understanding the
Proceedings of HLT/EMNLP 2005. 347-354.
nature of opinion expressing behavior on web. Another interesting
topic is to automatically construct a collection-based sentiment [18] WordNet. http://wordnet.princeton.edu/
lexicon, which has been a hot research topic [26], and to induct [19] Yang, K., Yu, N., Valerio, A., Zhang, H. WIDIT in TREC-
this lexicon into our generation model. 2006 Blog track. Online Proceedings of TREC, 2006.
http://trec.nist.gov/
7. REFERENCES
[1] Dong, Z. HowNet. http://www.HowNet.org [20] Zhai, C. and Lafferty, J. A study of smoothing methods for
language models applied to information retrieval. ACM
[2] Eguchi, K. and Lavrenko, V. Sentiment Retrieval using Transactions on Information Systems (ACM TOIS ), Vol. 22,
Generative Models. In Proceedings of Empirical Methods on No. 2, 179-214.2004.
Natural Language Processing (EMNLP) 2006, 345-354.
[21] Zhai, C. A Brief Review of Information Retrieval Models,
[3] Esuli, A. and Sebastiani, F. Determining the semantic Technical report, Dept. of Computer Science, UIUC, 2007
orientation of terms through gloss classification. In
Proceedings of CIKM 2005, 617-624. [22] Zhang, W. and Yu, C. UIC at TREC 2006 Blog Track.
Online Proceedings of TREC, 2006. http://trec.nist.gov/
[4] Hurst, M. and Nigam, K. Retrieving Topical Sentiments from
Online Document Collections. Document Recognition and [23] Mishne, G. and Glance, N. Leave a Reply: An analysis of
Retrieval XI. 27--34. 2004. Weblog Comments. In WWE 2006 (WWW 2006 Workshop
on Weblogging Ecosystem), 2006.
[5] Lafferty, J. and Zhai, C. Probabilistic relevance models based
on document and query generation. Language Modeling and [24] Mishne, G. Using blog properties to improve retrieval, In
Information Retrieval, Kluwer International Series on Proceedings of the International Conference on Weblogs and.
Information Retrieval, Vol. 13, 2003. Social Media (ICSWM) 2007.
[6] Liao, X., Cao, D., Tan, S., Liu, Y., Ding, G., and Cheng X. [25] Singhal, A. Modern information retrieval: A brief overview.
Combining Language Model with Sentiment Analysis for Bulletin of the IEEE Computer Society Technical committee
Opinion Retrieval of Blog-Post. Online Proceedings of Text on Data Engineering, 24(4):35-43, 2001.
Retrieval Conference (TREC) 2006. http://trec.nist.gov/ [26] Macdonald, C. and Ounis, I. Overview of the TREC-2007
[7] Liu, B., Hu, M., and Cheng, J. Opinion observer: analyzing Blog Track. Online Proceedings of the 16th Text Retrieval
and comparing opinions on the Web. WWW 2005: 342-351 Conference (TREC2007).
http://trec.nist.gov/pubs/trec16/t16_proceedings.html
[8] Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. Topic
sentiment mixture: modeling facets and opinions in weblogs.
WWW 2007: 171-180

418

You might also like