A Generation Model To Unify Topic Relevance and Lexicon-Based Sentiment For Opinion Retrieval
A Generation Model To Unify Topic Relevance and Lexicon-Based Sentiment For Opinion Retrieval
411
we come to a novel generation model that unifies the topic- retrieval, i.e. to find opinions about a given topic. However in that
relevance model and the opinion generation model by a quadratic work, the emphasis is on how to judge the presence of such
combination. It is essentially different from the linear opinions and no ranking strategy is put forward. The first opinion
interpolation between the document’s relevant score and its ranking formula is introduced by Eguchi and Lavrenko [2] as the
opinion score, which is popularly used in such tasks. With this cross entropy of topics and sentiments under a generation model.
proposed model, the relevance-based ranking criterion now serves The instantiation of this formula, however, does not perform very
as the weighting factor for the lexicon-based sentiment ranking well in the following TREC opinion retrieval experiments. No
function. Experimental results show the significant effectiveness encouraging result has been obtained.
of the proposed unified model. It is reasonable since the relevance
score is a reliable indicator of whether opinions, if any, expressed Opinion search systems that perform well empirically generally
in the document is indeed towards the wanted object. This notion adopt a two-stage approach [12]. Topic-relevance search is carried
is a novel characteristic of our model because in previous work, out first by using relevance ranking (e.g. TF*IDF ranking or
the opinion score is always calculated independently to the topic- language modeling). Then heuristic opinion detection is used to
relevance degree. Furthermore, this process can be viewed as a re-rank the documents. One major method to identify opinionate
result re-ranking. Our work demonstrates that in IR and sentiment content is by matching the documents with a sentiment word
analysis, a Bayesian approach to combining multiple ranking dictionary and calculating term frequency [6, 10, 11, 19]. The
functions is superior to using a linear combination. It is also matching process is often performed multi-times for different
applicable to other result re-ranking applications in similar dictionaries and different restrictions on matching. Dictionaries
scenario. This opinionate document ranking problem is of are constructed according to existing lexical categories [6, 10, 19]
fundamental benefits to all opinion-related research issues, in that or the word distribution over the dataset [10, 11, 19]. Matching
it can provide high quality results for further feature extraction constraints often concern with the distance between topic terms
and user behavior learning. and opinion terms, which can be thought of as a sliding window.
Some require the two types of words to be in the same sentence
Although the experiments in this paper are conducted on TREC [10], others set the maximum word allowed between them [19].
(Text REtreival Conference) blog 06 and 07 data sets, no After the opinion score is calculated, an effective ranking formula
characteristic of blog data has been used, such as feature is needed to combine multiple sources of information. Most
extraction, blog spamming filtering, processing on blog feed and existing approaches use a linear combination of relevance score
comments, etc. In addition, the lexicons used in this work are all and opinion score [6, 10, 19]. A typical example is shown below.
domain-independent ones. Hence the conclusion is not limited to
blog environment and the proposed approach is applicable to all α * Scorerel + β * Scoreopn (1)
opinion retrieval tasks on different kinds of resource.
where α and β are combination parameters, which are often tuned
The rest of the paper is organized as follows. We first review by hand or learned to optimize a target metric such as binary
previous work in section 2. In section 3, we present our generation preference [10]. Other alternatives include demoting the ranking
model for opinion retrieval that unifies topic relevance model and of neutral documents [11].
sentiment-based opinion generation. Details for estimating model
Domain specific information has always been studied by
parameters are also discussed in the section. After introducing
researchers. Mishne [22, 23] proposed three simple heuristics with
experiment settings in section 4, we test our generation model
improved opinion retrieval performance by using blog-specific
with comparative experiments in section 5, together with some
properties. Other works make use of many field-dependent
further discussions. Finally, we summarize the paper and suggest
features such as different aspects of a product or movie [7, 15],
avenues for future work in section 6.
which are not present for other types of text data. TREC blog
2. RELATED WORK track is also an important research and experimental platform for
opinion retrieval. The major goal is to explore the information
There has long been interest in either the topics discussed or the
seeking behavior in the blogosphere, with an emphasis on spam
opinions expressed in web documents. A popular approach to
detection, blog structure analysis, etc. Hence submitted work
opinion identification is text classification [7, 15, 22]. Typically, a
often goes to great lengths to exploit the non-textual nature of a
sentence classifier is learned from both opinionate and neutral
blog post [10, 12]. This approach makes strong assumptions on
web pages available using language features such as local phrases
the problem domain and is difficult to generalize.
[15] and domain-specific adjective-noun patterns [7]. In order to
calculate an opinion score, the classification result is then 3. GENERATION MODEL FOR OPINION
combined with topic-relevance score using binary operator [12].
RETRIEVAL
Another line of research on opinionate documents comes from
natural language processing and deals with pure text without
3.1 A New Generation Model
The opinion retrieval task aims to find the documents that contain
constraints on the source of opinionate data. The work in general
relevant opinions according to a user’s query. In existing
treats opinion detection as a text classification problem and use
probabilistic-based IR models, relevance is modeled with a binary
linguistic features to determine the presence and the polarity of
random variable to estimate “What is the probability that this
opinions [13, 17, 22]. Nevertheless, they either neglect the
document is relevant to this query”. There are two different ways
problem of retrieving valuable documents [13, 17], or adopt an
to factor the relevance probability, i.e. query generation and
intuitive solution to ranking that is in a way out of their opinion
document generation [5].
detection [22].
In order to rank the document by their relevance, the posterior
It is the first in Hurst and Nigam’s work [4] that topicality and
probability p(d|q) is generally estimated, which captures how well
polarity are first fused together to form the notion of opinion
412
the document d “fits” the particular query q. According to Bayes 3.2 Topic Relevance Ranking
formula, In the topic relevance model, Irel(d,q) is based on the notion of
p(d | q) ∝ p(q | d ) p(d ) (2) document generation. A classic probabilistic model, the Binary
Independent Retrieval (BIR) model [5], is one of the most famous
where p(d) is the prior probability that a document d is relevant to ones in this branch. The heuristic ranking function BM25 and its
any query, and p(q|d) denotes the probability of query q being variants have been successfully applied in many IR experiments,
“generated” by d. When assuming a uniform document prior, the including TREC (Text Retrieval Conference) evaluation.
ranking function is reduced to the likelihood of generating the
expected query terms from the document. Hence in this paper, we adopt this BIR-based document
generation model, by which the topic relevance score ScoreIrel(d,q)
However, when explicitly searching for opinions, users’
given by the ranking function presented in [25] can be shown as:
information need is now restricted to only an opinionate subset of
the relevant documents. This subset is characterized by sentiment N − df ( w) + 0.5
expressions s towards topic q. Thus the ranking estimation for ScoreI rel ( d , q ) = ∑ w∈q ∩d (ln ×
df ( w) + 0.5
opinion retrieval changes to p(d|q,s).
( k1 + 1) × c ( w, d ) ( k + 1) × c ( w, q ) (6)
In this paper, for simplicity, when we discuss the lexicon-based × 3 )
|d | k 3 + c ( w, q )
sentiment analysis, the latent variable s is assumed to be a pre- k1 (1 − b ) + b + c ( w, d )
constructed bag-of-word sentiment thesaurus, and all sentiment avdl
words si are uniformly distributed. Then the prior probability that where c(w,d) is the count of word w in the document d,
the document d contains relevant opinions to query q is given by c(w,q) is the count of word w in the query q,
p(d | q, s) = ∑ p (d | q, si ) p ( si , s ) N is the total number of documents in the collection,
i
413
“queries” are long and more verbose. In this proposed opinion rank ⎧[1 + λ ′ log(TF if λ ≠ 0
CO ( s , q ,W ) + 1)] × ScoreI rel ( d , q )
generation model, the “queries” are sentiment words. Therefore, p ( d | q, s ) = ⎨ (12)
⎩ ScoreI rel ( d , q ) if λ = 0
under this similar scenario, we use the MLE estimation, smoothed
by Jelinek-Mercer method. According to Jelinek-Mercer 1− λ co( si , q | W )
where λ ′ = , TFCO ( s ,q ,W ) = ∑S ∈d
smoothing, λ i
c(q, d )⋅ | W |
ps(si|d,q) = (1-λ) pml(si|d,q) + λ p(si|C,q), αd = λ The experimental analysis on this logarithm relationship will be
made in section 5.3, which shows the effectiveness of this
where λ is the smoothing parameter, and pml(si|d,q) is the
normalization.
maximum likelihood estimation of p(si|d,q). Then use this
smoothing to Equation 7 and Equation 8, we get the estimation: 4. EXPERIMENTAL SETUP
∑i p(si | d , q) 4.1 Data set
= ∑S ∈d p( si | d , q) + ∑S ∉d p( si | d , q) We test our opinion retrieval model on the TREC Blog06 and
i i (9)
= ∑S ∈d p S ( si | d , q) + ∑S ∉d α d p( si | C , q) Blog07 corpus [12, 26], which is the most authoritative opinion
i i
retrieval dataset available up to date.
= ∑S ∈d [(1 − λ ) p ml ( si | d , q) + λp( si | C , q)] + ∑S ∉d λp( si | C , q)
i i The corpus is collected from 100,649 blogs during a period of two
= ∑S ∈d (1 − λ ) p ml ( si | d , q) + λ ∑i p( si | C , q) and a half months. We focus on retrieving permalinks from this
i
dataset since human evaluation result is only available for these
= ∑S ∈d (1 − λ ) p ml ( si | d , q) + λ
i documents. There are 50 topics (Topic 851~900) from the TREC
2006 blog opinion retrieval task, and 50 topics (Topic 901~950)
We use the co-occurrence of sentiment word s and query word q
from TREC blog 2007. Query terms are extracted from the title
inside document d within a window W as the ranking measure of
field using porter stemming and standard stop words removal.
pml(si|d,q) . Hence the sentiment score of a document d given by
the opinion generation model is: Generally, queries from blog 06 are used for parameter
co( si , q | W ) (10) comparison study, including selection of sentiment thesaurus,
ScoreI op (d , q, s ) = ∑S ∈d (1 − λ ) +λ window size, and the effectiveness of different models. And
i
c(q, d )⋅ | W |
queries of blog 07 are used as the testing set, where all the
Where co(si,q|W) is the frequency of sentiment word si which is
parameters have been tuned in blog 06 data and no modification is
co-occurred with query q within window W, c(q,d) is the query
made.
term frequency in the document.
414
scored each word in WordNet regarding its positive, negative and
neutral indications to obtain a SentiWordNet lexicon. Words with
positive or negative score above a threshold in SentiWordNet are
used by some participants of the TREC opinion retrieval task.
Furthermore, we seek help from other languages. HowNet [1] is a
knowledge database of the Chinese language, and some of the
words in the dictionary have properties of positive or negative.
We use the English translation of those sentiment words provided
by HowNet.
For comparison, sentimental words from HowNet, WordNet,
General Inquirer and SentiWordNet are used as lexicons
respectively. Table 1 shows the detail information on the lists.
415
It is clear that the larger the window is, the better the performance
is. And this tendency is invariant to different levels of smoothing.
The result is reasonable since the distance between a query term
and a sentiment word is generally used to demonstrate the opinion
relevance to the topic, which has already been taken into
consideration in this unified model by the quadratic combination
of topic relevance. And in the Web documents, the opinion words
may not always been located near the topic words.
Therefore, we set the full document as the default window size in
the following experiments.
416
Figure 5. Per-topic analysis: Performance improvement over 50 topics after re-ranking on Blog 07 data.
(a)MAP improvement, (b) p@10 improvement
(in (b), the three topics whose improvement is much higher than the figure upper-bound have been annotated individually.)
417
[9] Metzler, D., Strohman T., Turtle H., and Croft, W.B. Indri at
6. CONCLUSION AND FUTURE WORK TREC 2004: Terabyte Track. Online Proceedings of 2004
Text REtrieval Conference (TREC 2004), 2004
In this work we deal with the problem of opinion search towards
general topics. Contrary to previous approaches that view facts [10] Mishne, G. Multiple Ranking Strategies for Opinion
retrieval and opinion detection as two distinct parts to be linearly Retrieval in Blogs. Online Proceedings of TREC, 2006.
combined, we proposed a formal probabilistic generation model to [11] Oard, D., Elsayed, T., Wang, J., and Wu, Y. TREC-2006 at
unify the topic relevance score and opinion score. A couple of Maryland: Blog, Enterprise, Legal and QA Tracks. Online
opinion re-ranking formulas are derived using the language Proceedings of TREC, 2006. http://trec.nist.gov/
modeling approach with smoothing, together with logarithm
normalization paradigm. Furthermore, the effectiveness of [12] Ounis, I., de Rijke, M., Macdonald, C., Mishne, G., and
different sentiment lexicons and variant distances between Soboroff, I. Overview of the TREC 2006 Blog Track. In
sentiment words and query terms are compared and discussed Proceedings of TREC 2006, 15–27. http://trec.nist.gov/
empirically. Experiment shows that bigger windows are better [13] Pang, B., et al, Thumbs up? Sentiment Classification Using
than smaller windows. According to the experiments, the Machine Learning Techniques. In Proceedings of the
proposed model yields much better results on TREC Blog06 and Conference on Empirical Methods in Natural Language
Blog07 dataset. Processing (EMNLP) 2002, 79-86.
The novelty of our work lies in a probabilistic generation model [14] Stone, P., Dunphy, D., Smith, M., and Ogilvie, D. The
for opinion retrieval, which is general in motivation and flexible General Inquirer: A Computer Approach to Content
in practice. This work derives a unified model from the quadratic Analysis. MIT Press, Cambridge, 1966.
relation between opinion analysis and topic relevance, which is [15] Tong, R. 2001. An Operational System for Detecting and
essentially different from general linear combination. Furthermore, Tracking Opinions in on-line discussion. SIGIR Workshop
in this work, we do not make any assumption on the nature of on Operational Text Classification. 2001. 1-6.
blog-structured text. Therefore this approach is expected to be
generalized to all kinds of resources for opinion retrieval task. [16] Turtle, H. and Croft, W.B. Evaluation of an Inference
Network-Based Retrieval Model. ACM Transactions on
Future directions on opinion retrieval may go beyond merely Information System, in 9(3),187-222, 1991.
document re-ranking. An opinion-oriented index, as well as
[17] Wilson, T., Wiebe, J., and Hoffmann, P. Recognizing
deeper analysis on the structural information of opinion resources Contextual Polarity in Phrase-Level Sentiment Analysis. In
such as blogs and forums could be helpful in understanding the
Proceedings of HLT/EMNLP 2005. 347-354.
nature of opinion expressing behavior on web. Another interesting
topic is to automatically construct a collection-based sentiment [18] WordNet. http://wordnet.princeton.edu/
lexicon, which has been a hot research topic [26], and to induct [19] Yang, K., Yu, N., Valerio, A., Zhang, H. WIDIT in TREC-
this lexicon into our generation model. 2006 Blog track. Online Proceedings of TREC, 2006.
http://trec.nist.gov/
7. REFERENCES
[1] Dong, Z. HowNet. http://www.HowNet.org [20] Zhai, C. and Lafferty, J. A study of smoothing methods for
language models applied to information retrieval. ACM
[2] Eguchi, K. and Lavrenko, V. Sentiment Retrieval using Transactions on Information Systems (ACM TOIS ), Vol. 22,
Generative Models. In Proceedings of Empirical Methods on No. 2, 179-214.2004.
Natural Language Processing (EMNLP) 2006, 345-354.
[21] Zhai, C. A Brief Review of Information Retrieval Models,
[3] Esuli, A. and Sebastiani, F. Determining the semantic Technical report, Dept. of Computer Science, UIUC, 2007
orientation of terms through gloss classification. In
Proceedings of CIKM 2005, 617-624. [22] Zhang, W. and Yu, C. UIC at TREC 2006 Blog Track.
Online Proceedings of TREC, 2006. http://trec.nist.gov/
[4] Hurst, M. and Nigam, K. Retrieving Topical Sentiments from
Online Document Collections. Document Recognition and [23] Mishne, G. and Glance, N. Leave a Reply: An analysis of
Retrieval XI. 27--34. 2004. Weblog Comments. In WWE 2006 (WWW 2006 Workshop
on Weblogging Ecosystem), 2006.
[5] Lafferty, J. and Zhai, C. Probabilistic relevance models based
on document and query generation. Language Modeling and [24] Mishne, G. Using blog properties to improve retrieval, In
Information Retrieval, Kluwer International Series on Proceedings of the International Conference on Weblogs and.
Information Retrieval, Vol. 13, 2003. Social Media (ICSWM) 2007.
[6] Liao, X., Cao, D., Tan, S., Liu, Y., Ding, G., and Cheng X. [25] Singhal, A. Modern information retrieval: A brief overview.
Combining Language Model with Sentiment Analysis for Bulletin of the IEEE Computer Society Technical committee
Opinion Retrieval of Blog-Post. Online Proceedings of Text on Data Engineering, 24(4):35-43, 2001.
Retrieval Conference (TREC) 2006. http://trec.nist.gov/ [26] Macdonald, C. and Ounis, I. Overview of the TREC-2007
[7] Liu, B., Hu, M., and Cheng, J. Opinion observer: analyzing Blog Track. Online Proceedings of the 16th Text Retrieval
and comparing opinions on the Web. WWW 2005: 342-351 Conference (TREC2007).
http://trec.nist.gov/pubs/trec16/t16_proceedings.html
[8] Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. Topic
sentiment mixture: modeling facets and opinions in weblogs.
WWW 2007: 171-180
418