Coling 2010 Final Zhang

This document summarizes a research paper on extracting and ranking product features from opinion documents. The paper introduces an improved method for feature extraction using part-whole patterns and "no" patterns to increase recall for large and small corpora. It also presents a feature ranking approach based on feature relevance and frequency to improve precision of top-ranked candidates and tackle the low precision problem for large corpora. Experimental results show the new method achieves superior performance over the state-of-the-art double propagation technique.

Uploaded by

Myat Su Wai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views9 pages

Coling 2010 Final Zhang

Uploaded by

Myat Su Wai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Extracting and Ranking Product Features

in Opinion Documents

Lei Zhang Bing Liu

Department of Computer Science Department of Computer Science
University of Illinois at Chicago University of Illinois at Chicago
851 S. Morgan Street 851 S. Morgan Street
Chicago, IL 60607 Chicago, IL 60607
[email protected] [email protected]

Suk Hwan Lim Eamonn O’Brien-Strain

Hewlett-Packard Labs Hewlett-Packard Labs
1501 Page Mill Road 1501 Page Mill Road
Palo Alto, CA 94304 Palo Alto, CA 94304
[email protected] [email protected]

Abstract 1 Introduction

An important task of opinion mining is In recent years, opinion mining or sentiment

to extract people’s opinions on features analysis (Liu, 2010; Pang and Lee, 2008) has
of an entity. For example, the sentence, been an active research area in NLP. One task is
“I love the GPS function of Motorola to extract people’s opinions expressed on
Droid” expresses a positive opinion on features of entities (Hu and Liu, 2004). For
the “GPS function” of the Motorola example, the sentence, “The picture of this
phone. “GPS function” is the feature. camera is amazing”, expresses a positive
This paper focuses on mining features. opinion on the picture of the camera. “picture”
Double propagation is a state-of-the-art is the feature. How to extract features from a
technique for solving the problem. It corpus is an important problem. There are
works well for medium-size corpora. several studies on feature extraction (e.g., Hu
However, for large and small corpora, it and Liu, 2004, Popescu and Etzioni, 2005,
can result in low precision and low re- Kobayashi et al., 2007, Scaffidi et al., 2007,
call. To deal with these two problems, Stoyanov and Cardie. 2008, Wong et al., 2008,
two improvements based on part-whole Qiu et al., 2009). However, this problem is far
and “no” patterns are introduced to in- from being solved.
crease the recall. Then feature ranking is Double Propagation (Qiu et al., 2009) is a
applied to the extracted feature candi- state-of-the-art unsupervised technique for
dates to improve the precision of the solving the problem. It mainly extracts noun
top-ranked candidates. We rank feature features, and works well for medium-size
candidates by feature importance which corpora. But for large corpora, this method can
is determined by two factors: feature re- introduce a great deal of noise (low precision),
levance and feature frequency. The and for small corpora, it can miss important
problem is formulated as a bipartite features. To deal with these two problems, we
graph and the well-known web page propose a new feature mining method, which
ranking algorithm HITS is used to find enhances that in (Qiu et al., 2009). Firstly, two
important features and rank them high. improvements based on part-whole patterns and
Experiments on diverse real-life datasets “no” patterns are introduced to increase recall.
show promising results. Part-whole or meronymy is an important
semantic relation in NLP, which indicates that mines whether a noun/noun phrase is a feature
one or more objects are parts of another object. by computing the pointwise mutual information
For example, the phrase “the engine of the car” (PMI) score between the phrase and class-
contains the part-whole relation that “engine” is specific discriminators, e.g., “of xx”, “xx has”,
part of “car”. This relation is very useful for “xx comes with”, etc., where xx is a product
feature extraction, because if we know one class. This work first used part-whole patterns
object is part of a product class, this object for feature mining, but it finds part-whole based
should be a feature. “no” pattern is another features by searching the Web. Querying the
extraction pattern. Its basic form is the word Web is time consuming. In our method, we use
“no” followed by a noun/noun phrase, for predefined part-whole relation patterns to ex-
instance, “no noise”. People often express their tract features in a domain corpus. These patterns
short comments or opinions on features using are domain-independent and fairly accurate.
this pattern. Both types of patterns can help find Following the initial work in (Hu and Liu
features missed by double propagation. As for 2004), several researchers have further explored
the low precision problem, we present a feature the idea of using opinion words in product fea-
ranking approach to tackle it. We rank feature ture mining. A dependency based method was
candidates based on their importance which proposed in (Zhuang et al., 2006) for a movie
consists of two factors: feature relevance and review analysis application. Qiu et al. (2009)
feature frequency. The basic idea of feature proposed a double propagation method, which
importance ranking is that if a feature candidate exploits certain syntactic relations of opinion
is correct and frequently mentioned in a corpus, words and features, and propagates through
it should be ranked high; otherwise it should be both opinion words and features iteratively. The
ranked low in the final result. Feature frequency extraction rules are designed based on different
is the occurrence frequency of a feature in a relations between opinion words and features,
corpus, which is easy to obtain. However, and among opinion words and features them-
assessing feature relevance is challenging. We selves. Dependency grammar was adopted to
model the problem as a bipartite graph and use describe these relations. In (Wang and Wang,
the well-known web page ranking algorithm 2008), another bootstrapping method was pro-
HITS (Kleinberg, 1999) to find important posed. In (Kobayashi et al. 2007), a pattern min-
features and rank them high. Our experimental ing method was used. The patterns are relations
results show superior performances. In practical between feature and opinion pairs (they call as-
applications, we believe that ranking is also pect-evaluation pairs). The patterns are mined
important for feature mining because ranking from a large corpus using pattern mining. Statis-
can help users to discover important features tics from the corpus are used to determine the
from the extracted hundreds of fine-grained confidence scores of the extraction.
candidate features efficiently. In general information extraction, there are
two approaches: rule-based and statistical. Early
2 Related work extraction systems are mainly based on rules
(e.g., Riloff, 1993). In statistical methods, the
Hu and Liu (2004) proposed a technique based
most popular models are Hidden Markov Mod-
on association rule mining to extract product
els (HMM) (Rabiner, 1989), Maximum Entropy
features. The main idea is that people often use
Models (ME) (Chieu et al., 2002) and Condi-
the same words when they comment on the
tional Random Fields (CRF) (Lafferty et al.,
same product features. Then frequent itemsets
2001). CRF has been shown to be the most ef-
of nouns in reviews are likely to be product fea-
fective method. It was used in (Stoyanov et al.,
tures while the infrequent ones are less likely to
2008). However, a limitation of CRF is that it
be product features. This work also introduced
only captures local patterns rather than long
the idea of using opinion words to find addi-
range patterns. It has been shown in (Qiu et al.,
tional (often infrequent) features.
2009) that many feature and opinion word pairs
Popescu and Etzioni (2005) investigated the
have long range dependencies. Experimental
same problem. Their algorithm requires that the
results in (Qiu et al., 2009) indicate that CRF
product class is known. The algorithm deter-
does not perform well.
Other related works on feature extraction of mattresses, a reviewer may say “There is a
mainly use topic modeling to capture topics in valley on my mattress”, which implies a nega-
reviews (Mei et al., 2007). In (Su et al., 2008), tive opinion because “valley” is undesirable for
the authors also proposed a clustering based a mattress. Obviously, “valley” is a feature, but
method with mutual reinforcement to identify the word “valley” may not be described by any
features. However, topic modeling or clustering opinion adjective, especially for a small corpus.
is only able to find some general/rough features, Double propagation is not applicable in this sit-
and has difficulty in finding fine-grained or pre- uation.
cise features, which is more related to informa- To deal with the problem, we propose a novel
tion extraction. method to mine features, which consists of two
steps: feature extraction and feature ranking.
3 The Proposed Method For feature extraction, we still adopt the double
propagation idea to populate feature candidates.
As discussed in the introduction section, our
But two improvements based on part-whole re-
proposed method deals with the problems of
lation patterns and a “no” pattern are made to
double propagation. So let us give a short ex-
find features which double propagation cannot
planation why double propagation can cause
find. They can solve part of the recall problem.
problems in large or small corpora.
For feature ranking, we rank feature candidates
Double propagation assumes that features are
by feature importance.
nouns/noun phrases and opinion words are ad-
A part-whole pattern indicates one object is
jectives. It is shown that opinion words are
part of another object. For the previous example
usually associated with features in some ways.
“There is a valley on my mattress”, we can find
Thus, opinion words can be recognized by iden-
that it contains a part-whole relation between
tified features, and features can be identified by
“valley” and “mattress”. “valley” belongs to
known opinion words. The extracted opinion
“mattress”, which is indicated by the preposi-
words and features are utilized to identify new
tion “on”. Note that “valley” is not actually a
opinion words and new features, which are used
part of mattress, but an effect on the mattress. It
again to extract more opinion words and fea-
is called a pseudo part-whole relation. For sim-
tures. This propagation or bootstrapping process
plicity, we will not distinguish it from an actual
ends when no more opinion words or features
part-whole relation because for our feature min-
can be found. The biggest advantage of the me-
ing task, they have little difference. In this case,
thod is that it requires no additional resources
“noun1 on noun2” is a good indicative pattern
except an initial seed opinion lexicon, which is
which implies noun1 is part of noun2. So if we
readily available (Wilson et al., 2005, Ding et
know “mattress” is a class concept, we can infer
al., 2008). Thus it is domain independent and
that “valley” is a feature for “mattress”. There
unsupervised, avoiding laborious and time-
are many phrase or sentence patterns
consuming work of labeling data for supervised
representing this type of semantic relation
learning methods. It works well for medium–
which was studied in (Girju et al, 2006). Beside
size corpora. But for large corpora, this method
part-whole patterns, “no” pattern is another im-
may extract many nouns/noun phrases which
portant and specific feature indicator in opinion
are not features. The precision of the method
documents. We introduce these patterns in de-
thus drops. The reason is that during propaga-
tail in Sections 3.2 and 3.3.
tion, adjectives which are not opinionated will
Now let us deal with the first problem: noise.
be extracted as opinion words, e.g., “entire” and
With opinion words, part-whole and “no” pat-
“current”. These adjectives are not opinion
terns, we have three feature indicators at hands,
words but they can modify many kinds of
but all of them are ambiguous, which means
nouns/noun phrases, thus leading to extracting
that they are not hard rules. We will inevitably
wrong features. Iteratively, more and more
extract wrong features (also called noises) by
noises may be introduced during the process.
using them. Pruning noises from feature candi-
The other problem is that for certain domains,
dates is a hard task. Instead, we propose a new
some important features do not have opinion
angle for solving this problem: feature ranking.
words modifying them. For example, in reviews
The basic idea is that we rank the extracted fea-
ture candidates by feature importance. If a fea- quency has been considered in (Hu and Liu,
ture candidate is correct and important, it should 2004; Blair-Goldensohn et al., 2008). We con-
be ranked high. For unimportant feature or sider a feature f1 to be more important than fea-
noise, it should be ranked low in the final result. ture f2 if f1 appears more frequently than f2 in
Ranking is also very useful in practice. In a opinion documents. In practice, it is desirable to
large corpus, we may extract hundreds of fine- rank those frequent features higher than infre-
grained features. But the user often only cares quent features. The reason is that missing a fre-
about those important ones, which should be quently mentioned feature in opinion mining is
ranked high. We identified two major factors bad, but missing a rare feature is not a big issue.
affecting the feature importance: one is feature Combining the above factors, we propose a
relevance and the other is feature frequency. new feature mining method. Experiments show
Feature relevance: it describes how possible good results on diverse real-life datasets.
a feature candidate is a correct feature. We find
that there are three strong clues to indicate fea- 3.1 Double Propagation
ture relevance in a corpus. The first clue is that As we described above, double propagation is
a correct feature is often modified by multiple based on the observation that there are natural
opinion words (adjectives or adverbs). For ex- relations between opinion words and features
ample, in the mattress domain, “delivery” is due to the fact that opinion words are often used
modified by “quick” “cumbersome” and “time- to modify features. Furthermore, it is observed
ly”. It shows that reviewers put emphasis on the that opinion words and features themselves have
word “delivery”. Thus we can infer that “deli- relations in opinionated expressions too (Qiu et
very” is a possible feature. The second clue is al., 2009). These relations can be identified via
that a feature could be extracted by multiple a dependency parser (Lin, 1998) based on the
part-whole patterns. For example, in the car dependency grammar. The identification of the
domain, if we find following two phrases, “the relations is the key to feature extraction.
engine of the car” and “the car has a big en- Dependency grammar: It describes the de-
gine”, we can infer that “engine” is a feature for pendency relations between words in a sentence.
car, because both phrases contain part-whole After parsed by a dependency parser, words in a
relations to indicate “engine” is a part of “car”. sentence are linked to each other by a certain
The third clue is the combination of opinion relation. For a sentence, “The camera has a
word modification, part-whole pattern extrac- good lens”, “good” is the opinion word and
tion and “no” pattern extraction. That is, if a “lens” is the feature of camera. After parsing,
feature candidate is not only modified by opi- we can find that “good” depends on “lens” with
nion words but also extracted by part-whole or relation mod. Here mod means that “good” is
“no” patterns, we can infer that it is a feature the adjunct modifier for “lens”. In some cases,
with high confidence. For example, for sentence an opinion word and a feature are not directly
“there is a bad hole in the mattress”, it strongly dependent, but they directly depend on a same
indicates that “hole” is a feature for a mattress word. For example, from the sentence “The lens
because it is modified by opinion word “bad” is nice”, we can find that both feature “lens” and
and also in the part-whole pattern. What is opinion word “nice” depend on the verb “is”
more, we find that there is a mutual enforce- with the relation s and pred respectively. Here s
ment relation between opinion words, part- means that “lens” is the surface subject of “is”
whole and “no” patterns, and features. If an ad- while pred means that “nice” is the predicate of
jective modifies many correct features, it is the “is” clause.
highly possible to be a good opinion word. Si- In (Qiu et al., 2009), it defines two categories
milarly, if a feature candidate can be extracted of dependency relations to summarize all types
by many opinion words, part-whole patterns, or of dependency relations between two words,
“no” pattern, it is also highly likely to be a cor- which are illustrated in Figure 1. Arrows are
rect feature. This indicates that the Web page used to represent dependencies.
ranking algorithm HITS is applicable. Direct relations: It represents that one word
Feature frequency: This is another important depends on the other word directly or they both
factor affecting feature ranking. Feature fre- depend on a third word directly, shown in (a)
and (b) of Figure 1. In (a), B depends on A di- part-whole relations. We can easily find features
rectly, and in (b) they both directly depend on D. of the camera and the bed. Unfortunately, this
Indirect relation: It represents that one word kind of patterns is not very frequent in a corpus.
depends on the other word through other words However, there are many ambiguous expres-
or they both depend on a third word indirectly. sions that are explicit but convey part-whole
For example, in (c) of Figure 1, B depends on A relations only in some contexts. For example,
through D; in (d) of Figure 1, A depends on D for two phrases “valley on the mattress” and
through I1 while B depends on D through I2. For “toy on the mattress”, “valley” is a part of “mat-
some complicated situations, there can be more tress” whereas “toy” is not a part of “mattress”.
than one I1 or I2. Our idea is to use both the unambiguous and
ambiguous patterns. Although ambiguous pat-
A D terns may bring some noise, we can rank them
low in the ranking procedure. The following
two kinds of patterns are what we have utilized
B A B for feature extraction.
(a) (b)
3.2.1 Phrase pattern
A D
In this case, the part-whole relation exists in a
phrase.
D I1 I2 NP + Prep + CP: Noun/noun phrase (NP)
contains the part word and the class concept
A B
phrase (CP) contains the whole word. They are
B
connected by the preposition word (Prep). For
(c) (d)
example, “battery of the camera” is an instance
of this pattern where NP (battery) is the part
Fig.1 Different relations between A and B noun and CP (camera) is the whole noun. For
our application, we only use three specific pre-
Parsing indirect relations is error prone for positions: “of”, “in” and “on”.
Web corpora. Thus we only use direct relation CP + with + NP: CP is the class concept
to extract opinion words and feature candidates word, and NP is the noun/noun phrase. They are
in our application. For detailed extraction rules, connected by the word “with”. Here NP is likely
please refer to the paper (Qiu et al., 2009). to be a feature. For example, in a phrase, “mat-
tress with a cover”, “cover” is a feature for mat-
3.2 Part-whole relation
tress.
As we discussed above, a part-whole relation is NP CP or CP NP: noun phase (NP) and
a good indicator for features if the class concept class phrase (CP) forms a compound word. For
word (the “whole” part) is known. For example, example, “mattress pad”. Here “pad” is a fea-
the compound nominal “car hood” contains the ture of “mattress”.
part-whole relation. If we know “car” is the
class concept word, then we can infer that 3.2.2 Sentence pattern
“hood” is a feature for car. Part-whole patterns In these patterns, the part-whole relation is indi-
occur frequently in text and are expressed by a cated in a sentence. The patterns contain specif-
variety of lexico-syntactic structures (Girju et ic verbs. The part word and the whole word can
al, 2006; Popescu and Etzioni, 2005). There are be found inside noun phrases or prepositional
two types of lexico-syntactic structures convey- phrases which contain specific prepositions. We
ing part-whole relations: unambiguous structure utilize the following patterns in our application.
and ambiguous structure. The unambiguous “CP Verb NP”: CP is the class concept
structure clearly indicates a part-whole relation. phrase that contains the whole word, NP is the
For example, for sentences “the camera consists noun phrase that contains the part word and the
of lens, body and power cord.” and “the bed verb is restricted and specific. For example, in a
was made of wood”. In these cases, the detec- sentence, “the car has a fluid leak”, we can in-
tion of the patterns leads to the discovery of real fer that “fluid leak” is a feature for “car”, which
is a class concept. In sentence patterns, verbs S an authority score and a hub score. Let the
play an important role. We use the following number of pages to be studied be n. We use G =
verbs to indicate part-of relations in a sentence, (V, E) to denote the (directed) link graph of S. V
i.e., “have” “include” “contain” “consist”, is the set of pages (or nodes) and E is the set of
“comprise” and so on (Girju et al, 2006). directed edges (or links). We use L to denote the
It is worth mentioning that in order to use adjacency matrix of the graph.
part-whole relations, the class concept word for
1 ,
a corpus is needed, which is fairly easy to find (1)
0
because the noun with the most frequent occur-
rences in a corpus is always the class concept Let the authority score of the page i be A(i), and
word based on our experiments. the hub score of page i be H(i). The mutual rein-
forcing relationship of the two scores is
3.3 “no” Pattern represented as follows:
Besides opinion word and part-whole relation, ∑ , (2)
“no” pattern is also an important pattern indicat-
ing features in a corpus. Here “no” represents ∑ , (3)
word no. The basic form of the pattern is “no” We can write them in a matrix form. We use A
word followed by noun/noun phrase. This sim- to denote the column vector with all the authori-
ple pattern actually is very useful to feature ex- ty scores, A = (A(1), A(2), …, A(n))T, and use H
traction. It is a specific pattern for product re- to denote the column vector with all the hub
views and forum posts. People often express scores, H = (H(1), H(2), …, H(n))T,
their comments or opinions on features by this
short pattern. For example, in a mattress domain, (4)
people always say that “no noise” and “no in- (5)
dentation”. Here “noise” and “indentation” are To solve the problem, the widely used method
all features for the mattress. We discover that is power iteration, which starts with some ran-
this pattern is frequently used in corpora and a dom values for the vectors, e.g., A0 = H0 = (1, 1,
very good indicator for features with a fairly 1, …1,). It then continues to compute iteratively
high precision. But we have to take care of the until the algorithm converges.
some fixed “no” expression, like “no problem” From the formulas, we can see that the author-
“no offense”. In these cases, “problem” and “of- ity score estimates the importance of the content
fense” should not be regarded as features. We of the page, and the hub score estimates the val-
have a list of such words, which are manually ues of its links to other pages. An authority
compiled. score is computed as the sum of the scaled hub
3.4 Bipartite Graph and HITS Algorithm scores that point to that page. A hub score is the
sum of the scaled authority scores of the pages
Hyperlink-induced topic search (HITS) is a link it points to. The key idea of HITS is that a good
analysis algorithm that rates Web pages. As hub points to many good authorities and a good
discussed in the introduction section, we can authority is pointed by many good hubs. Thus,
apply the HITS algorithm to compute feature authorities and hubs have a mutual reinforce-
relevance for ranking. ment relationship.
Before illustrating how HITS can be applied For our scenario, we have three strong clues
to our scenario, let us first give a brief for features in a corpus: opinion words, part-
introduction to HITS. Given a broad search whole patterns, and the “no” pattern. Although
query q, HITS sends the query to a search all these three clues are not hard rules, there
engine system, and then collects k (k = 200 in exist mutual enforcement relations between
the original paper) highest ranked pages, which them. If an adjective modify many features, it is
are assumed to be highly relevant to the search highly likely to be a good opinion word. If a
query. This set is called the root set R; then it feature candidate is modified by many opinion
grows R by including any page pointed to a words, it is likely to be a genuine feature. The
page in R, then forms a base set S. HITS then same goes with part-whole patterns, the “no”
works on the pages in S. It assigns every page in
pattern, or the combination for these three clues. scores of and until they converge using
This kind of mutual enforcement relation can be power iteration. Finally, we normalize and
naturally modeled in the HITS framework. compute the score S for a feature.
Applying the HITS algorithm: Based on the Step 2: The final score function considering
key idea of HITS algorithm and feature indica- the feature frequency is given in Equation (6).
tors, we can apply the HITS algorithm to obtain
the feature relevance ranking. Features act as log (6)
authorities and feature indicators act as hubs.
Different from the general HITS algorithm, fea- where is the frequency count of
tures only have authority scores and feature in- ture , and S(f) is the authority score of the can-
dicators only have hub scores in our case. They didate feature f. The idea is to push the frequent
form a directed bipartite graph, which is illu- candidate features up by multiplying the log of
strated in Figure 2. We can run the HITS algo- frequency. Log is taken in order to reduce the
rithm on this bipartite graph. The basic idea is effect of big frequency count numbers.
that if a feature candidate has a high authority
score, it must be a highly-relevant feature. If a 4 Experiments
feature indicator has a high hub score, it must be
a good feature indicator. This section evaluates the proposed method. We
first describe the data sets, evaluation metrics
and then the experimental results. We also com-
pare our method with the double propagation
method given in (Qiu et al., 2009).
4.1 Data Sets
We used four diverse data sets to evaluate our
techniques. They were obtained from a com-
Feature Indicators Features
mercial company that provides opinion mining
Fig. 2 Relations between feature indicators and services. Table 1 shows the domains (based on
features their names) and the number of sentences in
each data set (“Sent.” means the sentence). The
3.5 Feature Ranking data in “Cars” and “Mattress” are product re-
Although the HITS algorithm can rank features views extracted from some online review sites.
by feature relevance, the final ranking is not “Phone” and “LCD” are forum discussion posts
only determined by relevance. As we discussed extracted from some online forum sites. We
before, feature frequency is another important split each review/post into sentences and the
factor affecting the final ranking. It is highly sentences are POS-tagged using the Brill’s tag-
desirable to rank those correct and frequent ger (Brill, 1995). The tagged sentences are the
features at top because they are more important input to our system.
than the infrequent ones in opinion mining (or Data Sets Cars Mattress Phone LCD
even other applications). With this in mind, we # of Sent. 2223 13233 15168 1783
put everything together to present the final Table 1. Experimental data sets
algorithm that we use. We use two steps:
Step 1: Compute feature score using HITS 4.2 Evaluation Metrics
without considering frequency. Initially, we use Besides precision and recall, we adopt the pre-
three feature indicators to populate feature cision@N metric for experimental evaluation
candidates, which form a directed bipartite (Liu, 2006). It gives the percentage of correct
graph. Each feature candidate acts as an features that are among the top N feature candi-
authority node in the graph; each feature dates in a ranked list. We compare our method’s
indicator acts as a hub node. For node s in the results with those of double propagation which
graph, we let be the hub score and be the ranks extracted candidates only by occurrence
authority score. Then, we initialize and to frequency.
1 for all nodes in the graph. We update the
4.3 Experimental Results number of features discussed in the data. So the
column for “LCD” in Table 7 is empty. We rank
We first compare our results with double propa-
the extracted feature candidates based on fre-
gation on recall and precision for different cor-
quency for the double propagation method (DP).
pus sizes. The results are presented in Tables 2,
Using occurrence frequency is the natural way
3, and 4 for the four data sets. They show the
to rank features. The more frequent a feature
precision and recall of 1000, 2000, and 3000
occurs in a corpus, the more important it is.
sentences from these data sets. We did not try
However, frequency-based ranking assumes the
more sentences because manually checking the
extracted candidates are correct features. The
recall and precision becomes prohibitive. Note
tables show that our proposed method (Ours)
that there are less than 3000 sentences for “Cars”
outperforms double propagation considerably.
and “LCD” data sets. Thus, the columns for
The reason is that some highly-frequent feature
“Cars” and “LCD” are empty in Table 4. In the
candidates extracted by double propagation are
Tables, “DP” represents the double propagation
not correct features. Our method considers the
method; “Ours” represents our proposed method;
feature relevance as an important factor. So it
“Pr” represents precision, and “Re” represents
produces much better rankings.
recall.
Cars Mattress Phone LCD
Cars Mattress Phone LCD
DP 0.84 0.81 0.64 0.68
Pr Re Pr Re Pr Re Pr Re
Ours 0.94 0.90 0.76 0.76
DP 0.79 0.55 0.79 0.54 0.69 0.23 0.68 0.43
Ours 0.78 0.56 0.77 0.64 0.68 0.44 0.66 0.55 Table 5. Precision at top 50
Table 2. Results of 1000 sentences Cars Mattress Phone LCD
DP 0.82 0.80 0.65 0.68
Cars Mattress Phone LCD
Ours 0.88 0.85 0.75 0.73
Pr Re Pr Re Pr Re Pr Re
DP 0.70 0.65 0.70 0.58 0.67 0.42 0.64 0.52 Table 6. Precision at top 100
Ours 0.66 0.69 0.70 0.66 0.70 0.50 0.62 0.56 Cars Mattress Phone LCD
Table 3. Results of 2000 sentences DP 0.75 0.71 0.70
Ours 0.80 0.79 0.76
Cars Mattress Phone LCD
Pr Re Pr Re
Table 7. Precision at top 200
DP 0.65 0.59 0.64 0.48
5 Conclusion
Ours 0.66 0.67 0.62 0.51
Table 4. Results of 3000 sentences The paper proposed a new method to deal with
the problems of the state-of-the-art double prop-
From the tables, we can see that for corpora in
agation method for feature extraction. It first
all domains, our method outperforms double
uses part-whole and “no” patterns to increase
propagation on recall with only a small loss in
recall. It then ranks the extracted feature candi-
precision. In data sets for “Phone” and “Mat-
dates by feature importance, which is deter-
tress”, the precisions are even better. We also
mined by two factors: feature relevance and fea-
find that with the increase of the data size, the
ture frequency. The Web page ranking algo-
recall gap between the two methods becomes
rithm HITS was applying to compute feature
smaller gradually and the precisions of both me-
relevance. Experimental results using diverse
thods also drop. However, in this case, feature
real-life datasets show promising results. In our
ranking plays an important role in discovering
future work, apart from improving the current
important features.
methods, we also plan to study the problem of
Ranking comparison between the two me-
extracting features that are verbs or verb phras-
thods is shown in Tables 5, 6, and 7, which give
es.
the precisions of top 50, 100 and 200 results
respectively. Note that the experiments reported Acknowledgement
in these tables were run on the whole data sets.
There were no more results for the “LCD” data This work was funded by a HP Labs Innovation
beyond top 200 as there were only a limited Research Program Award (CW165044).
References Pang, Bo., Lillian Lee. 2008. Opinion Mining and
Sentiment Analysis. Foundations and Trends in
Blair-Goldensohn, Sasha., Kerry, Hannan., Ryan, Information Retrieval pp. 1-135 2008
McDonald., Tyler, Neylon., George A. Reis, Jeff,
Reyna. 2008. Building Sentiment Summarizer for Pantel, Patrick., Eric Crestan, Arkady Borkovsky,
Local Service Reviews In Proceedings of the Ana-Maria Popescu, Vishunu Vyas. 2009. Web-
Workshop of NLPIX . WWW, 2008 Scale Distributional Similarity and Entity Set Ex-
pansion. In Proceedings of. EMNLP, 2009
Brill, Eric. 1995. Transformation-Based Error-
Driven Learning and Natural Language Popescu, Ana-Maria and Oren, Etzioni. 2005. Ex-
Processing: a case study in part of speech tagging. tracting product features and opinions from re-
Computational Linguistics, 1995. views. In Proceedings of EMNLP, 2005.
Chieu, Hai Leong and Hwee-Tou Ng. 2002. Name Qiu, Guang., Bing, Liu., Jiajun Bu and Chun Chen.
Entity Recognition: a Maximum Entropy Ap- 2009. Expanding Domain Sentiment Lexicon
proach Using Global Information. In Proceedings through Double Propagation. In Proceedings of
of the 6th Workshop on Very Large Corpora, IJCAI 2009.
2002. Rabiner, Lawrenence. 1989. A Tutorial on Hidden
Ding, Xiaowen., Bing Liu and Philip S. Yu. 2008. A Markov Models and Selected Applications in
Holistic Lexicon-Based Approach to Opinion Speech Recognition. In Proceedings of the IEEE,
Mining In Proceedings of WSDM 2008. 77(2), 1989.
Girju, Roxana., Adriana Badulescu and Dan Moldo- Riloff, Ellen. 1993. Automatically Constructing a
van. 2006. “Automatic Discovery of Part-Whole Dictionary for Information Extraction Tasks. In
Relations” Computational Linguistics ,32(1):83- Proceedings of AAAI 1993.
135 2006 Scaffidi, Christopher., Kevin Bierhoff, Eric Chang,
Hu, Mingqin and Bing Liu. 2004. Mining and Sum- Mikhael Felker, Herman Ng and Chun Jin. 2007.
marizing Customer Reviews. In Proceedings of Red opal: Product-feature Scoring from Reviews.
KDD 2004 In Proceedings of EC 2007
Kleinberg, Jon. 1999. “Authoritative sources in Stoyanov, Veselin and Claire Cardie. 2008. Topic
hyperlinked environment” Journal of the ACM 46 Identification for Fine-grained Opinion Analysis.
(5): 604-632 1999 In Proceedings of COLING 2008
Kobayashi, Nozomi., Kentaro Inui and Yuji Matsu- Su, Qi., Xinying Xu., Honglei Guo, Zhili Guo, Xian
moto. 2007 Extracting Aspect-Evaluation and As- Wu, Xiaoxun Zhang, Bin Swen and Zhong Su.
pect-of Relations in Opinion Mining. In Proceed- 2008. Hidden Sentiment Association in Chinese
ings of EMNLP, 2007. Web Opinion Mining. In Proceedings of WWW
2008.
Lafferty, John., Andrew McCallum and Fernando
Pereira. 2001 Conditional Random Fields: Proba- Wang, Bo., Houfeng Wang. 2008. Bootstrapping
bilistic Models for Segmenting and Labeling Se- both Product Features and Opinion Words from
quence Data. In Proceedings of ICML, 2001. Chinese Customer Reviews with Cross-Inducing
In Proceedings of IJCNLP 2008
Lin, Dekang. 1998. Dependency-based evaluation of
MINIPAR. In Proceedings of the Workshop on Wilson, Theresa., Janyce Wiebe and Paul Hoffmann.
Evaluation of Parsing System at ICLRE 1998. 2005. Recognizing Contextual Polarity in Phrase-
Level Sentiment Analysis. In Proceedings of
Liu, Bing. 2006. Web Data Mining: Exploring HLT/EMNLP 2005
Hyperlinks, contents and usage data. Springer,
2006. Wong, Tak-Lam., Wai Lam and Tik-Sun Wong.
2008. An Unsupervised Framework for Extracting
Liu, Bing. 2010. Sentiment analysis and subjectivity. and Normalizing Product Attributes from Multiple
Handbook of Natural Language Processing, Web Sites In Proceedings of SIGIR 2008
second edition, 2010.
Zhuang, Li., Feng Jing, Xiao-yan Zhu. 2006. Movie
Mei, Qiaozhu, Ling Xu, Matthew Wondra, Hang Su Review Mining and Summarization. In Proceed-
and ChengXiang Zhai. 2007. Topic Sentiment ings of CIKM 2006
Mixture: Modeling Facets and Opinions in Web-
logs. In Proceedings of WWW, pages 171–180,
2007.