Mining Opinion Features in Customer Reviews: Minqing Hu and Bing Liu

Mining Opinion Features in Customer Reviews
Minqing Hu and Bing Liu
Department of Computer Science

University of Illinois at Chicago
851 South Morgan Street
Chicago, IL 60607-7053
{mhu1, liub}@cs.uic.edu
Abstract rank the features according to their frequencies that

It is a common practice that merchants selling products on they appear in the reviews.
the Web ask their customers to review the products and 2. For each feature, we identify how many customer
associated services. As e-commerce is becoming more and reviews have positive or negative opinions. The
more popular, the number of customer reviews that a specific reviews that express these opinions are
product receives grows rapidly. For a popular product, the attached to the feature. This facilitates browsing of the
number of reviews can be in hundreds. This makes it
reviews by potential customers.
difficult for a potential customer to read them in order to
make a decision on whether to buy the product. In this We give a simple example to illustrate. Assume that we
project, we aim to summarize all the customer reviews of a summarize the reviews of a particular digital camera,
product. This summarization task is different from digital_camera_1. Our summary looks like the following:
traditional text summarization because we are only
interested in the specific features of the product that Digital_camera_1:
customers have opinions on and also whether the opinions picture quality:
are positive or negative. We do not summarize the reviews Positive: 253 <individual reviews>
by selecting or rewriting a subset of the original sentences Negative: 6 <individual reviews>
from the reviews to capture their main points as in the size:
classic text summarization. In this paper, we only focus on Positive: 134 <individual reviews>
mining opinion/product features that the reviewers have Negative: 10 <individual reviews>
commented on. A number of techniques are presented to …
mine such features. Our experimental results show that these
techniques are highly effective. picture quality and size are opinion features. There are 253
customer reviews that express positive opinions about the
picture quality, and only 6 that express negative opinions.
Introduction <individual reviews> points to the specific reviews that
With the rapid expansion of e-commerce, more and more give positive (or negative) comments about the feature.
products are sold on the Web, and more and more people
are buying products on the Web. In order to enhance With such a feature-based opinion summary, a potential
customer satisfaction and their shopping experiences, it has customer can easily see how the existing customers feel
become a common practice for online merchants to enable about the digital camera. If he/she is very interested in a
their customers to review or to express opinions on the particular feature, he/she can drill down by following the
products that they buy. With more and more common users <individual reviews> link to see why existing customers
becoming comfortable with the Internet, an increasing like it or what they complain about.
number of people are writing reviews. As a consequence, Our task is clearly different from traditional text
the number of reviews that a product receives grows summarization (Radev and McKeown. 1998; Hovy and
rapidly. Some popular products can get hundreds of Lin 1997) in a number of ways. First of all, our summary is
reviews at some large merchant sites. This makes it very structured rather than another (but shorter) free text
hard for a potential customer to read them to help him or document as produced by most text summarization
her to make a decision on whether to buy the product. systems. Second, we are only interested in features of the
In this research, we propose to study the problem of product that customers have opinions on and also whether
feature-based opinion summarization of customer reviews the opinions are positive or negative. We do not
of products sold online. The task is performed in two steps: summarize the reviews by selecting or rewriting a subset of
the original sentences from the reviews to capture their
1. Identify the features of the product that customers have main points as in traditional text summarization.
expressed opinions on (called opinion features) and
In this paper, we only focus on the first step of the review
summarization. That is, we aim to mine product features
Copyright © 2004, American Association for Artificial Intelligence
(www.aaai.org). All rights reserved. that the reviewers have commented on. The second step of
determining whether an opinion is positive or negative will
USER MODELING 755

be discussed in a subsequent paper as it is quite involved. similarities and differences in the information contents of
A question that one may ask is “why not ask the merchant these documents (Mani and Bloedorn 1997). Clearly, our
or the manufacturer of the product to provide a list of work is related but different.
features?” This is a possible approach. However, it has a In terminology identification, there are basically two
number of problems: (1) It is hard for a merchant to techniques for discovering terms in corpora: symbolic
provide the features because he/she may sell a large approaches that rely on syntactic description of terms,
number of products. (2) The words used by merchants or namely noun phrases, and statistical approaches that
the manufacturer may not be the same as those used by exploiting the fact that the words composing a term tend to
common users of the product although they may refer to be found close to each other and reoccurring (Jacquemin
the same features. This causes problem in identifying what and Bourigault 2001; Justeson and Katz 1995; Daille 1996;
the customers are interested in. Furthermore, customers Church and Hanks 1990). However, using noun phrases
may comment on the lack of certain features of the tends to produce too many non-terms, while using
product. (3) Customers may comment on some features reoccurring phrases misses many low frequency terms,
that the manufacturer has never thought about, i.e., terms with variations, and terms with only one word. Our
unexpected features. (4) The manufacturer may not want association mining based technique does not have these
users of its product to know certain weak features. problems, and we can also find infrequent features by
This paper proposes a number of techniques based on data exploiting the fact that we are only interesting in features
mining and natural language processing methods to mine that the users have expressed opinions on.
opinion/product features. Our experimental results show Our feature-based opinion summarization system is also
that these techniques are highly effective. related to (Dave, Lawrence and Pennock 2003), in which a
semantic classifier of product review sentences is built
using a training corpus. However, their system does not
Related Work mine product features. In addition, our work does not need
Our work is mainly related to two areas of research, text a training corpus to build a summary.
summarization and terminology identification. The
majority of text summarization techniques fall in two
categories: template instantiation and text extraction. Work The Proposed Techniques
in the former framework includes (DeJong 1982), (Tait Figure 1 gives an architectural overview for our opinion
1983), and (Radev and McKeown 1998). They focus on summarization system. The system performs the
the identification and extraction of certain core entities and summarization in two main steps: feature extraction and
facts in a document, which are packaged in a template. opinion orientation identification. The inputs to the system
This framework requires background analysis to instantiate are a product name and an entry page for all the reviews of
a template to a suitable level of detail. It is thus not domain the product. The output is the summary of the reviews as
independent (Sparck-Jones 1993a, 1993b). Our technique the one shown in the introduction section.
does not fill in any template and is domain independent. Given the inputs, the system first downloads (or crawls) all
The text extraction framework (Paice 1990; Kupiec, the reviews, and puts them in the review database. The
Pedersen, and Chen 1995; Hovy and Lin 1997) identifies feature extraction function, which is the focus of this
some representative sentences to summarize the document. paper, first extracts “hot” features that a lot of people have
Over the years, many sophisticated techniques were expressed their opinions on in their reviews, and then finds
developed, e.g., strong notions of topicality (Hovy and Lin those infrequent ones. The opinion orientation
1997), lexical chains (Barzilay and Elhadad 1997), and identification function takes the generated features and
discourse structures (Marcu 1997). Our work is different as summarizes the opinions of the feature into 2 categories:
we do not extract those most representative sentences, but positive and negative. In Figure 1, POS tagging is the part-
only identify and extract those specific product features of-speech tagging (Manning and Schütze 1999) from
and the opinions related with them. natural language processing. Below, we discuss each of the
functions in feature extraction in turn. We will not discuss
Kan and McKeown (1999) propose a hybrid approach that
the final step “Opinion orientation identification” as it is
merges template instantiation with sentence extraction.
not the focus of this paper and it is complex and involved
(Boguraev and Kennedy 1997) also reports a technique
(it will be described in a subsequent paper). Nevertheless,
that finds a few very prominent expressions, objects or
a brief introduction of the step will be given in a later
events in a document and use them to help summarize the
section.
document. Again, our work is different as we need to find
all product features in a set of customer reviews regardless
whether they are prominent or not. Part-of-Speech Tagging (POS)
Most existing works on text summarization focuses a Before discussing the application of part-of-speech tagging
single document. Some researchers also studied from natural language processing, we first give some
summarization of multiple documents covering similar example sentences from some reviews to describe what
information. Their main purpose is to summarize the kinds of opinions that we will handle.
756 USER MODELING

absolutely </W></VG> <W C='IN'> in </W> <NG>
Crawl reviews <W C='NN'> awe </W> </NG> <W C='IN'> of </W>
POS Tagging <NG> <W C='DT'> this </W> <W C='NN'> camera
</W></NG><W C='.'> . </W></S>
Review The NLProcessor system generates XML output. For
Frequent feature
Database instance, <W C=‘NN’> indicates a noun and <NG>
generation
indicates a noun group/noun phrase.
Each sentence is saved in the review database along with
Feature pruning Feature the POS tag information of each word in the sentence.
set
A transaction file is then created for the generation of
Opinion word frequent features in the next step. In this file, each line
Extraction Opinion contains words from a sentence, which includes only pre-
words processed nouns/noun phrases of the sentence. The reason
is that other components of a sentence are unlikely to be
Infrequent feature product features. Here, pre-processing includes the
identification deletion of stopwords, stemming and fuzzy matching.
Opinion orientation
identification Fuzzy matching (Jokinen and Ukkonen 1991) is used to
Feature extraction deal with word variants or misspellings. For example,
“autofocus” and “auto-focus” actually refer to the same
Summary feature. All the occurrences of “autofocus” are replaced
Figure 1: The opinion summarization system with “auto-focus”.
Our system aims to find what people like and dislike about
a given product. Therefore how to find out the product Frequent Features Generation
features that people talk about is an important step. This step is to find features that people are most interested
However, due to the difficulty of natural language in. In order to do this, we use association rule mining
understanding, some types of sentences are hard to deal (Agrawal and Srikant 1994) to find all frequent itemsets. In
with. Let us see some easy and hard sentences from the our context, an itemset is a set of words or a phrase that
reviews of a digital camera: occurs together.
“The pictures are very clear.” Association rule mining is stated as follows: Let I = {i1, …,
“Overall a fantastic very compact camera.” in} be a set of items, and D be a set of transactions (the
dataset). Each transaction consists of a subset of items in I.
In the first sentence, the user is satisfied with the picture An association rule is an implication of the form X→Y,
quality of the camera, picture is the feature that the user where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅. The rule X→ Y holds
talks about. Similarly, the second sentence shows that in D with confidence c if c% of transactions in D that
camera is the feature that the user expresses his/her support X also support Y. The rule has support s in D if s%
opinion. While the features of these two sentences are of transactions in D contain X ∪ Y. The problem of mining
explicitly mentioned in the sentences, some features are association rules is to generate all association rules in D
implicit and hard to find. For example, that have support and confidence greater than the user-
“While light, it will not easily fit in pockets.” specified minimum support and minimum confidence.
This customer is talking about the size of the camera, but Mining frequent occurring phrases: Each piece of
the word “size” is not explicitly mentioned in the sentence. information extracted above is stored in a dataset called a
To find such implicit features, semantic understanding is transaction set/file. We then run the association rule miner,
needed, which requires more sophisticated techniques. CBA (Liu, Hsu and Ma 1998), which is based on the
However, implicit features occur much less frequent than Apriori algorithm in (Agrawal and Srikant 1994). It finds
explicit ones. Thus in this paper, we focus on finding all frequent itemsets in the transaction set. Each resulting
features that appear explicitly as nouns or noun phrases in frequent itemset is a possible feature. In our work, we
the reviews. To identify nouns/noun phrases from the define an itemset as frequent if it appears in more than 1%
reviews, we use the part-of-speech tagging. (minimum support) of the review sentences.
In this work, we use the NLProcessor linguistic parser The Apriori algorithm works in two steps. In the first step,
(NLProcessor 2000), which parses each sentence and it finds all frequent itemsets from a set of transactions that
yields the part-of-speech tag of each word (whether the satisfy a user-specified minimum support. In the second
word is a noun, verb, adjective, etc) and identifies simple step, it generates rules from the discovered frequent
noun and verb groups (syntactic chunking). The following itemsets. For our task, we only need the first step, i.e.,
shows a sentence with the POS tags. finding frequent itemsets, which are candidate features. In
<S> <NG><W C='PRP' L='SS' T='w' S='Y'> I </W> addition, we only need to find frequent itemsets with three
</NG> <VG> <W C='VBP'> am </W><W C='RB'> words or fewer in this work as we believe that a product
feature contains no more than three words (this restriction
USER MODELING 757

can be easily relaxed). ftr appears in as a noun or noun phrase, and these
The generated frequent itemsets, which are also called sentences must contain no feature phrase that is a
candidate frequent features in this paper, are stored to the superset of ftr.
feature set for further processing. p-support is different from the general support in
association mining. For example, we have feature manual,
Feature Pruning with the support of 10 sentences. It is a subset of feature
Not all frequent features generated by association mining phrases manual mode and manual setting in the review
are useful or are genuine features. There are also some database. Suppose the support of the two feature phrases
uninteresting and redundant ones. Feature pruning aims to are 4 and 3 respectively, the two phrases do not appear
remove these incorrect features. We present two types of together in any sentence, and all the features appear as
pruning below. noun/noun phrases. Then the p-support of manual would
be 3. Recall that we require the feature to appear as a noun
Compactness pruning: This method checks features that
or noun phrase as we do not want to find adjectives or
contain at least two words, which we call feature phrases,
adverbs as features.
and remove those that are likely to be meaningless.
We use the minimum p-support to prune those redundant
In association mining, the algorithm does not consider the
features. If a feature has a p-support lower than the
position of an item (or word) in a transaction (or a
minimum p-support (in our system, we set it to 3) and the
sentence). However, in a natural language sentence, words
feature is a subset of another feature phrase (which
that appear together and in a specific order are more likely
suggests that the feature alone may not be interesting), it is
to be meaningful phrases. Therefore, some of the frequent
pruned. For instance, life by itself is not a useful feature
feature phrases generated by association mining may not
while battery life is a meaningful feature phrase. In the
be genuine features. The idea of compactness pruning is to
previous example of manual, which has a p-support of 3, it
prune those candidate features whose words do not appear
is not pruned. This is reasonable considering that manual
together. We use distances among the words in a candidate
has two senses as noun meaning “references” and adjective
feature phrase (itemset) to do the pruning.
meaning “of or relating to hands”. Thus all the three
Definition 1: compact phrase features, manual, manual mode, manual setting, could be
• Let f be a frequent feature phrase and f contains n interesting.
words. Assume that a sentence s contains f and the
sequence of the words in f that appear in s is: w1, w2, Opinion Words Extraction
…, wn. If the word distance in s between any two Opinion words are words that people use to express a
adjacent words (wi and wi+1) in the above sequence is positive or negative opinion. Observing that people often
no greater than 3, then we say f is compact in s. express their opinions of a product feature using opinion
• If f occurs in m sentences in the review database, and words that are located around the feature in the sentence,
it is compact in at least 2 of the m sentences, then we we can extract opinion words from the review database
call f a compact feature phrase. using all the remaining frequent features (after pruning).
For instance, let us look at the following two sentences:
For example, we have a frequent feature phrase “digital
camera” and three sentences from the review database “The strap is horrible and gets in the way of parts of
contain the phrase: the camera you need access to.”
“I had searched for a digital camera for 3 months.” “After nearly 800 pictures I have found that this
camera takes incredible pictures.”
“This is the best digital camera on the market”
“The camera does not have a digital zoom” In the first sentence, strap, the feature, is near the opinion
word horrible. And in the second example, feature picture
The phrase digital camera is compact in the first two is close to the opinion word incredible.
sentences but not compact in the last one. However, it is
still a compact phrase as it appeared compactly two times. Following from this observation, we can extract opinion
words in the following way:
For a feature phrase and a sentence that contains the
phrase, we look at the position information of every word • For each sentence in the review database, if it contains
of the phrase and check whether it is compact in the any frequent feature, extract the nearby adjective. If
sentence. If we could not find two compact sentences in such an adjective is found, it is considered an opinion
the review database, we prune the feature phrase. word. A nearby adjective refers to the adjacent
Redundancy pruning: In this step, we focus on removing adjective that modifies the noun/noun phrase that is a
redundant features that contain single words. To describe frequent feature.
redundant features, we have the following definition: As shown in the previous example, horrible is the
Definition 2 p-support (pure support) adjective that modifies strap, and incredible is the
adjective that modifies picture.
p-support of feature ftr is the number of sentences that
We use stemming and fuzzy matching to take care of word
758 USER MODELING

variants and misspellings. In this way, we build up an only POS tags, we use this simple heuristic method to find
opinion word list, which is used below. the nearest noun/noun phrase instead. It works quite well.
A problem with the infrequent feature identification using
Infrequent Feature Identification opinion words is that it could also find nouns/noun phrases
Frequent features are the “hot” features that people are that are irrelevant to the given product. The reason for this
most interested in for a given product. However, there are is that people can use common adjectives to describe a lot
some features that only a small number of people talked of subjects, including both interesting features that we
about. These features can also be interesting to some want and irrelevant subjects. Considering the following,
potential customers. The question is how to extract these “The salesman was easy going and let me try all the
infrequent features? Considering the following sentences: models on display.”
“Red eye is very easy to correct.” salesman is not a relevant feature of a product, but it will
“The camera comes with an excellent easy to install be found as an infrequent feature because of the nearby
software” opinion word easy.
“The pictures are absolutely amazing” This, however, is not a serious problem since the number
of infrequent features, compared with the number of
“The software that comes with it is amazing”
frequent features, is small. They account for around 15-
Sentences 1 and 2 share the same opinion word easy yet 20% of the total number of features as obtained in our
describing different features: sentence 1 is about red eye, experimental results. Infrequent features are generated for
sentence 2 is about the software. Assume that software is a completeness. Moreover, frequent features are more
frequent feature in our digital camera review database. red important than infrequent ones. Since we rank features
eye is infrequent but also interesting. Similarly, amazing according to their p-supports, those wrong infrequent
appears in both sentences 3 and 4, but sentence 3 is about features will be ranked very low and thus will not affect
picture while sentence 4 is about the software. most of the users.
From the examples, we see that people use the same
adjective words to describe different subjects. Therefore, Opinion Sentence Orientation Identification: After
we could use the opinion words to look for features that opinion features have been identified, we determine the
cannot be found in the frequent feature generation step. semantic orientation (i.e., positive or negative) of each
opinion sentence. This consists of two steps: (1) for each
In the opinion words generation step, we use frequent opinion word in the opinion word list, we identify its
features to find adjacent opinion words that modify the semantic orientation using a bootstrapping technique and
features. In this step, we use the known opinion words to the WordNet (Miller et al. 1990), and (2) we then decide
find those nearby features that opinion words modify. In the opinion orientation of each sentence based on the
both steps, we utilize the observation “opinions tend to dominant orientation of the opinion words in the sentence.
appear closely together with features”. We extract The details are presented in a subsequent paper.
infrequent features using the following procedure:
• For each sentence in the review database, if it contains
no frequent feature but one or more opinion words, find Experiments
the nearest noun/noun phrase of the opinion word. The We have conducted experiments on the customer reviews
noun/noun phrase is then stored in the feature set as an of five electronics products: 2 digital cameras, 1 DVD
infrequent feature. player, 1 mp3 player, and 1 cellular phone. The two
We use the nearest noun/noun phrase as the noun/noun websites where we collected the reviews from are
phrase that the opinion word modifies because that is what Amazon.com and C|net.com. Products in these sites have a
happens most of the time. Since finding the corresponding large number of reviews. Each of the reviews includes a
noun/noun phrases that an opinion word modifies requires text review and a title. Additional information available but
natural language understanding, which is difficult with not used in this project, include date, time, author name
Table 1: Recall and precision at each step of the system

No. of Frequent features Compactness P-support Infrequent feature
Product name manual (association mining) pruning pruning identification
Features Recall Precision Recall Precision Recall Precision Recall Precision
Digital camera1 79 0.671 0.552 0.658 0.634 0.658 0.825 0.822 0.747
Digital camera2 96 0.594 0.594 0.594 0.679 0.594 0.781 0.792 0.710
Cellular phone 67 0.731 0.563 0.716 0.676 0.716 0.828 0.761 0.718
Mp3 player 57 0.652 0.573 0.652 0.683 0.652 0.754 0.818 0.692
DVD player 49 0.754 0.531 0.754 0.634 0.754 0.765 0.797 0.743
Average 69 0.68 0.56 0.67 0.66 0.67 0.79 0.80 0.72
USER MODELING 759

and location (for Amazon reviews), and ratings. References
For each product, we first crawled and downloaded the Agrawal, R. and Srikant, R. 1994. “Fast algorithm for mining
first 100 reviews. These review documents were then association rules.” VLDB’94, 1994.
cleaned to remove HTML tags. After that, NLProcessor is Barzilay, R., and Elhadad, M. 1997. Using lexical chains for text
used to generate part-of-speech tags. After that, our system summarization. ACL Workshop on Intelligent, scalable text
is applied to perform feature extraction. summarization.
Boguraev, B., and Kennedy, C. 1997. Salience-based content
To evaluate the discovered features, a human tagger characterization of text documents. In Proceedings of the ACL
manually read all the reviews and produced a manual Workshop on Intelligent Scalable Text Summarization.
feature list for each product. The features are mostly Church, K. and Hanks, P. 1990. Word association norms, mutual
explicit in opinion sentences, e.g., pictures in “the pictures information and lexicography. Computational Linguistics,
are absolutely amazing”. The implicit features such as size 16(1) : 22-29.
in “it fits in a pocket nicely” are also easy to identify by the Daille. 1996. Study and Implementation of Combined Techniques
human tagger. Column “No. of manual features” in Table 1 for Automatic Extraction of Terminology. The Balancing Act:
shows the number of manual features for each product. Combining Symbolic and Statistical Approaches to Language
Processing. MIT Press, Cambridge
Table 1 gives all the precision and recall results. We Dave, K., Lawrence, S., and Pennock, D., 2003. Mining the
evaluated the results at each step of our algorithm. In the Peanut Gallery: Opinion Extraction and Semantic
table, column 1 lists each product. Columns 3 and 4 give Classification of Product Reviews. WWW-2003.
the recall and precision of frequent feature generation for DeJong, G. 1982. An Overview of the FRUMP System.
each product, which uses association mining. The results Strategies for Natural Language Parsing. 149-176
indicate that the frequent features contain a lot of errors. Hovy, E., and Lin, C.Y. 1997. Automated Text Summarization in
Using this step alone gives poor results, i.e., low precision. SUMMARIST. ACL Workshop on Intelligent, Scalable Text
Columns 5 and 6 show the corresponding results after Summarization
compactness pruning is performed. We can see that the Jacquemin, C., and Bourigault, D. 2001. Term extraction and
precision is improved significantly by this pruning. The automatic indexing. In R. Mitkov, editor, Handbook of
recall stays steady. Columns 7 and 8 give the results after Computational Linguistics. Oxford University Press.
pruning using p-support. There is another dramatic Jokinen P., and Ukkonen, E. 1991. Two algorithms for
approximate string matching in static texts. In A. Tarlecki,
improvement in the precision. The recall level has almost
(ed.), Mathematical Foundations of Computer Science.
no change. The results from Columns 4-8 demonstrate Justeson, J. S., and Katz, S.M. 1995. Technical Terminology:
clearly the effectiveness of these two pruning techniques. some linguistic properties and an algorithm for identification in
Columns 9 and 10 give the results after infrequent feature text. Natural Language Engineering 1(1):9-27.
identification is done. The recall is improved dramatically. Kan, M. and McKeown, K. 1999. Information Extraction and
The precision drops a few percents on average. However, Summarization: Domain Independence through Focus Types.
this is not a major problem because the infrequent features Columbia University Technical Report CUCS-030-99.
are ranked rather low, and thus will not affect most users. Kupiec, J., Pedersen, J., and Chen, F. 1995. A Trainable
In summary, with the average of recall of 80% and the Document Summarizer. SIGIR-1995.
Liu, B., Hsu, W., Ma, Y. 1998. Integrating Classification and
average precision of 72%, we believe that our techniques
Association Rule Mining. KDD-98, 1998.
are quite promising, and can be used in practical settings. Mani, I., and Bloedorn, E., 1997. Multi-document Summarization
by Graph Search and Matching. AAAI-97.
Conclusion Manning, C. and Schütze, H. 1999. Foundations of Statistical
Natural Language Processing, MIT Press. May 1999.
In this paper, we proposed a number of techniques for Marcu, D. 1997. From Discourse Structures to Text Summaries.
mining opinion features from product reviews based on ACL Workshop on Intelligent, Scalable Text Summarization.
data mining and natural language processing methods. The Miller, G., Beckwith, R, Fellbaum, C., Gross, D., and Miller, K.
objective is to produce a feature-based summary of a large 1990. Introduction to WordNet: An on-line lexical database.
number of customer reviews of a product sold online. We International Journal of Lexicography, 3(4):235-312.
believe that this problem will become increasingly NLProcessor – Text Analysis Toolkit. 2000.
important as more people are buying and expressing their http://www.infogistics.com/textanalysis.html
opinions on the Web. Our experimental results indicate Paice, C. D. 1990. Constructing Literature Abstracts by
Computer: techniques and prospects. Information Processing
that the proposed techniques are effective in performing
and Management 26:171-186.
their tasks. In our future work, we plan to further improve Radev, D. and McKeown, K. 1998. Generating natural language
these techniques. We also plan to group features according summaries from multiple on-line sources. Computational
to the strength of the opinions that have been expressed on Linguistics, 24(3):469-500, September 1998.
them, e.g., to determine which features customers strongly Sparck-Jones, K. 1993a. Discourse Modeling for Automatic Text
like and dislike. This will further improve the feature Summarizing. Technical Report 290, University of Cambridge.
extraction and the subsequent summarization. Sparck-Jones, K. 1993b. What might be in a summary?
Information Retrieval 93: 9-26.
Acknowledgements. This work is supported by the Tait, J. 1983. Automatic Summarizing of English Texts. Ph.D.
National Science Foundation under the grant IIS-0307239. Dissertation, University of Cambridge.
760 USER MODELING

Mining Opinion Features in Customer Reviews: Minqing Hu and Bing Liu

Uploaded by

Copyright:

Available Formats

Mining Opinion Features in Customer Reviews: Minqing Hu and Bing Liu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mining Opinion Features in Customer Reviews: Minqing Hu and Bing Liu

Uploaded by

Copyright:

Available Formats

Mining Opinion Features in Customer Reviews

Minqing Hu and Bing Liu

Department of Computer Science

Abstract rank the features according to their frequencies that

USER MODELING 755

756 USER MODELING

USER MODELING 757

758 USER MODELING

Table 1: Recall and precision at each step of the system

USER MODELING 759

760 USER MODELING

You might also like