be discussed in a subsequent paper as it is quite involved. similarities and differences in the information contents of A question that one may ask is “why not ask the merchant these documents (Mani and Bloedorn 1997). Clearly, our or the manufacturer of the product to provide a list of work is related but different. features?” This is a possible approach. However, it has a In terminology identification, there are basically two number of problems: (1) It is hard for a merchant to techniques for discovering terms in corpora: symbolic provide the features because he/she may sell a large approaches that rely on syntactic description of terms, number of products. (2) The words used by merchants or namely noun phrases, and statistical approaches that the manufacturer may not be the same as those used by exploiting the fact that the words composing a term tend to common users of the product although they may refer to be found close to each other and reoccurring (Jacquemin the same features. This causes problem in identifying what and Bourigault 2001; Justeson and Katz 1995; Daille 1996; the customers are interested in. Furthermore, customers Church and Hanks 1990). However, using noun phrases may comment on the lack of certain features of the tends to produce too many non-terms, while using product. (3) Customers may comment on some features reoccurring phrases misses many low frequency terms, that the manufacturer has never thought about, i.e., terms with variations, and terms with only one word. Our unexpected features. (4) The manufacturer may not want association mining based technique does not have these users of its product to know certain weak features. problems, and we can also find infrequent features by This paper proposes a number of techniques based on data exploiting the fact that we are only interesting in features mining and natural language processing methods to mine that the users have expressed opinions on. opinion/product features. Our experimental results show Our feature-based opinion summarization system is also that these techniques are highly effective. related to (Dave, Lawrence and Pennock 2003), in which a semantic classifier of product review sentences is built using a training corpus. However, their system does not Related Work mine product features. In addition, our work does not need Our work is mainly related to two areas of research, text a training corpus to build a summary. summarization and terminology identification. The majority of text summarization techniques fall in two categories: template instantiation and text extraction. Work The Proposed Techniques in the former framework includes (DeJong 1982), (Tait Figure 1 gives an architectural overview for our opinion 1983), and (Radev and McKeown 1998). They focus on summarization system. The system performs the the identification and extraction of certain core entities and summarization in two main steps: feature extraction and facts in a document, which are packaged in a template. opinion orientation identification. The inputs to the system This framework requires background analysis to instantiate are a product name and an entry page for all the reviews of a template to a suitable level of detail. It is thus not domain the product. The output is the summary of the reviews as independent (Sparck-Jones 1993a, 1993b). Our technique the one shown in the introduction section. does not fill in any template and is domain independent. Given the inputs, the system first downloads (or crawls) all The text extraction framework (Paice 1990; Kupiec, the reviews, and puts them in the review database. The Pedersen, and Chen 1995; Hovy and Lin 1997) identifies feature extraction function, which is the focus of this some representative sentences to summarize the document. paper, first extracts “hot” features that a lot of people have Over the years, many sophisticated techniques were expressed their opinions on in their reviews, and then finds developed, e.g., strong notions of topicality (Hovy and Lin those infrequent ones. The opinion orientation 1997), lexical chains (Barzilay and Elhadad 1997), and identification function takes the generated features and discourse structures (Marcu 1997). Our work is different as summarizes the opinions of the feature into 2 categories: we do not extract those most representative sentences, but positive and negative. In Figure 1, POS tagging is the part- only identify and extract those specific product features of-speech tagging (Manning and Schütze 1999) from and the opinions related with them. natural language processing. Below, we discuss each of the functions in feature extraction in turn. We will not discuss Kan and McKeown (1999) propose a hybrid approach that the final step “Opinion orientation identification” as it is merges template instantiation with sentence extraction. not the focus of this paper and it is complex and involved (Boguraev and Kennedy 1997) also reports a technique (it will be described in a subsequent paper). Nevertheless, that finds a few very prominent expressions, objects or a brief introduction of the step will be given in a later events in a document and use them to help summarize the section. document. Again, our work is different as we need to find all product features in a set of customer reviews regardless whether they are prominent or not. Part-of-Speech Tagging (POS) Most existing works on text summarization focuses a Before discussing the application of part-of-speech tagging single document. Some researchers also studied from natural language processing, we first give some summarization of multiple documents covering similar example sentences from some reviews to describe what information. Their main purpose is to summarize the kinds of opinions that we will handle.
756 USER MODELING
absolutely </W></VG> <W C='IN'> in </W> <NG> Crawl reviews <W C='NN'> awe </W> </NG> <W C='IN'> of </W> POS Tagging <NG> <W C='DT'> this </W> <W C='NN'> camera </W></NG><W C='.'> . </W></S> Review The NLProcessor system generates XML output. For Frequent feature Database instance, <W C=‘NN’> indicates a noun and <NG> generation indicates a noun group/noun phrase. Each sentence is saved in the review database along with Feature pruning Feature the POS tag information of each word in the sentence. set A transaction file is then created for the generation of Opinion word frequent features in the next step. In this file, each line Extraction Opinion contains words from a sentence, which includes only pre- words processed nouns/noun phrases of the sentence. The reason is that other components of a sentence are unlikely to be Infrequent feature product features. Here, pre-processing includes the identification deletion of stopwords, stemming and fuzzy matching. Opinion orientation identification Fuzzy matching (Jokinen and Ukkonen 1991) is used to Feature extraction deal with word variants or misspellings. For example, “autofocus” and “auto-focus” actually refer to the same Summary feature. All the occurrences of “autofocus” are replaced Figure 1: The opinion summarization system with “auto-focus”. Our system aims to find what people like and dislike about a given product. Therefore how to find out the product Frequent Features Generation features that people talk about is an important step. This step is to find features that people are most interested However, due to the difficulty of natural language in. In order to do this, we use association rule mining understanding, some types of sentences are hard to deal (Agrawal and Srikant 1994) to find all frequent itemsets. In with. Let us see some easy and hard sentences from the our context, an itemset is a set of words or a phrase that reviews of a digital camera: occurs together. “The pictures are very clear.” Association rule mining is stated as follows: Let I = {i1, …, “Overall a fantastic very compact camera.” in} be a set of items, and D be a set of transactions (the dataset). Each transaction consists of a subset of items in I. In the first sentence, the user is satisfied with the picture An association rule is an implication of the form X→Y, quality of the camera, picture is the feature that the user where X ⊂ I, Y ⊂ I, and X ∩ Y = ∅. The rule X→ Y holds talks about. Similarly, the second sentence shows that in D with confidence c if c% of transactions in D that camera is the feature that the user expresses his/her support X also support Y. The rule has support s in D if s% opinion. While the features of these two sentences are of transactions in D contain X ∪ Y. The problem of mining explicitly mentioned in the sentences, some features are association rules is to generate all association rules in D implicit and hard to find. For example, that have support and confidence greater than the user- “While light, it will not easily fit in pockets.” specified minimum support and minimum confidence. This customer is talking about the size of the camera, but Mining frequent occurring phrases: Each piece of the word “size” is not explicitly mentioned in the sentence. information extracted above is stored in a dataset called a To find such implicit features, semantic understanding is transaction set/file. We then run the association rule miner, needed, which requires more sophisticated techniques. CBA (Liu, Hsu and Ma 1998), which is based on the However, implicit features occur much less frequent than Apriori algorithm in (Agrawal and Srikant 1994). It finds explicit ones. Thus in this paper, we focus on finding all frequent itemsets in the transaction set. Each resulting features that appear explicitly as nouns or noun phrases in frequent itemset is a possible feature. In our work, we the reviews. To identify nouns/noun phrases from the define an itemset as frequent if it appears in more than 1% reviews, we use the part-of-speech tagging. (minimum support) of the review sentences. In this work, we use the NLProcessor linguistic parser The Apriori algorithm works in two steps. In the first step, (NLProcessor 2000), which parses each sentence and it finds all frequent itemsets from a set of transactions that yields the part-of-speech tag of each word (whether the satisfy a user-specified minimum support. In the second word is a noun, verb, adjective, etc) and identifies simple step, it generates rules from the discovered frequent noun and verb groups (syntactic chunking). The following itemsets. For our task, we only need the first step, i.e., shows a sentence with the POS tags. finding frequent itemsets, which are candidate features. In <S> <NG><W C='PRP' L='SS' T='w' S='Y'> I </W> addition, we only need to find frequent itemsets with three </NG> <VG> <W C='VBP'> am </W><W C='RB'> words or fewer in this work as we believe that a product feature contains no more than three words (this restriction
USER MODELING 757
can be easily relaxed). ftr appears in as a noun or noun phrase, and these The generated frequent itemsets, which are also called sentences must contain no feature phrase that is a candidate frequent features in this paper, are stored to the superset of ftr. feature set for further processing. p-support is different from the general support in association mining. For example, we have feature manual, Feature Pruning with the support of 10 sentences. It is a subset of feature Not all frequent features generated by association mining phrases manual mode and manual setting in the review are useful or are genuine features. There are also some database. Suppose the support of the two feature phrases uninteresting and redundant ones. Feature pruning aims to are 4 and 3 respectively, the two phrases do not appear remove these incorrect features. We present two types of together in any sentence, and all the features appear as pruning below. noun/noun phrases. Then the p-support of manual would be 3. Recall that we require the feature to appear as a noun Compactness pruning: This method checks features that or noun phrase as we do not want to find adjectives or contain at least two words, which we call feature phrases, adverbs as features. and remove those that are likely to be meaningless. We use the minimum p-support to prune those redundant In association mining, the algorithm does not consider the features. If a feature has a p-support lower than the position of an item (or word) in a transaction (or a minimum p-support (in our system, we set it to 3) and the sentence). However, in a natural language sentence, words feature is a subset of another feature phrase (which that appear together and in a specific order are more likely suggests that the feature alone may not be interesting), it is to be meaningful phrases. Therefore, some of the frequent pruned. For instance, life by itself is not a useful feature feature phrases generated by association mining may not while battery life is a meaningful feature phrase. In the be genuine features. The idea of compactness pruning is to previous example of manual, which has a p-support of 3, it prune those candidate features whose words do not appear is not pruned. This is reasonable considering that manual together. We use distances among the words in a candidate has two senses as noun meaning “references” and adjective feature phrase (itemset) to do the pruning. meaning “of or relating to hands”. Thus all the three Definition 1: compact phrase features, manual, manual mode, manual setting, could be • Let f be a frequent feature phrase and f contains n interesting. words. Assume that a sentence s contains f and the sequence of the words in f that appear in s is: w1, w2, Opinion Words Extraction …, wn. If the word distance in s between any two Opinion words are words that people use to express a adjacent words (wi and wi+1) in the above sequence is positive or negative opinion. Observing that people often no greater than 3, then we say f is compact in s. express their opinions of a product feature using opinion • If f occurs in m sentences in the review database, and words that are located around the feature in the sentence, it is compact in at least 2 of the m sentences, then we we can extract opinion words from the review database call f a compact feature phrase. using all the remaining frequent features (after pruning). For instance, let us look at the following two sentences: For example, we have a frequent feature phrase “digital camera” and three sentences from the review database “The strap is horrible and gets in the way of parts of contain the phrase: the camera you need access to.” “I had searched for a digital camera for 3 months.” “After nearly 800 pictures I have found that this camera takes incredible pictures.” “This is the best digital camera on the market” “The camera does not have a digital zoom” In the first sentence, strap, the feature, is near the opinion word horrible. And in the second example, feature picture The phrase digital camera is compact in the first two is close to the opinion word incredible. sentences but not compact in the last one. However, it is still a compact phrase as it appeared compactly two times. Following from this observation, we can extract opinion words in the following way: For a feature phrase and a sentence that contains the phrase, we look at the position information of every word • For each sentence in the review database, if it contains of the phrase and check whether it is compact in the any frequent feature, extract the nearby adjective. If sentence. If we could not find two compact sentences in such an adjective is found, it is considered an opinion the review database, we prune the feature phrase. word. A nearby adjective refers to the adjacent Redundancy pruning: In this step, we focus on removing adjective that modifies the noun/noun phrase that is a redundant features that contain single words. To describe frequent feature. redundant features, we have the following definition: As shown in the previous example, horrible is the Definition 2 p-support (pure support) adjective that modifies strap, and incredible is the adjective that modifies picture. p-support of feature ftr is the number of sentences that We use stemming and fuzzy matching to take care of word
758 USER MODELING
variants and misspellings. In this way, we build up an only POS tags, we use this simple heuristic method to find opinion word list, which is used below. the nearest noun/noun phrase instead. It works quite well. A problem with the infrequent feature identification using Infrequent Feature Identification opinion words is that it could also find nouns/noun phrases Frequent features are the “hot” features that people are that are irrelevant to the given product. The reason for this most interested in for a given product. However, there are is that people can use common adjectives to describe a lot some features that only a small number of people talked of subjects, including both interesting features that we about. These features can also be interesting to some want and irrelevant subjects. Considering the following, potential customers. The question is how to extract these “The salesman was easy going and let me try all the infrequent features? Considering the following sentences: models on display.” “Red eye is very easy to correct.” salesman is not a relevant feature of a product, but it will “The camera comes with an excellent easy to install be found as an infrequent feature because of the nearby software” opinion word easy. “The pictures are absolutely amazing” This, however, is not a serious problem since the number of infrequent features, compared with the number of “The software that comes with it is amazing” frequent features, is small. They account for around 15- Sentences 1 and 2 share the same opinion word easy yet 20% of the total number of features as obtained in our describing different features: sentence 1 is about red eye, experimental results. Infrequent features are generated for sentence 2 is about the software. Assume that software is a completeness. Moreover, frequent features are more frequent feature in our digital camera review database. red important than infrequent ones. Since we rank features eye is infrequent but also interesting. Similarly, amazing according to their p-supports, those wrong infrequent appears in both sentences 3 and 4, but sentence 3 is about features will be ranked very low and thus will not affect picture while sentence 4 is about the software. most of the users. From the examples, we see that people use the same adjective words to describe different subjects. Therefore, Opinion Sentence Orientation Identification: After we could use the opinion words to look for features that opinion features have been identified, we determine the cannot be found in the frequent feature generation step. semantic orientation (i.e., positive or negative) of each opinion sentence. This consists of two steps: (1) for each In the opinion words generation step, we use frequent opinion word in the opinion word list, we identify its features to find adjacent opinion words that modify the semantic orientation using a bootstrapping technique and features. In this step, we use the known opinion words to the WordNet (Miller et al. 1990), and (2) we then decide find those nearby features that opinion words modify. In the opinion orientation of each sentence based on the both steps, we utilize the observation “opinions tend to dominant orientation of the opinion words in the sentence. appear closely together with features”. We extract The details are presented in a subsequent paper. infrequent features using the following procedure: • For each sentence in the review database, if it contains no frequent feature but one or more opinion words, find Experiments the nearest noun/noun phrase of the opinion word. The We have conducted experiments on the customer reviews noun/noun phrase is then stored in the feature set as an of five electronics products: 2 digital cameras, 1 DVD infrequent feature. player, 1 mp3 player, and 1 cellular phone. The two We use the nearest noun/noun phrase as the noun/noun websites where we collected the reviews from are phrase that the opinion word modifies because that is what Amazon.com and C|net.com. Products in these sites have a happens most of the time. Since finding the corresponding large number of reviews. Each of the reviews includes a noun/noun phrases that an opinion word modifies requires text review and a title. Additional information available but natural language understanding, which is difficult with not used in this project, include date, time, author name
Table 1: Recall and precision at each step of the system
No. of Frequent features Compactness P-support Infrequent feature Product name manual (association mining) pruning pruning identification Features Recall Precision Recall Precision Recall Precision Recall Precision Digital camera1 79 0.671 0.552 0.658 0.634 0.658 0.825 0.822 0.747 Digital camera2 96 0.594 0.594 0.594 0.679 0.594 0.781 0.792 0.710 Cellular phone 67 0.731 0.563 0.716 0.676 0.716 0.828 0.761 0.718 Mp3 player 57 0.652 0.573 0.652 0.683 0.652 0.754 0.818 0.692 DVD player 49 0.754 0.531 0.754 0.634 0.754 0.765 0.797 0.743 Average 69 0.68 0.56 0.67 0.66 0.67 0.79 0.80 0.72
USER MODELING 759
and location (for Amazon reviews), and ratings. References For each product, we first crawled and downloaded the Agrawal, R. and Srikant, R. 1994. “Fast algorithm for mining first 100 reviews. These review documents were then association rules.” VLDB’94, 1994. cleaned to remove HTML tags. After that, NLProcessor is Barzilay, R., and Elhadad, M. 1997. Using lexical chains for text used to generate part-of-speech tags. After that, our system summarization. ACL Workshop on Intelligent, scalable text is applied to perform feature extraction. summarization. Boguraev, B., and Kennedy, C. 1997. Salience-based content To evaluate the discovered features, a human tagger characterization of text documents. In Proceedings of the ACL manually read all the reviews and produced a manual Workshop on Intelligent Scalable Text Summarization. feature list for each product. The features are mostly Church, K. and Hanks, P. 1990. Word association norms, mutual explicit in opinion sentences, e.g., pictures in “the pictures information and lexicography. Computational Linguistics, are absolutely amazing”. The implicit features such as size 16(1) : 22-29. in “it fits in a pocket nicely” are also easy to identify by the Daille. 1996. Study and Implementation of Combined Techniques human tagger. Column “No. of manual features” in Table 1 for Automatic Extraction of Terminology. The Balancing Act: shows the number of manual features for each product. Combining Symbolic and Statistical Approaches to Language Processing. MIT Press, Cambridge Table 1 gives all the precision and recall results. We Dave, K., Lawrence, S., and Pennock, D., 2003. Mining the evaluated the results at each step of our algorithm. In the Peanut Gallery: Opinion Extraction and Semantic table, column 1 lists each product. Columns 3 and 4 give Classification of Product Reviews. WWW-2003. the recall and precision of frequent feature generation for DeJong, G. 1982. An Overview of the FRUMP System. each product, which uses association mining. The results Strategies for Natural Language Parsing. 149-176 indicate that the frequent features contain a lot of errors. Hovy, E., and Lin, C.Y. 1997. Automated Text Summarization in Using this step alone gives poor results, i.e., low precision. SUMMARIST. ACL Workshop on Intelligent, Scalable Text Columns 5 and 6 show the corresponding results after Summarization compactness pruning is performed. We can see that the Jacquemin, C., and Bourigault, D. 2001. Term extraction and precision is improved significantly by this pruning. The automatic indexing. In R. Mitkov, editor, Handbook of recall stays steady. Columns 7 and 8 give the results after Computational Linguistics. Oxford University Press. pruning using p-support. There is another dramatic Jokinen P., and Ukkonen, E. 1991. Two algorithms for approximate string matching in static texts. In A. Tarlecki, improvement in the precision. The recall level has almost (ed.), Mathematical Foundations of Computer Science. no change. The results from Columns 4-8 demonstrate Justeson, J. S., and Katz, S.M. 1995. Technical Terminology: clearly the effectiveness of these two pruning techniques. some linguistic properties and an algorithm for identification in Columns 9 and 10 give the results after infrequent feature text. Natural Language Engineering 1(1):9-27. identification is done. The recall is improved dramatically. Kan, M. and McKeown, K. 1999. Information Extraction and The precision drops a few percents on average. However, Summarization: Domain Independence through Focus Types. this is not a major problem because the infrequent features Columbia University Technical Report CUCS-030-99. are ranked rather low, and thus will not affect most users. Kupiec, J., Pedersen, J., and Chen, F. 1995. A Trainable In summary, with the average of recall of 80% and the Document Summarizer. SIGIR-1995. Liu, B., Hsu, W., Ma, Y. 1998. Integrating Classification and average precision of 72%, we believe that our techniques Association Rule Mining. KDD-98, 1998. are quite promising, and can be used in practical settings. Mani, I., and Bloedorn, E., 1997. Multi-document Summarization by Graph Search and Matching. AAAI-97. Conclusion Manning, C. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing, MIT Press. May 1999. In this paper, we proposed a number of techniques for Marcu, D. 1997. From Discourse Structures to Text Summaries. mining opinion features from product reviews based on ACL Workshop on Intelligent, Scalable Text Summarization. data mining and natural language processing methods. The Miller, G., Beckwith, R, Fellbaum, C., Gross, D., and Miller, K. objective is to produce a feature-based summary of a large 1990. Introduction to WordNet: An on-line lexical database. number of customer reviews of a product sold online. We International Journal of Lexicography, 3(4):235-312. believe that this problem will become increasingly NLProcessor – Text Analysis Toolkit. 2000. important as more people are buying and expressing their http://www.infogistics.com/textanalysis.html opinions on the Web. Our experimental results indicate Paice, C. D. 1990. Constructing Literature Abstracts by Computer: techniques and prospects. Information Processing that the proposed techniques are effective in performing and Management 26:171-186. their tasks. In our future work, we plan to further improve Radev, D. and McKeown, K. 1998. Generating natural language these techniques. We also plan to group features according summaries from multiple on-line sources. Computational to the strength of the opinions that have been expressed on Linguistics, 24(3):469-500, September 1998. them, e.g., to determine which features customers strongly Sparck-Jones, K. 1993a. Discourse Modeling for Automatic Text like and dislike. This will further improve the feature Summarizing. Technical Report 290, University of Cambridge. extraction and the subsequent summarization. Sparck-Jones, K. 1993b. What might be in a summary? Information Retrieval 93: 9-26. Acknowledgements. This work is supported by the Tait, J. 1983. Automatic Summarizing of English Texts. Ph.D. National Science Foundation under the grant IIS-0307239. Dissertation, University of Cambridge.