Sentiment Analysis
Sentiment Analysis
Abstract—Sentiment analysis is a new area of research in mention lexicon generation, sentiment classification, feature
data mining that concerns the detection of opinions and/or based sentiment classification and opinion summarization.
sentiments in texts. This work focuses on the application and the Lexicon generation is based on an analysis at a word level,
comparison of three classification techniques over a text corpus
composed of reviews of commercial products in order to detect which leads to the construction of subjectivity or sentiment
opinions about them. The chosen domain is about ”perfumes”, lexicons, that can be manually, semi-manually or automatically
and user opinions composing the corpus are written in Italian built. Sentiment classification aims at automatically detect-
language. The proposed approach is completely data-driven: ing the polarity of a text (a word, a sentence or an entire
a Term Frequency / Inverse Document Frequency (TFIDF) document). Feature based sentiment classification regards the
terms selection procedure has been applied in order to make
computation more efficient, to improve the classification results attribution of sentiments to the values of features of some
and to manage some issues related to the specific classification products which opinion refers. Opinion summarization is the
procedure adopted. extraction and aggregation of sentiments of the whole opinion
Keywords: Sentiment Classification, Naive Bayes classifier, given its features in a meaningful summary.
Class Association Rules, Random Indexing, TF-IDF
In this scenario, many methods have been developed for
sentiment classification. Two main approaches have been in-
I. I NTRODUCTION
dividuated: methods based on machine learning and methods
Sentiment analysis is a sub-discipline of natural language based on semantic orientation[10]. From these, other methods
processing that focuses on determining polarity of a given text have been proposed, which take advantages of both of them.
through the analysis of words in text, their disposition, their Among different applications we recall the classification of
presence/absence in relation to the presence/absence of other consumer feedbacks and ratings of products and services.
words. In the last years, interest in these areas is increasing In this article we show a data-driven approach that retrieves
because of the explosion of popularity of social networks and comments about products of a specific domain by extracting
reviews sites, that are incomparable sources of opinions about them from a website focusing on opinions and comparisons of
society, economy, commerce, politics, but also moods. Thanks products given by customers. Text are properly processed in
to automated methods and techniques studied by researchers, order to extract significative terms and given as input to three
the large amount of opinions available from the net has become classifiers. We explore and compare the performances of each
object of analysis for the extraction of relevant orientations of one of them.
people about specific topics, so that retrieved information is The following of the paper is organized as follows: Section
useful in determining what people likes or dislikes. 2 contains an overview about the state of the art in Sentiment
Unlike the generic classification of texts, the object of analysis analysis and sentiment classification; Section 3 describes the
is an opinion, which can be defined as a tuple of values proposed method and its application to the three classification
{O, F, S, U, T }, which are the object of the opinion O and methods involved in the comparison (Naive Bayes classifier,
its feature F , the sentiment expressed in the opinion S , the Class Association Rules and Random Indexing); in Section
user U which expressed it and the time T when the opinion 4, details about datasets, evaluation criteria and experimental
has been expressed [9]. results are shown; in Section 5 we discuss our experimental
Major problems in determining the polarity of a text have results and explain some considerations about the adopted
been originated from the nature of human language: a word method.
may change its polarity if it’s near to a negative word, logic
connectives may be referred to some words in the same
II. R ELATED WORKS
sentence or to a different sentences, a phrase may have some
positive terms used in a negative context (and vice versa, for Sentiment classification methods are generally included into
example into ironical sentences), and so on. two great branches: machine learning (supervised approach)
Among the different sub-fields of sentiment analysis we can and semantic orientation [10].
427
If C is the set of all the classes c, we consider T F (t, c) as where A is a normalization factor, and p(fk |ci ), is the proba-
the frequency of term t in class c, and we refer to IDF (t) bility that the features belong jointly to the class ci :
as the percentage of documents in class c in which term t
N (Fk = fk ∪ C = ci )
appears. The modified definition of TFIDF for the case of p(fk |ci ) = (2)
N (C = ci )
classes is explained as follows:
In our case the features are the terms belonging to the
|occurrences of t in c|
T F (t, c) = document, therefore we used models for discrete features:
|terms in c| Multivariate Bernoulli Distribution and Multinomial Distribu-
|C| tion models. In the first case given {t1 , t2 , ..., tm } the set of
IDF (t) = log |documents in ci in which t appears terms in the document, the probability p(tk |ci ) is given by:
ci ∈C |documents in ci |
|{d|d ∈ ci ∩ tk ∈ d}|
T F IDF (t, c) = T F (t, c) ∗ IDF (t) p(tk |ci ) = (3)
|{d|d ∈ ci }|
This way, it is possible to perform a selection of the most while in the second case is given by:
relevant terms, where for our purposes the relevance of a term
|tk occurrences in ci |
is strictly connected with its semantic polarity. For highlighting p(tk |ci ) = (4)
polarity of a certain term, a weighted sum of the TFIDF values |total words in ci |
of that term for every class has been calculated: a negative 2) Class Association Rules: In this case the review clas-
weight has been attributed to the TFIDF values of classes 1 sification is treated as the recognition of those reviews that
and 2, and positive to the TFIDF values of classes 3, 4 and 5. satisfy a set of rules defined for each class. Let T be the set
Furthermore, if the same term is also present in negated of transactions, composed of the different reviews, where each
form preceded by the term “non” (in Italian, but it also may transaction is labelled with a class ci , let I be the item set and
be “not” for the English case, or some other negation in some Y the set of classes. A class association rule (CAR) is defined
other language), it was given to the two term versions (negated as an implication of the form X → y con X ⊆ I e y ∈ Y .
and not negated) two equal and opposite weights: after polar- An algorithm extracts the rules satisfying a minimum sup-
ity assignation has finished, for each pair of term versions port and a minimum confidence, where the support sup(X →
present in the dataset, the term with minor absolute value y) and the confidence conf (X → y) are respectively defined
received the value of his counterpart, but with opposite sign, as:
in order to respect the polarity inversion. Finally, terms with T ransactions of y containing X
negative orientation had negative polarity values and, similarly, P r(X ∪ y) =
T otal number of transactions
terms with positive orientation had positive polarity values.
Furthermore it was also used an experimentally determined T ransactions of y containing X
P r(y|X) =
threshold for considering only terms that are strongly negative N umber of transactions containing X
or strongly positive. After terms selection, every document in The algorithm used for this purpose is the “Apriori” one[15].
the training set was filtered: documents are filtered so that Let a ruleitem be a set of items (condset) associated to a class
those belonging to negative classes involve only terms with y (condset, y): given a minsup and a minconf, at each step the
negative values, and documents belonging to positive classes algorithm generates the ruleitems satisfying the minsup, which
involve only terms with positive values. The choice of not will be used in the next step. At the end the algorithm returns
considering negative terms in positive documents (and vice the ruleitems satisfying both the minsup and the minconf.
versa) is due to the fact that negative opinion may surely The space of all possible rules that can be generated is
involve positive terms, but with rare frequency or out of exponential (O(2m ), with m number of items in the dataset):
opinion polarity context. the use of a high minimum support and an high confidence
Figure 1 shows the chain of the phases for obtaining reduced allows to concentrate the computation of a reduced number of
vocabulary and for filtering documents of training set. rules that have a certain validity. However since rules with sup-
B. Sentiment Corpus Classification: Training phase port lower than minimum user-specified support are removed
from computation, the consequence is the so called rare items
In this work we have analyzed and compared three different problem: the removal of those rules with low support, which
algorithms on the reviews dataset: Naive Bayes classifier (with may include rare terms, potentially characteristic of a certain
both Multinomial and Bernoulli models), Class Association class. In fact, because of their presence in some classes than
Rules and Random Indexing with k-Nearest Neighbors. in others, these terms result not very frequent in the whole
1) Naive Bayes Classifier: The aim is to determine the class dataset, but very frequent in a specific class. On these basis,
ci from the probability that the document d belongs to that we preferred to use the Multi Support Apriori algorithm, in
class p(ci |d), given by: order to individuate also those rules involving rare terms and
1
m to make user free from choosing a specific minimum support,
p(ci |f1 , f2 , ..., fm ) = p(ci ) p(fk |ci ) (1) that may negatively influence the results of rules generation.
A
k=1
428
Fig. 1. Generation of reduced vocabulary through TFIDF and filtering of documents
3) Random Indexing: In this case documents are rep- possible scores and a coarse grained version of the dataset,
resented as vectors, analyzing the terms co-occurrences in where reviews with a score ranging from 1 to 2 are grouped
specific contexts (e.g. each document). According to [14], into a macro class A, and reviews with a score ranging from
it consists of two phases. In the first phase, each context is 3 to 5 are grouped into a macro class B. In the first case
assigned to a random, high dimensional, sparse index vector we assume that classes have the following interpretations:
consisting of a randomly distributed +1s and -1s, while other 1 = “very negative”, 2 = “negative”, 3 = “just positive”,
elements are set to 0. 4 = “positive”, 5 = “very positive”; in the second case we
In the second phase, context vectors are computed for assume more generically that class A = “negative” and class
each word: each time a word is present in a certain context, B = “positive”. The difference in the granularity is only at
the index vector of that context is added to a vector which a training level. During the test phase we consider only a
represents the context vector of the word. macro classification. We assume that if two users use the
At the end a co-occurrence matrix Mws is obtained, where same words to express an opinion on a product, it’s realistic to
the rows are the generated context vectors. suppose that they both express an opinion about the product
Among the advantages of Random Indexing we recall the in the same positive or negative manner: on the other hand,
independence by specific domains, the fact that it is an incre- score attribution gives a coarse indication about the opinion
mental method and the reduced computational and memory expressed in the text.
requirements with respect to other vector space models (e.g. Let us focus on this example: if two users use
those required by Latent Semantic Analysis). in their opinions the terms “brutto”, “non piace”,
“non consiglio”, “sgradevole” (in English “ugly”, “not like”,
IV. E XPERIMENTAL RESULTS “not recommend”, “unpleasant”), they have negative
The dataset has been obtained by means of an ad-hoc judgement of that product; on the other hand, it is possible
crawler, extracting pairs composed of “textual descriptions that they attribute a negative score, effectively negative, but
of opinions”, which are expressed in Italian language, and different in value, because they may choose to vote 1 or 2,
their associated “scores”, ranging from 1 to 5, about per- according to their negative judgement.
fumery products from the review sites “www.dooyoo.it” and In our opinion during the evaluation phase, positive doc-
“www.ciao.it”. Training and test sets were chosen as follows: uments with different scores should be considered generally
• training set with 500 documents (100 per class); positive, independently of the score (3, 4 or 5). The same
• test set with 50 documents (10 per class). assumption is made for negative scores.
The analysis of performances was conducted on the dataset We have tested the algorithms considering reduced vocabu-
considering two different levels of granularity: we have con- lary obtained by experimentally setting different threshold for
sidered a fine grained version of the dataset, which considers the TF-IDF based filtering procedure, and also considering
the reviews according to five classes corresponding to the five the entire vocabulary of terms (except for Class Association
429
Multinom. NB 2 classes 5 classes Accuracy RI
no thresh. thresh. 0.25 no thresh. thresh. 0.3 k-NN Threshold 2 classes 5 classes
Accuracy 0.78 0.84 0.76 0.86 3-NN None 0.62 0.62
True positive rate 0.866667 0.933333 0.866667 0.933333 0.15 0.68 0.68
True negative rate 0.65 0.7 0.6 0.75 0.2 0.82 0.82
0.25 0.74 0.74
TABLE II 0.3 0.86 0.86
R ESULTS WITH M ULTINOMIAL NAIVE BAYES CLASSIFIER 0.35 0.76 0.76
5-NN None 0.64 0.68
0.15 0.68 0.68
Bernoulli NB 2 classes 5 classes 0.2 0.8 0.82
no thresh. thresh. 0.3 no thresh. thresh. 0.3 0.25 0.74 0.74
Accuracy 0.6 0.84 0.64 0.86 0.3 0.86 0.86
True positive rate 1.0 0.9 0.9666667 0.9333333 0.35 0.76 0.74
True negative rate 0.0 0.75 0.15 0.75 7-NN None 0.62 0.62
0.15 0.68 0.68
TABLE III 0.2 0.82 0.82
R ESULTS WITH B ERNOULLI NAIVE BAYES CLASSIFIER 0.25 0.74 0.74
0.3 0.86 0.86
0.35 0.78 0.82
9-NN None 0.66 0.58
0.15 0.66 0.68
Rules: it was impracticable to complete the training phase with 0.2 0.8 0.82
0.25 0.74 0.74
unfiltered documents because of the number of rules generated 0.3 0.86 0.86
with MS-Apriori resulted excessively high). 0.35 0.78 0.78
In particular we have obtained: 11-NN None 0.64 0.56
0.15 0.66 0.68
• without threshold, 7556 terms; 0.2 0.8 0.8
• with threshold 0.15, 584 terms; 0.25 0.74 0.74
• with threshold 0.2, 313 terms; 0.3 0.86 0.86
0.35 0.76 0.78
• with threshold 0.25, 179 terms;
15-NN None 0.58 0.56
• with threshold 0.3, 106 terms; 0.15 0.66 0.68
• with threshold 0.35, 77 terms. 0.2 0.8 0.8
0.25 0.74 0.74
With Naive Bayes classifier, analysis was conducted with 0.3 0.86 0.86
Multinomial and Bernoulli models and with the help of 0.35 0.76 0.76
Laplacian correction, for including the case of test document
terms that are present in some classes and not in others; with TABLE V
ACCURACY RESULTS WITH R ANDOM I NDEXING + K -N EAREST
Random Indexing, results with 3, 5, 7, 9, 11 and 15 nearest N EIGHBORS
neighbours are reported.
The first four tables, show a comparison of the classification
Accuracy NB
results obtained with both unfiltered and filtered documents.
Threshold MNB 2 cl. MNB 5 cl. BNB 2 cl. BNB 5 cl.
For filtered documents, there are reported those corresponding None 0.78 0.76 0.6 0.64
to the threshold giving the best results. For a complete view of 0.1 0.78 0.76 0.74 0.72
performances, results with all accuracy values for all adopted 0.15 0.72 0.74 0.72 0.72
0.2 0.76 0.82 0.82 0.8
thresholds are reported in the last three tables. 0.25 0.84 0.8 0.8 0.82
0.3 0.82 0.86 0.84 0.86
V. C ONCLUSIONS 0.35 0.76 0.78 0.74 0.76
In this work we have compared the results obtained apply-
ing different classification algorithms on a sentiment corpus TABLE VI
ACCURACY RESULTS WITH NAIVE BAYES CLASSIFIERS
composed of perfumery products reviews. The dataset has
been labelled according to the votes given by the costumers,
and a selection of the most meaningful terms has been Accuracy CAR
Threshold 2 classes 5 classes
0.15 0.74 0.74
Car Association Rules 2 classes 5 classes 0.2 0.82 0.82
0.25 0.82 0.78
Accuracy 0.84 0.86 0.3 0.84 0.86
True positive rate 0.833333 0.9 0.35 0.76 0.82
True negative rate 0.85 0.8
TABLE VII
TABLE IV ACCURACY RESULTS WITH C LASS A SSOCIATION RULES
R ESULTS WITH C AR A SSOCIATION RULES
430
Random Ind. k-near 2 classes 5 classes
no thresh. thresh. 0.3 no thresh. thresh. 0.3
Accuracy 3-NN 0.62 0.86 0.62 0.86
5-NN 0.64 0.86 0.68 0.86
7-NN 0.62 0.86 0.62 0.86
9-NN 0.66 0.86 0.58 0.86
11-NN 0.64 0.86 0.56 0.86
15-NN 0.58 0.86 0.56 0.86
True pos. rate 3-NN 0.8 0.93333 0.6 0.93333
5-NN 0.93333 0.93333 0.73333 0.93333
7-NN 0.9 0.93333 0.66667 0.93333
9-NN 0.96667 0.93333 0.7 0.93333
11-NN 0.96667 0.93333 0.7 0.93333
15-NN 0.96667 0.93333 0.76667 0.93333
True neg. rate 3-NN 0.35 0.75 0.65 0.75
5-NN 0.2 0.75 0.6 0.75
7-NN 0.2 0.75 0.55 0.75
9-NN 0.2 0.75 0.4 0.75
11-NN 0.15 0.75 0.35 0.75
15-NN 0.0 0.75 0.25 0.75
TABLE I
R ESULTS WITH R ANDOM I NDEXING + K -N EAREST N EIGHBORS
performed defining a TF-IDF-based weighting procedure. The [5] Zhu Jian, Xu Chen, Wang Han-shi, Sentiment classification using the
obtained results show that the proposed weighting procedure theory of ANNs, The Journal of China Universities of Posts and Telecom-
has improved the performance of all the analysed classification munications (July 2010), 17(Suppl.): 58-62.
[6] Chaovalit, Lina Zhou, Movie Review Mining: a Comparison between
methodologies. Moreover the comparison shows that Naive Supervised and Unsupervised Classification Approaches, Proceedings of
Bayes classifiers and Random Indexing work better on the the 38th Hawaii International Conference on System Sciences (2005).
classification of positive documents, while Car Association [7] DongMei Zhang; Shengen Li; Cuiling Zhu; Xiaofei Niu; Ling Song.
A comparison study of multi-class sentiment classification for Chinese
Rules works better on the classification of negative documents. reviews. Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh
In addition, a little better results are obtained from the evalua- International Conference on , vol.5, no., pp.2433,2436, 10-12 Aug. 2010.
tion with five classes: the reason of this behaviour is probably doi: 10.1109/FSKD.2010.5569300
[8] Liu Bing. (2010), Sentiment Analysis and Subjectivity, Handbook of
due to the fact that, with five classes, the grain of training is Natural Language Processing 2nd ed, chapter 28, editors N. Indurkhya
more fine than in the case of two classes, and this augmented and F. J. Damerau (2010).
precision permits more accurate classification. [9] G. Vinodhini and RM. Chandrasekaran 2012. Sentiment analysis and
Opinion Mining: A survey, International Journal of advanced Research
The proposed filtering approach is well indicated for con- in Computer Science and Software Engineering vol. 2 Issue 6.
texts in which a terms vocabulary is not available, but it is [10] Kushal Dave, Steve Lawrence, and David M. Pennock. 2003. Mining
possible to exploit a large amount of voted reviews, where the the peanut gallery: opinion extraction and semantic classification of
product reviews. In Proceedings of the 12th international conference on
user gives an indication of the polarity of his comment. The World Wide Web (WWW ’03). ACM, New York, NY, USA, 519-528.
proposed procedure, which is moreover independent from the DOI=10.1145/775152.775226 http://doi.acm.org/10.1145/775152.775226
used language, allows us to estimate words polarities analysing [11] Peter D. Turney and Michael L. Littman, Measuring Praise and Criti-
cism: Inference of Semantic Orientation from Association, ACM Trans-
the reviews polarities. Moreover the proposed approach allows actions on Information Systems, vol.21, pp.315-346, 2003
to reduce the computational costs of the algorithms. [12] Peter D. Turney, Thumbs Up or Thumbs Down? Semantic Orientation
Applied to Unsupervised Classification of Reviews, presented at the
VI. ACKNOWLEDGEMENTS Association for Computational Linguistics 40th Anniversary Meeting,
New Brunswick, N.J., 2002
This work has been partially supported by the [13] Kanerva P., Sparse distributed memory, The MIT Press, 1988.
PON01 01687 - SINTESYS (Security and INTElligence [14] Agrawal R., Srikant R., Fast algorithm for mining association rules,
VLDB-94, 1994.
SYSstem) Research Project. [15] Qiang Ye; Wen Shi; Yi-Jun Li, ”Sentiment Classification for Movie
Reviews in Chinese by Improved Semantic Oriented Approach,” System
R EFERENCES Sciences, 2006. HICSS ’06. Proceedings of the 39th Annual Hawaii
International Conference on , vol.3, no., pp.53b,53b, 04-07 Jan. 2006
[1] Songbo Tan, Jin Zhang, An empirical study of sentiment analysis for
doi: 10.1109/HICSS.2006.432
chinese documents, Expert Systems with Applications 34 (2008), 2622-
2629.
[2] Rudy Prabowo, Mike Thelwall, Sentiment analysis: A combined ap-
proach, Journal of Informetrics 3 (2009), 143-157.
[3] Weitong Huang, Yu Zhao, Shiqiang Yang, Yuchang Lu, Analysis of the
user behavior and opinion classification based on the BSS, Applied
Mathematics and Computation 205 (2008), 668-676.
[4] Chin-Sheng Yang, Hsiao-Ping Shih, A rule-based approach for effec-
tive sentiment analysis (2012). PACIS 2012 Proceedings. Paper 181.
http://aisel.aisnet.org/pacis2012/181
431