0% found this document useful (0 votes)
59 views64 pages

4 - Sentiment Analysis - Plain

Sentiment analysis aims to computationally study people's opinions, attitudes, and emotions toward entities, issues, and topics. It involves classifying the sentiment expressed in text at the document, sentence, and entity/aspect level. Automated sentiment analysis systems are needed to overcome biases and limitations in human analysis of large amounts of subjective text from sources like reviews and social media.

Uploaded by

Prakhar Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views64 pages

4 - Sentiment Analysis - Plain

Sentiment analysis aims to computationally study people's opinions, attitudes, and emotions toward entities, issues, and topics. It involves classifying the sentiment expressed in text at the document, sentence, and entity/aspect level. Automated sentiment analysis systems are needed to overcome biases and limitations in human analysis of large amounts of subjective text from sources like reviews and social media.

Uploaded by

Prakhar Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Sentiment Analysis and

Opinion Mining
Sentiment analysis or opinion mining is the computational
study of people's opinions, appraisals, attitudes, and
emotions toward entities, individuals, issues, events, topics
and their attributes.

The task is • Businesses want to find public or consumer opinions about


technically their products and services.

Introduction challenging and • Potential customers want to know the opinions of existing
practically useful. users before they use a service or purchase a product.

With user generated content on social media (i.e., reviews,


forum discussions, blogs and social networks) on the Web,
individuals and organizations are increasingly using public
opinions for their decision making.
Finding and monitoring opinion sites on the Web and distilling the
information in them remains a formidable task because of the proliferation
of diverse sites.

Each site typically contains a huge volume of opinionated text that is not
always easily deciphered in long forum postings and blogs.

The average human reader will have difficulty identifying relevant sites and
accurately summarizing the information and opinions contained in them.

Need for Human analysis of text information is subject to considerable biases, e.g.,
Automated people often pay greater attention to opinions consistent with their own
preferences. People have difficulty, in producing consistent results when the
Sentiment amount of information to be processed is large.

Analysis
Automated opinion mining and summarization systems are needed, as
subjective biases and mental limitations can be overcome with an objective
sentiment analysis system.
 Sentiment analysis is carried out at three levels:
 Document level: The task is to classify whether a whole opinion
document expresses a positive or negative sentiment
 Given a product review, the system determines whether the review
expresses an overall positive or negative opinion about the product.
This task is commonly known as document-level sentiment
Levels of classification.

Analysis  This level of analysis assumes that each document expresses


opinions on a single entity (e.g., a single product).
 It is not applicable to documents which evaluate or compare
multiple entities.
Sentence level: The analysis goes to the sentences and determines
whether each sentence expressed a positive, negative, or neutral opinion.

Neutral usually means no opinion.

Distinguishes sentences that express factual information (objective


sentences) Vs. that express subjective views and opinions (subjective
Levels of sentences).

Analysis Subjectivity is not equivalent to sentiment as many objective sentences


can imply opinions, e.g., “We bought the car last month and the
windshield wiper has fallen off.”

Analysis is done at clause level but the clause level is still not enough,
e.g., “Apple is doing very well in this lousy economy.”
Entity or Aspect Level: Document level and the sentence level analyses do not
discover what exactly people liked and did not like (Feature Level)

Instead of looking at language constructs (documents, paragraphs, sentences,


clauses or phrases), aspect level directly looks at the opinion itself.

Idea is an opinion consists of a sentiment (positive or negative) and a target


Levels of
Analysis
An opinion without its target being identified is of limited use.

Example: “although the service is not that


great, I still love this restaurant”
Realizing the importance of opinion Has a positive tone, but cannot say it is
targets also helps to understand the entirely positive. In fact, the sentence is
positive about the restaurant (emphasized),
sentiment analysis problem better. but negative about its service (not
emphasized).
The most important indicators of sentiments are sentiment words, or opinion
words. These are words that are commonly used to express positive or negative
sentiments.

For example, good, wonderful, and amazing are positive sentiment words, and
bad, poor, and terrible are negative sentiment words.

There are also phrases and idioms, e.g., cost someone an arm and a leg.

Sentiment
Sentiment words and phrases are instrumental to sentiment analysis.
Lexicon
A list of such words and phrases is called a sentiment lexicon (or opinion
lexicon).

Researchers have designed numerous algorithms to compile such lexicons


 A positive or negative sentiment word may have opposite
orientations in different application domains.
 For example, “suck” usually indicates negative sentiment, e.g.,
“This battery sucks,” but it can also imply positive sentiment, e.g.,
“This vacuum cleaner really sucks (dirt).”
Issues  Sarcastic sentences with or without sentiment words are hard to
deal with, e.g., “What a great car! It stopped working in two days.”
 Sarcasms are not very common in consumer reviews about products
and services, but are very common in other places, eg. political
discussions
A sentence containing sentiment words may not express any
sentiment.

Common in Question “Can you tell me which Sony


camera is good?”
(interrogative) sentences and “If I can find a good camera in the
conditional sentences shop, I will buy it.”

Issues
These contain the sentiment word “good”, but does not
express a positive or negative opinion on any specific
camera.

Not all conditional or


“Does anyone know how to repair
interrogative sentences express this terrible printer”
no sentiments,
Many sentences without sentiment words can also imply opinions.

Many of these are objective sentences that are used to express some
factual information.

“This washer uses a lot of water” implies a negative sentiment about


Issues the washer.

“After sleeping on the mattress for two days, a valley has formed in
the middle” – a negative opinion about the mattress.

These sentences are objective as it states a fact. They have no


sentiment words.
An entity e is a product, service, person,
event, organization, or topic.

e : (T;W), where T is a
hierarchy of components (or

It is associated with a parts), sub-components, and


so on, and W is a set of

Entity: pair
attributes of e.
Each component or sub-
component also has its own

Definition set of attributes.

battery and screen,


Samsung Galaxy is an It has a set of attributes, voice
entity. It has a set of quality, size, and weight.
The battery has its own set of
components, attributes, e.g., battery life,
and battery size
Entity is represented as a tree or hierarchy. The root of the
tree is the name of the entity.

Each non-root node is a component or sub-component of


the entity.

Each link is a part-of relation.

Entity and
Each node is associated with a set of attributes.
Attributes
An opinion can be expressed on any node and any attribute
of the node.

Both components and attributes are combined and called


“Aspects”
An opinion is a positive or negative sentiment,
attitude, emotion or appraisal about an entity or
an aspect of the entity from an opinion holder.

Opinion
Positive, negative and neutral are called opinion
orientations (also called sentiment orientations,
semantic orientations, or polarities).
Discover Objective : Given a collection of opinion documents D, discover all opinion quintuples in D.

Extract all entity expressions in D, and group synonymous entity expressions into entity
Extract clusters. Each entity expression cluster indicates a unique entity ei.

Extract all aspect expressions of the entities, and group aspect expressions into clusters. Each
Extract aspect expression cluster of entity ei indicates a unique aspect aij

Opinion
Extract Extract opinion holder and time information from the text or unstructured data.

mining
Determine Determine whether each opinion on an aspect is positive, negative or neutral.

Produce Produce all opinion quintuples expressed in D based on the results of the above
 bigXyz on Nov-4-2010:(1) I bought a Motorola phone and
my girlfriend bought a Nokia phone yesterday. (2) We called
each other when we got home. (3) The voice of my Moto
phone was unclear, but the camera was good. (4) My
girlfriend was quite happy with her phone, and its sound
quality. (5) I want a phone with good voice quality. (6) So I
probably will not keep it.
 QUINTIPLES
Example of  (Motorola, voice quality, negative, bigXyz, Nov-4-2010)
Extraction  (Motorola, camera, positive, bigXyz, Nov-4-2010)
 (Nokia, GENERAL, positive, bigXyz's girlfriend, Nov-4-
2010)
 (Nokia, voice quality, positive, bigXyz's girlfriend, Nov-4-
2010)
 An objective sentence (sentence 1&2) presents some factual
information about the world, while a subjective sentence
expresses some personal feelings, views or beliefs.
 Subjective expressions come in many forms, e.g.,
opinions, allegations, desires, beliefs, suspicions, and
speculations.
Two more  A subjective sentence may not contain an opinion
Definitions (Sentence 5)
 Not every objective sentence contains no opinion. “the
earphone broke in two days", is an objective sentence but
it implies a negative sentiment.
 Emotions are our subjective feelings and thoughts
 There are 6 primary emotions, i.e., love, joy, surprise,
anger, sadness, and fear, which can be sub-divided into
many secondary and tertiary emotions. Each emotion can
also have different intensities.
 The concepts of emotions and opinions are not
Emotions equivalent.
 Many opinion sentences express no emotion (e.g., “the
voice of this phone is clear”), which are called rational
evaluation sentences
 Many emotion sentences give no opinion, (e.g., “I am so
surprised to see you”)
Document-level sentiment classification
 Sentiment classification assumes that the
opinion document d (e.g., a product review)
expresses opinions on a single entity e and the
Document opinions are from a single opinion holder h.
Sentiment  This assumption holds for customer reviews of
Classification products and services because each such review
usually focuses on a single product and is
written by a single reviewer.
Three classes, positive, negative and neutral.

Since each review already has a reviewer-assigned


rating (e.g., 1-5 stars), training and testing data are
readily available.
Classification • A review with 4 or 5 stars is a positive review, a review with 1 or 2
based on stars is a negative review and a review with 3 stars is a neutral
review.
Supervised • Naïve Bayesian classification, and support vector machines (SVM).
Learning • It was shown that using unigrams (a bag of individual words) as
features in classification performed well with either naive Bayesian
or SVM.
• ***Last 2 points refereeing to the pdf shared
Terms and their frequency: individual words or
word n-grams and their frequency counts.
• word positions may also be important.
• TF-IDF weighting scheme.

Opinion words and phrases: Used to express


positive or negative sentiments.
Feature set for • beautiful, wonderful, good, and amazing are positive opinion words,
Classification and bad, poor, and terrible are negative
• Many opinion words are adjectives and adverbs. Nouns (rubbish,
junk, and crap) and verbs (hate and like) can also indicate opinions.
• There are also opinion phrases and idioms, cost someone an arm
and a leg. Opinion words and phrases are instrumental to sentiment
analysis
Part of speech: adjectives are important indicators of
opinions and treated as special features.

Negations: Negation words are important because their


appearances often change the opinion orientation.

Feature set for • “I don't like this camera” is negative.


• Negation words must be handled with care because not all
Classification occurrences of such words mean negation.
• “not” in “not only but also” does not change the orientation
direction

Syntactic dependency: Word dependency based


features generated from parsing or dependency trees
Three Step Process

Step 1:
• Phrases containing adjectives or adverbs are
extracted as adjectives and adverbs are good
indicators of opinions.
Classification – • Context is important. “unpredictable" breaking
Unsupervised distance of car vs. “unpredictable” ending of the
mystery movie
Learning • The algorithm extracts two consecutive words,
where one member of the pair is an adjective or
adverb, and the other is a context word
Step 2: Estimate the semantic orientation of the extracted phrases using
the point-wise mutual information (PMI) measure

 PMI is a measure of the degree of statistical dependence between t 1


and t2 and log of this ratio is the amount of information that we
acquire about the presence of one of the words when we observe the
other
Classification –  The semantic/opinion orientation (SO) of a phrase is computed
based on its association with the positive reference word
Unsupervised “excellent” and its association with the negative reference word
Learning “poor”
SO(Phrase)=PMI(Phrase, “Excellent”) – PMI(Phrase, “Poor”)
 The probabilities are calculated by issuing queries and collecting
the number of hits. Searching the two terms together and separately,
we can estimate the probabilities
Step 3: The algorithm computes the average SO of all phrases
in a review, and classifies the review as recommended if the
average SO is positive
• Final classification accuracies on reviews from various domains range from
84% for automobile reviews to 66% for movie reviews.

Advantage of document level sentiment classification: it


provides a prevailing opinion on an entity, topic or event.
Classification –
Unsupervised
Learning Shortcomings:

• It does not give details on what people liked and/or disliked and
• It is not easily applicable to non-reviews, e.g., forum and blog postings,
because many such postings evaluate multiple entities and compare them.
Document-level sentiment classification techniques can also be
applied to individual sentences.

Subjectivity classification: The task of classifying a sentence as


subjective or objective

The resulting subjective sentences are also classified as expressing


Sentence-level positive or negative opinions

Sentiment
Classification. 1. Subjectivity classification: Determine whether s is a subjective
sentence or an objective sentence

2. Sentence-level sentiment classification: If s is subjective,


determine whether it expresses a positive, negative or neutral
opinion.
The sentence expresses a single opinion from a single
opinion holder.

This assumption is only


appropriate for simple “The picture quality of this

Assumption sentences with a single


opinion,
camera is amazing.”

Compound and complex “The picture quality of this


sentences, a single sentence camera is amazing and so is
the battery life, but the
may express more than one view finder is too small for
opinion. such a great camera"
Opinion Positive There are also Collectively, Three
words: also opinion words opinion they are called Approaches:
known as are used to phrases and the opinion Manual,
opinion- express some idioms: “Cost lexicon. Used Dictionary-
bearing words desired states someone an for opinion based, and
or sentiment while negative arm and a leg”. mining. Corpus-based.
words. opinion words
are used to
express some
undesired
Opinion states.

Lexicon beautiful, wonderful, The manual

Expansion good, and amazing.


bad, poor, and
terrible.
approach is time-
consuming and not
usually used alone,
but combined with
automated
approaches as the
check because
automated methods
make mistakes.
Bootstrapping using a small set of seed opinion words and an
online dictionary, e.g., WordNet.

The strategy is to first collect a small set of opinion words


manually with known orientations, and then to grow this set by
searching for their synonyms and antonyms.

Dictionary
The newly found words are added to the seed list and the next
based iteration starts. The iterative process stops when no more new
words are found.
approach
After the process completes, manual inspection can be carried
out to remove and/or correct errors.
Shortcoming: The approach is For example, for a
unable to find opinion words with speaker phone, if it is
quiet, it is usually
domain and context specific negative. However, for a
orientations, which is quite car, if it is quiet, it is
common. positive.

Dictionary
based
The corpus-based approach can help deal with this problem.
approach
The methods rely on syntactic or co-occurrence patterns and also a seed list of
opinion words to find other opinion words in a large corpus

The technique starts with a list of seed opinion adjectives, and uses them and a
set of linguistic constraints or conventions on connectives to identify
additional adjective opinion words and their orientations.

Conjunction “AND”: conjoined “This car is beautiful and spacious”


adjectives usually have the same
Corpus-based
"This car is beautiful and difficult to drive“
orientation. (AND Conjunction is not usually used)

approach
Rules or constraints are also designed for other connectives, OR, BUT,
EITHER-OR, and NEITHER-NOR.

This idea is called sentiment consistency


Learning is applied to a large corpus to determine if two conjoined
adjectives are of the same or different orientations.

Same and different-orientation links between adjectives are formed

Clustering is performed on these to produce two sets of words: positive and


negative.

Corpus-based
Inter-sentential consistency is the idea to neighboring sentences.
approach
The same opinion orientation (positive or negative) is usually expressed in a
few consecutive sentences.

Opinion changes are indicated by adversative expressions such as “but” and


“however”.
Digital camera: “The battery life is long (+)”
Different orientations in different ;
contexts even in the same domain. “The time taken to auto-focus is long" (-).

Consider both possible opinion words and aspects together, and use
the pair (aspect, opinion word) as the opinion context, (battery life", \
long").

This determines opinion words and their orientations together with


Corpus-based the aspects that they modify.

approach
Can be used to analyze comparative sentences.

Many contexts can be more complex, consuming a large amount of


resources.
Aspect-Based Sentiment Analysis

 In a typical opinionated document, the author writes both positive and


negative aspects of the entity, although the general sentiment on the entity
may be positive or negative. Document and sentence sentiment
classification does not provide such information.
 Aspect-based sentiment analysis needs to be used
 At the aspect level, the mining objective is to discover every quintuple (ei;
aij ; ooijkl; hk; tl) in a given document d.
 To achieve the objective, five tasks need to be performed.
Extract aspects that have been evaluated.
• “The picture quality of this camera is amazing,” the aspect is
“picture quality" of the entity represented by “this camera”. The
evaluation is not about the camera as a whole, but about its
picture quality.
• The sentence “I love this camera” evaluates the camera as a
whole, i.e., the GENERAL aspect of the entity represented by
Aspect “this camera”.
extraction Whenever we talk about an aspect, we must know
which entity it belongs to.

It is a Two-step Process
1. Find frequent nouns and noun phrases.

Nouns and noun phrases (or groups) are identified by a POS tagger; the
frequencies are counted; and only the frequent ones are kept.

A frequency threshold can be Case by Case.


Aspect
extraction When people comment on different aspects of a product, the vocabulary that
they use usually converges. The nouns that are frequently talked about are
usually genuine and important aspects.

Irrelevant contents in reviews are often diverse, i.e., they are quite different in
different reviews. These are infrequent nouns
2. Find
infrequent
aspects by
• The previous step can miss many genuine
exploiting the aspect expressions which are infrequent. This
relationships step tries to find some of them.
between aspects
and opinion
words.

Aspect
The same opinion word
extraction can be used to describe or
modify different aspects.
• For example, “picture” has been found to be
Opinion words that a frequent aspect, and we have the sentence,
modify frequent aspects “The pictures are absolutely amazing.”
can also modify • “software“ can also be extracted as an aspect
infrequent aspects, and from the following sentence, “The software is
thus can be used to amazing.”
extract infrequent
aspects.
 Point-wise mutual information (PMI) score between the phrase
and some meronymy* discriminators associated with the product
class can be used.
 The meronymy discriminators for the “scanner” class are, “of
scanner”, “scanner has”, “scanner comes with”, etc., which are used
to find components or parts of scanners by searching the Web.

 If the PMI value of a candidate aspect is too low, it may not be a


Aspect component of the product because a and d do not co-occur
frequently.
extraction
 *Meronym: a term which denotes part of something but which is used to
refer to the whole of it, e.g. faces when used to mean people in I see
several familiar faces present.
 *Meronymy: A meronym denotes a constituent part of, or a member of
something.
Determine whether the opinions on different aspects are positive, negative
or neutral. “The touch screen of this phone is absolutely amazing," the
aspect is “touch screen" of the entity represented by “this phone". Does
not indicate the GENERAL aspect because the evaluation is not about the
phone as a whole, but about its touch screen.

“I am amazed by this phone" evaluates the phone as a whole, i.e., the


Aspect GENERAL aspect of the entity represented by “this phone".

sentiment
classification
In the first example, the opinion on the “touch screen" aspect is positive,
and in the second example, the opinion on the GENERAL aspect is also
positive.
 Uses an opinion lexicon, - a list of opinion words and phrases,
and a set of rules to determine the orientations of opinions in a
sentence
 It also considers opinion shifters and “but-clauses”.
Involves 4 steps
 1. Mark opinion words and phrases: Given a sentence that
Lexicon- contains one or more aspects, this step marks all opinion words
based and phrases in the sentence.
 Each positive word is assigned the opinion score of +1, each
Approach negative word is assigned the opinion score of -1.
2. Handle opinion shifters: Opinion shifters are words and
phrases that can shift or change opinion orientations.
 Negation words like not, never, none, nobody, nowhere,
neither and cannot are the most common type.
 Sarcasm changes orientation
 “What a great car, it failed to start the very first day.”
Lexicon-
 Spotting them and handling them correctly in actual
based sentences by an automated system is not easy.
Approach  Not every appearance of an opinion shifter changes the
opinion orientation
 “not only … but also”
3. Handle but-clauses:
 In English, but means contrary.
 A sentence containing but is handled by applying the
following rule:
 The opinion orientation before but and after but are
opposite to each other if the opinion on one side cannot
Lexicon- be determined.
 “not only but also” (needs to be handled separately).
based
 There are contrary words and phrases that do not always
Approach indicate an opinion change
 “Audi is great, but Mercedes is better".
 Such cases need to be identified and dealt with separately.
 4. Aggregating opinions: This step applies an opinion
aggregation function to the resulting opinion scores to
determine the final orientation of the opinion on each aspect
in the sentence.
 Consider a sentence S, which contains a set of aspects {a1 …
am} and a set of opinion words or phrases {ow1 : : : own}
with their opinion scores. The opinion orientation for each
Lexicon- aspect ai in S is

based
 where owj is an opinion word/phrase in s, dist (owj ; ai) is
Approach the distance between aspect ai and opinion word owj in S.
 owj.oo is the opinion score of owj. Gives lower weights
to opinion words that are far away from aspect ai.
 Needs an initial set of opinion word seeds as the input (no
seed aspects)
 Opinions almost always have targets and there are natural
relations connecting opinion words and targets in a
Simultaneous sentence

Opinion  Opinion words have relations among themselves and


so do targets among themselves.
Lexicon  The opinion targets are aspects. Opinion words can be
Expansion recognized by identified aspects, and aspects can be
identified by known opinion words.
and Aspect  The extracted opinion words and aspects are utilized
to identify new opinion words and new aspects,
Extraction which are used again to extract more opinion words
and aspects.
 Propagation stops when no more opinion words or
aspects can be found.
Dependency grammar was adopted to describe the
relations. The Algorithm uses only direct dependencies to
model the relations.
• A direct dependency indicates that one word depends on the other word
without any additional words in their dependency path or they both
depend on a third word directly.

Some constraints are also imposed. Opinion words are


Dependency considered to be adjectives and aspects nouns or noun
grammar phrases.
• “Canon G3 produces great pictures”, the adjective “great” is parsed as
directly depending on the noun “pictures". “great" is an opinion word
and given the rule `a noun on which an opinion word directly
depends is taken as an aspect', we can extract “pictures” as an aspect.
Similarly, “pictures” is an aspect, “great” as an opinion word using a
similar rule.
A comparative sentence The comparison is usually
expresses a relation based on conveyed using the
comparative or superlative
similarities or differences of form of an adjective or
more than one entity. adverb.

A comparative sentence A superlative sentence


typically states that one entity states that one entity has

Mining has more or less of a certain


the most or least of a
certain attribute among a
attribute than another entity. set of similar entities.
Comparative
Opinions
A comparison can be between two or more entities,
groups of entities, and one entity and the rest of the
entities. It can also be between versions.
 Comparatives are usually formed by adding the suffix “-er” and
superlatives are formed by adding the suffix “-est” to their base
adjectives and adverbs.
 “longer” in “The battery life of Camera-x is longer than that
of Camera-y”, longest“ in “The battery life of this camera is
the longest",
 This type of comparatives and superlatives are called Type 1
Types of  Some adjectives and adverbs form comparatives or superlatives
by using words like more, most, less and least before such words
Comparatives (more beautiful)
and Superlatives  These are Type 2. Types 1 and 2 are called regular
comparatives and superlatives
 Irregular comparatives and superlatives, i.e., more, less, least,
better, best,
 Grouped under Type 1 (based on the behavior)
 Words like “superior”, “preferred” are also grouped under Type 1
Four types
• 1. Non-equal gradable comparisons: Type “greater or less
than” that express an ordering of some entities with regard
to some of their shared aspects
• “The Intel chip is faster than that of AMD”. “I prefer Intel
to AMD”.
• 2. Equative comparisons: Type equal to that state two or
Types of more entities are equal with regard to some of their shared
aspects
comparative • “The performance of Samsung is about the same as that of
relations LG.”
• 3. Superlative comparisons: type greater or less than all
others that rank one entity over all others,
• “The Intel chip is the fastest”.
 Comparative words used in non-equal gradable
comparisons are categorized into two groups according to
whether they express increased or decreased quantities,
Types of which are useful in opinion analysis.

comparative  Increasing comparatives: Such a comparative


expresses an increased quantity, e.g., more and
relations longer.
 Decreasing comparatives: Such a comparative
expresses a decreased quantity, e.g., less and fewer.
4. Non-gradable comparisons: Relations that compare aspects of two
or more entities, but do not grade them.
 There are three main sub-types:
 Entity A is similar to or different from entity B with regard to some
of their shared aspects, “Coke tastes differently from Pepsi.”
Types of  Entity A has aspect a1, and entity B has aspect a2 (They are usually
comparative substitutable), “Desktop PCs use external speakers but laptops use
internal speakers.”
relations
 Entity A has aspect a, but entity B does not have, e.g., “Phone-x has
an earphone, but Phone-y does not have.”
Objective of mining comparative opinions

 Given a collection of opinionated documents D,


 discover in D all comparative opinion sextuples of the form (E 1;E2; A; PE; h; t)
 where E1 and E2 are the entity sets being compared based on their shared aspects A
 Entities in E1 appear before entities in E2 in the sentence,
 PE( {E1;E2}) is the preferred entity set of the opinion holder h,
 t is the time when the comparative opinion is expressed.
 These sextuples can be mined
 “Ipad's display is better than those of Galaxy and Surface."
written by Vish in Feb 2016.
 The extracted comparative opinion is:
 ({Ipad}, {Galaxy, Surface}, {display}, preferred: {Ipad},
Vish, Feb 2016)
 The entity set E1 is {Ipad}, the entity set E2 is {Galaxy,
Example Surface},
 Their shared aspect set A being compared is {display},
 The preferred entity set is {Ipad},
 The opinion holder h is Vish
 The time t when this comparative opinion was written is Feb
2016.
Combined rule-based classification, supervised
learning and machine learning to form a hybrid
method.

Tested on movie reviews, product reviews and


MySpace comments.

Case:
Hybrid classification can improve the
Sentiment classification effectiveness in terms of micro-
and macro-averaged F1.
Analysis-
Hybrid
F1 is a measure that takes both the precision and
Approach recall of a classifier’s effectiveness into account
Evaluation Metrics

  Machine says yes Machine says no


human says yes tp fn
human says no fp tn

 Precision(P) = ; Recall(R) = ;
 Accuracy(A) = ; F1 =
1. Micro averaging. 2. Macro averaging.
Given a set of confusion tables, a Given a set of confusion tables, a
new two-by-two contingency set of values are generated.
table is generated. Each value represents the precision
Evaluation Each cell in the new table or recall of an automatic classifier
Metrics represents the sum of the number of
documents from within the set of
Given these values, the average
performance of an automatic
tables. classifier, in terms of its precision
Given the new table, the average and recall, is measured
performance of an automatic
classifier, in terms of its precision
and recall, is measured.
 A rule consists of an antecedent and its associated
consequent that have an ‘if-then ’relation: antecedent 
consequent
 An antecedent is a condition: one or more tokens
concatenated by the ^ operator.
 A token can be a word, ‘?’ representing a proper noun, or
‘#’ representing a target term.
 A target term is a term that represents the context in
Rule Based which a set of documents occurs, such as the name of a
Classification person, a policy recommendation, a company name, a
brand of a product or a movie title.
 A consequent represents a sentiment that is either positive or
negative, and is the result of meeting the condition defined
by the antecedent.
 {token1 ^ token2 ^ . . . ^ tokenn}  {+|−}
+ is positive sentiment; - is negative sentiment
 1. Laptop-A is more expensive than Laptop-B.
 2. Laptop-A is more expensive than Laptop-C.
 Target word of these sentences is Laptop-A. The rule derived
is:
 {# ^ more ^ expensive ^ than^?}  {−}
 The target word, Laptop-A is less favorable than the
other two laptops due to its price. Focus is on the price
attribute of the Laptop-A.
Comparative  Target words are Laptop-B and Laptop-C. The rule derived
Statements is:
 {? ^ more ^ expensive ^ than ^ #}  {+}
 The two target words, Laptop-B and Laptop-C are more
favorable than the Laptop-A due to its price. Focus is on
the price attribute of both the Laptop-B and Laptop-C.
 Target word is crucial factor in determining the sentiment of
an antecedent
General Inquirer Based Classifier (GIBC)

 The first, simplest rule set was based on 3672 pre-classified words
found in the General Inquirer Lexicon (Stone et al. 1966),
 1598 of which were pre-classified as positive and 2074 of which
were pre-classified as negative.
 Here, each rule depends solely on one sentiment bearing word
representing an antecedent.
 A General Inquirer Based Classifier (GIBC) was implemented which
applied the rule set to classify document collections.
 1. Select 120 positive words, such as amazing, awesome, beautiful, and
120 negative words, such as absurd, angry, anguish, from the General
Inquirer Lexicon.
 2. Compose 240 search engine queries per antecedent; each query
combines an antecedent and a sentiment bearing word.
 3. Collect the hit counts of all queries by using the Google and Yahoo
search engines. Two search engines were used to determine whether the
hit counts were influenced by the coverage and accuracy level of a
Calculation of single search engine. For each query, the search engines return the hit
count of a number of Web pages that contains both the antecedent and a
“Closeness” sentiment bearing word. The proximity of the antecedent and word is at
the page level.
 A better level of precision may be obtained if the proximity checking
can be carried out at the sentence level.
 This would lead to an ethical issue, however, because each page has to
be downloaded and stored locally for further analysis.
 4. Collect the hit counts of each sentiment-bearing word
and each antecedent.
 5. Use 4 closeness measures to measure the closeness
between each antecedent and 120 positive words (S+) and
between each antecedent and 120 negative words (S−)
Calculation of based on all the hit counts collected.
“Closeness”
 If the antecedent co-occurs more frequently with the 120
positive words (S+ > S−), then this would mean that the
antecedent has a positive consequent and vice versa.
 Document Frequency (DF). counts the number of Web pages
containing a pair of an antecedent and a sentiment bearing
word, i.e., the hit count returned by a search engine. The
larger a DF value, the greater the association strength
between antecedent and word.
Measures of  The other measures of closeness are

Closeness  Mutual Information (MI) =


 Chi-Square
 Log Likelihood Ratio
GENERAL RULE-BASED STATISTICS BASED MUTUAL
INQUIRER BASED CLASSIFIER (RBC) CLASSIFIER (SBC) INFORMATION
CLASSIFIER (GIBC) (MI).

Classifiers
Used
CHI-SQUARE (Χ2) INDUCTION RULE SUPPORT VECTOR
BASED CLASSIFIER MACHINES
(IRBC)
Multi-stage
Hybrid Models
Steps for Implementation

1. Goal Setting 3. Parsing the content


 Determine the sentiment analysis goal  Word segmentation
 Determine the scope for text content  Parts of Speech tagging

2. Text Preprocessing  Term Identification

 Determine the data source (Web/Micro 4. Text Refinement


blogging site/ etc.,)  Stop Words
 Load the text to the processing system  Synonyms
 Delete unwanted words or meaningless words 5. Analysis & Scoring
 Organize the emotional symbols (like , )  Determine the sentiment bearing phrases
into words
 Score them
3. Parsing the content
Word segmentation
Parts of Speech tagging
Term Identification
4. Text Refinement
Stop Words
Synonyms
5. Analysis & Scoring
Determine the sentiment bearing phrases
Score them

You might also like