0% found this document useful (0 votes)

59 views64 pages

4 - Sentiment Analysis - Plain

Sentiment analysis aims to computationally study people's opinions, attitudes, and emotions toward entities, issues, and topics. It involves classifying the sentiment expressed in text at the document, sentence, and entity/aspect level. Automated sentiment analysis systems are needed to overcome biases and limitations in human analysis of large amounts of subjective text from sources like reviews and social media.

Uploaded by

Prakhar Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views64 pages

4 - Sentiment Analysis - Plain

Uploaded by

Prakhar Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

Sentiment Analysis and

Opinion Mining
Sentiment analysis or opinion mining is the computational
study of people's opinions, appraisals, attitudes, and
emotions toward entities, individuals, issues, events, topics
and their attributes.

The task is • Businesses want to find public or consumer opinions about

technically their products and services.

Introduction challenging and • Potential customers want to know the opinions of existing
practically useful. users before they use a service or purchase a product.

With user generated content on social media (i.e., reviews,

forum discussions, blogs and social networks) on the Web,
individuals and organizations are increasingly using public
opinions for their decision making.
Finding and monitoring opinion sites on the Web and distilling the
information in them remains a formidable task because of the proliferation
of diverse sites.

Each site typically contains a huge volume of opinionated text that is not
always easily deciphered in long forum postings and blogs.

The average human reader will have difficulty identifying relevant sites and
accurately summarizing the information and opinions contained in them.

Need for Human analysis of text information is subject to considerable biases, e.g.,
Automated people often pay greater attention to opinions consistent with their own
preferences. People have difficulty, in producing consistent results when the
Sentiment amount of information to be processed is large.

Analysis
Automated opinion mining and summarization systems are needed, as
subjective biases and mental limitations can be overcome with an objective
sentiment analysis system.
 Sentiment analysis is carried out at three levels:
 Document level: The task is to classify whether a whole opinion
document expresses a positive or negative sentiment
 Given a product review, the system determines whether the review
expresses an overall positive or negative opinion about the product.
This task is commonly known as document-level sentiment
Levels of classification.

Analysis  This level of analysis assumes that each document expresses

opinions on a single entity (e.g., a single product).
 It is not applicable to documents which evaluate or compare
multiple entities.
Sentence level: The analysis goes to the sentences and determines
whether each sentence expressed a positive, negative, or neutral opinion.

Neutral usually means no opinion.

Distinguishes sentences that express factual information (objective

sentences) Vs. that express subjective views and opinions (subjective
Levels of sentences).

Analysis Subjectivity is not equivalent to sentiment as many objective sentences

can imply opinions, e.g., “We bought the car last month and the
windshield wiper has fallen off.”

Analysis is done at clause level but the clause level is still not enough,
e.g., “Apple is doing very well in this lousy economy.”
Entity or Aspect Level: Document level and the sentence level analyses do not
discover what exactly people liked and did not like (Feature Level)

Instead of looking at language constructs (documents, paragraphs, sentences,

clauses or phrases), aspect level directly looks at the opinion itself.

Idea is an opinion consists of a sentiment (positive or negative) and a target

Levels of
Analysis
An opinion without its target being identified is of limited use.

Example: “although the service is not that

great, I still love this restaurant”
Realizing the importance of opinion Has a positive tone, but cannot say it is
targets also helps to understand the entirely positive. In fact, the sentence is
positive about the restaurant (emphasized),
sentiment analysis problem better. but negative about its service (not
emphasized).
The most important indicators of sentiments are sentiment words, or opinion
words. These are words that are commonly used to express positive or negative
sentiments.

For example, good, wonderful, and amazing are positive sentiment words, and
bad, poor, and terrible are negative sentiment words.

There are also phrases and idioms, e.g., cost someone an arm and a leg.

Sentiment
Sentiment words and phrases are instrumental to sentiment analysis.
Lexicon
A list of such words and phrases is called a sentiment lexicon (or opinion
lexicon).

Researchers have designed numerous algorithms to compile such lexicons

 A positive or negative sentiment word may have opposite
orientations in different application domains.
 For example, “suck” usually indicates negative sentiment, e.g.,
“This battery sucks,” but it can also imply positive sentiment, e.g.,
“This vacuum cleaner really sucks (dirt).”
Issues  Sarcastic sentences with or without sentiment words are hard to
deal with, e.g., “What a great car! It stopped working in two days.”
 Sarcasms are not very common in consumer reviews about products
and services, but are very common in other places, eg. political
discussions
A sentence containing sentiment words may not express any
sentiment.

Common in Question “Can you tell me which Sony

camera is good?”
(interrogative) sentences and “If I can find a good camera in the
conditional sentences shop, I will buy it.”

Issues
These contain the sentiment word “good”, but does not
express a positive or negative opinion on any specific
camera.

Not all conditional or

“Does anyone know how to repair
interrogative sentences express this terrible printer”
no sentiments,
Many sentences without sentiment words can also imply opinions.

Many of these are objective sentences that are used to express some
factual information.

“This washer uses a lot of water” implies a negative sentiment about

Issues the washer.

“After sleeping on the mattress for two days, a valley has formed in
the middle” – a negative opinion about the mattress.

These sentences are objective as it states a fact. They have no

sentiment words.
An entity e is a product, service, person,
event, organization, or topic.

e : (T;W), where T is a
hierarchy of components (or

It is associated with a parts), sub-components, and

so on, and W is a set of

Entity: pair
attributes of e.
Each component or sub-
component also has its own

Definition set of attributes.

battery and screen,

Samsung Galaxy is an It has a set of attributes, voice
entity. It has a set of quality, size, and weight.
The battery has its own set of
components, attributes, e.g., battery life,
and battery size
Entity is represented as a tree or hierarchy. The root of the
tree is the name of the entity.

Each non-root node is a component or sub-component of

the entity.

Each link is a part-of relation.

Entity and
Each node is associated with a set of attributes.
Attributes
An opinion can be expressed on any node and any attribute
of the node.

Both components and attributes are combined and called

“Aspects”
An opinion is a positive or negative sentiment,
attitude, emotion or appraisal about an entity or
an aspect of the entity from an opinion holder.

Opinion
Positive, negative and neutral are called opinion
orientations (also called sentiment orientations,
semantic orientations, or polarities).
Discover Objective : Given a collection of opinion documents D, discover all opinion quintuples in D.

Extract all entity expressions in D, and group synonymous entity expressions into entity
Extract clusters. Each entity expression cluster indicates a unique entity ei.

Extract all aspect expressions of the entities, and group aspect expressions into clusters. Each
Extract aspect expression cluster of entity ei indicates a unique aspect aij

Opinion
Extract Extract opinion holder and time information from the text or unstructured data.

mining
Determine Determine whether each opinion on an aspect is positive, negative or neutral.

Produce Produce all opinion quintuples expressed in D based on the results of the above
 bigXyz on Nov-4-2010:(1) I bought a Motorola phone and
my girlfriend bought a Nokia phone yesterday. (2) We called
each other when we got home. (3) The voice of my Moto
phone was unclear, but the camera was good. (4) My
girlfriend was quite happy with her phone, and its sound
quality. (5) I want a phone with good voice quality. (6) So I
probably will not keep it.
 QUINTIPLES
Example of  (Motorola, voice quality, negative, bigXyz, Nov-4-2010)
Extraction  (Motorola, camera, positive, bigXyz, Nov-4-2010)
 (Nokia, GENERAL, positive, bigXyz's girlfriend, Nov-4-
2010)
 (Nokia, voice quality, positive, bigXyz's girlfriend, Nov-4-
2010)
 An objective sentence (sentence 1&2) presents some factual
information about the world, while a subjective sentence
expresses some personal feelings, views or beliefs.
 Subjective expressions come in many forms, e.g.,
opinions, allegations, desires, beliefs, suspicions, and
speculations.
Two more  A subjective sentence may not contain an opinion
Definitions (Sentence 5)
 Not every objective sentence contains no opinion. “the
earphone broke in two days", is an objective sentence but
it implies a negative sentiment.
 Emotions are our subjective feelings and thoughts
 There are 6 primary emotions, i.e., love, joy, surprise,
anger, sadness, and fear, which can be sub-divided into
many secondary and tertiary emotions. Each emotion can
also have different intensities.
 The concepts of emotions and opinions are not
Emotions equivalent.
 Many opinion sentences express no emotion (e.g., “the
voice of this phone is clear”), which are called rational
evaluation sentences
 Many emotion sentences give no opinion, (e.g., “I am so
surprised to see you”)
Document-level sentiment classification
 Sentiment classification assumes that the
opinion document d (e.g., a product review)
expresses opinions on a single entity e and the
Document opinions are from a single opinion holder h.
Sentiment  This assumption holds for customer reviews of
Classification products and services because each such review
usually focuses on a single product and is
written by a single reviewer.
Three classes, positive, negative and neutral.

Since each review already has a reviewer-assigned

rating (e.g., 1-5 stars), training and testing data are
readily available.
Classification • A review with 4 or 5 stars is a positive review, a review with 1 or 2
based on stars is a negative review and a review with 3 stars is a neutral
review.
Supervised • Naïve Bayesian classification, and support vector machines (SVM).
Learning • It was shown that using unigrams (a bag of individual words) as
features in classification performed well with either naive Bayesian
or SVM.
• ***Last 2 points refereeing to the pdf shared
Terms and their frequency: individual words or
word n-grams and their frequency counts.
• word positions may also be important.
• TF-IDF weighting scheme.

Opinion words and phrases: Used to express

positive or negative sentiments.
Feature set for • beautiful, wonderful, good, and amazing are positive opinion words,
Classification and bad, poor, and terrible are negative
• Many opinion words are adjectives and adverbs. Nouns (rubbish,
junk, and crap) and verbs (hate and like) can also indicate opinions.
• There are also opinion phrases and idioms, cost someone an arm
and a leg. Opinion words and phrases are instrumental to sentiment
analysis
Part of speech: adjectives are important indicators of
opinions and treated as special features.

Negations: Negation words are important because their

appearances often change the opinion orientation.

Feature set for • “I don't like this camera” is negative.

• Negation words must be handled with care because not all
Classification occurrences of such words mean negation.
• “not” in “not only but also” does not change the orientation
direction

Syntactic dependency: Word dependency based

features generated from parsing or dependency trees
Three Step Process

Step 1:
• Phrases containing adjectives or adverbs are
extracted as adjectives and adverbs are good
indicators of opinions.
Classification – • Context is important. “unpredictable" breaking
Unsupervised distance of car vs. “unpredictable” ending of the
mystery movie
Learning • The algorithm extracts two consecutive words,
where one member of the pair is an adjective or
adverb, and the other is a context word
Step 2: Estimate the semantic orientation of the extracted phrases using
the point-wise mutual information (PMI) measure

 PMI is a measure of the degree of statistical dependence between t 1

and t2 and log of this ratio is the amount of information that we
acquire about the presence of one of the words when we observe the
other
Classification –  The semantic/opinion orientation (SO) of a phrase is computed
based on its association with the positive reference word
Unsupervised “excellent” and its association with the negative reference word
Learning “poor”
SO(Phrase)=PMI(Phrase, “Excellent”) – PMI(Phrase, “Poor”)
 The probabilities are calculated by issuing queries and collecting
the number of hits. Searching the two terms together and separately,
we can estimate the probabilities
Step 3: The algorithm computes the average SO of all phrases
in a review, and classifies the review as recommended if the
average SO is positive
• Final classification accuracies on reviews from various domains range from
84% for automobile reviews to 66% for movie reviews.

Advantage of document level sentiment classification: it

provides a prevailing opinion on an entity, topic or event.
Classification –
Unsupervised
Learning Shortcomings:

• It does not give details on what people liked and/or disliked and
• It is not easily applicable to non-reviews, e.g., forum and blog postings,
because many such postings evaluate multiple entities and compare them.
Document-level sentiment classification techniques can also be
applied to individual sentences.

Subjectivity classification: The task of classifying a sentence as

subjective or objective

The resulting subjective sentences are also classified as expressing

Sentence-level positive or negative opinions

Sentiment
Classification. 1. Subjectivity classification: Determine whether s is a subjective
sentence or an objective sentence

2. Sentence-level sentiment classification: If s is subjective,

determine whether it expresses a positive, negative or neutral
opinion.
The sentence expresses a single opinion from a single
opinion holder.

This assumption is only

appropriate for simple “The picture quality of this

Assumption sentences with a single

opinion,
camera is amazing.”

Compound and complex “The picture quality of this

sentences, a single sentence camera is amazing and so is
the battery life, but the
may express more than one view finder is too small for
opinion. such a great camera"
Opinion Positive There are also Collectively, Three
words: also opinion words opinion they are called Approaches:
known as are used to phrases and the opinion Manual,
opinion- express some idioms: “Cost lexicon. Used Dictionary-
bearing words desired states someone an for opinion based, and
or sentiment while negative arm and a leg”. mining. Corpus-based.
words. opinion words
are used to
express some
undesired
Opinion states.

Lexicon beautiful, wonderful, The manual

Expansion good, and amazing.

bad, poor, and
terrible.
approach is time-
consuming and not
usually used alone,
but combined with
automated
approaches as the
check because
automated methods
make mistakes.
Bootstrapping using a small set of seed opinion words and an
online dictionary, e.g., WordNet.

The strategy is to first collect a small set of opinion words

manually with known orientations, and then to grow this set by
searching for their synonyms and antonyms.

Dictionary
The newly found words are added to the seed list and the next
based iteration starts. The iterative process stops when no more new
words are found.
approach
After the process completes, manual inspection can be carried
out to remove and/or correct errors.
Shortcoming: The approach is For example, for a
unable to find opinion words with speaker phone, if it is
quiet, it is usually
domain and context specific negative. However, for a
orientations, which is quite car, if it is quiet, it is
common. positive.

Dictionary
based
The corpus-based approach can help deal with this problem.
approach
The methods rely on syntactic or co-occurrence patterns and also a seed list of
opinion words to find other opinion words in a large corpus

The technique starts with a list of seed opinion adjectives, and uses them and a
set of linguistic constraints or conventions on connectives to identify
additional adjective opinion words and their orientations.

Conjunction “AND”: conjoined “This car is beautiful and spacious”

adjectives usually have the same
Corpus-based
"This car is beautiful and difficult to drive“
orientation. (AND Conjunction is not usually used)

approach
Rules or constraints are also designed for other connectives, OR, BUT,
EITHER-OR, and NEITHER-NOR.

This idea is called sentiment consistency

Learning is applied to a large corpus to determine if two conjoined
adjectives are of the same or different orientations.

Same and different-orientation links between adjectives are formed

Clustering is performed on these to produce two sets of words: positive and

negative.

Corpus-based
Inter-sentential consistency is the idea to neighboring sentences.
approach
The same opinion orientation (positive or negative) is usually expressed in a
few consecutive sentences.

Opinion changes are indicated by adversative expressions such as “but” and

“however”.
Digital camera: “The battery life is long (+)”
Different orientations in different ;
contexts even in the same domain. “The time taken to auto-focus is long" (-).

Consider both possible opinion words and aspects together, and use
the pair (aspect, opinion word) as the opinion context, (battery life", \
long").

This determines opinion words and their orientations together with

Corpus-based the aspects that they modify.

approach
Can be used to analyze comparative sentences.

Many contexts can be more complex, consuming a large amount of

resources.
Aspect-Based Sentiment Analysis

 In a typical opinionated document, the author writes both positive and

negative aspects of the entity, although the general sentiment on the entity
may be positive or negative. Document and sentence sentiment
classification does not provide such information.
 Aspect-based sentiment analysis needs to be used
 At the aspect level, the mining objective is to discover every quintuple (ei;
aij ; ooijkl; hk; tl) in a given document d.
 To achieve the objective, five tasks need to be performed.
Extract aspects that have been evaluated.
• “The picture quality of this camera is amazing,” the aspect is
“picture quality" of the entity represented by “this camera”. The
evaluation is not about the camera as a whole, but about its
picture quality.
• The sentence “I love this camera” evaluates the camera as a
whole, i.e., the GENERAL aspect of the entity represented by
Aspect “this camera”.
extraction Whenever we talk about an aspect, we must know
which entity it belongs to.

It is a Two-step Process
1. Find frequent nouns and noun phrases.

Nouns and noun phrases (or groups) are identified by a POS tagger; the
frequencies are counted; and only the frequent ones are kept.

A frequency threshold can be Case by Case.

Aspect
extraction When people comment on different aspects of a product, the vocabulary that
they use usually converges. The nouns that are frequently talked about are
usually genuine and important aspects.

Irrelevant contents in reviews are often diverse, i.e., they are quite different in
different reviews. These are infrequent nouns
2. Find
infrequent
aspects by
• The previous step can miss many genuine
exploiting the aspect expressions which are infrequent. This
relationships step tries to find some of them.
between aspects
and opinion
words.

Aspect
The same opinion word
extraction can be used to describe or
modify different aspects.
• For example, “picture” has been found to be
Opinion words that a frequent aspect, and we have the sentence,
modify frequent aspects “The pictures are absolutely amazing.”
can also modify • “software“ can also be extracted as an aspect
infrequent aspects, and from the following sentence, “The software is
thus can be used to amazing.”
extract infrequent
aspects.
 Point-wise mutual information (PMI) score between the phrase
and some meronymy* discriminators associated with the product
class can be used.
 The meronymy discriminators for the “scanner” class are, “of
scanner”, “scanner has”, “scanner comes with”, etc., which are used
to find components or parts of scanners by searching the Web.

 If the PMI value of a candidate aspect is too low, it may not be a

Aspect component of the product because a and d do not co-occur
frequently.
extraction
 *Meronym: a term which denotes part of something but which is used to
refer to the whole of it, e.g. faces when used to mean people in I see
several familiar faces present.
 *Meronymy: A meronym denotes a constituent part of, or a member of
something.
Determine whether the opinions on different aspects are positive, negative
or neutral. “The touch screen of this phone is absolutely amazing," the
aspect is “touch screen" of the entity represented by “this phone". Does
not indicate the GENERAL aspect because the evaluation is not about the
phone as a whole, but about its touch screen.

“I am amazed by this phone" evaluates the phone as a whole, i.e., the

Aspect GENERAL aspect of the entity represented by “this phone".

sentiment
classification
In the first example, the opinion on the “touch screen" aspect is positive,
and in the second example, the opinion on the GENERAL aspect is also
positive.
 Uses an opinion lexicon, - a list of opinion words and phrases,
and a set of rules to determine the orientations of opinions in a
sentence
 It also considers opinion shifters and “but-clauses”.
Involves 4 steps
 1. Mark opinion words and phrases: Given a sentence that
Lexicon- contains one or more aspects, this step marks all opinion words
based and phrases in the sentence.
 Each positive word is assigned the opinion score of +1, each
Approach negative word is assigned the opinion score of -1.
2. Handle opinion shifters: Opinion shifters are words and
phrases that can shift or change opinion orientations.
 Negation words like not, never, none, nobody, nowhere,
neither and cannot are the most common type.
 Sarcasm changes orientation
 “What a great car, it failed to start the very first day.”
Lexicon-
 Spotting them and handling them correctly in actual
based sentences by an automated system is not easy.
Approach  Not every appearance of an opinion shifter changes the
opinion orientation
 “not only … but also”
3. Handle but-clauses:
 In English, but means contrary.
 A sentence containing but is handled by applying the
following rule:
 The opinion orientation before but and after but are
opposite to each other if the opinion on one side cannot
Lexicon- be determined.
 “not only but also” (needs to be handled separately).
based
 There are contrary words and phrases that do not always
Approach indicate an opinion change
 “Audi is great, but Mercedes is better".
 Such cases need to be identified and dealt with separately.
 4. Aggregating opinions: This step applies an opinion
aggregation function to the resulting opinion scores to
determine the final orientation of the opinion on each aspect
in the sentence.
 Consider a sentence S, which contains a set of aspects {a1 …
am} and a set of opinion words or phrases {ow1 : : : own}
with their opinion scores. The opinion orientation for each
Lexicon- aspect ai in S is

based
 where owj is an opinion word/phrase in s, dist (owj ; ai) is
Approach the distance between aspect ai and opinion word owj in S.
 owj.oo is the opinion score of owj. Gives lower weights
to opinion words that are far away from aspect ai.
 Needs an initial set of opinion word seeds as the input (no
seed aspects)
 Opinions almost always have targets and there are natural
relations connecting opinion words and targets in a
Simultaneous sentence

Opinion  Opinion words have relations among themselves and

so do targets among themselves.
Lexicon  The opinion targets are aspects. Opinion words can be
Expansion recognized by identified aspects, and aspects can be
identified by known opinion words.
and Aspect  The extracted opinion words and aspects are utilized
to identify new opinion words and new aspects,
Extraction which are used again to extract more opinion words
and aspects.
 Propagation stops when no more opinion words or
aspects can be found.
Dependency grammar was adopted to describe the
relations. The Algorithm uses only direct dependencies to
model the relations.
• A direct dependency indicates that one word depends on the other word
without any additional words in their dependency path or they both
depend on a third word directly.

Some constraints are also imposed. Opinion words are

Dependency considered to be adjectives and aspects nouns or noun
grammar phrases.
• “Canon G3 produces great pictures”, the adjective “great” is parsed as
directly depending on the noun “pictures". “great" is an opinion word
and given the rule `a noun on which an opinion word directly
depends is taken as an aspect', we can extract “pictures” as an aspect.
Similarly, “pictures” is an aspect, “great” as an opinion word using a
similar rule.
A comparative sentence The comparison is usually
expresses a relation based on conveyed using the
comparative or superlative
similarities or differences of form of an adjective or
more than one entity. adverb.

A comparative sentence A superlative sentence

typically states that one entity states that one entity has

Mining has more or less of a certain

the most or least of a
certain attribute among a
attribute than another entity. set of similar entities.
Comparative
Opinions
A comparison can be between two or more entities,
groups of entities, and one entity and the rest of the
entities. It can also be between versions.
 Comparatives are usually formed by adding the suffix “-er” and
superlatives are formed by adding the suffix “-est” to their base
adjectives and adverbs.
 “longer” in “The battery life of Camera-x is longer than that
of Camera-y”, longest“ in “The battery life of this camera is
the longest",
 This type of comparatives and superlatives are called Type 1
Types of  Some adjectives and adverbs form comparatives or superlatives
by using words like more, most, less and least before such words
Comparatives (more beautiful)
and Superlatives  These are Type 2. Types 1 and 2 are called regular
comparatives and superlatives
 Irregular comparatives and superlatives, i.e., more, less, least,
better, best,
 Grouped under Type 1 (based on the behavior)
 Words like “superior”, “preferred” are also grouped under Type 1
Four types
• 1. Non-equal gradable comparisons: Type “greater or less
than” that express an ordering of some entities with regard
to some of their shared aspects
• “The Intel chip is faster than that of AMD”. “I prefer Intel
to AMD”.
• 2. Equative comparisons: Type equal to that state two or
Types of more entities are equal with regard to some of their shared
aspects
comparative • “The performance of Samsung is about the same as that of
relations LG.”
• 3. Superlative comparisons: type greater or less than all
others that rank one entity over all others,
• “The Intel chip is the fastest”.
 Comparative words used in non-equal gradable
comparisons are categorized into two groups according to
whether they express increased or decreased quantities,
Types of which are useful in opinion analysis.

comparative  Increasing comparatives: Such a comparative

expresses an increased quantity, e.g., more and
relations longer.
 Decreasing comparatives: Such a comparative
expresses a decreased quantity, e.g., less and fewer.
4. Non-gradable comparisons: Relations that compare aspects of two
or more entities, but do not grade them.
 There are three main sub-types:
 Entity A is similar to or different from entity B with regard to some
of their shared aspects, “Coke tastes differently from Pepsi.”
Types of  Entity A has aspect a1, and entity B has aspect a2 (They are usually
comparative substitutable), “Desktop PCs use external speakers but laptops use
internal speakers.”
relations
 Entity A has aspect a, but entity B does not have, e.g., “Phone-x has
an earphone, but Phone-y does not have.”
Objective of mining comparative opinions

 Given a collection of opinionated documents D,

 discover in D all comparative opinion sextuples of the form (E 1;E2; A; PE; h; t)
 where E1 and E2 are the entity sets being compared based on their shared aspects A
 Entities in E1 appear before entities in E2 in the sentence,
 PE( {E1;E2}) is the preferred entity set of the opinion holder h,
 t is the time when the comparative opinion is expressed.
 These sextuples can be mined
 “Ipad's display is better than those of Galaxy and Surface."
written by Vish in Feb 2016.
 The extracted comparative opinion is:
 ({Ipad}, {Galaxy, Surface}, {display}, preferred: {Ipad},
Vish, Feb 2016)
 The entity set E1 is {Ipad}, the entity set E2 is {Galaxy,
Example Surface},
 Their shared aspect set A being compared is {display},
 The preferred entity set is {Ipad},
 The opinion holder h is Vish
 The time t when this comparative opinion was written is Feb
2016.
Combined rule-based classification, supervised
learning and machine learning to form a hybrid
method.

Tested on movie reviews, product reviews and

MySpace comments.

Case:
Hybrid classification can improve the
Sentiment classification effectiveness in terms of micro-
and macro-averaged F1.
Analysis-
Hybrid
F1 is a measure that takes both the precision and
Approach recall of a classifier’s effectiveness into account
Evaluation Metrics

Machine says yes Machine says no

human says yes tp fn
human says no fp tn

 Precision(P) = ; Recall(R) = ;
 Accuracy(A) = ; F1 =
1. Micro averaging. 2. Macro averaging.
Given a set of confusion tables, a Given a set of confusion tables, a
new two-by-two contingency set of values are generated.
table is generated. Each value represents the precision
Evaluation Each cell in the new table or recall of an automatic classifier
Metrics represents the sum of the number of
documents from within the set of
Given these values, the average
performance of an automatic
tables. classifier, in terms of its precision
Given the new table, the average and recall, is measured
performance of an automatic
classifier, in terms of its precision
and recall, is measured.
 A rule consists of an antecedent and its associated
consequent that have an ‘if-then ’relation: antecedent 
consequent
 An antecedent is a condition: one or more tokens
concatenated by the ^ operator.
 A token can be a word, ‘?’ representing a proper noun, or
‘#’ representing a target term.
 A target term is a term that represents the context in
Rule Based which a set of documents occurs, such as the name of a
Classification person, a policy recommendation, a company name, a
brand of a product or a movie title.
 A consequent represents a sentiment that is either positive or
negative, and is the result of meeting the condition defined
by the antecedent.
 {token1 ^ token2 ^ . . . ^ tokenn}  {+|−}
+ is positive sentiment; - is negative sentiment
 1. Laptop-A is more expensive than Laptop-B.
 2. Laptop-A is more expensive than Laptop-C.
 Target word of these sentences is Laptop-A. The rule derived
is:
 {# ^ more ^ expensive ^ than^?}  {−}
 The target word, Laptop-A is less favorable than the
other two laptops due to its price. Focus is on the price
attribute of the Laptop-A.
Comparative  Target words are Laptop-B and Laptop-C. The rule derived
Statements is:
 {? ^ more ^ expensive ^ than ^ #}  {+}
 The two target words, Laptop-B and Laptop-C are more
favorable than the Laptop-A due to its price. Focus is on
the price attribute of both the Laptop-B and Laptop-C.
 Target word is crucial factor in determining the sentiment of
an antecedent
General Inquirer Based Classifier (GIBC)

 The first, simplest rule set was based on 3672 pre-classified words
found in the General Inquirer Lexicon (Stone et al. 1966),
 1598 of which were pre-classified as positive and 2074 of which
were pre-classified as negative.
 Here, each rule depends solely on one sentiment bearing word
representing an antecedent.
 A General Inquirer Based Classifier (GIBC) was implemented which
applied the rule set to classify document collections.
 1. Select 120 positive words, such as amazing, awesome, beautiful, and
120 negative words, such as absurd, angry, anguish, from the General
Inquirer Lexicon.
 2. Compose 240 search engine queries per antecedent; each query
combines an antecedent and a sentiment bearing word.
 3. Collect the hit counts of all queries by using the Google and Yahoo
search engines. Two search engines were used to determine whether the
hit counts were influenced by the coverage and accuracy level of a
Calculation of single search engine. For each query, the search engines return the hit
count of a number of Web pages that contains both the antecedent and a
“Closeness” sentiment bearing word. The proximity of the antecedent and word is at
the page level.
 A better level of precision may be obtained if the proximity checking
can be carried out at the sentence level.
 This would lead to an ethical issue, however, because each page has to
be downloaded and stored locally for further analysis.
 4. Collect the hit counts of each sentiment-bearing word
and each antecedent.
 5. Use 4 closeness measures to measure the closeness
between each antecedent and 120 positive words (S+) and
between each antecedent and 120 negative words (S−)
Calculation of based on all the hit counts collected.
“Closeness”
 If the antecedent co-occurs more frequently with the 120
positive words (S+ > S−), then this would mean that the
antecedent has a positive consequent and vice versa.
 Document Frequency (DF). counts the number of Web pages
containing a pair of an antecedent and a sentiment bearing
word, i.e., the hit count returned by a search engine. The
larger a DF value, the greater the association strength
between antecedent and word.
Measures of  The other measures of closeness are

Closeness  Mutual Information (MI) =

 Chi-Square
 Log Likelihood Ratio
GENERAL RULE-BASED STATISTICS BASED MUTUAL
INQUIRER BASED CLASSIFIER (RBC) CLASSIFIER (SBC) INFORMATION
CLASSIFIER (GIBC) (MI).

Classifiers
Used
CHI-SQUARE (Χ2) INDUCTION RULE SUPPORT VECTOR
BASED CLASSIFIER MACHINES
(IRBC)
Multi-stage
Hybrid Models
Steps for Implementation

1. Goal Setting 3. Parsing the content

 Determine the sentiment analysis goal  Word segmentation
 Determine the scope for text content  Parts of Speech tagging

2. Text Preprocessing  Term Identification

 Determine the data source (Web/Micro 4. Text Refinement

blogging site/ etc.,)  Stop Words
 Load the text to the processing system  Synonyms
 Delete unwanted words or meaningless words 5. Analysis & Scoring
 Organize the emotional symbols (like , )  Determine the sentiment bearing phrases
into words
 Score them
3. Parsing the content
Word segmentation
Parts of Speech tagging
Term Identification
4. Text Refinement
Stop Words
Synonyms
5. Analysis & Scoring
Determine the sentiment bearing phrases
Score them

Structures of English 1 Module 1
100% (1)
Structures of English 1 Module 1
69 pages
Big Data and Hadoop-Sentiment Analysis Using Flume and Hive
No ratings yet
Big Data and Hadoop-Sentiment Analysis Using Flume and Hive
27 pages
The Social Stratification of (R) in New York City Department Stores (Labov's Work (1966) )
No ratings yet
The Social Stratification of (R) in New York City Department Stores (Labov's Work (1966) )
7 pages
Present Simple Present Continuous For Future - Exercise 5 With Ak 1
100% (2)
Present Simple Present Continuous For Future - Exercise 5 With Ak 1
2 pages
Introduction To Sentiment Analysis
No ratings yet
Introduction To Sentiment Analysis
7 pages
SA Notes
No ratings yet
SA Notes
61 pages
Sentiment Analysis: Literature Survey
No ratings yet
Sentiment Analysis: Literature Survey
3 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
8 pages
AFM - Module 5
No ratings yet
AFM - Module 5
31 pages
Sentiment Analysis Tutorial: Prof. Ronen Feldman Hebrew University, JERUSALEM Digital Trowel, Empire State Building
No ratings yet
Sentiment Analysis Tutorial: Prof. Ronen Feldman Hebrew University, JERUSALEM Digital Trowel, Empire State Building
248 pages
Raw Content
No ratings yet
Raw Content
23 pages
2.1 Berendt - VSSDH15 - Lecture2
No ratings yet
2.1 Berendt - VSSDH15 - Lecture2
33 pages
Data Mining Ass
No ratings yet
Data Mining Ass
14 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
6 pages
A Review On Sentiment Analysis Techniques For Reshaping Business
No ratings yet
A Review On Sentiment Analysis Techniques For Reshaping Business
10 pages
Sentiment Analysis Report
No ratings yet
Sentiment Analysis Report
4 pages
Sentiment Analysis Based Approaches For Understanding User Context in Web Content
No ratings yet
Sentiment Analysis Based Approaches For Understanding User Context in Web Content
5 pages
ABSA Guidelines
No ratings yet
ABSA Guidelines
16 pages
Aspect-Based Sentiment Analysis
No ratings yet
Aspect-Based Sentiment Analysis
38 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
2 pages
Lecture 7 - Sentiment Analysis Understanding
No ratings yet
Lecture 7 - Sentiment Analysis Understanding
55 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
5 pages
Sentiment Analysis of Product Reviews A Review
No ratings yet
Sentiment Analysis of Product Reviews A Review
6 pages
44 - Aspect-Level Sentiment Analysis On E-Commerce Data
No ratings yet
44 - Aspect-Level Sentiment Analysis On E-Commerce Data
5 pages
A Survey On Sentiment Analysis of (Product) Reviews: A. Nisha Jebaseeli E. Kirubakaran, PHD
No ratings yet
A Survey On Sentiment Analysis of (Product) Reviews: A. Nisha Jebaseeli E. Kirubakaran, PHD
4 pages
Reasearch Paper
100% (1)
Reasearch Paper
9 pages
A Critical Review of Sentiment Analysis: Fatehjeet Kaur Chopra Rekha Bhatia
No ratings yet
A Critical Review of Sentiment Analysis: Fatehjeet Kaur Chopra Rekha Bhatia
4 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
27 pages
Set Reference Final PDF
No ratings yet
Set Reference Final PDF
4 pages
Sentiment Analysis: Natural Language Processing (NLP) Customer Feedback
No ratings yet
Sentiment Analysis: Natural Language Processing (NLP) Customer Feedback
12 pages
Cin2015 715730
No ratings yet
Cin2015 715730
9 pages
Kartik-20CS46 Report
No ratings yet
Kartik-20CS46 Report
43 pages
Minor Fnal
No ratings yet
Minor Fnal
22 pages
Comparative Study of Available Technique For Detection in Sentiment Analysis
No ratings yet
Comparative Study of Available Technique For Detection in Sentiment Analysis
5 pages
Sentiment Analysis Wikipedia
No ratings yet
Sentiment Analysis Wikipedia
6 pages
Machine Learning Algorithms For Opinion Mining and Sentiment Classification
No ratings yet
Machine Learning Algorithms For Opinion Mining and Sentiment Classification
6 pages
A Research Study of Sentiment Analysis and Various Techniques of Sentiment Classification
No ratings yet
A Research Study of Sentiment Analysis and Various Techniques of Sentiment Classification
21 pages
Review of Online Product Using Rule Based and Fuzzy Logic With Smileys
No ratings yet
Review of Online Product Using Rule Based and Fuzzy Logic With Smileys
6 pages
University Synopsius
No ratings yet
University Synopsius
3 pages
Sentiment Analysis and Opinion Mining
No ratings yet
Sentiment Analysis and Opinion Mining
49 pages
Masters' Thesis Report 1-1
No ratings yet
Masters' Thesis Report 1-1
5 pages
Sentiment Analysis On Customer Responses
No ratings yet
Sentiment Analysis On Customer Responses
3 pages
A Survey of Sentiment Analysis Techniques: Harpreet Kaur Veenu Mangat Nidhi
No ratings yet
A Survey of Sentiment Analysis Techniques: Harpreet Kaur Veenu Mangat Nidhi
5 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
15 pages
Lexi Can
No ratings yet
Lexi Can
6 pages
Report ADM
No ratings yet
Report ADM
6 pages
Literature Review On Feature Identification in Sentiment Analysis
No ratings yet
Literature Review On Feature Identification in Sentiment Analysis
6 pages
Sentiment Analysis and Opinion Mining For Mobile Network
No ratings yet
Sentiment Analysis and Opinion Mining For Mobile Network
9 pages
Sentiment Analysis or Opinion Mining: Project Synopsis
No ratings yet
Sentiment Analysis or Opinion Mining: Project Synopsis
6 pages
Assignment No 6 - Polarity
No ratings yet
Assignment No 6 - Polarity
2 pages
Supervised Learning Based Approach To Aspect Based Sentiment Analysis
No ratings yet
Supervised Learning Based Approach To Aspect Based Sentiment Analysis
5 pages
A Brief Review On Sentiment Analysis
No ratings yet
A Brief Review On Sentiment Analysis
5 pages
Sentiment Analysis of Text and Audio Data IJERTV10IS120009
No ratings yet
Sentiment Analysis of Text and Audio Data IJERTV10IS120009
4 pages
Lec # 8
No ratings yet
Lec # 8
23 pages
rl9.3 - Sentiment - Analysis 1
No ratings yet
rl9.3 - Sentiment - Analysis 1
12 pages
Survey On Aspect-Level Sentiment Analysis: Kim Schouten and Flavius Frasincar
No ratings yet
Survey On Aspect-Level Sentiment Analysis: Kim Schouten and Flavius Frasincar
18 pages
Sentiment Analysis Techniques A Review
No ratings yet
Sentiment Analysis Techniques A Review
5 pages
A Study On Sentiment Analysis - Methods and Tools
No ratings yet
A Study On Sentiment Analysis - Methods and Tools
6 pages
Sentimental Analysis Final Year Project
No ratings yet
Sentimental Analysis Final Year Project
21 pages
A Comprehensive Study On Lexicon Based Approaches For Sentiment Analysis
No ratings yet
A Comprehensive Study On Lexicon Based Approaches For Sentiment Analysis
7 pages
Sentiment Analysis Over Social Networks: An
No ratings yet
Sentiment Analysis Over Social Networks: An
6 pages
U S T B S A: RDU Peech and EXT Ased Entiment Nalyzer
No ratings yet
U S T B S A: RDU Peech and EXT Ased Entiment Nalyzer
16 pages
From Expert To Idiot?
From Everand
From Expert To Idiot?
Ian Woodrow
No ratings yet
Identifying Verb Phrases: Name: Date
100% (2)
Identifying Verb Phrases: Name: Date
2 pages
English Learning Skills
No ratings yet
English Learning Skills
15 pages
Important Rules: By-Sanjeev Rathore
No ratings yet
Important Rules: By-Sanjeev Rathore
13 pages
Sentence Level Homework Year 4
50% (2)
Sentence Level Homework Year 4
6 pages
TENSES (Material)
No ratings yet
TENSES (Material)
16 pages
Prueba de Nivel A2
No ratings yet
Prueba de Nivel A2
2 pages
A Grammar of Tommo So Laura Mcpherson PDF Download
No ratings yet
A Grammar of Tommo So Laura Mcpherson PDF Download
87 pages
Chomsky Hierarchy
No ratings yet
Chomsky Hierarchy
1 page
Present Simple, Continuous, Perfect
No ratings yet
Present Simple, Continuous, Perfect
20 pages
Voice Worksheets
No ratings yet
Voice Worksheets
5 pages
Unit 10 The Ecosystem
No ratings yet
Unit 10 The Ecosystem
6 pages
Shona Phrases - Quick Online Learning
No ratings yet
Shona Phrases - Quick Online Learning
3 pages
Unit 3 - Morphosyntax and Semantics
No ratings yet
Unit 3 - Morphosyntax and Semantics
20 pages
(Class I) E-Content (Session 2020-21)
No ratings yet
(Class I) E-Content (Session 2020-21)
7 pages
Soal Inggris
No ratings yet
Soal Inggris
3 pages
Tenses
No ratings yet
Tenses
10 pages
Russian Intro T1 Alphabet 1 2022 03 04
No ratings yet
Russian Intro T1 Alphabet 1 2022 03 04
17 pages
Nathan's Tutoring IB French B SL Paper 1 Handbook
No ratings yet
Nathan's Tutoring IB French B SL Paper 1 Handbook
32 pages
Modals of Advice
No ratings yet
Modals of Advice
10 pages
CAE - Conditionals
No ratings yet
CAE - Conditionals
8 pages
Complements in Sentences
No ratings yet
Complements in Sentences
3 pages
Inversion (Theory & Exercises)
100% (1)
Inversion (Theory & Exercises)
8 pages
Lexical Awareness Checklist
No ratings yet
Lexical Awareness Checklist
4 pages
Translators and Machine Translation: Knowledge and Skills Gaps in Translator Pedagogy
100% (1)
Translators and Machine Translation: Knowledge and Skills Gaps in Translator Pedagogy
15 pages
Compendio Unit 3 Topic 2
No ratings yet
Compendio Unit 3 Topic 2
12 pages
Atg Worksheet Capital Para1
No ratings yet
Atg Worksheet Capital Para1
2 pages
The Natural Approach Krashen
No ratings yet
The Natural Approach Krashen
196 pages

4 - Sentiment Analysis - Plain

Uploaded by

4 - Sentiment Analysis - Plain

Uploaded by

Sentiment Analysis and

The task is • Businesses want to find public or consumer opinions about

With user generated content on social media (i.e., reviews,

Analysis  This level of analysis assumes that each document expresses

Neutral usually means no opinion.

Distinguishes sentences that express factual information (objective

Analysis Subjectivity is not equivalent to sentiment as many objective sentences

Instead of looking at language constructs (documents, paragraphs, sentences,

Idea is an opinion consists of a sentiment (positive or negative) and a target

Example: “although the service is not that

Researchers have designed numerous algorithms to compile such lexicons

Common in Question “Can you tell me which Sony

Not all conditional or

“This washer uses a lot of water” implies a negative sentiment about

These sentences are objective as it states a fact. They have no

It is associated with a parts), sub-components, and

Definition set of attributes.

battery and screen,

Each non-root node is a component or sub-component of

Each link is a part-of relation.

Both components and attributes are combined and called

Since each review already has a reviewer-assigned

Opinion words and phrases: Used to express

Negations: Negation words are important because their

Feature set for • “I don't like this camera” is negative.

Syntactic dependency: Word dependency based

 PMI is a measure of the degree of statistical dependence between t 1

Advantage of document level sentiment classification: it

Subjectivity classification: The task of classifying a sentence as

The resulting subjective sentences are also classified as expressing

2. Sentence-level sentiment classification: If s is subjective,

This assumption is only

Assumption sentences with a single

Compound and complex “The picture quality of this

Lexicon beautiful, wonderful, The manual

Expansion good, and amazing.

The strategy is to first collect a small set of opinion words

Conjunction “AND”: conjoined “This car is beautiful and spacious”

This idea is called sentiment consistency

Same and different-orientation links between adjectives are formed

Clustering is performed on these to produce two sets of words: positive and

Opinion changes are indicated by adversative expressions such as “but” and

This determines opinion words and their orientations together with

Many contexts can be more complex, consuming a large amount of

 In a typical opinionated document, the author writes both positive and

A frequency threshold can be Case by Case.

 If the PMI value of a candidate aspect is too low, it may not be a

“I am amazed by this phone" evaluates the phone as a whole, i.e., the

Opinion  Opinion words have relations among themselves and

Some constraints are also imposed. Opinion words are

A comparative sentence A superlative sentence

Mining has more or less of a certain

comparative  Increasing comparatives: Such a comparative

 Given a collection of opinionated documents D,

Tested on movie reviews, product reviews and

Machine says yes Machine says no

Closeness  Mutual Information (MI) =

1. Goal Setting 3. Parsing the content

2. Text Preprocessing  Term Identification

 Determine the data source (Web/Micro 4. Text Refinement

You might also like