0% found this document useful (0 votes)
21 views31 pages

Unit IIISentiment Analysis (Autosaved)

This document provides an overview of sentiment analysis. It discusses how sentiment analysis seeks to determine people's feelings about topics by analyzing opinions expressed online through sources like social media, reviews, and blogs. It explains that sentiment analysis is useful for understanding customer, market, employee, and brand sentiment. Applications mentioned include improving customer experience, competitive intelligence, predicting stock market movements, and analyzing political discourse.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views31 pages

Unit IIISentiment Analysis (Autosaved)

This document provides an overview of sentiment analysis. It discusses how sentiment analysis seeks to determine people's feelings about topics by analyzing opinions expressed online through sources like social media, reviews, and blogs. It explains that sentiment analysis is useful for understanding customer, market, employee, and brand sentiment. Applications mentioned include improving customer experience, competitive intelligence, predicting stock market movements, and analyzing political discourse.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Sentiment Analysis:

Unit III
Syllabus
 We, humans, are social beings.
 We are adept at utilizing a variety of means to communicate. We often
consult financial discussion forums before making an investment decision;
 ask our friends for their opinions on a newly opened restaurant or a newly
released movie;
 and conduct Internet searches and read consumer reviews and expert
reports before making a big purchase like a house, a car, or an appliance.
SENTIMENT  We rely o n others' opinions to make better decisions, especially in area
ANALYSIS where we don’t have a knowledge.

OVERVIEW  Thanks to the growing availability and popularity of opinion-rich Internet


resources such as social media outlets (e.g. , Twitter, Facebook, etc.),
online review sites, and personal blogs, it is now easier than ever to find
opinions of others (thousands of them, as a matter of fact) on eve1ything
from the latest gadgets to political and public figures.
 Even though not everybody expresses opinions over the Internet, due
mostly to the fast growing numbers and capabilities of social
communication channels, the numbers are increasing exponentially.
 is a difficult word to define.
 It is often linked to or confused with other terms like belief, view, opinion,
and conviction.
 Sentiment suggests a settled opinion reflective of one's feelings (Mejova,
2009).
 Sentiment has some unique properties that set it apart from other concepts
that we may want to identify in text.
Sentiment  Often we want to categorize text by topic, which may involve dealing with
whole taxonomies of topics.
 Sentiment classification, on the other hand, usually deals with two classes
(positive versus negative), a range of polarity (e.g., star ratings for movies),
or even a range in strength of opinion (Pang and Lee, 2008).
 These classes span many topics, users, and documents.
 sentiment analysis is closely related to computational linguistics, natural
language processing, and text mining.
 Sentiment analysis has many names. It's often referred to as opinion
mining, subjectivity analysis, and appraisal extraction, with some
connections to affective computing (computer recognition and expression of
emotion).
 Sentiment analysis is trying to answer the question "What do people feel
Sentiment about a certain topic?" by digging into opinions of many using a variety of
automated tools.
 Bringing together researchers and practitioners in business, computer
science, computational linguistics, data mining, text mining, psychology,
and even sociology, sentiment analysis aims to expand traditional fact-based
text analysis to new frontiers, to realize opinion-oriented information
systems
 In a business setting, especially in marketing and customer relationship
management, sentiment analysis seeks to detect favorable and unfavorable
opinions toward specific products and/ or services using large numbers of
textual data sources (customer feedback in the form of Web postings, tweets,
blogs, etc.).
 Sentiment that appears in text comes in two Flavors: explicit, where the
subjective sentence directly expresses an opinion ("It's a wonderful day"),
and implicit, where the text implies an opinion ("The handle breaks too
Sentiment easily"). Most of the earlier work done in sentiment analysis focused on the
first kind of sentiment, since it was easier to analyse.
 Current trends are to implement analytical methods to consider both implicit
and explicit sentiments.
 Timely collection and analysis of textual data, which may be coming from a
variety of sources- ranging from customer call centre transcripts to social
media postings-is a crucial part of the capabilities of proactive and customer-
focused companies, nowadays.
 These real-time analyses of textual data are often visualized in
easy-to-understand dashboards.
 Attensity is one of those companies that provide such end-to-end
Sentiment solutions to companies' text analytics needs (Figure 7.9 shows an
example social media analytics dashboard created by Attensity).
 Sentiment analysis is perhaps the most popular application of

SENTIMENT text analytics, tapping into data sources like tweets, Facebook
posts, online communities, discussion boards, Web logs,
ANALYSIS product reviews, call center logs and recording, product
rating sites, chat rooms, price comparison portals, search
APPLICATIO engine logs, and newsgroups. The following applications of

NS sentiment analysis are meant to illustrate the power and the


widespread coverage of this technology.
 is an integral part of an analytic CRM and customer experience
management systems.
 is an integral part of an analytic CRM and customer experience
management systems. As the enabler of VOC, sentiment analysis can
access a company's product and service reviews (either continuously or
VOICE OF THE periodically) to better understand and better manage the customer

CUSTOMER complaints and praises.


 For instance, a motion picture advertising/marketing company may detect
(VOC) Voice of the the negative sentiments toward a movie that is about to open in theatres
customer (VOC) (based on its trailers), and quickly change the composition of trailers and
advertising strategy (on all media outlets) to mitigate the negative impact.
 Similarly, a software company may detect the negative buzz regarding the
bugs found in their newly released product early enough to release patches
and quick fixes to alleviate the situation .
 Often, the focus of VOC is individual customers, their service-
and support-related needs, wants, and issues. VOC draw data
from the full set of customer touch points, including e-mails,
VOICE OF THE surveys, call center notes/ recordings, and social media
CUSTOMER postings, and match customer voices to transactions (inquiries,
purchases, returns) and individual customer profiles captured in
(VOC) Voice of the enterprise operational systems. VOC, mostly driven by
customer (VOC) sentiment analysis, is a key element of customer experience
management initiatives, where the goal is to create an intimate
relationship with the customer.
 Voice of the market is about understanding aggregate opinions
VOICE OF and trends. It's about knowing what stakeholders- customers,

THE potential customers, influencers, whoever- are saying about


your (and your competitors') products and services.
MARKET  A well-done VOM analysis helps companies with competitive
(VOM) intelligence and product development and positioning.
 Traditionally VOE has been limited to employee satisfaction
surveys.
VOICE OF  Text analytics in general (and sentime nt analysis in particular)
THE is a huge enabler of assessing the VOE.

EMPLOYEE  Using rich, opinionated textual data is an effective and efficient


way to listen to what employees are saying.
(VOE)  As we all know, happy employees empower customer
experience efforts and improve customer satisfaction.
 Brand management focuses on listening to social media where anyone
(past/current/ prospective customers, industry experts, other authorities)
can post opinions that can damage or boost your reputation.
BRAND  There are a number of relatively newly launched start-up companies
MANAGEME that offer analytics-driven brand management services for others.

NT  Brand management is product and company (rather than customer)


focused. It attempts to shape perceptions rather than to manage
experiences using sentiment analysis techniques.
 Predicting the future values of individual (or a group oD stocks has been
an interesting and seemingly unsolvable problem. What makes a stock
(or a group of stocks) move up or down is anything but an exact science.
 Many believe that the stock market is mostly sentiment driven, making it
anything but rational (especially for short-term stock movements).
 Therefore, use of sentiment analysis in financial markets has gained
FINANCIAL significant popularity.
MARKETS  Automated analysis of market sentiments using social media, news, bogs,
and discussion groups seems to be a proper way to compute the market
movements.
 If done correctly, sentiment analysis can identify short-term stock
movements based on the buzz in the market, potentially impacting
liquidity and trading.
 As we all know, opinions matter a great deal in politics. Because political
discussions are dominated by quotes, sarcasm, and complex references to
persons, organizations, and ideas, politics is one of the most difficult, and
potentially fruitful, areas for sentiment analysis.
 By analysing the sentiment on election forums, one may predict who is
more likely to win or lose.

POLITICS  Sentiment analysis can help understand what voters are thinking and can
clarify a candidate's position on issues.
 Sentiment analysis can help political organizations, campaigns, and news
analysts to better understand which issues and positions matter the most to
voters.
 The technology was successfully applied by both parties to the 2008 and
2012 American presidential election campaigns.
 Government intelligence is another application that has been
used by intelligence agencies.
 For example, it has been suggested that one could monitor
GOVERNME sources for increases in hostile or negative communications.

NT  Sentiment analysis can allow the automatic analysis of the

INTELLIGEN opinions that people submit about pending policy or


government-regulation proposals.
CE  Furthermore, monitoring communications for spikes in
negative sentiment may be of use to agencies like Homeland
Security.
 Sentiments of customers can be used to better design e-
commerce sites (product suggestions, upsell/ cross-sell
advertising), better place advertisements (e.g., placing dynamic
advertisement of products and services that consider the
sentiment on the page the user is browsing), and manage

OTHER opinion- or review-oriented search engines (i.e., an opinion-


aggregation Web site, an alternative to sites like Opinions,
INTERESTIN summarizing user reviews).

G AREAS  Sentiment analysis can help with e-mail filtration by


categorizing and prioritizing incoming e-mails (e.g., it can
detect strongly negative or flaming e-mails and forward them to
the proper folder), as well as citation analysis, where it can
determine whether an author is citing a piece of work as
supporting evidence or as research that he or she dismisses
 Because of the complexity of the problem (underlying concepts,
expressions in text, context in which the text is expressed, etc.), there
is no readily available standardized process to conduct sentiment
analysis.
 However, based on the published work in the field of sensitivity
SENTIMENT analysis so far (both on research methods and range of applications), a

ANALYSIS multi-step , simple logical process, as given in Figure below, seems to


be an appropriate methodology for sentiment analysis.
PROCESS  These logical steps are iterative (i.e., feedback, corrections, and
iterations are part of the discove1y process) and experimental in
nature, and once completed and combined, capable of producing
desired insight about the opinions in the text collection.
Diagram
 After the retrieval and preparation of the text documents, the first main task
in sensitivity analysis is the detection of objectivity. Here the goal is to
differentiate between a fact and an opinion, which may be viewed as
classification of text as objective or subjective.
 This may also be characterized as calculation of 0-S Polarity (Objectivity-

STEP 1: Subjectivity Polarity, which may be represented with a numerical value


ranging from Oto 1).
SENTIMENT  If the objectivity value is close to 1, then there is no opinion to mine (i.e., it
DETECTION is a fact); therefore, the process goes back and grabs the next text data to
analyze.
 Usually opinion detection is based on the examination of adjectives in text.
For example, the polarity of "what a wonderful work" can be determined
relatively easily by looking at the adjective.
 The second main task is that of polarity classification. Given an
opinionated piece of text, the goal is to classify the opinion as falling
under one of two opposing sentiment polarities, or locate its position on
the continuum between these two polarities (Pang and Lee, 2008). When
viewed as a binary feature, polarity classification is the binary
classification task of labeling an opinionated document as expressing

STEP 2: N-P either an overall positive or an overall negative opinion (e.g., thumbs up
or thumbs down).
POLARITY  In addition to the identification of N-P polarity, one should also be
CLASSIFICAT interested in identifying the strength of the sentiment (as opposed to just
positive, it may be expressed as mildly, moderately, strongly, or very
ION strongly positive).
 Most of this research was done on product or movie reviews where the
definitions of "positive" and "negative" are quite clear. Other tasks, such
as classifying news as "good" or "bad," present some difficulty. For
instance an article may contain negative news without explicitly using any
subjective words or terms.
 The goal of this step is to accurately identify the target of the expressed
sentiment (e.g., a person, a product, an event, etc.).
 The difficulty of this task depends largely on the domain of the analysis.
Even though it is usually easy to accurately identify the target for product
or movie reviews, because the review is directly connected to the target, it
may be quite challenging in other domains. For instance, lengthy, general-
STEP 3: purpose text such as Web pages, news articles, and blogs do not always

TARGET have a predefined topic that they are assigned to, and often mention many
objects, any of which may be deduced as the target.
IDENTIFICATI  Sometimes there is more than one target in a sentiment sentence, which is
ON the case in comparative texts. A subjective comparative sentence orders
objects in order of preferences-for example, "This laptop computer is better
than my desktop PC." These sentences can be identified using comparative
adjectives and adverbs (more, less, better, longer), superlative adjectives
(most, least, best), and other words (such as same, differ, win, prefer, etc.).
Once the sentences have been retrieved, the objects can be put in an order
that is most representative of their merits, as described in text.
 Once the sentiments of all text data points in the document are
identified and calculated, in this step they are aggregated and
STEP 4: converted to a single sentiment measure for the whole

COLLECTION document.
 This aggregation may be as simple as summing up the
AND polarities and strengths of all texts, or as complex as using
AGGREGATION semantic aggregation techniques from natural language
processing to come up with the ultimate sentiment.
 As mentioned in the previous section, polarity identification-identifying the
polarity of a text-can be made at the word, term, sentence, or document
level. The most granular level for polarity identification is at the word level.
 Once the polarity identification is made at the word level, then it can be
aggregated to the next higher level, and then the next until the level of
aggregation desired from the sentiment analysis is reached.
Methods for  There seem to be two dominant techniques used for identification of
Polarity polarity at the word/ term level, each having its advantages and
disadvantages:
Identification  1. Using a lexicon as a reference library (either developed manually or
automatically, by an individual for a specific task or developed by an
institution for general use)
 2. Using a collection of training documents as the source of knowledge
about the polarity of terms within a specific domain (i.e., inducing
predictive models from opinionated textual documents)
 A lexicon is essentially the catalog of words, their synonyms, and their meanings for a given
language. In addition to lexicons for many other languages, there are several general-purpose lexicons
created for English.
 Often general-purpose lexicons are used to create a variety of special-purpose lexicons for use in
sentiment analysis projects.
 Perhaps the most popular general-purpose lexicon is WordNet, created at Princeton University, which
has been extended and used by many researchers and practitioners for sentiment analysis purposes.
 As described on the WordNet Web site (wordnet. princeton.edu), it is a large lexical database of
English, including nouns, verbs, adjectives, and adverbs grouped into sets of cognitive synonyms (i.e.,
synsets), each expressing a distinct concept.

Using a  Synsets are interlinked by means of conceptual-semantic and lexical relations. An interesting
extension of WordNet was created by Esuli and Sebastiani (2006) where they added polarity

Lexicon (Positive-Negative) and objectivity (Subjective-Objective) labels for each term in the lexicon.
 To label each term, they classify the synset (a group of synonyms) to which this term belongs using a
set of ternary classifiers (a measure that attaches to each object exactly one out of three labels), each
of them capable of deciding whether a synset is Positive, or Negative, or Objective.
 The resulting scores range from 0.0 to 1.0, giving a graded evaluation of opinion-related properties of
the terms.
 These can be summed up visually as in Figure 7.11. The edges of the triangle represent one of the
three classifications (positive, negative, and objective).
 A term can be located in this space as a point, representing the extent to which it belongs to each of
the classifications. A similar extension methodology is used to create SentiWordNet, a publicly
available lexicon specifically developed for opinion mining (sentiment analysis) purposes.
Figure
 SentiWordNet assigns to each synset of WordNet three sentiment scores:
positivity, negativity, objectivity. More about SentiWordNet can be found
at sentiworclnet.isti.cnr.it. Another extension to WordNet is WordNet-
Affect, developed by Strapparava and Valitutti (Strapparava and Valitutti,
2004).
 They label WordNet synsets using affective labels representing different
affective categories like emotion , cognitive state, attitude, feeling, and so
SentiWordNet on.
 WordNet has also been directly used in sentiment analysis. For example,
Kim and Hovy (Kim and Hovy, 2004) and Hu and Liu (Hu and Liu, 2005)
generate lexicons of positive and negative terms by starting with a small
list of "seed" terms of known polarities (e.g., love, like, nice, etc.) and
then using the antonymy and synonymy properties of terms to group them
into either of the polarity categories.
 It is possible to perform sentiment classification using statistical analysis and
machinelearning tools that take advantage of the vast resources of labeled
(manually by annotators or using a star/ point system) documents available .
 Product review Web sites like Amazon, C-NET, ebay, RottenTomatoes, and
the Internet Movie Database (IMDB) have all been extensively used as
sources of annotated data.
Using a  The star (or tomato, as it were) system provides an explicit label of the
Collection of overall polarity of the review, and it is often taken as a gold standard in
algorithm evaluation.
Training  A variety of manually labeled textual data is available through evaluation
Documents efforts such as the Text REtrieval Conference (TREC), NII Test Collection
for IR Systems (NTCIR), and Cross Language Evaluation Forum (CLEF).
 The data sets these efforts produce often serve as a standard in the text
mining community, including for sentiment analysis researchers. Individual
researchers and research groups have also produced many interesting data
sets.
 Once the semantic orientation of individual words has been
determined, it is often desirable to extend this to the phrase or
sentence the word appears in.

Identifying Semantic  The simplest way to accomplish such aggregation is to use


Orientation of some type of averaging for the polarities of words in the
phrases or sentences.
Sentences and
 Though rarely applied, such aggregation can be as complex as
Phrases using one or more machine-learning techniques to create a
predictive relationship between the words (and their polarity
values) and phrases or sentences.
 Even though the vast majority of the work in this area is done
in determining semantic orientation of words and
phrases/sentences, some tasks like summarization and

Identifying information retrieval may require semantic labeling of the


whole document (REF).
Semantic  Similar to the case in aggregating sentiment polarity from
Orientation of word level to phrase or sentence level, aggregation to document
level is also accomplished by some type of averaging.
Document  Sentiment orientation of the document may not make sense for
very large documents; therefore, it is often used on small to
medium-sized documents posted on the Internet.

You might also like