7 - Text Analytics Text Mining and Sentiment Analysis
7 - Text Analytics Text Mining and Sentiment Analysis
Chapter 7:
Text Analytics, Text Mining,
and Sentiment Analysis
Learning Objectives
Describe text mining and understand the need
for text mining
Differentiate between text mining, Web mining,
and data mining
Understand the different application areas for
text mining
Know the process of carrying out a text mining
project
Understand the different methods to introduce
structure to text-based data
(Continued…)
-2 © Pearson Education Limited 2014
Learning Objectives
Describe sentiment analysis
Develop familiarity with popular applications of
sentiment analysis
Learn the common methods for sentiment
analysis
Become familiar with speech analytics as it
relates to sentiment analysis
Situation
Problem Watch it on YouTube!
Solution https://www.youtube.com/watch?v=YLR1byL0U8M
Results
Answer & discuss the case questions...
-4 © Pearson Education Limited 2014
Questions for
the Opening Vignette
1. What is Watson? What is special about it?
2. What technologies were used in building
Watson (both hardware and software)?
3. What are the innovative characteristics of
DeepQA architecture that made Watson
superior?
4. Why did IBM spend all that time and money
to build Watson? Where is the ROI?
Answer Evidence
sources sources
Trained
models 3
4
5
2
1
Text Mining
Information
Web Mining
Retrieval
Information
Data Mining
Extraction
Dream of AI community
to have algorithms that are capable of automatically
reading and obtaining knowledge from text
-17 © Pearson Education Limited 2014
Natural Language Processing
(NLP)
WordNet
A laboriously hand-coded database of English words,
their definitions, sets of synonyms, and various
semantic relations between synonym sets.
A major resource for NLP.
Need automation to be completed.
Sentiment Analysis
A technique used to detect favorable and
unfavorable opinions toward specific products and
services
SentiWordNet
-18 © Pearson Education Limited 2014
Application Case 7.2
Text Mining Improves Hong Kong
Government’s Ability to Anticipate and
Address Public Complaints
Statements Labeled as
Cues Extracted &
Truthful or Deceptive
Selected
By Law Enforcement
Text Processing
Software Generated
Quantified Cues
D007962
D 016923
Ontology
D 001773
...expression of Bcl-2 is correlated with insufficient white blood cell death and activation of p53.
Word
185 8 51112 9 23017 27 5874 2791 8952 1623 5632 17 8252 8 2523
POS
NN IN NN IN VBZ IN JJ JJ NN NN NN CC NN IN NN
Shallow
Parse
NP PP NP NP PP NP NP PP NP
Domain expertise
Tools and techniques
Feedback Feedback
The inputs to the process The output of the Task 1 is a The output of the Task 2 is a The output of Task 3 is a
includes a variety of relevant collection of documents in flat file called term-document number of problem specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such computer processing populated with the term clustering models and
as text, XML, HTML, etc. frequencies visualizations
Document 2 1
Document 3 3 1
Document 4 1
Document 5 2 1
Document 6 1 1
...
… … … … … … … …
1
1
2
2
3
3
1
1
2
2
3
3
1
1
2
2
3
3
0
5
0
5
0
5
0
5
0
5
0
5
0
5
0
5
0
5
0
5
0
5
0
5
1994 1994 1994
1995 1995 1995
1996 1996 1996
1997 1997 1997
1998 1998 1998
1999 1999 1999
2000 2000 2000
2001 2001 2001
C LU STER : 7
C LU STER : 4
C LU STER : 1
2002 2002 2002
2003 2003 2003
2004 2004 2004
2005 2005 2005
Y EAR
2000 2000 2000
2001 2001 2001
C LU STER : 8
C LU STER : 5
C LU STER : 2
C LU S T ER : 1 C LU S TE R : 2 C LU S TE R : 3
100
90
80
70
No of Articles
60
50
40
30
20
10
0
IS R J M IS M IS Q IS R J M IS M IS Q IS R J M IS M IS Q
C LU S T ER : 4 C LU S TE R : 5 C LU S TE R : 6
100
90
80
70
60
50
40
30
20
10
0
IS R J M IS M IS Q IS R J M IS M IS Q IS R J M IS M IS Q
C LU S T ER : 7 C LU S TE R : 8 C LU S TE R : 9
JO U R N AL
Sentiment
Analysis
A statement
Step 1
Process
Calculate the
O-S Polarity
Lexicon
Is there a
No sentiment? Yes
O-S
polarity
Yes measure
Step 2
Calculate the NP N-P Polarity
polarity of the
sentiment
Lexicon Record the Polarity,
Strength, and the
Target of the
Step 3 sentiment.
Step 4
Tabulate & aggregate
the sentiment
analysis results
-45 © Pearson Education Limited 2014
Sentiment Analysis Process
Step 1 – Sentiment Detection
Comes right after the retrieval and
preparation of the text documents
It is also called detection of objectivity
Fact [= objectivity] versus Opinion [= subjectivity]
Step 2 – N-P Polarity Classification
Given an opinionated piece of text, the goal is
to classify the opinion as falling under one of
two opposing sentiment polarities
N [= negative] versus P [= positive]
-46 © Pearson Education Limited 2014
Sentiment Analysis Process
Step 3 – Target Identification
The goal of this step is to accurately identify
the target of the expressed sentiment (e.g., a
person, a product, an event, etc.)
Level of difficulty the application domain
Step 4 – Collection and Aggregation
Once the sentiments of all text data points in
the document are identified and calculated,
they are to be aggregated
Word Statement Paragraph Document
-47 © Pearson Education Limited 2014
Sentiment Analysis
Methods for Polarity Identification
Polarity Identification – P vs. N
Can be made at the level of word, term,
sentence, paragraph, document
Two competing methods
1. Using a lexicon
WordNet [wordnet.princeton.edu]
SentiWordNet [sentiwordnet.isti.cnr.it]
2. Using pre-classified training documents
Data mining / machine learning
S-O Polarity
Objective (O)
-49 © Pearson Education Limited 2014
Sentiment Analysis and
Speech Analytics
Speech analytics – analysis of voice
Content versus other Voice Features
Two Approaches
The Acoustic Approach
Intensity, Pitch, Jitter, Shimmer, etc.
The Linguistic Approach
Lexical: words, phrases, etc.
Disfluencies: filled pauses, hesitation, restarts, etc.
Higher semantics: taxonomy/ontology, pragmatics
Many uses and use cases exist
-50 © Pearson Education Limited 2014
Application Case 7.8
Cutting Through the Confusion: Blue Cross
Blue Shield of North Carolina Uses Nexidia’s
Speech Analytics to Ease Member Experience
in Healthcare
Questions, comments