Slp3 TextClassification Reduced
Slp3 TextClassification Reduced
Classification
and Naive
Bayes
Is this spam?
Who wrote which Federalist papers?
1787-8: anonymous essays try to convince New
York to ratify U.S Constitution: Jay, Madison,
Hamilton.
Authorship of 12 of the letters in dispute
1963: solved by Mosteller and Wallace using
Bayesian methods
5
Positive or negative movie review?
6
Why sentiment analysis?
7
Scherer Typology of Affective States
Emotion: brief organically synchronized … evaluation of a major event
◦ angry, sad, joyful, fearful, ashamed, proud, elated
Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
◦ cheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: affective stance toward another person in a specific interaction
◦ friendly, flirtatious, distant, cold, warm, supportive, contemptuous
Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
◦ liking, loving, hating, valuing, desiring
Personality traits: stable personality dispositions and typical behavior tendencies
◦ nervous, anxious, reckless, morose, hostile, jealous
Scherer Typology of Affective States
Emotion: brief organically synchronized … evaluation of a major event
◦ angry, sad, joyful, fearful, ashamed, proud, elated
Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
◦ cheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: affective stance toward another person in a specific interaction
◦ friendly, flirtatious, distant, cold, warm, supportive, contemptuous
Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
◦ liking, loving, hating, valuing, desiring
Personality traits: stable personality dispositions and typical behavior tendencies
◦ nervous, anxious, reckless, morose, hostile, jealous
Basic Sentiment Classification
Sentiment analysis
Spam detection
Authorship identification
Language Identification
Assigning subject categories, topics, or genres
…
Text Classification: definition
Input:
◦ a document d
◦ a fixed set of classes C = {c1, c2,…, cJ}
14
Classification Methods:
Supervised Machine Learning
Any kind of classifier
◦ Naïve Bayes
◦ Logistic regression
◦ Neural networks
◦ k-Nearest Neighbors
◦ …
Text The Task of Text Classification
Classification
and Naive
Bayes
Text The Naive Bayes Classifier
Classification
and Naive
Bayes
Naive Bayes Intuition
γ( )=c
sweet 1
whimsical 1
recommend 1
happy 1
... ...
Bayes’ Rule Applied to Documents and Classes
P(d | c)P(c)
P(c | d) =
P(d)
Naive Bayes Classifier (I)
MAP is “maximum a
cMAP = argmax P(c | d) posteriori” = most
c∈C likely class
P(d | c)P(c)
= argmax Bayes Rule
c∈C P(d)
= argmax P(d | c)P(c) Dropping the
denominator
c∈C
Naive Bayes Classifier (II)
"Likelihood" "Prior"
available.
Multinomial Naive Bayes Independence
Assumptions
P(x1, x2 ,…, xn | c)
2 3
X
This: cNB = argmax 4log P (cj ) + log P (xi |cj )5
cj 2C
i2positions
Notes:
1) Taking log doesn't change the ranking of classes!
The class with highest probability also has highest log probability!
2) It's a linear model:
Just a max of a sum of weights: a linear function of the inputs
So naive bayes is a linear classifier
Text The Naive Bayes Classifier
Classification
and Naive
Bayes
Text
Classification Naive Bayes: Learning
and Naïve
Bayes
Sec.13.3
𝑁"!
𝑃! 𝑐! =
𝑁#$#%&
count(wi , c j )
P̂(wi | c j ) =
∑ count(w, c j )
w∈V
Parameter estimation
P̂(wi | c j ) =
count(wi , c j ) fraction of times word wi appears
∑ count(w, c j ) among all words in documents of topic cj
w∈V
count("fantastic", positive)
P̂("fantastic" positive) = = 0
∑ count(w, positive)
w∈V
count(wi , c) +1
P̂(wi | c) =
∑ (count(w, c))+1)
w∈V
count(wi , c) +1
=
# &
%% ∑ count(w, c)(( + V
$ w∈V '
Multinomial Naïve Bayes: Learning
45
without add-1 smoothing to make the differences clearer. Note that the results counts
need not be 1; the word great has a count of 2 even for Binary NB, because it appears
Binary multinominal naive Bayes
in multiple documents.
NB Binary
Counts Counts
Four original documents: + +
it was pathetic the worst part was the and 2 0 1 0
boxing scenes boxing 0 1 0 1
film 1 0 1 0
no plot twists or great scenes great 3 1 2 1
+ and satire and great plot twists it 0 1 0 1
+ great scenes great film no 0 1 0 1
or 0 1 0 1
After per-document binarization: part 0 1 0 1
it was pathetic the worst part boxing pathetic 0 1 0 1
plot 1 1 1 1
scenes satire 1 0 1 0
no plot twists or great scenes scenes 1 2 1 2
+ and satire great plot twists the 0 2 0 1
+ great scenes film twists 1 1 1 1
was 0 2 0 1
worst 0 1 0 1
without add-1 smoothing to make the differences clearer. Note that the results counts
need not be 1; the word great has a count of 2 even for Binary NB, because it appears
Binary multinominal naive Bayes
in multiple documents.
NB Binary
Counts Counts
Four original documents: + +
it was pathetic the worst part was the and 2 0 1 0
boxing scenes boxing 0 1 0 1
film 1 0 1 0
no plot twists or great scenes great 3 1 2 1
+ and satire and great plot twists it 0 1 0 1
+ great scenes great film no 0 1 0 1
or 0 1 0 1
After per-document binarization: part 0 1 0 1
it was pathetic the worst part boxing pathetic 0 1 0 1
plot 1 1 1 1
scenes satire 1 0 1 0
no plot twists or great scenes scenes 1 2 1 2
+ and satire great plot twists the 0 2 0 1
+ great scenes film twists 1 1 1 1
was 0 2 0 1
worst 0 1 0 1
without add-1 smoothing to make the differences clearer. Note that the results counts
need not be 1; the word great has a count of 2 even for Binary NB, because it appears
Binary multinominal naive Bayes
in multiple documents.
NB Binary
Counts Counts
Four original documents: + +
it was pathetic the worst part was the and 2 0 1 0
boxing scenes boxing 0 1 0 1
film 1 0 1 0
no plot twists or great scenes great 3 1 2 1
+ and satire and great plot twists it 0 1 0 1
+ great scenes great film no 0 1 0 1
or 0 1 0 1
After per-document binarization: part 0 1 0 1
it was pathetic the worst part boxing pathetic 0 1 0 1
plot 1 1 1 1
scenes satire 1 0 1 0
no plot twists or great scenes scenes 1 2 1 2
+ and satire great plot twists the 0 2 0 1
+ great scenes film twists 1 1 1 1
was 0 2 0 1
worst 0 1 0 1
without add-1 smoothing to make the differences clearer. Note that the results counts
need not be 1; the word great has a count of 2 even for Binary NB, because it appears
Binary multinominal naive Bayes
in multiple documents.
NB Binary
Counts Counts
Four original documents: + +
it was pathetic the worst part was the and 2 0 1 0
boxing scenes boxing 0 1 0 1
film 1 0 1 0
no plot twists or great scenes great 3 1 2 1
+ and satire and great plot twists it 0 1 0 1
+ great scenes great film no 0 1 0 1
or 0 1 0 1
After per-document binarization: part 0 1 0 1
it was pathetic the worst part boxing pathetic 0 1 0 1
plot 1 1 1 1
scenes satire 1 0 1 0
no plot twists or great scenes scenes 1 2 1 2
+ and satire great plot twists the 0 2 0 1
+ great scenes film twists 1 1 1 1
was 0 2 0 1
Counts can still be 2! Binarization is within-doc! worst 0 1 0 1
Text Sentiment and Binary
Classification Naive Bayes
and Naive
Bayes
Text More on Sentiment
Classification Classification
and Naive
Bayes
Sentiment Classification: Dealing with Negation
I really like this movie
I really don't like this movie
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
55
The General Inquirer
Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie. 1966. The General
Inquirer: A Computer Approach to Content Analysis. MIT Press