Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
CLASSIFICATION
Spring 2023 CS6431 Natural Language Processing
B1:
Speech and Language Processing (Third Edition draft
– Jan2022)
Daniel Jurafsky, James H. Martin
Credits
1. B1
Assignment
Read:
B1: Chapter 4
Spam detection
Authorship attribution
Subject category assignment
Supervised Learning Approach
Input:
𝑑1 , 𝑐1 , 𝑑2 , 𝑐2 , … , (𝑑𝑁 , 𝑐𝑁 )
And an unknown document 𝑑
Output
The class label for 𝑑
Naive Bayes Classifiers
Bag-of-words
Dropping denominator
Document 𝑑 be represented as a set of features 𝑓1 , 𝑓2 , … , 𝑓𝑛
Two simplifying assumptions
Position
of the word is not considered (does not matter)
Naïve Bayes assumption
Becomes
Training the Naive Bayes Classifier
How to compute 𝑃 𝑐 and 𝑃(𝑤𝑖 |𝑐)?
𝑁𝑐
𝑃 𝑐 =
𝑁𝑑𝑜𝑐
𝑁𝑐 :
number of documents labelled with 𝑐
𝑁𝑑𝑜𝑐 : be the total number of documents
𝑐:topic/class label
𝑉: vocabulary of the dataset
A problem
Consider the problem of movie reviews
Imagine, no positive review in the training set contains “fantastic” but
the test set does
Note: vocabulary 𝑉 consists of the union of all the word types in all classes,
not just the words in one class 𝑐 (why?)
More things to remove
Unknown Words: words in test data but not in training data
Ignore them / remove them from test document/sentence
Stop words removal
Very frequent words like ‘the’ and ‘a’.
Sort by frequency and take top 10-100 entries as stop words
2𝑃𝑅
𝐹𝐵=1 or 𝐹1 =
𝑃+𝑅
Harmonic mean is more conservative than arithmetic mean
◼ Closer to the smaller of the two numbers
Evaluating more than two classes
𝑘-fold Cross Validation
Statistical Significance Testing
How to decide if model/classifier 𝐴 is better than 𝐵?
𝑀 𝐴, 𝑥 : performance of model/classifier 𝐴 on test set 𝑥
𝑀 𝐵, 𝑥 : performance of model/classifier 𝐵 on test set 𝑥
𝛿 𝑥 (effect size) = 𝑀 𝐴, 𝑥 − 𝑀 𝐵, 𝑥
Consider 𝛿 𝑥 = .04
We want to check if 𝐴’s superiority over 𝐵 is likely to hold again if we
checked another test set 𝑥′
We define two hypothesis
𝐻0 : 𝛿 𝑥 ≤ 0 (Null Hypothesis, 𝐴 is not better than 𝐵)
𝐻1 : 𝛿 𝑥 > 0 (𝐴 is better than 𝐵)
𝐻0 : 𝛿 𝑥 ≤ 0 (Null Hypothesis, 𝐴 is not better than 𝐵)
We want to test if can confidently rule out the null hypothesis and instead
support 𝐻1 , i.e., 𝐴 is better
Let 𝑋: R.V. over all test sets
Distribution of
𝟙(𝑥): if 𝑥 is true, and 0 otherwise
𝛿 values
Goal: assume 𝐻0 and estimate how accidental/surprising 𝛿(𝑥) is
Since the above distribution is biased towards 𝛿 𝑥 = .2, to capture
how surprising 𝛿(𝑥) is, we compute:
Suppose,
10,000 bootstrapped test sets (𝑥 (𝑖) s) are created
Threshold is .01