0% found this document useful (0 votes)
50 views

Text Classification: Slides Adapted From Lyle Ungar and Dan Jurafsky

This document discusses text classification using supervised machine learning methods. It introduces naive Bayes classification, which uses Bayes' rule to classify text documents based on word frequencies. The naive Bayes classifier makes independence assumptions between words. It learns classification models by estimating word probabilities in each class from training data and uses these probabilities to classify new documents. The document provides an example of classifying a new document using naive Bayes.

Uploaded by

Matthew Miceli
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Text Classification: Slides Adapted From Lyle Ungar and Dan Jurafsky

This document discusses text classification using supervised machine learning methods. It introduces naive Bayes classification, which uses Bayes' rule to classify text documents based on word frequencies. The naive Bayes classifier makes independence assumptions between words. It learns classification models by estimating word probabilities in each class from training data and uses these probabilities to classify new documents. The document provides an example of classifying a new document using naive Bayes.

Uploaded by

Matthew Miceli
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Text Classification

Slides adapted from Lyle Ungar and Dan Jurafsky


Example: Positive or negative movie review?
Example: what is the subject of this article?
Text Classification
Text Classification: Definition
Supervised learning: classification methods

Any kind of classifier

• Naïve Bayes
• Logistic Regression
• Support-vector machines
• K-Nearest Neighbors
• Neural Networks
Text Classification: Naïve Bayes

Naïve Bayes Intuition


Text: Bag of words representation
Text: Bag of words using a subset of words
Text: Bag of words representation (vectors)
Bayes’ Rule for document and classes
Bayes’ Rule and MAP (I)
Bayer’s Rule and MAP (II)
Bayes’ Rule and MAP (III)
Naïve Bayes Independence Assumptions
Naïve Bayes Classifier
Learning Naïve Bayes Model: Prior
First attempt: maximum likelihood to estimate parameters
Simply use the frequencies in the data

Fraction of documents belonging to topic j


Learning Naïve Bayes Model: Conditional Probabilities

Fraction of times word wi appears


in all words in documents of topic cj

xi = wi, word

Create mega-document for topic j by concatenating all docs in the topic


Zero probability problems
Laplace (add-1) smoothing for Naïve Bayes
Algorithm with smoothing parameter
Example
Class
c0
c0
c0
c1
?
Example
Class
c0
c0
c0
c1
?

Priors:
Example
Class
c0
c0
c0
c1
?

Priors:

P(c0)=3/4
P(c1)=1/4
Example
Class
c0
c0
c0
c1
?
Conditional Probabilities:
Example
Class
c0
c0
c0
c1
?
Conditional Probabilities:

P(Chinese|c0)=(5+1)/(8+6)=6/14=3/7 P(Chinese|c1)=(1+1)/(3+6)=2/9
P(Tokyo|c0)=(0+1)/(8+6)=1/14 P(Tokyo|c1)=(1+1)/(3+6)=2/9
P(Japan|c0)=(0+1)/(8+6)=1/14 P(Japan|c1)=(1+1)/(3+6)=2/9
Example
Class
c0
c0
c0
c1
?

Choosing a class
P(c0|d5)

P(c1|d5)
Summary

You might also like