Lecture 10 - Naive Bayes Classifier
Lecture 10 - Naive Bayes Classifier
Machine Learning
Classification
§ Input: an email
§ Output: spam/ham
Dear Sir.
§ Setup:
nature as being utterly confidencial
and top secret. …
TO BE REMOVED FROM FUTURE
§ Get a large collection of example emails, each labeled MAILINGS, SIMPLY REPLY TO THIS
“spam” or “ham” MESSAGE AND PUT "REMOVE" IN THE
§ Note: someone has to hand label all this data! SUBJECT.
§ Want to learn to predict labels of new, future emails 99 MILLION EMAIL ADDRESSES
FOR ONLY $99
§ Setup: 1
§ Get a large collection of example images, each labeled with
a digit 2
§ Note: someone has to hand label all this data!
§ Want to learn to predict labels of new, future digit images 1
§ Examples:
§ Spam detection (input: document,
classes: spam / ham)
§ OCR (input: images, classes: characters)
§ Medical diagnosis (input: symptoms,
classes: diseases)
§ Automatic essay grading (input: document,
classes: grades)
§ Fraud detection (input: account activity,
classes: fraud / no fraud)
§ Customer service email routing
§ … many more
Model-Based Classification
Model-Based Classification
§ Model-based approach
§ Build a model (e.g. Bayes’
net) where both the label
and features are random
variables
§ Instantiate any observed
features
§ Query for the distribution
of the label conditioned
on the features
Lecture 09 – Neural Networks
AI 202 Trends & Techniques in AI
A Spam Filter
§ Naïve Bayes
spam filter Dear Sir.
§ Data:
top secret. …
§ Generative model:
word in the
dictionary!
Spam Example
P(spam |
w) = 98.9
Important Concepts
§ Data: labeled instances, e.g. emails marked spam/ham
§ Training set
§ Held out set Training
§ Test set Data
§ Features: attribute-value pairs which characterize each x
§ Experimentation cycle
§ Learn parameters (e.g. model probabilities) on training set
§ (Tune hyperparameters on held-out set)
§ Compute accuracy of test set Held-Out
§ Very important: never “peek” at the test set!
Data
§ Evaluation
§ Accuracy: fraction of instances predicted correctly
Test
§ Overfitting and generalization
§ Want a classifier which does well on test data Data
§ Overfitting: fitting the training data very closely, but not
generalizing well
§ We’ll investigate overfitting and generalization formally in a few
lectures
Overfitting
30
25
20
Degree 15 polynomial
15
10
-5
-10
-15
0 2 4 6 8 10 12 14 16 18 20
Example: Overfitting
2 wins!!
Example: Overfitting
south- screens
west : : inf
inf minute
nation : inf
: inf guarante
morally ed : inf
: inf What went wrong $205.00
nicely here? : inf
: inf delivery
extent : inf Lecture 09 – Neural Networks
AI 202 Trends & Techniques in AI
Parameter Estimation
Parameter Estimation
§ Estimating the distribution of a
random variable
§ Elicitation: ask a human (why is
r b br b
b r
b
br r b b
b b
this hard?)
§ Empirically: use training data
r r b
(learning!)
§ E.g.: for each outcome x, look at the
empirical rate of that value:
Smoothing
Maximum Likelihood?
§ Relative frequencies are the maximum likelihood estimates
Unseen Events
Laplace Smoothing
§ Laplace’s
estimate: r r b
§ Pretend you saw
every outcome
once more than
you actually did
Laplace Smoothing
§ Laplace’s estimate
r r b
(extended):
§ Pretend you saw every
outcome k extra times
helvetica : verdana :
11.4 28.8
seems : Credit :
10.8 28.4
group : ORDER :
10.2 27.2
Do these make more
ago : sense? <FONT> :
8.4 26.9
areas : money : Lecture 09 – Neural Networks
AI 202 Trends & Techniques in AI
Tuning
Features
§ Examples of errors
Dear GlobalSCAPE Customer,
GlobalSCAPE has partnered with ScanSoft to
offer you the latest version of OmniPage Pro,
for just $99.99* - the regular list price is
$499! The most common question we've received
about this offer is - Is this genuine? We
would like to assure you that this offer is
authorized by ScanSoft, is genuine and valid.
You can get the . . .
Baselines
§ First step: get a baseline
§ Baselines are very simple “straw man” procedures
§ Help determine how hard the task is
§ Help know what a “good” accuracy is
§ The confidence of a
probabilistic classifier:
§ Posterior over the top label
§ Calibration
§ Weak calibration: higher confidences mean
higher accuracy
§ Strong calibration: confidence predicts accuracy Lecture 09 – Neural Networks
AI 202 Trends & Techniques in AI
Summary
§ Bayes rule lets us do diagnostic queries with causal probabilities
Introduction
§ You are working on a classification problem and have generated
your set of hypothesis, created features and discussed the
importance of variables.
§ Within an hour, stakeholders want to see the first cut of the
model.
§ What will you do?
§ You have hundreds of thousands of data points and quite a
few variables in your training data set.
§ In such situation, if I were in your place, I would have used ‘Naive
Bayes‘, which can be extremely fast relative to
Lecture 09 – Neural Networks
AI 202 Trends & Techniques in AI
Topics covered
§ What is Naive Bayes algorithm?
§ How Naive Bayes Algorithms works?
§ What are the Pros and Cons of using Naive Bayes?
§ 4 Applications of Naive Bayes Algorithm
§ Steps to build a basic Naive Bayes Model in Python
§ Tips to improve the power of Naive Bayes Model
Bayes theorem
§ Bayes theorem provides a way of calculating
posterior probability P(c|x) from P(c), P(x) and
P(x|c).
§ Look at the equation below:
§ Pros:
§ It is easy and fast to predict class of test data set. It
also perform well in multi class prediction
§ When assumption of independence holds, a Naive
Bayes classifier performs better compare to
other models like logistic regression and you need
less training data.
§ It perform well in case of categorical input variables
compared to numerical variable(s). For numerical
variable, normal distribution is assumed (bell curve,
which is a strong assumption). Lecture 09 – Neural Networks
AI 202 Trends & Techniques in AI
§ Contents from George F. Luger, AI: Structures and strategies for complex problem
solving, 6th Ed.