0% found this document useful (0 votes)

45 views41 pages

Naive Bayes

The decision surface of a Naïve Bayes classifier is a set of linear decision boundaries. Specifically: - For a classification problem with two classes and two features, the decision boundary is a straight line. - For problems with more than two features, the decision boundary is a set of hyperplanes, one for each pair of classes. This is because under the conditional independence assumption of Naïve Bayes, the posterior probability P(Y|X) factors into a product of the class prior P(Y) and feature likelihoods P(X1|Y), P(X2|Y), etc. Each factor contributes a linear term to the decision function, so the overall decision boundary is linear even when there are

Uploaded by

Arvind H H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views41 pages

Naive Bayes

Uploaded by

Arvind H H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Bayesian Classifiers,

Conditional Independence
and Naïve Bayes
Required reading:
•  Mitchell draft chapter, sections 1 and 2.
(available on class website)

Machine Learning 10-601

Tom M. Mitchell
Machine Learning Department
Carnegie Mellon University

Jan 28, 2009

Feb 2, 2009
Let’s learn classifiers by learning P(Y|X)
Suppose Y=wealth, X=<gender, hours_worked>
How many parameters must we estimate?
Suppose X =<X1,… Xn>
where Xi and Y are boolean RV’s

To estimate P(Y| X) = P(Y| X1, X2, … Xn)

Can we reduce params by using Bayes Rule?
Suppose X =<X1,… Xn>
where Xi and Y are boolean RV’s
Bayes Rule

Which is shorthand for:

Equivalently:
Naïve Bayes
Naïve Bayes assumes

i.e., that Xi and Xj are conditionally

independent given Y, for all i≠j
Conditional Independence
Definition: X is conditionally independent of Y given Z, if
the probability distribution governing X is independent
of the value of Y, given the value of Z

Which we often write

E.g.,
Naïve Bayes uses assumption that the Xi are conditionally
independent, given Y

Given this assumption, then:

in general:

How many parameters needed to describe P(X|Y)? P(Y)?

•  Without conditional indep assumption?
•  With conditional indep assumption?
How many parameters to estimate?
P(X1, ... Xn | Y), all variables boolean
Without conditional independence assumption:

With conditional independence assumption:

Naïve Bayes in a Nutshell
Bayes rule:

Assuming conditional independence among Xi’s:

So, classification rule for Xnew = < X1, …, Xn > is:

Naïve Bayes Algorithm – discrete Xi

•  Train Naïve Bayes (examples)

for each* value yk
estimate
for each* value xij of each attribute Xi
estimate

•  Classify (Xnew)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters: Y, Xi discrete-valued

Maximum likelihood estimates (MLE’s):

Number of items in D for

which Y=yk
Example: Live in Sq Hill? P(S|G,D,M)
•  S=1 iff live in Squirrel Hill •  D=1 iff Drive to CMU
•  G=1 iff shop at SH Giant Eagle •  M=1 iff Rachel Maddow fan
Example: Live in Sq Hill? P(S|G,D,M)
•  S=1 iff live in Squirrel Hill •  D=1 iff Drive to CMU
•  G=1 iff shop at SH Giant Eagle •  M=1 iff Rachel Maddow fan
Example: Live in Sq Hill? P(S|G,D,M)
•  S=1 iff live in Squirrel Hill •  D=1 iff Drive to CMU
•  G=1 iff shop at SH Giant Eagle •  M=1 iff Rachel Maddow fan
Naïve Bayes: Subtlety #1
If unlucky, our MLE estimate for P(Xi | Y) might be
zero. (e.g., X373= Birthday_Is_January_30_1990)

•  Why worry about just one parameter out of many?

•  What can be done to avoid this?

Estimating Parameters: Y, Xi discrete-valued

Maximum likelihood estimates:

MAP estimates (Dirichlet priors):

Only difference:
“imaginary” examples
Naïve Bayes: Subtlety #2
Often the Xi are not really conditionally independent

•  We use Naïve Bayes in many cases anyway, and

it often works pretty well
–  often the right classification, even when not the right
probability (see [Domingos&Pazzani, 1996])

•  What is effect on estimated P(Y|X)?

–  Special case: what if we add two copies: Xi = Xk
Learning to classify text documents
•  Classify which emails are spam?
•  Classify which emails promise an attachment?
•  Classify which web pages are student home
pages?

How shall we represent text documents for Naïve

Bayes?
Baseline: Bag of Words Approach
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
…
Zaire 0
For code and data, see
www.cs.cmu.edu/~tom/mlbook.html
click on “Software and Data”
What if we have continuous Xi ?
Eg., image classification: Xi is ith pixel
What if we have continuous Xi ?
Eg., image classification: Xi is ith pixel

Gaussian Naïve Bayes (GNB): assume

Sometimes assume variance

•  is independent of Y (i.e., σi),
•  or independent of Xi (i.e., σk)
•  or both (i.e., σ)
Gaussian (aka Normal) Distribution
Gaussian Naïve Bayes Algorithm – continuous Xi
(but still discrete Y)

•  Train Naïve Bayes (examples)

for each value yk
estimate*
for each attribute Xi estimate
class conditional mean , variance

•  Classify (Xnew)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters: Y discrete, Xi continuous

Maximum likelihood estimates: jth training

example

ith feature kth class

δ(z)=1 if z true,
else 0
GNB Example: Classify a person’s
cognitive activity, based on brain image

•  are they reading a sentence or viewing a picture?

•  reading the word “Hammer” or “Apartment”

•  viewing a vertical or horizontal line?

•  answering the question, or getting confused?

Stimuli for our study:

ant

or 60 distinct exemplars, presented 6 times each

fMRI voxel means for “bottle”: means defining P(Xi | Y=“bottle)

fMRI
activation
high

Mean fMRI activation over all stimuli:

average

below
average
“bottle” minus mean activation:
Scaling up: 60 exemplars

Categories Exemplars
BODY PARTS leg arm eye foot hand
FURNITURE chair table bed desk dresser
VEHICLES car airplane train truck bicycle
ANIMALS horse dog bear cow cat
KITCHEN
UTENSILS glass knife bottle cup spoon
TOOLS chisel hammer screwdriver pliers saw
BUILDINGS apartment barn house church igloo
PART OF A
BUILDING window door chimney closet arch
CLOTHING coat dress shirt skirt pants
INSECTS fly ant bee butterfly beetle
VEGETABLES lettuce tomato carrot corn celery
MAN MADE
OBJECTS refrigerator key telephone watch bell
Rank Accuracy Distinguishing among 60 words
Where in the brain is activity that
distinguishes tools vs. buildings?
Accuracyat
Accuracy of each
a radius
voxelone
with
aclassifier
radius 1 centered at each
searchlight
voxel:
voxel clusters: searchlights
Accuracies of
cubical
27-voxel
classifiers
centered at
each significant
voxel
[0.7-0.8]
What you should know:
•  Training and using classifiers based on Bayes rule

•  Conditional independence
–  What it is
–  Why it’s important

•  Naïve Bayes
–  What it is
–  Why we use it so much
–  Training using MLE, MAP estimates
–  Discrete variables (Bernoulli) and continuous (Gaussian)
Questions:
•  Can you use Naïve Bayes for a combination of
discrete and real-valued Xi?

•  How can we easily model just 2 of n attributes as

dependent?

•  What does the decision surface of a Naïve Bayes

classifier look like?
What is form of decision surface for Naïve
Bayes classifier?

Naive Bayes
No ratings yet
Naive Bayes
11 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Machine Learning 10-601: Today: - Bayes Classifiers - Conditional Independence - Naïve Bayes Readings
No ratings yet
Machine Learning 10-601: Today: - Bayes Classifiers - Conditional Independence - Naïve Bayes Readings
51 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
NBayes 1 20 2011 Ann
No ratings yet
NBayes 1 20 2011 Ann
21 pages
NLP NB
No ratings yet
NLP NB
52 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
NaiveBayersClassification BA
No ratings yet
NaiveBayersClassification BA
36 pages
NB Slides
No ratings yet
NB Slides
29 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Naive Bayes Ons
No ratings yet
Naive Bayes Ons
29 pages
Lecture 10 Naïve Bayes Classification
No ratings yet
Lecture 10 Naïve Bayes Classification
29 pages
Class 06 07 Naive Bayes
No ratings yet
Class 06 07 Naive Bayes
91 pages
L10-Naive Bayes Continuous
No ratings yet
L10-Naive Bayes Continuous
16 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
SJB Institute of Technology: Department of Information Science & Engineering
No ratings yet
SJB Institute of Technology: Department of Information Science & Engineering
14 pages
lec20-ML I
No ratings yet
lec20-ML I
48 pages
ML For ME S15-16 Naïve Bayes
No ratings yet
ML For ME S15-16 Naïve Bayes
17 pages
Naive Bayes
No ratings yet
Naive Bayes
12 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
HIRARC Form
50% (2)
HIRARC Form
43 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Na Ive Bayes Classifier
No ratings yet
Na Ive Bayes Classifier
3 pages
NBayes Log Reg
No ratings yet
NBayes Log Reg
18 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
PHYHOME - FTTH PON Series
No ratings yet
PHYHOME - FTTH PON Series
37 pages
07 Naive Bayes
No ratings yet
07 Naive Bayes
6 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
Schneider - 45RIEC PDF
No ratings yet
Schneider - 45RIEC PDF
28 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
07 - ML - Naive-Bayes-update
No ratings yet
07 - ML - Naive-Bayes-update
26 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Lecture 06 Bayesian Networks 07112022 011127pm
No ratings yet
Lecture 06 Bayesian Networks 07112022 011127pm
33 pages
Far From The Tree - Andrew Solomon - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
Far From The Tree - Andrew Solomon - Free Download, Borrow, and Streaming - Internet Archive
3 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Netflix - Ecommerce
No ratings yet
Netflix - Ecommerce
17 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Naive Bayes
No ratings yet
Naive Bayes
26 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Naive Bayes
No ratings yet
Naive Bayes
3 pages
Naive Bayes
No ratings yet
Naive Bayes
7 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
Windows 7 Developer Guide v1.5
No ratings yet
Windows 7 Developer Guide v1.5
46 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Get Python For Finance 2nd Edition Yuxing Yan Free All Chapters
No ratings yet
Get Python For Finance 2nd Edition Yuxing Yan Free All Chapters
41 pages
Facilities Management Policy Draft 12
100% (2)
Facilities Management Policy Draft 12
36 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
9 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Recovery v12 170312 Install
No ratings yet
Recovery v12 170312 Install
15 pages
Bizhub PRO 1200 Series Product Guide 4.8
No ratings yet
Bizhub PRO 1200 Series Product Guide 4.8
73 pages
Wheatstone Bridge's Sensitivity, Resistors' Values Effect PDF
No ratings yet
Wheatstone Bridge's Sensitivity, Resistors' Values Effect PDF
6 pages
Snapdeal MIS
No ratings yet
Snapdeal MIS
16 pages
IC Packaging 2008
No ratings yet
IC Packaging 2008
26 pages
WK 08
No ratings yet
WK 08
10 pages
C Programming Sollution
100% (1)
C Programming Sollution
43 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
03 School Sports Draft Data Privacy Notice and Consent Form 3
No ratings yet
03 School Sports Draft Data Privacy Notice and Consent Form 3
3 pages
NOTES
No ratings yet
NOTES
15 pages
TS DRA 2022 en Create Drawings
No ratings yet
TS DRA 2022 en Create Drawings
1,070 pages
4 6filter Banks
No ratings yet
4 6filter Banks
9 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
DBMT103-EUR 500BN Swift
No ratings yet
DBMT103-EUR 500BN Swift
2 pages
KRA-KYC Text File Download Structure - Ver2.2
No ratings yet
KRA-KYC Text File Download Structure - Ver2.2
49 pages
Create The Plan For A Real Project: Lab 02 Tasks
No ratings yet
Create The Plan For A Real Project: Lab 02 Tasks
4 pages
2080iq4 2
No ratings yet
2080iq4 2
2 pages
Nishant Resume
No ratings yet
Nishant Resume
2 pages
Segment 11
No ratings yet
Segment 11
4 pages
Evolution of Computer Devices: Grade 12 Competency Level 2.2 Anuradha Dissanayake
No ratings yet
Evolution of Computer Devices: Grade 12 Competency Level 2.2 Anuradha Dissanayake
16 pages
NI ELVIS II Prototyping Board Pinouts
No ratings yet
NI ELVIS II Prototyping Board Pinouts
5 pages
Juniper SA 700 Datasheet
No ratings yet
Juniper SA 700 Datasheet
4 pages
Quick Start Guide Cisco 7911 IP Telephone: More User Manuals On
No ratings yet
Quick Start Guide Cisco 7911 IP Telephone: More User Manuals On
18 pages
How To Build A Blog With The Ghost API and Next - Js
No ratings yet
How To Build A Blog With The Ghost API and Next - Js
68 pages
Microsoft Kinect Case Study
No ratings yet
Microsoft Kinect Case Study
5 pages
Conclusion
No ratings yet
Conclusion
2 pages
Bell's Inequality Untwisted
From Everand
Bell's Inequality Untwisted
Jim Spinosa
No ratings yet

Naive Bayes

Uploaded by

Naive Bayes

Uploaded by

Bayesian Classifiers,

Machine Learning 10-601

Jan 28, 2009

To estimate P(Y| X) = P(Y| X1, X2, … Xn)

Which is shorthand for:

i.e., that Xi and Xj are conditionally

Which we often write

Given this assumption, then:

How many parameters needed to describe P(X|Y)? P(Y)?

With conditional independence assumption:

Assuming conditional independence among Xi’s:

So, classification rule for Xnew = < X1, …, Xn > is:

• Train Naïve Bayes (examples)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Maximum likelihood estimates (MLE’s):

Number of items in D for

• Why worry about just one parameter out of many?

• What can be done to avoid this?

Maximum likelihood estimates:

MAP estimates (Dirichlet priors):

• We use Naïve Bayes in many cases anyway, and

• What is effect on estimated P(Y|X)?

How shall we represent text documents for Naïve

Gaussian Naïve Bayes (GNB): assume

Sometimes assume variance

• Train Naïve Bayes (examples)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Maximum likelihood estimates: jth training

ith feature kth class

• are they reading a sentence or viewing a picture?

• reading the word “Hammer” or “Apartment”

• viewing a vertical or horizontal line?

• answering the question, or getting confused?

or 60 distinct exemplars, presented 6 times each

Mean fMRI activation over all stimuli:

• How can we easily model just 2 of n attributes as

• What does the decision surface of a Naïve Bayes

You might also like

•  Train Naïve Bayes (examples)

•  Why worry about just one parameter out of many?

•  What can be done to avoid this?

•  We use Naïve Bayes in many cases anyway, and

•  What is effect on estimated P(Y|X)?

•  Train Naïve Bayes (examples)

•  are they reading a sentence or viewing a picture?

•  reading the word “Hammer” or “Apartment”

•  viewing a vertical or horizontal line?

•  answering the question, or getting confused?

•  How can we easily model just 2 of n attributes as

•  What does the decision surface of a Naïve Bayes