0% found this document useful (0 votes)
131 views17 pages

Bayesian Classification Examples

- Naive Bayesian classifiers are based on Bayes' theorem and assume attribute independence. - They calculate the probability of a new data point belonging to each class based on the probabilities of the attributes given each class. - While making a strong independence assumption, naive Bayes classifiers are fast, simple, and often highly accurate in practice.

Uploaded by

SM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views17 pages

Bayesian Classification Examples

- Naive Bayesian classifiers are based on Bayes' theorem and assume attribute independence. - They calculate the probability of a new data point belonging to each class based on the probabilities of the attributes given each class. - While making a strong independence assumption, naive Bayes classifiers are fast, simple, and often highly accurate in practice.

Uploaded by

SM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Bayes Classifiers

We are about to see some of the mathematical formalisms and


examples, but keep in mind the basic idea.

Find out the probability of the previously unseen instance


belonging to each class, then simply pick the most probable class.
Bayes Classifiers
• Bayesian classifiers use Bayes theorem, which says
p(cj | d ) = p(d | cj ) p(cj)
p(d)

• p(cj | d) = probability of instance d being in class cj,


This is what we are trying to compute
• p(d | cj) = probability of generating instance d given class cj,
We can imagine that being in class cj, causes you to have feature d
with some probability
• p(cj) = probability of occurrence of class cj,
This is just how frequent the class cj, is in our database
• p(d) = probability of instance d occurring
This can actually be ignored, since it is the same for all classes
Assume that we have two classes (Note: “Drew
c1 = male, and c2 = female. can be a male
or female
name”)
We have a person whose gender we do
not know, say “drew” or d. Drew Barrymore
Classifying drew as male or female is
equivalent to asking is it more probable
that drew is male or female, I.e which is
greater p(male | drew) or p(female | drew)

Drew Carey
What is the probability of being called
“drew” given that you are a male?
What is the probability
of being a male?
p(male | drew) = p(drew | male ) p(male)
What is the probability of
p(drew)
being named “drew”?
(actually irrelevant, since it is
that same for all classes)
This is Officer Drew. Is Officer Drew a
Male or Female?
Luckily, we have a small
database with names and
gender.
We can use it to apply Bayes Name gender
rule…
Drew Male
Officer Drew Claudia Female
Drew Female
Drew Female
p(cj | d) = p(d | cj ) p(cj) Alberto Male
p(d) Karin Female
Nina Female
Sergio Male
Name gender
Drew Male
Claudia Female
Drew Female
Drew Female
p(cj | d) = p(d | cj ) p(cj) Alberto Male
p(d) Karin Female
Officer Drew Nina Female
Sergio Male
p(male | drew) = 1/3 * 3/8 = 0.125
3/8 3/8 Officer Drew is
more likely to be
p(female | drew) = 2/5 * 5/8 = 0.250 a Female.
3/8 3/8
Officer Drew IS a female!

Officer Drew

p(male | drew) = 1/3 * 3/8 = 0.125


3/8 3/8

p(female | drew) = 2/5 * 5/8 = 0.250


3/8 3/8
So far we have only considered Bayes p(cj | d) = p(d | cj ) p(cj)
Classification when we have one
attribute (the “name”). But we may p(d)
have many features.
How do we use all the features?

Name Over 170CM Eye Hair length gender


Drew No Blue Short Male
Claudia Yes Brown Long Female
Drew No Blue Long Female
Drew No Blue Long Female
Alberto Yes Brown Short Male
Karin No Blue Long Female
Nina Yes Brown Short Female
Sergio Yes Blue Long Male
• To simplify the task, naïve Bayesian classifiers assume
attributes have independent distributions, and thereby estimate

p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

The probability of
class cj generating
instance d, equals….
The probability of class cj
generating the observed
value for feature 1,
multiplied by..
The probability of class cj
generating the observed
value for feature 2,
multiplied by..
• To simplify the task, naïve Bayesian classifiers
assume attributes have independent distributions, and
thereby estimate
p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

p(officer drew|cj) = p(over_170cm = yes|cj) * p(eye =blue|cj) * ….

Officer Drew
is blue-eyed, p(officer drew| Female) = 2/5 * 3/5 * ….
over 170cm
tall, and has p(officer drew| Male) = 2/3 * 2/3 * ….
long hair
The Naive Bayes classifiers
is often represented as this
type of graph…
cj
Note the direction of the
arrows, which state that
each class causes certain
features, with a certain
probability

p(d1|cj) p(d2|cj) … p(dn|cj)


Naïve Bayes is fast and cj
space efficient

We can look up all the probabilities


with a single scan of the database and
store them in a (small) table…

p(d1|cj) p(d2|cj) … p(dn|cj)

gender Over190cm gender Long Hair gender


Male Yes 0.15 Male Yes 0.05 Male
No 0.85 No 0.95
Female Yes 0.01 Female Yes 0.70 Female
No 0.99 No 0.30
Naïve Bayes is NOT sensitive to irrelevant features...

Suppose we are trying to classify a persons gender based


on several features, including eye color. (Of course, eye
color is completely irrelevant to a persons gender)
p(Jessica |cj) = p(eye = brown|cj) * p( wears_dress = yes|cj) * ….

p(Jessica | Female) = 9,000/10,000 * 9,975/10,000 * ….


p(Jessica | Male) = 9,001/10,000 * 2/10,000 * ….
Almost the same!

However, this assumes that we have good enough estimates of


the probabilities, so the more data the better.
An obvious point. I have used a
simple two class problem, and
cj
two possible values for each
example, for my previous
examples. However we can have
an arbitrary number of classes, or
feature values

p(d1|cj) p(d2|cj) … p(dn|cj)

Animal Mass >10kg Animal Color Animal


Cat Yes 0.15 Cat Black 0.33 Cat
No 0.85 White 0.23
Dog Yes 0.91 Brown 0.44 Dog

No 0.09 Dog Black 0.97


Pig
Pig Yes 0.99 White 0.03
No 0.01 Brown 0.90
Pig Black 0.04
White 0.01
Problem! Naïve Bayesian
p(d|cj)
Classifier
Naïve Bayes assumes
independence of
features…

p(d1|cj) p(d2|cj) p(dn|cj)

gender Over 6 gender Over 200


foot pounds
Male Yes 0.15 Male Yes 0.11
No 0.85 No 0.80
Female Yes 0.01 Female Yes 0.05
No 0.99 No 0.95
Solution Naïve Bayesian
p(d|cj)
Classifier
Consider the
relationships between
attributes…

p(d1|cj) p(d2|cj) p(dn|cj)

gender Over 6 gender Over 200 pounds


foot
Male Yes and Over 6 foot 0.11
Male Yes 0.15
No and Over 6 foot 0.59
No 0.85
Yes and NOT Over 6 foot 0.05
Female Yes 0.01
No and NOT Over 6 foot 0.35
No 0.99
Solution Naïve Bayesian
p(d|cj)
Classifier
Consider the
relationships between
attributes…

p(d1|cj) p(d2|cj) p(dn|cj)

But how do we find the set of connecting arcs??


Advantages/Disadvantages of Naïve Bayes
• Advantages:
– Fast to train (single scan). Fast to classify
– Not sensitive to irrelevant features
– Handles real and discrete data
– Handles streaming data well
• Disadvantages:
– Assumes independence of features

You might also like