ML-09-naive-bayes-classifier
ML-09-naive-bayes-classifier
Machine Learning
● Bayes theorem:
P( A | C ) P(C )
P(C | A) =
P( A)
● Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20
● Given
a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )
● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n
P( A A ! A )
1 2 n
1 2 n
Bayesian Classifiers
● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
Class-conditional
probability Prior probability
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n
P( A A ! A )
1 2 n
1 2 n
● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n
P( A A ! A )
1 2 n
1 2 n
● An example
– Level of reading skills of people tends to increase with
length of the arm
– Explanation: both increase with age of a person
– If age is given, arm length and reading skills are
(conditionally) independent
Conditional independence: basics
P( X, Y | Z ) = P(X, Y, Z) / P(Z)
= P(X, Y, Z) / P(Y, Z) * P(Y, Z) / P(Z)
= P(X | Y, Z) * P(Y | Z)
= P(X | Z) * P(Y | Z)
P( X, Y | Z ) = P(X | Z) * P(Y | Z)
NB assumption:
P(A1, A2, …, An |C) = P(A1| C) P(A2| C)… P(An| C)
How to Estimate
l l
Probabilities from Data?
a a u s
r ic r ic o
o o u
g g it n ss
c at
e
c at
e
c on cl
a ● Class: P(C) = Nc/N
Tid Refund Marital Taxable – e.g., P(No) = 7/10,
Status Income Evade
P(Yes) = 3/10
1 Yes Single 125K No
2 No Married 100K No
● For discrete attributes:
3 No Single 70K No
4 Yes Married 120K No P(Ai | Ck) = |Aik|/ Nc k
5 No Divorced 95K Yes
6 No Married 60K No
– where |Aik| is number of
instances having attribute
7 Yes Divorced 220K No
Ai and belongs to class Ck
8 No Single 85K Yes
9 No Married 75K No
– Examples:
10 No Single 90K Yes P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
10
How to Estimate Probabilities from Data?
● For
continuous attributes, two options:
– Discretize the range into bins
u one ordinal attribute per bin
– Probability density estimation:
u Assume attribute follows a Gaussian / normal
distribution
u Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
u Once probability distribution is known, can use it to
estimate the conditional probability P(Ai|c)
How too Estimate Probabilities from Data?
a l a l s
u
o u ric ric o
e g e g t in s s
c at c at c o n
cl
a
Tid Refund Marital
Status
Taxable
Income Evade
● Normal distribution:
1 −
( Ai − µ ij ) 2
2πσ
i j 2
2 No Married 100K No
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Ai, cj) pair
5 No Divorced 95K Yes
6 No Married 60K No ● For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
9 No Married 75K No
u sample mean = 110
10 No Single 90K Yes u sample variance = 2975
10
1 −
( 120 −110 ) 2
N ic
Original : P ( Ai | C ) =
Nc
c: number of classes
N ic + 1
Laplace : P ( Ai | C ) = p: prior probability
Nc + c
m: parameter
N ic + mp
m - estimate : P ( Ai | C ) =
Nc + m
Naïve Bayes: Pros and Cons