Lec4 - Probability Theory and Naive Bayes Classifier
Lec4 - Probability Theory and Naive Bayes Classifier
Lecture 6
6
7
Try It...
alarm ¬alarm
• P(alarm | burglary) = ??
burglary .09 .01
• P(burglary | alarm) = ??
¬burglary .1 .8
• P(burglary alarm) = ??
• Computing conditional prob: • P(alarm) = ??
– P(a | b) = P(a b) / P(b)
– P(b): normalizing constant
• Product rule:
– P(a b) = P(a | b) P(b)
• Marginalizing:
– P(B) = ΣaP(B, a)
– P(B) = ΣaP(B | a) P(a)
(conditioning)
8
Probability Theory (cont.)
• Conditional probability: • P(burglary | alarm) = .47
probability of effect given causes P(alarm | burglary) = .9
• Computing conditional probs: • P(burglary | alarm) =
– P(a | b) = P(a b) / P(b) P(burglary alarm) / P(alarm)
– P(b): normalizing constant = .09 / .19 = .47
• Product rule: • P(burglary alarm) =
– P(a b) = P(a | b) P(b) P(burglary | alarm) P(alarm) =
.47 * .19 = .09
• Marginalizing:
– P(B) = ΣaP(B, a) • P(alarm) =
P(alarm burglary) +
– P(B) = ΣaP(B | a) P(a)
(conditioning) P(alarm ¬burglary) =
.09+.1 = .19
9
10
11
13
17
18
19
25
Conditional Independence
• Absolute independence:
– A and B are independent if P(A B) = P(A) P(B); equivalently,
P(A) = P(A | B) and P(B) = P(B | A)
• A and B are conditionally independent given C if
– P(A B | C) = P(A | C) P(B | C)
• This lets us decompose the joint distribution:
– P(A B C) = P(A | C) P(B | C) P(C)
• Moon-Phase and Burglary are conditionally independent
given Light-Level
• Conditional independence is weaker than absolute
independence, but still useful in decomposing the full joint
probability distribution
26
27
28
29
30
31
Events:
M= Late in class
A= Late is class
A M A M
day1 L L day1 L L
day2 O L day2 L L
day3 O O day3 L O
day4 O O day4 L O
day5 L L day5 L L
day6 O L day6 L L
day7 O L day7 L L
day8 O O day8 L O
day9 O L day9 L L
day10 L O day10 L O
Event: Strick
A M S A M S
day1 L L Y day1 O L Y
day2 O L N day2 O L N
day3 O O N day3 O O N
day4 O O N day4 O O N
day5 L L N day5 L L N
day6 O L N day6 O L N
day7 O L N day7 O L N
day8 O O N day8 O O N
day9 O L N day9 O L N
day10 L O Y day10 L O Y
16
Naïve Bayes
• Bayes classification
P(C | X) P(X | C)P(C) = P(X1,, X n | C)P(C)
Difficulty: learning the probability
P(X) is not considered
8
17
Example
18
Example
• Learning Phase
' '
' '
!4& "4$ !4& !4$
#4& 4$ #4& !4$
"4& !4$ "4& 4$
9
19
Example
• Test Phase
– Given a new instance, predict its label
1',' ! '!! ' -
– Look up tables achieved in the learning phrase
,' 5 ' -6'6!4& ,'5 '-6'6"4$
, '5 ' -6'6"4& , '5 '-6'6 4$
,'5 ' -6'6"4& ,'5 '-6'6#4$
, ' 5 ' -6'6"4& , ' 5 '-6'6"4$
, ' -6'6&4 # , '-6'6$4 #
– Decision making
, 51-6(6., 5 -,5 -,5 -, 5 -/, ' -6'6*$"
6,51-6(6., 5-6,5-,5-, 5-/, '-6'6*!%
66666666666 , 51-6)6,51-+66 6166 623*6666
20
P(T|HIV) = 95%
P(~T|~HIV) = 95%
10
21
22
Naïve Bayes
• Algorithm: Continuous-valued Features
– Numberless values for a feature
– Conditional probability often modeled with the normal distribution
1 (X )2
j ji
P̂(X j | C = ci ) = exp
2 ji 2 2ji
ji : mean (avearage) of feature values X j of examples for which C = ci
ji : standard deviation of feature values X j of examples for which C = ci
= =
– Learning Phase:
Output: × normal distributions and = =
– Test Phase: Given an unknown instance =
• Instead of looking-up tables, calculate conditional probabilities with all
the normal distributions achieved in the learning phrase
11
23
Naïve Bayes
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
– Estimate mean and variance for each class
= =
= = = =
= =
24
Relevant Issues
• Violation of Independence Assumption
– For many real world tasks,
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem
– If no example contains the feature value = = = =
– In this circumstance, = during test
– For a remedy, conditional probabilities re-estimated with
!!!!!!!!!!!!!!!!!!!!!!!" = = =
+
+
! !! ! !! ! = ! ! =
! ! !! ! !! ! =
! ! ! ! ! = !!! ! ! !!
! !! ! !! ! !!
12