0% found this document useful (0 votes)
11 views27 pages

Lec4 - Probability Theory and Naive Bayes Classifier

The document discusses Probability Theory and the Naive Bayes Classifier, emphasizing concepts such as conditional probability, independence, and the product rule. It explains how the Naive Bayes method simplifies calculations by assuming conditional independence among attributes for classification tasks. Additionally, it provides examples, including a case study on HIV testing, to illustrate the application of these concepts in real-world scenarios.

Uploaded by

alvi.ibn.amzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

Lec4 - Probability Theory and Naive Bayes Classifier

The document discusses Probability Theory and the Naive Bayes Classifier, emphasizing concepts such as conditional probability, independence, and the product rule. It explains how the Naive Bayes method simplifies calculations by assuming conditional independence among attributes for classification tasks. Additionally, it provides examples, including a case study on HIV testing, to illustrate the application of these concepts in real-world scenarios.

Uploaded by

alvi.ibn.amzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CSE710 Advanced Artificial Intelligence

Lecture 6

Probability Theory and Naive Bayes Classifier

Dr. Md. Golam Rabiul Alam


BRAC University
1
2
4
Probability Theory: Definitions

• Computing conditional prob:


– P(a | b) = P(a š b) / P(b)
– P(b): normalizing constant
• Product rule:
– P(a š b) = P(a | b) P(b)
• Marginalizing:
– P(B) = ΣaP(B, a)
– P(B) = ΣaP(B | a) P(a)
(conditioning)

6
7
Try It...
alarm ¬alarm
• P(alarm | burglary) = ??
burglary .09 .01
• P(burglary | alarm) = ??
¬burglary .1 .8
• P(burglary š alarm) = ??
• Computing conditional prob: • P(alarm) = ??
– P(a | b) = P(a š b) / P(b)
– P(b): normalizing constant
• Product rule:
– P(a š b) = P(a | b) P(b)
• Marginalizing:
– P(B) = ΣaP(B, a)
– P(B) = ΣaP(B | a) P(a)
(conditioning)

8
Probability Theory (cont.)
• Conditional probability: • P(burglary | alarm) = .47
probability of effect given causes P(alarm | burglary) = .9
• Computing conditional probs: • P(burglary | alarm) =
– P(a | b) = P(a š b) / P(b) P(burglary š alarm) / P(alarm)
– P(b): normalizing constant = .09 / .19 = .47
• Product rule: • P(burglary š alarm) =
– P(a š b) = P(a | b) P(b) P(burglary | alarm) P(alarm) =
.47 * .19 = .09
• Marginalizing:
– P(B) = ΣaP(B, a) • P(alarm) =
P(alarm š burglary) +
– P(B) = ΣaP(B | a) P(a)
(conditioning) P(alarm š ¬burglary) =
.09+.1 = .19

9
10
11
13
17
18
19
25
Conditional Independence
• Absolute independence:
– A and B are independent if P(A š B) = P(A) P(B); equivalently,
P(A) = P(A | B) and P(B) = P(B | A)
• A and B are conditionally independent given C if
– P(A š B | C) = P(A | C) P(B | C)
• This lets us decompose the joint distribution:
– P(A š B š C) = P(A | C) P(B | C) P(C)
• Moon-Phase and Burglary are conditionally independent
given Light-Level
• Conditional independence is weaker than absolute
independence, but still useful in decomposing the full joint
probability distribution

26
27
28
29
30
31
Events:
M= Late in class
A= Late is class
A M A M
day1 L L day1 L L
day2 O L day2 L L
day3 O O day3 L O
day4 O O day4 L O
day5 L L day5 L L
day6 O L day6 L L
day7 O L day7 L L
day8 O O day8 L O
day9 O L day9 L L
day10 L O day10 L O

P(A)=0.3 P(M)=0.6 P(A)=1 P(M)=0.6

P( M, A)= P(M|A)P(A)= (2/3)*0.3=0.2 ≠ P(M)*P(A) P(M, A)= P(M|A)P(A)= 0.6 * 1=0.6=P(M)*P(A)

Event: Strick
A M S A M S
day1 L L Y day1 O L Y
day2 O L N day2 O L N
day3 O O N day3 O O N
day4 O O N day4 O O N
day5 L L N day5 L L N
day6 O L N day6 O L N
day7 O L N day7 O L N
day8 O O N day8 O O N
day9 O L N day9 O L N
day10 L O Y day10 L O Y

P(A)=0.3 P(M)=0.6 P(S)=0.2 P(A)=0.3 P(M)=0.6 P(S)=0.2


P(A|S)=1.0 P(M|S)=0.5 P(A|S)=0.5 P(M|S)=0.5
P( M, A | S)= P(M|A, S )P(A|S)= (1/2)*1=0.5 =P(M|S)*P(A|S) P( M, A | S)= P(M|A, S )P(A|S)= 0*0.5=0 ≠ P(M|S)*P(A|S
S)
15

The Naïve Bayes Classifier


• The Naïve Bayes method assumes that the probability
P(BCD|A), which is difficult to compute, can instead be
substituted by a “naïve” approximation that assumes, for a
given class, the values of the attributes to be independent.

• This means that P(BCD|A) is replaced by


P(B|A) x P(C|A) x P(D|A)

• Which is easy to compute, since each of these factors can


be easily estimated from the table of instances.

16

Naïve Bayes
• Bayes classification
P(C | X)  P(X | C)P(C) = P(X1,, X n | C)P(C)
Difficulty: learning the probability        
P(X) is not considered

• Naïve Bayes classification


– Assumption that all input features are conditionally independent!
P(X1, X 2 ,, X n | C) = P(X1 | C)P(X 2 | C) P(X n | C)

8
17

Example

18

Example
• Learning Phase
   '   '
  '   '
 !4& "4$  !4& !4$
  #4& 4$  #4& !4$
  "4& !4$  "4& 4$

  '  '


  '   '
 "4& #4$  "4& "4$
  %4& 4$   %4& !4$

,   !!&4 # ,  !!$4 #

9
19

Example
• Test Phase
– Given a new instance, predict its label
1',' ! '!! ' -
– Look up tables achieved in the learning phrase
,' 5 ' -6'6!4& ,'5 '-6'6"4$
, '5 ' -6'6"4& , '5 '-6'6 4$
,'5 ' -6'6"4& ,'5 '-6'6#4$
, ' 5 ' -6'6"4& , ' 5 '-6'6"4$
, ' -6'6&4 # , '-6'6$4 #

– Decision making
, 51-6(6., 5 -,5 -,5 -, 5 -/, ' -6'6*$"
6,51-6(6., 5-6,5-,5-, 5-/, '-6'6*!%

66666666666 , 51-6)6,51-+66 6166 623*6666


20

Naïve Bayes – HIV example


HIV global prevalence = 0,008
Test with 95% Specificity and Sensitivity

P(T|HIV) = 95%
P(~T|~HIV) = 95%

Perform a first and the result is positive.


Perform a second different and independent test with the
same Sensitivity and Specificity. The result is positive.

What is the probability of having HIV?

10
21

Naïve Bayes – HIV example


P(HIV|T1,T2)  P(T1|HIV) x P(T2|HIV) x P(HIV)
= 0,95 x 0,95 x 0,008 = 0,00722 (2,9x)

P(~HIV|T1,T2)  P(T1|~HIV) x P(T2|~HIV) x P(~HIV)


= 0,05 x 0,05 x 0,992 = 0,00248

22

Naïve Bayes
• Algorithm: Continuous-valued Features
– Numberless values for a feature
– Conditional probability often modeled with the normal distribution
1  (X  )2 
j ji
P̂(X j | C = ci ) = exp   
2 ji  2 2ji 
 ji : mean (avearage) of feature values X j of examples for which C = ci
 ji : standard deviation of feature values X j of examples for which C = ci

 =        =      
– Learning Phase: 
Output:  ×  normal distributions and  =   =    
– Test Phase: Given an unknown instance  =       
• Instead of looking-up tables, calculate conditional probabilities with all
the normal distributions achieved in the learning phrase

11
23

Naïve Bayes
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
– Estimate mean and variance for each class
     =    =  
=    =        =   =  
 =  =

– Learning Phase: output two Gaussian models for P(temp|C)


1  ( x  21.64) 2  1  ( x  21.64) 2 
Pˆ ( x | Yes) = exp  = exp 
2.35 2  2 × 2.35 2  2.35 2  11.09 
1  ( x  23.88) 2  1  ( x  23.88) 2 
Pˆ ( x | No) = exp  2
= exp 
7.09 2  2 × 7.09  7.09 2  50.25 

24

Relevant Issues
• Violation of Independence Assumption
– For many real world tasks,                  
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem

– If no example contains the feature value   =      =    =   = 
– In this circumstance,                       =  during test
– For a remedy, conditional probabilities re-estimated with

!!!!!!!!!!!!!!!!!!!!!!!"    =    =   =
+ 
+
 !  !! !  !!  !  =   ! ! = 
! !  !! !  !!  ! = 
! ! ! ! ! =   !!! !  ! !!  
!  !! !  !!   !  !!  

12

You might also like