Statistical Inference INF312 - Is - Lecture 03 - Part 3
Statistical Inference INF312 - Is - Lecture 03 - Part 3
medium income
2
Prediction Based on Bayes’ Theorem
◼ Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes’ theorem
3
Classification Is to Derive the Maximum Posteriori
◼ Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute vector
X = (x1, x2, …, xn)
◼ Suppose there are m classes C1, C2, …, Cm.
◼ Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
◼ This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X) = i i
i P(X)
◼ Since P(X) is constant for all classes, only
P(C | X) = P(X | C )P(C )
i i i
needs to be maximized
4
Naïve Bayes Classifier
◼ A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes):
n
P( X | C i) = P( x | C i) = P( x | C i) P( x | C i) ... P( x | C i)
k 1 2 n
k =1
◼ This greatly reduces the computation cost: Only counts the
class distribution
◼ If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk
for Ak divided by |Ci, D| (# of tuples of Ci in D)
◼ If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard deviation σ
( x− )2
1 −
g ( x, , ) = e 2 2
and P(xk|Ci) is 2
P ( X | C i ) = g ( xk , C i , Ci )
5
Naïve Bayes Classifier: Training Dataset
Example:
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
Class: >40 medium no excellent no
7
Naïve Bayes Classifier: An Example
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
8
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)
Age Buys Computer Count Total Conditional Probability Conditional Probability
<= 30 Yes 2 9 (2/9) 0.222222222
<= 30 No 3 5 (3/5) 0.6
31-40 Yes 4 9 (4/9) 0.444444444
31-40 No 0 5 (0/5) 0
> 40 Yes 3 9 (3/9) 0.333333333
> 40 No 2 5 (2/5) 0.4
9
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)
Income Buys Computer Count Total Conditional Probability Conditional Probability
High Yes 2 9 (2/9) 0.222222222
High No 2 5 (2/5) 0.4
Medium Yes 4 9 (4/9) 0.444444444
Medium No 2 5 (2/5) 0.4
Low Yes 3 9 (3/9) 0.333333333
Low No 1 5 (1/5) 0.2
10
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)
11
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)
Credit Rating Buys Computer Count Total Conditional Probability Conditional Probability
Fair Yes 6 9 (6/9) 0.666666667
Fair No 2 5 (2/5) 0.4
Excellent Yes 3 9 (3/9) 0.333333333
Excellent No 3 5 (3/5) 0.6
12
Naïve Bayes Classifier: An Example
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
13
Naïve Bayes Classifier: An Example
Class:
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
◼ Decision
X belongs to (C1)
Therefore, X belongs to class (“buys_computer = yes”)
14
Solved Example on Bayes Theorem
◼ Researchers investigated the effectiveness of using the
Hologic Sahara Sonometer, a portable device that
measures bone mineral density (BMD) in the ankle, in
predicting a fracture. They used a Hologic estimated
bone mineral density value of .57 as a cutoff. The
results of the investigation yielded the following data:
15
Solved Example on Bayes Theorem
a) Calculate the sensitivity of using a BMD value of 0.57
as a cutoff value for predicting fracture.
b) Calculate the specificity of using a BMD value of 0.57
as a cutoff value for predicting fracture.
c) If it is estimated that 10 percent of the U.S.
population have a confirmed bone fracture, What is
predictive value positive of using a BMD value of
0.57 as a cutoff value for predicting fracture? That is,
we wish to estimate the probability that a subject
who has BMD value equals 0.57 has a confirmed
bone fracture.
16
Solved Example on Bayes Theorem
17
Solved Example on Bayes Theorem
18