Module05 - Bayesian Reasoning
Module05 - Bayesian Reasoning
Reference
with patterns
𝑥 = (𝑥1 𝑥2 … 𝑥𝑛 )𝑇
• We consider y to be a random variable that must be
described probabilistically.
𝑦 ∶ (𝑦1 , 𝑦2 , … , 𝑦𝑞 … , 𝑦𝑀 )
𝑦𝑞 ; 𝑞 = 1, . . 𝑀 corresponds to class 𝑞 ∈ {1, . . 𝑀}
• The distribution of all possible values of discrete random
variable y is expressed as probability distribution,
𝑃 𝑦 = 𝑃(𝑦1 ), … , 𝑃(𝑦𝑀 )
𝑃(𝑦1 ) + … + 𝑃(𝑦𝑀 ) = 1
𝑃( 𝑦𝑘 )𝑃(𝑥 𝑦𝑘
𝑃(𝑦𝑘 𝑥 =
𝑃(𝑥 )
𝑀
𝑃 𝑥 = 𝑃(𝑥 𝑦𝑞 𝑃( 𝑦𝑞 )
𝑞=1
• 𝑃(𝑥) expresses variability of the observed data,
independent of the class.
• 𝑃(𝑥 𝑦𝑘 is called the class likelihood and is the
conditional probability that a pattern belonging to class
𝑦𝑘 has the associated with observation value x.
𝑃𝑟𝑖𝑜𝑟 ∗ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 =
𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒
• The posterior can be calculated as,
𝑃( 𝑦𝑘 )𝑃(𝑥 𝑦𝑘
𝑃(𝑦𝑘 𝑥 = 𝑀
σ𝑞=1 𝑃(𝑥 𝑦𝑞 𝑃( 𝑦𝑞 )
𝑁𝑞 𝑣𝑙 𝑥
𝑗
𝑃 𝑥𝑗 𝑦𝑞 ) =
𝑁𝑞
M = 3, N = 15
𝑁1 4
𝑃(𝑦1 ) = = = 0.267
𝑁 15
𝑁2 8
𝑃(𝑦2 ) = = = 0.533
𝑁 15
𝑁3 3
𝑃(𝑦3 ) = = = 0.2
𝑁 15
𝑉𝑥1 :{M , F} = { 𝑣1 𝑥1 , 𝑣2 𝑥1 }; 𝑑1 = 2
𝑉𝑥2 = { 𝑣1 𝑥2 , 𝑣2 𝑥1 , 𝑣3 𝑥2 , 𝑣4 𝑥2 , 𝑣5 𝑥2 , 𝑣6 𝑥2 }; 𝑑2 = 6
= bins {(0, 1.6], (1.6, 1.7], (1.7, 1.8], (1.8, 1.9], (1.9, 2.0], (2.0, ∞)}
The count table generated from data is given in Table.
Count 𝑵𝒒 𝒗𝒍 𝒙
𝒋
Value
𝑽𝒍 𝒙𝒋 Cricket Tennis Football
q=1 q=2 q=3
𝑣1 𝑥1 :M 1 2 3
𝑣2 𝑥1 : F 3 6 0
𝑣6 𝑥2 : (2.0, ∞) bin 0 0 2
We consider an instance from the given dataset (the same procedure
applies for a data tuple not in the given dataset (unseen instance)):
𝑁2 𝑣1 𝑥 2
1
𝑃 𝑥1 𝑦1 ) = =
𝑁2 8
𝑁3 𝑣1 𝑥 3
1
𝑃 𝑥1 𝑦3 ) = =
𝑁3 3
𝑁1 𝑣5 𝑥 0
2
𝑃 𝑥2 𝑦1 ) = =
𝑁1 4
𝑁2 𝑣5 𝑥 1
2
𝑃 𝑥2 𝑦2 ) = =
𝑁2 8
𝑁3 𝑣5 𝑥 1
2
𝑃 𝑥2 𝑦3 ) = =
𝑁3 3
1
𝑃 𝑥 𝑦1 ) = 𝑃 𝑥1 𝑦1 ) ∗ 𝑃 𝑥2 𝑦1 ) = ∗0=0
4
2 1 1
𝑃 𝑥 𝑦2 ) = 𝑃 𝑥1 𝑦2 ) ∗ 𝑃 𝑥2 𝑦2 ) = ∗ =
8 8 32
3 1 1
𝑃 𝑥 𝑦3 ) = 𝑃 𝑥1 𝑦3 ) ∗ 𝑃 𝑥2 𝑦3 ) = ∗ =
3 3 3
𝑃 𝑥 𝑦1 ) 𝑃(𝑦1 ) = 0*0.267 = 0
1
𝑃 𝑥 𝑦2 ) 𝑃(𝑦2 ) = *0.533 = 0.0166
32
1
𝑃 𝑥 𝑦3 ) 𝑃(𝑦3 ) = *0.2 = 0.066
3
𝒎𝒂𝒙
𝒚𝑵𝑩 = 𝒂𝒓𝒈 𝑷 𝒙 𝒚𝒒 ) 𝑷(𝒚𝒒 )
𝒒
This gives 𝒒 = 3.
The true class in the data table is ‘Tennis’. Note that we are working with
an artificial toy dataset. Use of naive Bayes algorithm on real-life
datasets will bring out the power of naive Bayes classifier when N is
large.
Naïve Bayes
𝑁𝑞𝑗
Original : 𝑃 𝑥𝑗 𝑞 =
𝑁𝑞
𝑁𝑞𝑗 +1
Laplace : 𝑃 𝑥𝑗 𝑞 =
𝑁𝑞 +𝑞
𝑁𝑞𝑗 +𝑚∗𝑃(𝑦𝑞 )
m-estimate : 𝑃 𝑥𝑗 𝑞 =
𝑁𝑞 +𝑚
Where 𝑚 is the number of parameters
𝑞 is the number of classes
Gaussian Naive Bayes
s(1) 6 180 12 y1
𝟏
𝝁𝒒𝒋 = σ𝒊 𝒙(𝒊)
𝒋 , gives
𝑵𝒒
𝟏
𝝈𝟐𝒒𝒋 = σ𝒊(𝒙(𝒊) 𝟐
𝒋 −𝝁𝒒𝒋 ) , gives
𝑵𝒒
2 2 2
𝜎11 = 0.0262, 𝜎12 = 92.1875, 𝜎13 = 0.6875,
2 2 2
𝜎21 = 0.0729, 𝜎22 = 418.75, 𝜎23 = 0.5
Testing sample: 𝑥1 = 6, 𝑥2 = 130, 𝑥3 = 8
= 𝐚𝒓𝒈 𝒎𝒂𝒙
𝒒
𝒑 𝒙𝟏 𝒚𝒒 )𝒑 𝒙𝟐 𝒚𝒒 ) 𝒑 𝒙𝟑 𝒚𝒒 )𝒑(𝒚𝒒 )
Actual +ve TP FN
Actual class
(observation)
Actual –ve FP TN
𝑇𝑁
• Specificity = True Negative Rate =
𝐹𝑃 +𝑇𝑁
𝐹𝑃
1- Specificity = 𝐹𝑃 +𝑇𝑁
= (fp rate)
2 ∗ Precision ∗ Recall
𝐹1 =
Precision + Recall