0% found this document useful (0 votes)

26 views46 pages

04 Bayes Classification Rule

- The document discusses the Bayes classification rule for pattern classification. - The Bayes classification rule computes the posterior probability P(Ci|X) using Bayes' theorem, and classifies X to the class with the highest posterior probability. - The overall classification accuracy is calculated as the weighted sum of the posterior probabilities, where the weights are the class prior probabilities. - For a 1D two-class problem with Gaussian class-conditional densities, the decision boundary that separates the classes can be found by equating the posterior probabilities. The error rate is then computed from the misclassified regions.

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views46 pages

04 Bayes Classification Rule

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Pattern Classification

04. Bayes Classification Rule

AbdElMoniem Bayoumi, PhD

Spring 2023
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya
Recap: Minimum Distance Classifier

• Compute the distance from 𝑋 to each

center 𝑉(𝑘):
𝑵
𝟐 𝟐
𝒅 𝒌 = )[𝑽𝒊 𝒌 − 𝑿𝒊 ] ≡ 𝑽 𝒌 −𝑿
𝒊"𝟏
𝑽 𝟑
𝑿𝟐

𝑽 𝟐 C1
C2
𝑽 𝟏 C3

𝑿𝟏
Recap: Nearest Neighbor Classifier

• The class of the nearest pattern to 𝑋

determines its classification

𝑿𝟐

C1
𝑿 C2

𝑿𝟏
Recap: K-Nearest Neighbor Classifier
• Take k = 5
𝑿𝟐

C1
𝑿 C2

𝑿𝟏
• One can see that C2 is the majority à classify 𝑋 as C2

• The KNN rule is less dependent on strange patterns

compared to the nearest neighbor classification rule
Recap: Bayes Classification Rule

• To compute 𝑃 𝐶& 𝑋 , we use Bayes rule:

𝑃(𝐶& , 𝑋)
𝑃 𝐶& 𝑋 =
𝑃(𝑋)
' (|*! '(*! )
=
'(()

Bayes Rule:
P(A,B) = P(A|B)P(B) = P(B|A)P(A)

6
Recap: Bayes Classification Rule
• To compute 𝑃 𝐶! 𝑋 , we use Bayes rule:

𝑃 𝑋|𝐶! 𝑃(𝐶! )
𝑃 𝐶! 𝑋 =
𝑃(𝑋)

• 𝑃 𝑋|𝐶! ≡ Class-conditional density (defined before)

• 𝑃 𝐶! ≡ Probability of class Ci before or without observing

the features 𝑋
≡ a priori probability of class Ci

7
Recap: Bayes Classification Rule
• The a priori probabilities represent the
frequencies of the classes irrespective of the
observed features

• For example in OCR, the a priori probabilities

are taken as the frequency or fraction of
occurrence of the different letters in a typical
text

– For the letters E & A à P(Ci) will be higher

– For letters Q & X à P(Ci) will be low because they

are infrequent

8
Bayes Classification Rule
• Find 𝐶- giving max 𝑃 𝐶- 𝑋

𝑃 𝑋|𝐶- 𝑃(𝐶- )
𝑃 𝐶- 𝑋 =
𝑃(𝑋)

–𝑃 𝐶! 𝑋 ≡ posterior prob.
–𝑃 𝐶! ≡ a priori prob.
–𝑃 𝑋|𝐶! ≡ class-conditional densities

• 𝑃 𝑋 = ∑/
&". 𝑃(𝑋 , 𝐶& ) = ∑ /
&". 𝑃 𝑋 𝐶& 𝑃(𝐶& )

9
Recap: Marginalization
• Discrete case:
$

𝑃 𝐴 = $ 𝑃(𝐴, 𝐵 = 𝐵! )
!"#
• Continuous case:
&

𝑃 𝑥 = * 𝑃 𝑥, 𝑦 𝑑𝑦
%&
• So:
' '

𝑃 𝑋 = $ 𝑃(𝑋 , 𝐶! ) = $ 𝑃 𝑋 𝐶! 𝑃(𝐶! )
!"# !"#

Marginalization Bayes rule

10
Bayes Classification Rule
𝑃 𝑋|𝐶- 𝑃(𝐶- )
𝑃 𝐶- 𝑋 =
∑/
&". 𝑃 𝑋 𝐶& 𝑃(𝐶& )

• In reality, we do not need to compute 𝑃 𝑋

because it is a common factor for all the
terms in the expression for 𝑃 𝐶- 𝑋

• Hence, it will not affect which terms will

end up being maximum

11
Bayes Classification Rule
• Classify 𝑋 to the class corresponding to
max 𝑃 𝑋|𝐶- 𝑃(𝐶- )
P(x|C1)P(C1) P(x|C2) P(C2)

8 11
1 2 3 4 5 6 7 9 10
x
1-D example

12
Bayes Classification Rule
• Classify 𝑋 to the class corresponding to max 𝑃 𝑋|𝐶! 𝑃(𝐶! )

P(x|C1)P(C1) P(x|C2) P(C2)

8 11
1 2 3 4 5 6 7
1-D example
9 10
x
• For x=5, P(x|C1)P(C1) has a higher value compared to P(x|C2)P(C2)
à classify as C1

13
Classification Accuracy
𝑷 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 𝑿 = 𝐦𝐚𝐱 𝑷(𝑪𝒊 |𝑿)
𝟏0𝒊0𝑲

• Example: 3-class case:

–𝑃 𝐶( 𝑋 = 0.6, 𝑃 𝐶) 𝑋 = 0.3, 𝑃 𝐶* 𝑋 = 0.1

– You classified 𝑋 as 𝐶( à it has highest 𝑃(𝐶+ |𝑋)

– The probability that your classifier is correct

equals to the probability that 𝑋 belongs to the
same class of the classification (which is 0.6)

14
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = * 𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡, 𝑋 𝑑𝑋 Marginal prob.

= * 𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡|𝑋 𝑃 𝑋 𝑑𝑋 Bayes rule

𝑃 𝑋|𝐶( 𝑃(𝐶( )
= * max 𝑃 𝑋 𝑑𝑋
( 𝑃(𝑋)

= * max 𝑃 𝑋|𝐶( 𝑃(𝐶( ) 𝑑𝑋

(

15
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = J max 𝑃 𝑋|𝐶- 𝑃(𝐶- ) 𝑑𝑋
-

P(x|C1)P(C1) P(x|C2) P(C2) P(x|C3)P(C3)

8 11
1 2 3 4 5 6
1-D example
7 9 10
x
16
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = J max 𝑃 𝑋|𝐶- 𝑃(𝐶- ) 𝑑𝑋
-

𝐦𝐚𝐱 P(x|𝑪𝒊 )P(𝑪𝒊 )

𝒊

8 11
1 2 3 4 5 6
1-D example
7 9 10
x
17
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = J max 𝑃 𝑋|𝐶- 𝑃(𝐶- ) 𝑑𝑋
-

P(correct) = areas[ + + ]

𝐦𝐚𝐱 P(x|𝑪𝒊 )P(𝑪𝒊 )

𝒊

8 11
1 2 3 4 5 6
1-D example
7 9 10
x
18
Classification Accuracy

𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = ) max 𝑃 𝑋|𝐶0 𝑃(𝐶0 ) 𝑑𝑋

𝑃 𝑒𝑟𝑟𝑜𝑟 = 1 − 𝑃(𝑐𝑜𝑟𝑟𝑒𝑐𝑡)
We can compute P(error) directly only for 2-class case!

area = P(error) 19
1-D Example
• Assume Gaussian class-conditional
densities:
("# $% )'
( .
–𝑃 𝑋 𝐶+ = ),-
𝑒 '('

#("#$)'
/
– 𝜇+ = 𝐸 𝑋 = ∫ ),-
𝑒 '(' 𝑑𝑥
– Var𝑖𝑎𝑛𝑐𝑒 = 𝐸 (𝑋 − 𝜇)) = 𝜎)
– 𝜎 = 𝑣𝑎𝑟
σ
std. dev

µ
mean 20
1-D Example
• To get decision boundary:

P(X|𝑪𝟏 )P(𝑪𝟏 ) P(X|𝑪𝟐 )P(𝑪𝟐 )

𝑃 𝑋|𝐶. 𝑃 𝐶. = 𝑃 𝑋|𝐶2 𝑃 𝐶2

21
1-D Example
• To get decision boundary:

𝑃 𝑋|𝐶" 𝑃 𝐶" = 𝑃 𝑋|𝐶# 𝑃 𝐶#

$(&$ '" )# $(&$ '# )#

1 #) # 1 #) #
𝑃 𝐶" 𝑒 " = 𝑃 𝐶# 𝑒 #
2𝜋𝜎" 2𝜋𝜎#

• Take log of both sides (𝑙𝑜𝑔* ≡ 𝑙𝑛), assuming 𝜎" = 𝜎# = 𝜎:

"#$! " ("#$" )"

log 𝑃 𝐶! − log 2𝜋𝜎 − = 𝑙𝑜𝑔 𝑃 𝐶% − 𝑙𝑜𝑔 2𝜋𝜎 −
%&" %&"

Can get 𝑋+ = 𝑓𝑛(𝜇" , 𝜇# , 𝜎 # , 𝑃 𝐶" , 𝑃 𝐶# ) à quadratic equation

Exercise!
22
1-D Example
• Compute P(error)
P(X|𝑪𝟏 )P(𝑪𝟏 ) P(X|𝑪𝟐 )P(𝑪𝟐 )

(/. 0- )- (/. 02 )-
. .
7, 8(:- )< -1- & 8(:2 )< -1-
• 𝑃 𝑒𝑟𝑟𝑜𝑟 = ∫%& =>?
𝑑𝑋 + ∫7
, =>?
𝑑𝑋

I1 I2
23
1-D Example
• Compute P(error): I1 I2
"! ("# $")" ) ("# $#)"
# #
𝑃(𝐶% )𝑒 %& " 𝑃(𝐶! )𝑒 %& "
𝑃 𝑒𝑟𝑟𝑜𝑟 = & 𝑑𝑋 + & 𝑑𝑋
2𝜋𝜎 2𝜋𝜎
#) "!

• Substitute in integral I1:

"# $" Limits:
= 𝑋0
& X = −∞ à 𝑋) = −∞
%$ &'#
𝑑𝑋 )
X = 𝑋$ à 𝑋 = (
= 𝑑𝑋0
𝜎
)! *+"
* "
#"
,
𝑒 % Normal integral CDF
𝐼! = 𝑃(𝐶% ) & 𝑑𝑋0 (numerically computed)
2𝜋
#)

Do the same for I2 24

Bayes Classification Rule
• It is an optimal classification rule

• This means no other classification rule is

better (on average)

• The reason is that it chooses the most

likely class (highest 𝑃 𝐶& 𝑋 ), so nothing
could be better

25
Bayes Classification Rule
• However, Bayes classifier assumes that the
probability densities are known, which is
not usually the case

• Typically, only a training set is available and

from the training set we can estimate the
densities

• The density estimates have some

“estimation error” that is higher if the
training set size is smaller

26
Bayes Classification Rule
• Density estimation:

real

estimate

27
Gaussian Densities
• Assume a multi dimensional Gaussian
density for each 𝑃(𝑋|𝐶& )

• Features may be independent (or

conditionally independent), i.e.,
independent Gaussians

• Features may be dependent in other cases

28
Gaussian Densities (independent case)

• 𝑃 𝑋 𝐶& = 𝑃 𝑋. 𝐶& 𝑃 𝑋2 𝐶& ⋯ 𝑃(𝑋3 |𝐶& )

)(+, ) -, )/
/0/,
5
where 𝑃 𝑋4 𝐶& =
267,

/
1 4 (+, ) -, )
)/ ∑,31
0/
,
5
• Get 𝑃 𝑋 𝐶& = 4
(26) / 71 7/ ⋯74

29
Gaussian Densities (dependent case)

-
,.(0,1)3 4,- (0,1)
"
• 𝑃 𝑋 𝐶! = 5 -
#$ . %"& . (()

where:
Σ ≡ Covariance matrix (NxN)
det ≡ determinant
𝜇(
𝜇)
𝜇 ≡mean vector = ⋮ 𝜇1 = 𝐸(𝑋1 )
𝜇0

30
Gaussian Densities (dependent case)

• Let 𝑍 = 𝑋 − 𝜇

• 𝑍 9 𝐴 𝑍 = 𝑍.2 𝐴.. + 𝑍. 𝑍2 𝐴.2 + 𝑍. 𝑍2 𝐴2. + 𝑍22 𝐴22 +

𝑍. 𝑍: 𝐴:. + ⋯
Quadratic form

• A ≡ matrix
Exercise for yourself!

31
Covariance
• Covariance between two variables is a
measure of how they change together

• Positive covariance if they tend to change in

the same direction, e.g., weight & height

• Negative covariance if they tend to change in

opposite directions, e.g., grade & exam
difficulty

• Zero covariance if they are not related and do

not influence each other’s value

32
Covariance
grades grades

IQ exam diff.
positive covariance negative covariance

zero covariance 33
Covariance Matrix

• Σ is a symmetric matrix, i.e., Σ(4,-) = Σ(-,4)

• Σ(&,&) = 𝐸 𝑋& − 𝜇& 2

= 𝜎&2 à variance of ith item

• Σ(4,-) = 𝐸 𝑋4 − 𝜇4 𝑋- − 𝜇- ≡ covariance(𝑋5 , 𝑋6 )

𝐸 𝑋7 − 𝜇7 8 𝐸 𝑋7 − 𝜇7 𝑋8 − 𝜇8 ⋯
• Σ = 𝐸 𝑋7 − 𝜇7 𝑋8 − 𝜇8 𝐸 𝑋8 − 𝜇8 8 ⋮
⋮ ⋮ ⋱

34
Some Properties

• If 𝑋 is Gaussian then its linear

transformation is also Gaussian

• For the independent case:

𝐸 𝑋& − 𝜇& (𝑋4 −𝜇4 ) = 0 𝑖𝑓 𝑖 ≠ 𝑗

𝜎.2 ⋯ 0
Σ= ⋮ ⋱ ⋮ diagonal matrix
0 ⋯ 𝜎32

35
Some Properties
• For the independent case:

1
⋯ 0
𝜎"#
Σ $" = ⋮ ⋱ ⋮
1
0 ⋯
𝜎3#

det Σ = 𝜎"# 𝜎## ⋯ 𝜎3# 𝑋−𝜇

$
Σ %& 𝑋 − 𝜇
#
" ' (&% $ '% ) 𝑋$ − 𝜇$
$# ∑%&"
)# 𝜎$%
𝑒 %
𝑋% − 𝜇%
𝑃 𝑋 𝐶! = 3 Σ #$ 𝑋−𝜇 = 𝜎%%
Dependent case: (2𝜋) # 𝜎" 𝜎# ⋯ 𝜎3 ⋮
𝟏 𝑋& − 𝜇&
% (𝑿%𝝁)𝑻 𝜮"𝟏 (𝑿%𝝁)
𝒆 𝟐
𝑷 𝑿 𝑪𝒊 = 𝜎&%
𝑵 𝟏 &
𝟐𝝅 𝟐 𝒅𝒆𝒕𝟐 (𝜮) 𝑑𝑒𝑡 ' (Σ) 36
Decision boundaries
• Decision boundary in case of multi-
dimensional Gaussian class-conditional
densities:

𝑃 𝐶& 𝑃 𝑋 𝐶& = 𝑃 𝐶4 𝑃 𝑋 𝐶4

# 6 # 6
% 7 % @5 A.2 (7%@5 ) % 7 % @7 A.2 (7%@7 )
𝑒 = 5 𝑒 = 7
𝑃 𝐶! $ # = 𝑃 𝐶B $ #
2𝜋 = 𝑑𝑒𝑡 = (Σ! ) 2𝜋 = 𝑑𝑒𝑡 = (ΣB )

Solve to get equation of boundary condition!

37
Decision boundaries
• For simplicity assume Σ= = ΣB = Σ

# 6 # 6
% 7 % @5 A.2 (7%@5 ) % 7 % @7 A.2 (7%@7 )
𝑒 = 5 𝑒 = 7
𝑃 𝐶! $ # = 𝑃 𝐶B $ #
2𝜋 = 𝑑𝑒𝑡 = (Σ! ) 2𝜋 = 𝑑𝑒𝑡 = (ΣB )

39
Decision boundaries
• For simplicity assume Σ+ = Σ1 = Σ
& % & %
%' 3 % 4$ 5"&
$ (3%4$ ) %' 3 % 4' 5"&
' (3%4' )
𝑒 𝑒
𝑃 𝐶2 6 & = 𝑃 𝐶1 6 &
2𝜋 ' 𝑑𝑒𝑡 ' (Σ2 ) 2𝜋 ' 𝑑𝑒𝑡 ' (Σ1 )

1 ) 1 )
log 𝑃 𝐶+ − 𝑋 − 𝜇( Σ *+ 𝑋 − 𝜇( = log 𝑃 𝐶1 − 𝑋 − 𝜇, Σ *+ 𝑋 − 𝜇,
2 2

2 log 𝑃 𝐶' − log 𝑃 𝐶(

= − 𝑋 ) Σ #$𝑋 − 𝑋 ) Σ #$𝜇( − 𝜇( ) Σ #$𝑋 + 𝜇( ) Σ #$𝜇( + 𝑋 ) Σ #$𝑋 − 𝑋 ) Σ #$𝜇' − 𝜇' ) Σ #$𝑋 + 𝜇' ) Σ #$𝜇'

−2𝜇1 $ Σ %& 𝑋 −2𝜇2 $ Σ %& 𝑋

𝑿𝑻𝜮#𝟏 𝝁𝒋 = (𝑿𝑻𝜮#𝟏 𝝁𝒋 )𝑻= 𝝁𝒋 𝑻𝜮#𝟏 𝑿 (𝑨𝑩𝑪)𝑻= 𝑪𝑻𝑩𝑻𝑨𝑻

Scalar = Scalar T (𝑿𝑻𝜮#𝟏 𝝁𝒋 )𝑻= 𝝁𝒋 𝑻(𝜮#𝟏 )𝑻𝑿

(𝜮#𝟏 )𝑻= 𝜮#𝟏 symmetric matrix

40
Decision boundaries
• For simplicity assume Σ+ = Σ1 = Σ
& % & %
%' 3 % 4$ 5"&
$ (3%4$ ) %' 3 % 4' 5"&
' (3%4' )
𝑒 𝑒
𝑃 𝐶2 6 & = 𝑃 𝐶1 6 &
2𝜋 ' 𝑑𝑒𝑡 ' (Σ2 ) 2𝜋 ' 𝑑𝑒𝑡 ' (Σ1 )

1 ) 1 )
log 𝑃 𝐶+ − 𝑋 − 𝜇( Σ *+ 𝑋 − 𝜇( = log 𝑃 𝐶1 − 𝑋 − 𝜇, Σ *+ 𝑋 − 𝜇,
2 2

2 log 𝑃 𝐶' − log 𝑃 𝐶(

= − 𝑋 ) Σ #$𝑋 − 𝑋 ) Σ #$𝜇( − 𝜇( ) Σ #$𝑋 + 𝜇( ) Σ #$𝜇( + 𝑋 ) Σ #$𝑋 − 𝑋 ) Σ #$𝜇' − 𝜇' ) Σ #$𝑋 + 𝜇' ) Σ #$𝜇'

−2𝜇1 $ Σ %& 𝑋 −2𝜇2 $ Σ %& 𝑋

𝑃 𝐶(
2𝑙𝑜𝑔 = 2𝜇, ) Σ*+ 𝑋 − 𝜇, ) Σ*+ 𝜇, − 2𝜇( ) Σ*+ 𝑋 + 𝜇( ) Σ*+ 𝜇(
𝑃 𝐶,
𝑃 𝐶(
2𝑙𝑜𝑔 = 2 𝜇, ) Σ*+ − 𝜇( ) Σ*+ 𝑋 − 𝜇, ) Σ*+ 𝜇, − 𝜇( ) Σ*+ 𝜇(
𝑃 𝐶,
Linear classifier! 41
Decision boundaries
M N-
• 2𝑙𝑜𝑔 M N%
= 2 𝜇5 O Σ &7 − 𝜇P O Σ &7 𝑋 − 𝜇5 O Σ &7𝜇5 − 𝜇P O Σ &7𝜇P

• Let:
<
< *= < *=
𝑊 = 2 𝜇; Σ = 2 Σ*= (𝜇; − 𝜇> )
− 𝜇> Σ
< *= < *=
𝑃 𝐶>
𝑊+ = − 𝜇; Σ 𝜇; − 𝜇> Σ 𝜇> − 2𝑙𝑜𝑔
𝑃 𝐶;

• Then decision boundary:

𝑊 8 𝑋 + 𝑊+ = 0 à Linear classifier!

42
Decision boundaries
• Decision boundary:
𝑊 9 𝑋 + 𝑊E = 0 à Linear classifier!

X2 decision boundary

𝝁𝒊

𝝁𝒋
data variation

X1
Scatter diagram
43
Applying Bayes Rule
• One way on how to apply Bayes rule in practical
situations:

– Obtain the training set 𝑋 1 , 𝑋 2 ⋯ 𝑋(𝑀)

– Assume a multi-dimensional Gaussian density for each

class, i.e., 𝑃(𝑋|𝐶> )

– To obtain the form of each density we need 𝜇> and Σ> for
each class 𝑖 à estimate from training set

– Estimate the a priori probabilities 𝑃(𝐶> ) from the training

set, i.e., according to the frequencies of each class

– Using the obtained estimates, plug in Bayes rule to

obtain the classification rule

44
Estimate 𝝁 and Σ
• Estimate 𝝁 and Σ for a particular class:
𝐸(𝑋7)
𝐸(𝑋8)
– We know that: 𝜇 = 𝐸 𝑋 =
⋮
𝐸(𝑋Q )
– An estimate of 𝜇P ≡ 𝐸(𝑋P ) is:
T
1
𝜇̂ P = D 𝑋P (𝑚) à the average
𝑀
RS7

𝜇̂ 7 T
𝜇̂ 8 1
𝜇̂ = = D 𝑋(𝑚)
⋮ 𝑀
RS7
𝜇̂ Q M is # of training patterns
belonging to the considered class

45
Estimate 𝝁 and Σ
• Estimate 𝝁 and Σ for a particular class:

– We know that:
=
estimate of variance = ∑?
@A= 𝑋> 𝑚 − 𝜇
M> B
?

= <
– Est. of Σ ∶ ΣO =
?
∑?
@A= 𝑋 𝑚 − 𝜇
M 𝑋 𝑚 − 𝜇M

– For terms on the diagonal, the previous formula reduces

= ? B
O
to: Σ(;,;) = ∑ 𝑋 𝑚 − 𝜇M; which is equivalent to the
? @A= ;
estimate of variance formula

– Confirming our earlier assertion that the terms on the

diagonal of Σ give the variances of the components of 𝑋

46
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya

Mean Median Mode MCQ
100% (8)
Mean Median Mode MCQ
4 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Basic Principles of Experimental Designs
100% (4)
Basic Principles of Experimental Designs
2 pages
Sampling Distributions, Estimation and Hypothesis Testing - Multiple Choice Questions
100% (1)
Sampling Distributions, Estimation and Hypothesis Testing - Multiple Choice Questions
6 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Ad3411 Data Science and Analytics Laboratory
100% (7)
Ad3411 Data Science and Analytics Laboratory
24 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
ML 1
No ratings yet
ML 1
64 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Chapter 07
No ratings yet
Chapter 07
68 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
03 Classification Methods
No ratings yet
03 Classification Methods
37 pages
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Classification: Probabilistic Generative Model
No ratings yet
Classification: Probabilistic Generative Model
34 pages
Bayesian
No ratings yet
Bayesian
23 pages
Bayesian
No ratings yet
Bayesian
21 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
MLSlides2 Selected Shared
No ratings yet
MLSlides2 Selected Shared
29 pages
Unit 3-Generative Models
No ratings yet
Unit 3-Generative Models
23 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Note 4 Nov 2023
No ratings yet
Note 4 Nov 2023
18 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
No ratings yet
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
6 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Assignment 10 Solution
No ratings yet
Assignment 10 Solution
8 pages
Midterm - EE511 - Part B: K K K K
No ratings yet
Midterm - EE511 - Part B: K K K K
8 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Bayes
No ratings yet
Bayes
10 pages
Unit 2 Control Charts For Variables: Structure
No ratings yet
Unit 2 Control Charts For Variables: Structure
41 pages
Parametric Vs Non Paramteric Tests
No ratings yet
Parametric Vs Non Paramteric Tests
5 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Linear Regression For Machine Learning
100% (1)
Linear Regression For Machine Learning
2 pages
Weibull-Analysis-In-Excel Standard IEC 61649
No ratings yet
Weibull-Analysis-In-Excel Standard IEC 61649
113 pages
Statistics Problems: Measures of Central Tendency
No ratings yet
Statistics Problems: Measures of Central Tendency
13 pages
Statistics Fundamentals Succinctly
No ratings yet
Statistics Fundamentals Succinctly
104 pages
Mathematics
No ratings yet
Mathematics
20 pages
2nd Sem Final Exam in Statistics
No ratings yet
2nd Sem Final Exam in Statistics
12 pages
Statistical Significance Versus Clinical Relevance
No ratings yet
Statistical Significance Versus Clinical Relevance
38 pages
Machine Learning Lab Assignment Using Weka Name:: Submitted To
No ratings yet
Machine Learning Lab Assignment Using Weka Name:: Submitted To
15 pages
DOANE - STAT - Chap 017
No ratings yet
DOANE - STAT - Chap 017
87 pages
ch03 Forecasting
No ratings yet
ch03 Forecasting
43 pages
CMP3010L03 Pipelining
No ratings yet
CMP3010L03 Pipelining
42 pages
Basic Statistics (5) Discrete Probability Distributions
No ratings yet
Basic Statistics (5) Discrete Probability Distributions
38 pages
Case Solution On Agricomp
No ratings yet
Case Solution On Agricomp
10 pages
Unit 4
No ratings yet
Unit 4
19 pages
Discretionary Accruals Earnings Management or Not
No ratings yet
Discretionary Accruals Earnings Management or Not
18 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
E6 MSA-GRR Blank Updated Proposal
No ratings yet
E6 MSA-GRR Blank Updated Proposal
3 pages
588-Article Text-2174-1-10-20201217 - Anna Maria Vasile
No ratings yet
588-Article Text-2174-1-10-20201217 - Anna Maria Vasile
15 pages
Do UN Interventions Cause Peace Using Matching To
No ratings yet
Do UN Interventions Cause Peace Using Matching To
43 pages
05 Density Estimation
No ratings yet
05 Density Estimation
29 pages
Annotated
No ratings yet
Annotated
7 pages
02 Training Patterns
No ratings yet
02 Training Patterns
18 pages
Module 2 Quiz - Correct
No ratings yet
Module 2 Quiz - Correct
4 pages
Jurnal Asro
No ratings yet
Jurnal Asro
29 pages
Assg 2
No ratings yet
Assg 2
10 pages
Kahan Et Al 2015 - Risk of Selection Bias in Randomised Trials
No ratings yet
Kahan Et Al 2015 - Risk of Selection Bias in Randomised Trials
8 pages
02 Multicollinearity
100% (1)
02 Multicollinearity
8 pages
I3 TD5
No ratings yet
I3 TD5
3 pages

04 Bayes Classification Rule

Uploaded by

04 Bayes Classification Rule

Uploaded by

Pattern Classification

04. Bayes Classification Rule

AbdElMoniem Bayoumi, PhD

• Compute the distance from 𝑋 to each

• The class of the nearest pattern to 𝑋

• The KNN rule is less dependent on strange patterns

• To compute 𝑃 𝐶& 𝑋 , we use Bayes rule:

• 𝑃 𝑋|𝐶! ≡ Class-conditional density (defined before)

• 𝑃 𝐶! ≡ Probability of class Ci before or without observing

• For example in OCR, the a priori probabilities

– For the letters E & A à P(Ci) will be higher

– For letters Q & X à P(Ci) will be low because they

Marginalization Bayes rule

• In reality, we do not need to compute 𝑃 𝑋

• Hence, it will not affect which terms will

P(x|C1)P(C1) P(x|C2) P(C2)

• Example: 3-class case:

– You classified 𝑋 as 𝐶( à it has highest 𝑃(𝐶+ |𝑋)

– The probability that your classifier is correct

= * 𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡|𝑋 𝑃 𝑋 𝑑𝑋 Bayes rule

= * max 𝑃 𝑋|𝐶( 𝑃(𝐶( ) 𝑑𝑋

P(x|C1)P(C1) P(x|C2) P(C2) P(x|C3)P(C3)

𝐦𝐚𝐱 P(x|𝑪𝒊 )P(𝑪𝒊 )

𝐦𝐚𝐱 P(x|𝑪𝒊 )P(𝑪𝒊 )

𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = ) max 𝑃 𝑋|𝐶0 𝑃(𝐶0 ) 𝑑𝑋

P(X|𝑪𝟏 )P(𝑪𝟏 ) P(X|𝑪𝟐 )P(𝑪𝟐 )

𝑃 𝑋|𝐶" 𝑃 𝐶" = 𝑃 𝑋|𝐶# 𝑃 𝐶#

$(&$ '" )# $(&$ '# )#

• Take log of both sides (𝑙𝑜𝑔* ≡ 𝑙𝑛), assuming 𝜎" = 𝜎# = 𝜎:

"#$! " ("#$" )"

Can get 𝑋+ = 𝑓𝑛(𝜇" , 𝜇# , 𝜎 # , 𝑃 𝐶" , 𝑃 𝐶# ) à quadratic equation

• Substitute in integral I1:

Do the same for I2 24

• This means no other classification rule is

• The reason is that it chooses the most

• Typically, only a training set is available and

• The density estimates have some

• Features may be independent (or

• Features may be dependent in other cases

• 𝑃 𝑋 𝐶& = 𝑃 𝑋. 𝐶& 𝑃 𝑋2 𝐶& ⋯ 𝑃(𝑋3 |𝐶& )

• 𝑍 9 𝐴 𝑍 = 𝑍.2 𝐴.. + 𝑍. 𝑍2 𝐴.2 + 𝑍. 𝑍2 𝐴2. + 𝑍22 𝐴22 +

• Positive covariance if they tend to change in

• Negative covariance if they tend to change in

• Zero covariance if they are not related and do

• Σ is a symmetric matrix, i.e., Σ(4,-) = Σ(-,4)

• Σ(&,&) = 𝐸 𝑋& − 𝜇& 2

• If 𝑋 is Gaussian then its linear

• For the independent case:

𝐸 𝑋& − 𝜇& (𝑋4 −𝜇4 ) = 0 𝑖𝑓 𝑖 ≠ 𝑗

det Σ = 𝜎"# 𝜎## ⋯ 𝜎3# 𝑋−𝜇

Solve to get equation of boundary condition!

2 log 𝑃 𝐶' − log 𝑃 𝐶(

−2𝜇1 $ Σ %& 𝑋 −2𝜇2 $ Σ %& 𝑋

𝑿𝑻𝜮#𝟏 𝝁𝒋 = (𝑿𝑻𝜮#𝟏 𝝁𝒋 )𝑻= 𝝁𝒋 𝑻𝜮#𝟏 𝑿 (𝑨𝑩𝑪)𝑻= 𝑪𝑻𝑩𝑻𝑨𝑻

Scalar = Scalar T (𝑿𝑻𝜮#𝟏 𝝁𝒋 )𝑻= 𝝁𝒋 𝑻(𝜮#𝟏 )𝑻𝑿

(𝜮#𝟏 )𝑻= 𝜮#𝟏 symmetric matrix

2 log 𝑃 𝐶' − log 𝑃 𝐶(

−2𝜇1 $ Σ %& 𝑋 −2𝜇2 $ Σ %& 𝑋

• Then decision boundary:

– Obtain the training set 𝑋 1 , 𝑋 2 ⋯ 𝑋(𝑀)

– Assume a multi-dimensional Gaussian density for each

– Estimate the a priori probabilities 𝑃(𝐶> ) from the training

– Using the obtained estimates, plug in Bayes rule to

– For terms on the diagonal, the previous formula reduces

– Confirming our earlier assertion that the terms on the

You might also like