03 Classification Methods
03 Classification Methods
Spring 2023
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya
Minimum Distance Classifier
• Choose a center or a representative pattern
from each class à 𝑉 𝑘 , where k is the
class index
𝑽 𝟑
𝑿𝟐
𝑽 𝟐 C1
C2
𝑽 𝟏 C3
𝑿𝟏
3
Minimum Distance Classifier
• Given a pattern 𝑋 that we would like to
classify
𝑽 𝟑
𝑿𝟐
𝑽 𝟐 C1
C2
𝑽 𝟏 C3
𝑿𝟏
4
Minimum Distance Classifier
• Compute the distance from 𝑋 to each
center 𝑉(𝑘):
𝑵
𝟐 𝟐
𝒅 𝒌 = )[𝑽𝒊 𝒌 − 𝑿𝒊 ] ≡ 𝑽 𝒌 −𝑿
𝒊"𝟏
𝑽 𝟑
𝑿𝟐
𝑽 𝟐 C1
C2
𝑽 𝟏 C3
𝑿𝟏 5
Recap: Euclidean Distance
• 2D:
(y1,y2)
(x1,x2)
𝒅𝟐 = (𝒚𝟐 − 𝒙𝟐 )𝟐 +(𝒚𝟏 − 𝒙𝟏 )𝟐
• N-dimensions:
𝑵
𝒅𝟐 𝑿, 𝒀 = )(𝒀𝒊 − 𝑿𝒊 )𝟐
𝒊"𝟏
6
Minimum Distance Classifier
• Find қ corresponding to the minimum
distance:
қ = 𝐚𝐫𝐠𝐦𝐢𝐧 𝒅(𝒌)
𝟏&𝒌&𝑲
7
Class Center Estimation
• Let 𝑋 𝑚 ∈ 𝐶! , V(1)
$!
1
𝑉 1 = ) 𝑋(𝑚)
𝑀!
"#!
8
Minimum Distance Classifier
• Too simple to solve difficult problems
𝑿𝟐
𝑽 𝟏
𝒇 𝑿 >𝟎 C1
𝑽 𝟐 𝒇 𝑿 <𝟎 C2
𝑿𝟏
9
Minimum Distance Classifier
• Too simple to solve difficult problems
𝑿𝟐
𝑽 𝟏
𝒇 𝑿 >𝟎 C1
𝑽 𝟐 𝒇 𝑿 <𝟎 C2
𝑿𝟏
𝑿 will be classified as C2
10
Nearest Neighbor Classifier
• The class of the nearest pattern to 𝑋
determines its classification
𝑿𝟐
C1
𝑿 C2
𝑿𝟏
11
Nearest Neighbor Classifier
• Compute the distance between pattern 𝑋
and each pattern 𝑋(𝑚) in the training set
𝟐
𝒅 𝒎 = 𝑿 − 𝑿(𝒎)
12
Nearest Neighbor Classifier
• The advantage of the nearest neighbor
classifier is its simplicity
C1
𝑿 C2
𝑿𝟏 13
Nearest Neighbor Classifier
• Also, for patterns with large overlaps
between the classes, the overlapping
patterns can negatively affect performance
𝑿𝟐
C1
C2
𝑿𝟏
14
K-Nearest Neighbor Classifier
• To alleviate the problems of the NN
classifier there is the k-nearest neighbor
classifier
15
K-Nearest Neighbor Classifier
• Take k = 5
𝑿𝟐
C1
𝑿 C2
𝑿𝟏
• One can see that C2 is the majority à classify 𝑋 as C2
16
K-Nearest Neighbor Classifier
• The k-nearest neighbors could be a bit far
away from 𝑋
𝑿𝟐
C1
C2
𝑿 k = 10
𝑿𝟏
• Leading to using information that might not
be relevant to the considered point 𝑋
17
Bayes Classification Rule
• Recall: histogram for feature x from class
C1 (e.g., letter ‘A’)
Number of training patterns Number of training patterns
of letter ‘A’ having x = 3 of letter ‘I’ having x = 10
8 11
1 2 3 4 5 6 7 9 10
x
18
Bayes Classification Rule
P(x|class Ci) ≡ class conditional probability function
≡ probability density of feature x, given
that x comes from class Ci
P(x|C1) P(x|C2)
8 11
1 2 3 4 5 6 7 9 10
x
19
Bayes Classification Rule
𝑋)
𝑋*
• If 𝑋 =
⋮
is a feature vector then:
𝑋+
𝑷 𝑿 𝑪𝒊 = 𝑷(𝑿𝟏 , 𝑿𝟐 , ⋯ , 𝑿𝑵 |𝑪𝒊 )
𝑿𝟏
𝑿𝟐 2 features!
20
Bayes Classification Rule
• Given a pattern 𝑋 (with unknown class) that
we wish to classify:
21
Bayes Classification Rule
• To compute 𝑃 𝐶, 𝑋 , we use Bayes rule:
𝑃(𝐶, , 𝑋)
𝑃 𝐶, 𝑋 =
𝑃(𝑋)
- .|0* -(0* )
=
-(.)
Bayes Rule:
P(A,B) = P(A|B)P(B) = P(B|A)P(A)
22
Bayes Classification Rule
• To compute 𝑃 𝐶" 𝑋 , we use Bayes rule:
𝑃 𝑋|𝐶" 𝑃(𝐶" )
𝑃 𝐶" 𝑋 =
𝑃(𝑋)
23
Bayes Classification Rule
• The a priori probabilities represent the
frequencies of the classes irrespective of the
observed features
24
Bayes Classification Rule
• Find 𝐶3 giving max 𝑃 𝐶3 𝑋
𝑃 𝑋|𝐶3 𝑃(𝐶3 )
𝑃 𝐶3 𝑋 =
𝑃(𝑋)
–𝑃 𝐶! 𝑋 ≡ posterior prob.
–𝑃 𝐶! ≡ a priori prob.
–𝑃 𝑋|𝐶! ≡ class-conditional densities
• 𝑃 𝑋 = ∑4
,") 𝑃(𝑋 , 𝐶, ) = ∑ 4
,") 𝑃 𝑋 𝐶, 𝑃(𝐶, )
25
Recap: Marginalization
• Discrete case:
&
𝑃 𝐴 = ) 𝑃(𝐴, 𝐵 = 𝐵% )
%#!
• Continuous case:
(
𝑃 𝑥 = 1 𝑃 𝑥, 𝑦 𝑑𝑦
'( Law of total probability
• So:
) )
𝑃 𝑋 = ) 𝑃(𝑋 , 𝐶% ) = ) 𝑃 𝑋 𝐶% 𝑃(𝐶% )
%#! %#!
27
Bayes Classification Rule
• Classify 𝑋 to the class corresponding to
max 𝑃 𝑋|𝐶3 𝑃(𝐶3 )
P(x|C1)P(C1) P(x|C2) P(C2)
8 11
1 2 3 4 5 6 7 9 10
x
1-D example
28
Bayes Classification Rule
• Classify 𝑋 to the class corresponding to max 𝑃 𝑋|𝐶! 𝑃(𝐶! )
8 11
1 2 3 4 5 6 7
1-D example
9 10
x
• For x=5, P(x|C1)P(C1) has a higher value compared to P(x|C2)P(C2)
à classify as C1
29
Classification Accuracy
𝑷 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 𝑿 = 𝐦𝐚𝐱 𝑷(𝑪𝒊 |𝑿)
𝟏&𝒊&𝑲
30
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = 1 𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡, 𝑋 𝑑𝑋 Marginal prob.
𝑃 𝑋|𝐶* 𝑃(𝐶* )
= 1 max 𝑃 𝑋 𝑑𝑋
* 𝑃(𝑋)
31
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = V max 𝑃 𝑋|𝐶3 𝑃(𝐶3 ) 𝑑𝑋
3
8 11
1 2 3 4 5 6
1-D example
7 9 10
x
32
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = V max 𝑃 𝑋|𝐶3 𝑃(𝐶3 ) 𝑑𝑋
3
8 11
1 2 3 4 5 6
1-D example
7 9 10
x
33
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = V max 𝑃 𝑋|𝐶3 𝑃(𝐶3 ) 𝑑𝑋
3
P(correct) = areas[ + + ]
8 11
1 2 3 4 5 6
1-D example
7 9 10
x
34
Classification Accuracy
𝑃 𝑒𝑟𝑟𝑜𝑟 = 1 − 𝑃(𝑐𝑜𝑟𝑟𝑒𝑐𝑡)
35
Classification Accuracy
𝑃 𝑒𝑟𝑟𝑜𝑟 = 1 − 𝑃(𝑐𝑜𝑟𝑟𝑒𝑐𝑡)
We can compute P(error) directly only for 2-class case!
area = P(error) 36
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya
37