0% found this document useful (0 votes)
16 views37 pages

03 Classification Methods

This document discusses various pattern classification methods. It begins by introducing the minimum distance classifier, which classifies patterns based on their distance to class centers. It then discusses nearest neighbor classification and K-nearest neighbor classification. Finally, it introduces the Bayes classification rule, which computes the posterior probabilities of class membership for a pattern based on class-conditional probabilities and prior probabilities.

Uploaded by

Mostafa Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views37 pages

03 Classification Methods

This document discusses various pattern classification methods. It begins by introducing the minimum distance classifier, which classifies patterns based on their distance to class centers. It then discusses nearest neighbor classification and K-nearest neighbor classification. Finally, it introduces the Bayes classification rule, which computes the posterior probabilities of class membership for a pattern based on class-conditional probabilities and prior probabilities.

Uploaded by

Mostafa Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Pattern Classification

03. Pattern Classification Methods

AbdElMoniem Bayoumi, PhD

Spring 2023
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya
Minimum Distance Classifier
• Choose a center or a representative pattern
from each class à 𝑉 𝑘 , where k is the
class index
𝑽 𝟑
𝑿𝟐
𝑽 𝟐 C1
C2
𝑽 𝟏 C3

𝑿𝟏
3
Minimum Distance Classifier
• Given a pattern 𝑋 that we would like to
classify

𝑽 𝟑
𝑿𝟐

𝑽 𝟐 C1
C2
𝑽 𝟏 C3

𝑿𝟏
4
Minimum Distance Classifier
• Compute the distance from 𝑋 to each
center 𝑉(𝑘):
𝑵
𝟐 𝟐
𝒅 𝒌 = )[𝑽𝒊 𝒌 − 𝑿𝒊 ] ≡ 𝑽 𝒌 −𝑿
𝒊"𝟏
𝑽 𝟑
𝑿𝟐

𝑽 𝟐 C1
C2
𝑽 𝟏 C3

𝑿𝟏 5
Recap: Euclidean Distance
• 2D:
(y1,y2)

(x1,x2)

𝒅𝟐 = (𝒚𝟐 − 𝒙𝟐 )𝟐 +(𝒚𝟏 − 𝒙𝟏 )𝟐

• N-dimensions:
𝑵

𝒅𝟐 𝑿, 𝒀 = )(𝒀𝒊 − 𝑿𝒊 )𝟐
𝒊"𝟏

6
Minimum Distance Classifier
• Find қ corresponding to the minimum
distance:
қ = 𝐚𝐫𝐠𝐦𝐢𝐧 𝒅(𝒌)
𝟏&𝒌&𝑲

• Then our classification of is 𝑋 class 𝐶қ

• 𝑋 is classified as belonging to the class


corresponding to the nearest class center

7
Class Center Estimation
• Let 𝑋 𝑚 ∈ 𝐶! , V(1)
$!
1
𝑉 1 = ) 𝑋(𝑚)
𝑀!
"#!

where, M1 is the number of training patterns from


class C1

• This corresponds to component-wise


averaging
$!
1
𝑉% 1 = ) 𝑋% (𝑚)
𝑀!
"#!

8
Minimum Distance Classifier
• Too simple to solve difficult problems

𝑿𝟐
𝑽 𝟏
𝒇 𝑿 >𝟎 C1
𝑽 𝟐 𝒇 𝑿 <𝟎 C2

𝑿𝟏

9
Minimum Distance Classifier
• Too simple to solve difficult problems

𝑿𝟐
𝑽 𝟏
𝒇 𝑿 >𝟎 C1
𝑽 𝟐 𝒇 𝑿 <𝟎 C2

𝑿𝟏
𝑿 will be classified as C2

10
Nearest Neighbor Classifier
• The class of the nearest pattern to 𝑋
determines its classification

𝑿𝟐

C1
𝑿 C2

𝑿𝟏
11
Nearest Neighbor Classifier
• Compute the distance between pattern 𝑋
and each pattern 𝑋(𝑚) in the training set

𝟐
𝒅 𝒎 = 𝑿 − 𝑿(𝒎)

• The class of the pattern m that corresponds


to the minimum distance is chosen as the
classification of 𝑋

12
Nearest Neighbor Classifier
• The advantage of the nearest neighbor
classifier is its simplicity

• However, a rouge pattern can affect the


classification negatively
𝑿𝟐

C1
𝑿 C2

𝑿𝟏 13
Nearest Neighbor Classifier
• Also, for patterns with large overlaps
between the classes, the overlapping
patterns can negatively affect performance

𝑿𝟐

C1
C2

𝑿𝟏
14
K-Nearest Neighbor Classifier
• To alleviate the problems of the NN
classifier there is the k-nearest neighbor
classifier

• Take the k-nearest points to point 𝑋

• Choose the classification of 𝑋 as the class


most often represented in these k points

15
K-Nearest Neighbor Classifier
• Take k = 5
𝑿𝟐

C1
𝑿 C2

𝑿𝟏
• One can see that C2 is the majority à classify 𝑋 as C2

• The KNN rule is less dependent on strange patterns


compared to the nearest neighbor classification rule

16
K-Nearest Neighbor Classifier
• The k-nearest neighbors could be a bit far
away from 𝑋
𝑿𝟐
C1
C2
𝑿 k = 10

𝑿𝟏
• Leading to using information that might not
be relevant to the considered point 𝑋

17
Bayes Classification Rule
• Recall: histogram for feature x from class
C1 (e.g., letter ‘A’)
Number of training patterns Number of training patterns
of letter ‘A’ having x = 3 of letter ‘I’ having x = 10

8 11
1 2 3 4 5 6 7 9 10
x

18
Bayes Classification Rule
P(x|class Ci) ≡ class conditional probability function
≡ probability density of feature x, given
that x comes from class Ci

P(x|C1) P(x|C2)

8 11
1 2 3 4 5 6 7 9 10
x
19
Bayes Classification Rule
𝑋)
𝑋*
• If 𝑋 =

is a feature vector then:
𝑋+

𝑷 𝑿 𝑪𝒊 = 𝑷(𝑿𝟏 , 𝑿𝟐 , ⋯ , 𝑿𝑵 |𝑪𝒊 )

𝑿𝟏

𝑿𝟐 2 features!
20
Bayes Classification Rule
• Given a pattern 𝑋 (with unknown class) that
we wish to classify:

– Compute 𝑃(𝐶&|𝑋), 𝑃(𝐶'|𝑋), … , 𝑃(𝐶( |𝑋)

– Find the k giving maximum 𝑃(𝐶) |𝑋)

• This is our classification according to the


Bayes classification rule

• We classify the data point (pattern) as


belonging to the most likely class

21
Bayes Classification Rule
• To compute 𝑃 𝐶, 𝑋 , we use Bayes rule:

𝑃(𝐶, , 𝑋)
𝑃 𝐶, 𝑋 =
𝑃(𝑋)
- .|0* -(0* )
=
-(.)

Bayes Rule:
P(A,B) = P(A|B)P(B) = P(B|A)P(A)

22
Bayes Classification Rule
• To compute 𝑃 𝐶" 𝑋 , we use Bayes rule:

𝑃 𝑋|𝐶" 𝑃(𝐶" )
𝑃 𝐶" 𝑋 =
𝑃(𝑋)

• 𝑃 𝑋|𝐶" ≡ Class-conditional density (defined before)

• 𝑃 𝐶" ≡ Probability of class Ci before or without observing


the features 𝑋
≡ a priori probability of class Ci

23
Bayes Classification Rule
• The a priori probabilities represent the
frequencies of the classes irrespective of the
observed features

• For example in OCR, the a priori probabilities


are taken as the frequency or fraction of
occurrence of the different letters in a typical
text

– For the letters E & A à P(Ci) will be higher

– For letters Q & X à P(Ci) will be low because they


are infrequent

24
Bayes Classification Rule
• Find 𝐶3 giving max 𝑃 𝐶3 𝑋

𝑃 𝑋|𝐶3 𝑃(𝐶3 )
𝑃 𝐶3 𝑋 =
𝑃(𝑋)

–𝑃 𝐶! 𝑋 ≡ posterior prob.
–𝑃 𝐶! ≡ a priori prob.
–𝑃 𝑋|𝐶! ≡ class-conditional densities

• 𝑃 𝑋 = ∑4
,") 𝑃(𝑋 , 𝐶, ) = ∑ 4
,") 𝑃 𝑋 𝐶, 𝑃(𝐶, )

25
Recap: Marginalization
• Discrete case:
&

𝑃 𝐴 = ) 𝑃(𝐴, 𝐵 = 𝐵% )
%#!
• Continuous case:
(

𝑃 𝑥 = 1 𝑃 𝑥, 𝑦 𝑑𝑦
'( Law of total probability
• So:
) )

𝑃 𝑋 = ) 𝑃(𝑋 , 𝐶% ) = ) 𝑃 𝑋 𝐶% 𝑃(𝐶% )
%#! %#!

Marginalization Bayes rule


26
Bayes Classification Rule
𝑃 𝑋|𝐶3 𝑃(𝐶3 )
𝑃 𝐶3 𝑋 =
∑4
,") 𝑃 𝑋 𝐶, 𝑃(𝐶, )

• In reality, we do not need to compute 𝑃 𝑋


because it is a common factor for all the
terms in the expression for 𝑃 𝐶3 𝑋

• Hence, it will not affect which terms will


end up being maximum

27
Bayes Classification Rule
• Classify 𝑋 to the class corresponding to
max 𝑃 𝑋|𝐶3 𝑃(𝐶3 )
P(x|C1)P(C1) P(x|C2) P(C2)

8 11
1 2 3 4 5 6 7 9 10
x
1-D example

28
Bayes Classification Rule
• Classify 𝑋 to the class corresponding to max 𝑃 𝑋|𝐶! 𝑃(𝐶! )

P(x|C1)P(C1) P(x|C2) P(C2)

8 11
1 2 3 4 5 6 7
1-D example
9 10
x
• For x=5, P(x|C1)P(C1) has a higher value compared to P(x|C2)P(C2)
à classify as C1

29
Classification Accuracy
𝑷 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 𝑿 = 𝐦𝐚𝐱 𝑷(𝑪𝒊 |𝑿)
𝟏&𝒊&𝑲

• Example: 3-class case:


–𝑃 𝐶( 𝑋 = 0.6, 𝑃 𝐶) 𝑋 = 0.3, 𝑃 𝐶* 𝑋 = 0.1

– You classified 𝑋 as 𝐶( à it has highest 𝑃(𝐶+ |𝑋)

– The probability that your classifier is correct


equals to the probability that 𝑋 belongs to the
same class of the classification (which is 0.6)

30
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = 1 𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡, 𝑋 𝑑𝑋 Marginal prob.

= 1 𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡|𝑋 𝑃 𝑋 𝑑𝑋 Bayes rule

𝑃 𝑋|𝐶* 𝑃(𝐶* )
= 1 max 𝑃 𝑋 𝑑𝑋
* 𝑃(𝑋)

= 1 max 𝑃 𝑋|𝐶* 𝑃(𝐶* ) 𝑑𝑋


*

31
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = V max 𝑃 𝑋|𝐶3 𝑃(𝐶3 ) 𝑑𝑋
3

P(x|C1)P(C1) P(x|C2) P(C2) P(x|C3)P(C3)

8 11
1 2 3 4 5 6
1-D example
7 9 10
x
32
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = V max 𝑃 𝑋|𝐶3 𝑃(𝐶3 ) 𝑑𝑋
3

𝐦𝐚𝐱 P(x|𝑪𝒊 )P(𝑪𝒊 )


𝒊

8 11
1 2 3 4 5 6
1-D example
7 9 10
x
33
Classification Accuracy
• Overall P(correct) is:
𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = V max 𝑃 𝑋|𝐶3 𝑃(𝐶3 ) 𝑑𝑋
3

P(correct) = areas[ + + ]

𝐦𝐚𝐱 P(x|𝑪𝒊 )P(𝑪𝒊 )


𝒊

8 11
1 2 3 4 5 6
1-D example
7 9 10
x
34
Classification Accuracy

𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = ) max 𝑃 𝑋|𝐶0 𝑃(𝐶0 ) 𝑑𝑋


0

𝑃 𝑒𝑟𝑟𝑜𝑟 = 1 − 𝑃(𝑐𝑜𝑟𝑟𝑒𝑐𝑡)

35
Classification Accuracy

𝑃 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = ) max 𝑃 𝑋|𝐶0 𝑃(𝐶0 ) 𝑑𝑋


0

𝑃 𝑒𝑟𝑟𝑜𝑟 = 1 − 𝑃(𝑐𝑜𝑟𝑟𝑒𝑐𝑡)
We can compute P(error) directly only for 2-class case!

area = P(error) 36
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya

37

You might also like