0% found this document useful (0 votes)

44 views20 pages

Classification L12

This document summarizes different classification methods: (1) It introduces discriminant functions and the optimal Bayes classifier for minimizing probability of error. (2) It describes quadratic classifiers that result from assuming Gaussian class-conditional densities, leading to quadratic discriminant functions involving Mahalanobis distances. (3) It examines special cases of the quadratic classifier where the covariance matrices have different forms, resulting in different types of decision boundaries.

Uploaded by

nsephus3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views20 pages

Classification L12

Uploaded by

nsephus3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Lecture 12: Classification

g Discriminant functions
g The optimal Bayes classifier
g Quadratic classifiers
g Euclidean and Mahalanobis metrics
g K Nearest Neighbor Classifiers

Intelligent Sensor Systems 1

Ricardo Gutierrez-Osuna
Wright State University
Discriminant functions
g A convenient way to represent a pattern classifier is in terms of
a family of discriminant functions gi(x) with a simple MAX gate
as the classification rule
Class assignment

Select
Select max
max
Costs
Costs

gg1(x) gg2(x) ggC(x) Discriminant functions

1(x) 2(x) C(x)

xx1
1
xx2
2
xx3
3
xxd
d
Features

Assign x to class if gi (x) > g (x) ∀j ≠ i

i j

g How do we choose the discriminant functions gi(x)

n Depends on the objective function to minimize
g Probability of error
g Bayes Risk
Intelligent Sensor Systems 2
Ricardo Gutierrez-Osuna
Wright State University
Minimizing probability of error
g Probability of error P[error|x] is “the probability of assigning x
to the wrong class”
n For a two-class problem, P[error|x] is simply

P( | x) if we decide
P(error | x) =  1 2

P( 2 | x) if we decide 1

g It makes sense that the classification rule be designed to minimize the

average probability of error P[error] across all possible values of x
+∞ +∞
P(error) = ∫ P(error, x)dx = ∫ P(error | x)P(x)dx
−∞ −∞

g To ensure P(error) is minimum we minimize P(error|x) by choosing the

class with maximum posterior P(ωi|x) at each x
n This is called the MAXIMUM A POSTERIORI (MAP) RULE
g And the associated discriminant functions become

gMAP
i (x) = P( i | x)

Intelligent Sensor Systems 3

Ricardo Gutierrez-Osuna
Wright State University
Minimizing probability of error
g We “prove” the optimality of the
MAP rule graphically

P(wi|x)
n The right plot shows the posterior
for each of the two classes
n The bottom plots shows the
P(error) for the MAP rule and
another rule
n Which one has lower P(error)
(color-filled area) ? x

THE MAP RULE THE “OTHER” RULE

Choose Choose Choose Choose Choose Choose

RED BLUE RED RED BLUE RED

Intelligent Sensor Systems 4

Ricardo Gutierrez-Osuna
Wright State University
Quadratic classifiers
g Let us assume that the likelihood densities are Gaussian
1  1 
P(x | i )= 1/2
exp  − (x − i )T ∑ i−1(x − i )
(2 n/2
∑i  2 

g Using Bayes rule, the MAP discriminant functions become

P(x | )P( ) 1  1  1
gi (x) = P( i | x) = i i
= 1/2
exp − (x − i )T ∑ i−1(x − i )P( i )
P(x) (2
n/2
∑i  2  P(x)

n Eliminating constant terms

 1 
exp − (x − i )T ∑ i−1(x − i )P(
-1/2
gi (x) = ∑ i i )
 2 

n We take natural logs (the logarithm is monotonically increasing)

gi (x) = − (x − i )T ∑ i−1(x − i ) - log( ∑ i ) + log(P(

1 1
i ))
2 2
g This is known as a Quadratic Discriminant Function
g The quadratic term is know as the Mahalanobis distance

Intelligent Sensor Systems 5

Ricardo Gutierrez-Osuna
Wright State University
Mahalanobis distance
g The Mahalanobis distance can be thought of vector distance that uses
a ∑i-1 norm
x1

Mahalanobis Distance
x2
2 −1 µ
x-y ∑ i −1
= (x − y) ∑i (x − y)
T
2
xi - ∑ −1
=K
2
xi - =

n ∑-1 can be thought of as a stretching factor on the space

n Note that for an identity covariance matrix (∑i=I), the Mahalanobis distance
becomes the familiar Euclidean distance
g In the following slides we look at special cases of the Quadratic
classifier
n For convenience we will assume equiprobable priors so we can drop the term
log(P(ωi))

Intelligent Sensor Systems 6

Ricardo Gutierrez-Osuna
Wright State University
Special case I: Σi=σ2I
g In this case, the discriminant
becomes
gi (x) = −(x − i )T (x − i )

n This is known as a MINIMUM

DISTANCE CLASSIFIER
n Notice the linear decision
boundaries

Intelligent Sensor Systems 7

Ricardo Gutierrez-Osuna
Wright State University
Special case 2: Σi= Σ (Σ diagonal)
g In this case, the discriminant
becomes
1
gi (x) = − (x − i )T ∑ −1(x − i )
2
n This is known as a MAHALANOBIS
DISTANCE CLASSIFIER
n Still linear decision boundaries

Intelligent Sensor Systems 8

Ricardo Gutierrez-Osuna
Wright State University
Special case 3: Σi=Σ (Σ non-diagonal)
g In this case, the discriminant
becomes
1 −1
gi (x) = − (x − i )T ∑ i (x − i )
2
n This is also known as a
MAHALANOBIS DISTANCE
CLASSIFIER
n Still linear decision boundaries

Intelligent Sensor Systems 9

Ricardo Gutierrez-Osuna
Wright State University
Case 4: Σi=σi2I, example
g In this case the quadratic
expression cannot be simplified
any further
g Notice that the decision
boundaries are no longer linear
but quadratic

Zoom
out

Intelligent Sensor Systems 10

Ricardo Gutierrez-Osuna
Wright State University
Case 5: Σi≠Σj general case, example
g In this case there are no
constraints so the quadratic
expression cannot be
simplified any further
g Notice that the decision
boundaries are also quadratic

Zoom
out

Intelligent Sensor Systems 11

Ricardo Gutierrez-Osuna
Wright State University
Limitations of quadratic classifiers
g The fundamental limitation is the unimodal Gaussian
assumption
n For non-Gaussian or multimodal
Gaussian, the results may be
significantly sub-optimal

g A practical limitation is associated with the minimum

required size for the dataset
n If the number of examples per class is less than the number of
dimensions, the covariance matrix becomes singular and,
therefore, its inverse cannot be computed
g In this case it is common to assume the same covariance structure
for all classes and compute the covariance matrix using all the
examples, regardless of class

Intelligent Sensor Systems 12

Ricardo Gutierrez-Osuna
Wright State University
Conclusions
g We can extract the following conclusions
n The Bayes classifier for normally distributed classes is quadratic
n The Bayes classifier for normally distributed classes with equal
covariance matrices is a linear classifier
n The minimum Mahalanobis distance classifier is optimum for
g normally distributed classes and equal covariance matrices and equal priors
n The minimum Euclidean distance classifier is optimum for
g normally distributed classes and equal covariance matrices proportional to
the identity matrix and equal priors
n Both Euclidean and Mahalanobis distance classifiers are linear
g The goal of this discussion was to show that some of the most
popular classifiers can be derived from decision-theoretic
principles and some simplifying assumptions
n It is important to realize that using a specific (Euclidean or Mahalanobis)
minimum distance classifier implicitly corresponds to certain statistical
assumptions
n The question whether these assumptions hold or don’t can rarely be
answered in practice; in most cases we are limited to posting and
answering the question “does this classifier solve our problem or not?”
Intelligent Sensor Systems 13
Ricardo Gutierrez-Osuna
Wright State University
K Nearest Neighbor classifier
g The kNN classifier is based on non-parametric density
estimation techniques
n Let us assume we seek to estimate the density function P(x) from a
dataset of examples
n P(x) can be approximated by the expression
 V is the volume surrounding x
k 
P(x) ≅ where  N is the total number of examples
NV  k is the number of examples inside V

n The volume V is determined by the

V=πR2
D-dim distance RkD(x) between x
and its k nearest neighbor P(x) =
k
2
x N R
R
k k
P(x) ≅ =
NV N ⋅ c D ⋅ RDk (x)

g Where cD is the volume of the

unit sphere in D dimensions

Intelligent Sensor Systems 14

Ricardo Gutierrez-Osuna
Wright State University
K Nearest Neighbor classifier
g We use the previous result to estimate the posterior probability
n The unconditional density is, again, estimated with
ki
P(x | i )=
Ni V
n And the priors can be estimated by
Ni
P( i ) =
N
n The posterior probability then becomes
k i Ni
⋅
P(x | i )P( i ) Ni V N k i
P( i | x) = = =
P(x) k k
n Yielding discriminant functions NV

ki
gi (x) =
k
g This is known as the k Nearest Neighbor classifier

Intelligent Sensor Systems 15

Ricardo Gutierrez-Osuna
Wright State University
K Nearest Neighbor classifier
g The kNN classifier is a very intuitive method
n Examples are classified based on their similarity with training data
g For a given unlabeled example xu∈ℜD, find the k “closest” labeled examples in the
training data set and assign xu to the class that appears most frequently within the k-
subset
g The kNN only requires
n An integer k 10 4
10
10
10
1010 444
n A set of labeled examples 1010 4 6666 6
10
10
10
1010 ? 4 444 4444
1010 4 666666666 1
111
5 10 1 111
n A measure of “closeness” 10 4 4 44 6
6 1 111111
12
12
6 222
222
22 3333
222
22 3 8338
0 22
2 2 2 58
3 8
3
8
9
8338
3
88
8 333
3
555859
939
888
55 5 58 9998
9 8
55555953
9 99
5 999
8
axis 2 -5 55 9 99
9
-10

77
7 7
7 77
-15

777777 7
77
-20 7
7
-0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03
axis 1

Intelligent Sensor Systems 16

Ricardo Gutierrez-Osuna
Wright State University
kNN in action: example 1
g We generate data for a 2-dimensional 3-
class problem, where the class-conditional
densities are multi-modal, and non-linearly
separable
g We used kNN with
n k = five
n Metric = Euclidean distance

Intelligent Sensor Systems 17

Ricardo Gutierrez-Osuna
Wright State University
kNN in action: example 2
g We generate data for a 2-dim 3-class
problem, where the likelihoods are
unimodal, and are distributed in rings
around a common mean
n These classes are also non-linearly separable
g We used kNN with
n k = five
n Metric = Euclidean distance

Intelligent Sensor Systems 18

Ricardo Gutierrez-Osuna
Wright State University
kNN versus 1NN
1-NN 5-NN 20-NN

Intelligent Sensor Systems 19

Ricardo Gutierrez-Osuna
Wright State University
Characteristics of the kNN classifier
g Advantages
n Analytically tractable, simple implementation
n Nearly optimal in the large sample limit (N→∞)
g P[error]Bayes >P[error]1-NNR<2P[error]Bayes
n Uses local information, which can yield highly adaptive behavior
n Lends itself very easily to parallel implementations
g Disadvantages
n Large storage requirements
n Computationally intensive recall
n Highly susceptible to the curse of dimensionality
g 1NN versus kNN
n The use of large values of k has two main advantages
g Yields smoother decision regions
g Provides probabilistic information: The ratio of examples for each class
gives information about the ambiguity of the decision
n However, too large values of k are detrimental
g It destroys the locality of the estimation
g In addition, it increases the computational burden
Intelligent Sensor Systems 20
Ricardo Gutierrez-Osuna
Wright State University

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
CIT NAAC SSR AfterSubmission
No ratings yet
CIT NAAC SSR AfterSubmission
125 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Plate Heat Exchanger 4 Plate Cleaning Manual
No ratings yet
Plate Heat Exchanger 4 Plate Cleaning Manual
10 pages
Xingang Catalogue 2020
No ratings yet
Xingang Catalogue 2020
9 pages
AAI Lecture 11 SP 25
No ratings yet
AAI Lecture 11 SP 25
77 pages
PDF2
No ratings yet
PDF2
720 pages
Unit 3
No ratings yet
Unit 3
100 pages
Machine Learning Crash Course: Computer Vision James Hays
No ratings yet
Machine Learning Crash Course: Computer Vision James Hays
38 pages
Physics 1 Wk1 Conversion of Units Scientific Notation
No ratings yet
Physics 1 Wk1 Conversion of Units Scientific Notation
158 pages
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
No ratings yet
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
23 pages
Educational Neuroscience - 1st Edition FULL PDF DOCX DOWNLOAD
100% (17)
Educational Neuroscience - 1st Edition FULL PDF DOCX DOWNLOAD
16 pages
1 Tahura Sharaban 2021 PHD
No ratings yet
1 Tahura Sharaban 2021 PHD
341 pages
Navot PHD
No ratings yet
Navot PHD
145 pages
Sujet Dissertation Telephone Portable
100% (2)
Sujet Dissertation Telephone Portable
4 pages
Earthen Dam Design
No ratings yet
Earthen Dam Design
41 pages
Quiz 1 On Wednesday
No ratings yet
Quiz 1 On Wednesday
46 pages
Sensor Characteristic
No ratings yet
Sensor Characteristic
26 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
Unit 2 - Gaussian Models
No ratings yet
Unit 2 - Gaussian Models
67 pages
Lec 9
No ratings yet
Lec 9
15 pages
Risk Factor of Suicide
No ratings yet
Risk Factor of Suicide
14 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Mod09-ppt2-ML in Image Classification
No ratings yet
Mod09-ppt2-ML in Image Classification
30 pages
Descartes1619 4
No ratings yet
Descartes1619 4
74 pages
Episode 13 Layout
No ratings yet
Episode 13 Layout
22 pages
Y7 Separating Mixtures
No ratings yet
Y7 Separating Mixtures
2 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Sitxglc002-Learner Assessment Pack Prctical
No ratings yet
Sitxglc002-Learner Assessment Pack Prctical
19 pages
ALS DLL LS5 MATH New
No ratings yet
ALS DLL LS5 MATH New
7 pages
LC50 51 51 58
No ratings yet
LC50 51 51 58
14 pages
First Quarter: (DRRM)
No ratings yet
First Quarter: (DRRM)
14 pages
Exercise and Social Media
No ratings yet
Exercise and Social Media
26 pages
Bioresource Technology: Sciencedirect
No ratings yet
Bioresource Technology: Sciencedirect
10 pages
Examinee Guide: POST Entry-Level Dispatcher Selection Test Battery
No ratings yet
Examinee Guide: POST Entry-Level Dispatcher Selection Test Battery
9 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
08classification I
No ratings yet
08classification I
52 pages
Mtec 115 - Workshop Theory and Practice III B Second Semester SY 2019 - 2020 Course Completion Hacksaw
No ratings yet
Mtec 115 - Workshop Theory and Practice III B Second Semester SY 2019 - 2020 Course Completion Hacksaw
10 pages
Lec 9
No ratings yet
Lec 9
15 pages
Inf2b Learn Note10 2up
No ratings yet
Inf2b Learn Note10 2up
7 pages
Notes On Support Vector Machines: Fernando Mira Da Silva
No ratings yet
Notes On Support Vector Machines: Fernando Mira Da Silva
60 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Report in Pa 211
No ratings yet
Report in Pa 211
6 pages
4 DL
No ratings yet
4 DL
81 pages
Lab NN KNN SVM
No ratings yet
Lab NN KNN SVM
13 pages
Lec 13
No ratings yet
Lec 13
16 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Areas of Engineering Specializations
No ratings yet
Areas of Engineering Specializations
3 pages
Distance Metric Learning For Large Margin Nearest Neighbor Classification
No ratings yet
Distance Metric Learning For Large Margin Nearest Neighbor Classification
8 pages
Institutional Plan HSS Frisal 2025
No ratings yet
Institutional Plan HSS Frisal 2025
8 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Astmb462 18
No ratings yet
Astmb462 18
6 pages
Lapidot 2018
No ratings yet
Lapidot 2018
5 pages
Artikel 21 Readability Test
No ratings yet
Artikel 21 Readability Test
8 pages
Lec 2
No ratings yet
Lec 2
37 pages
Perceptron, Convergence, and Generalization
No ratings yet
Perceptron, Convergence, and Generalization
5 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
The Forms of Anumana
No ratings yet
The Forms of Anumana
4 pages
Cours FLD
No ratings yet
Cours FLD
28 pages
SWE622 Lecture 3 Classification
No ratings yet
SWE622 Lecture 3 Classification
57 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
Sensor Characteristics
No ratings yet
Sensor Characteristics
26 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Physics Solutions
No ratings yet
Physics Solutions
4 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
Suriname Food Systems Innovation Challenge Event Flyer For Students
No ratings yet
Suriname Food Systems Innovation Challenge Event Flyer For Students
2 pages
Baysean Linear Quadratic Classifier
No ratings yet
Baysean Linear Quadratic Classifier
14 pages
Lec 04
No ratings yet
Lec 04
70 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
SVM Class
No ratings yet
SVM Class
33 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Green Building Research-Current Status and Future Agenda
No ratings yet
Green Building Research-Current Status and Future Agenda
11 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Lec 1
No ratings yet
Lec 1
42 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SGN-2506 Introduction To Pattern Recognition Handout
No ratings yet
SGN-2506 Introduction To Pattern Recognition Handout
82 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Uoc Luong Phi Tham So
No ratings yet
Uoc Luong Phi Tham So
84 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Single Layer Perceptron Classifier
No ratings yet
Single Layer Perceptron Classifier
62 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Classification L12

Uploaded by

Classification L12

Uploaded by

Lecture 12: Classification

Intelligent Sensor Systems 1

gg1(x) gg2(x) ggC(x) Discriminant functions

Assign x to class if gi (x) > g (x) ∀j ≠ i

g How do we choose the discriminant functions gi(x)

g It makes sense that the classification rule be designed to minimize the

g To ensure P(error) is minimum we minimize P(error|x) by choosing the

Intelligent Sensor Systems 3

THE MAP RULE THE “OTHER” RULE

Choose Choose Choose Choose Choose Choose

Intelligent Sensor Systems 4

g Using Bayes rule, the MAP discriminant functions become

n Eliminating constant terms

n We take natural logs (the logarithm is monotonically increasing)

gi (x) = − (x − i )T ∑ i−1(x − i ) - log( ∑ i ) + log(P(

Intelligent Sensor Systems 5

n ∑-1 can be thought of as a stretching factor on the space

Intelligent Sensor Systems 6

n This is known as a MINIMUM

Intelligent Sensor Systems 7

Intelligent Sensor Systems 8

Intelligent Sensor Systems 9

Intelligent Sensor Systems 10

Intelligent Sensor Systems 11

g A practical limitation is associated with the minimum

Intelligent Sensor Systems 12

n The volume V is determined by the

g Where cD is the volume of the

Intelligent Sensor Systems 14

Intelligent Sensor Systems 15

Intelligent Sensor Systems 16

Intelligent Sensor Systems 17

Intelligent Sensor Systems 18

Intelligent Sensor Systems 19

You might also like