Discriminant Functions
Discriminant Functions
Discriminant Functions
Sargur N. Srihari
University at Buffalo, State University of New York
USA
1
Machine Learning Srihari
Topics
• Linear Discriminant Functions
– Definition (2-class), Geometry
– Generalization to K > 2 classes
• Methods to learn parameters
1. Least Squares Classification
2. Fisher’s Linear Discriminant
3. Perceptrons
2
Machine Learning Srihari
Discriminant Functions
• A discriminant function assigns input vector x to
one of K classes denoted by Ck
• We restrict attention to linear discriminants
– i.e., Decision surfaces are hyperplanes
• First consider K = 2, and then extend to K > 2
3
Geometry of Linear Discriminant Functions:
• Two-class linear discriminant y>0
y=0
x2
R1
function:
y<0
R2
y(x) = wT x + w0 w
x
y(x)
threshold
• Assign x to C1 if y(x) ≥ 0 else C2
– Defines decision boundary y(x) = 0
• It corresponds to a (D-1)- dimensional
hyperplane in a D-dimensional input
space
Distance of Origin to Surface is w0
Let xA and xB be points on surface y(x)=wT x+ w0=0
• Because y(xA)=y(xB)=0, we have wT(xA-xB)=0,
• Thus w is orthogonal to every vector lying on decision surface
• So w determines orientation of the decision surface
– If x is a point on surface then y(x)=0 or wTx = -w0
• Normalized distance from origin to surface: y>0
y=0
x2
y<0 R1
T
w x w0 R2
=−
|| w || || w || x
w
where ||w|| is the norm defined as
y(x)
⇤w⇤
x⇥
2 T 2 2
|| w || = w w = w + .. + w
1 M −1 x1
w0
⇤w⇤
Augmented vector
• With dummy input x0=1
• And w~=(w0, w) then y(x) = w~Tx
• passes through origin in
augmented D+1 dimensional space
7
Machine Learning Srihari
8
Machine Learning Srihari
One-versus-the-rest One-versus-one
Build a K
class Alternative is C3
discriminant ?
K(K − 1)/2 C1
Use K − 1 binary R1
discriminant
R1 R3
classifiers, R2 C
1 ?
each solve a C1
R3
functions, R2
C3
two-class not C1
C2
one for every pair C2
not C2 C2
problem
Both result in ambiguous regions of input space
9
Machine Learning Srihari
12
Machine Learning Srihari
Least Squares for Classification
• Analogous to regression: simple closed- form
solution exists for parameters
• Each Ck , k =1,..K is described by its own linear model
Note: x and w have
yk(x) = wTk
x + wk0 D dimensions each
• Define matrices
T ≡ nth row is the vector tTn This is N ×K
X ≡ nth row of which is xnT
This is the N ×(D+1) design matrix
2 2
0 0
!2 !2
!4
!4
!6
!6
!8
!8
!4 !2 0 2 4 6 8
!4 !2 0 2 4 6 8
4 4
2 Three classes 2
0 2-D space 0
!2 !2
!4 !4
!6
!6 !6 !4 !2 0 2 4 6
!6 !4 !2 0 2 4 6
2 2
0 0
!2 !2
!2 2 6
!2 2 6
Means are well-separated Projection based on Fisher
but classes overlap showing greatly improved
class separation
• Maximizing mean separation is insufficient for classes with
non-diagonal covariance
• Fisher formulation
1. Maximize function to separate projected class means
2. Also give small variance within each class,
thereby minimizing the class overlap
20
Machine Learning Srihari
23
Machine Learning Srihari
5. Perceptron Algorithm
• Two-class model
– Input vector x transformed by a fixed nonlinear
transformation to give feature vector ϕ(x)
y(x) = f (wTϕ(x))
where non-linear activation f (.) is a step function
⎧+1, a ≥ 0
f (a) =⎨
⎩-1a<0
• Use a target coding scheme
– t = +1, for class C1 and t = −1 for C2 matching the
activation function 24
Machine Learning Srihari
25
Machine Learning Srihari
26
Machine Learning Srihari
Perceptron Criterion
• Seek w such that xn∈C1 will have wT(xn) ≥ 0
whereas patterns xn∈C2 will have x=wT (xn) < 0
• Using t ∈ {+1,−1}, all patterns need to satisfy
wT ϕ (xn)tn > 0
• For each misclassified sample, Perceptron
Criterion tries to minimize –wTϕ (xn)tn or
EP(w) = −Σn∈M wT ϕntn
M denotes set of all misclassified patterns and
ϕn=ϕ(xn)
27
Machine Learning Srihari
Perceptron Algorithm
• Error function EP(w) = −Σn∈MwTϕntn
• Stochastic Gradient Descent
– Change in weight is given by
wτ+1 = wτ − η▽ EP(wτ) = wτ +ηϕntn
η is learning rate, τ indexes the steps
• The algorithm
Cycle through the training patterns in turn
If incorrectly classified for class C1 add to weight vector
If incorrectly classified for class C2 subtract from weight vector
28
Machine Learning Srihari
0.5
0.5 0.5
0
0 0
!0.5 !0.5
!0.5
!1 !1
!1 !0.5 0 0.5 1 !1 !1 !0.5 0 0.5 1
!1 !0.5 0 0.5 1
0.5
Data points
0
History of Perceptrons
Perceptron
Invented at Calspan Buffalo, NY
Rosenblatt, Frank,
The Perceptron--a perceiving and
recognizing automaton.
Report 85-460-1, 1957
Cornell Aeronautical Laboratory
30
Machine Learning Srihari
Learning to discriminate
shapes of characters Patch-board Racks of
20x20 cell to allow Adaptive Weights
Image of character Implemented
different
configurations of as potentiometers
input features φ
Known as Mark 1 Perceptron. It is now in the Smithsonian. 31
Machine Learning Srihari
Disadvantages of Perceptrons
• Does not converge if classes not
linearly separable
• Does not provide probabilistic output
• Not readily generalized to K >2 classes
32
Machine Learning Srihari
Summary
• Linear Discrimin. Funcs have simple geometry
• Extensible to multiple classes
• Parameters can be learnt using
– Least squares
• not robust to outliers, model close to target values
– Fisher’s linear discriminant
• Two class is special case of least squares
• Not easily generalized to more classes
– Perceptrons
• Does not converge if classes not linearly separable
• Does not provide probabilistic output
33
• Not readily generalized to K>2 classes