Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
1 / 31
Discriminant Function
We know the proper forms for the discriminant functions and use the
samples to estimate the values of parameters of the discriminant
function
Although it estimates the parameters of the discriminant function, it
is said to be non-parametric form as it does require the knowledge
about the probability distributions.
Linear Discriminant function will be formulated as a problem of
minimizing a criterion function.
Criterion function: the obvious criterion function for classification
purpose is the sample risk or training error.
2 / 31
Discriminant function
3 / 31
Linear Discriminant Function
4 / 31
Decision criteria
5 / 31
1. Nature of weight vector w
g (X 1 ) = g (X 2 )
W tX 1 + w0 = W tX 2 + w0
W t (X 1 − X 2 ) = 0
We know that, A.B = |A|.|B|cosΘ;
If A.B = 0, then A is perpendicular to B
Likewise, W t (X 1 − X 2 ) is the inner product of weight vector W with
(X1 − X2 ).
As it is zero, it indicates that vector ‘W ’ is orthogonal to any vector
lying on decision surface.
In d-dimensional space, this surface is called as Hyper plane ’H’. 6 / 31
2. What does g(x) represents?
Pd
W w
= qPi=1 i
||W || d 2
i=1 (w i )
w
X = X P + r . ||W ||
g (X ) = W t X + w 0
w
g (X ) = W t [X p + r . ||W || ] + w 0
t
g (X ) = W t X p + w 0 + r . W||W.W
||
10 / 31
3. Distance of origin from the hyperplane H
w0
Distance of origin from the huperplane H is ||W || ; w 0 is
Bias/Threshold.
If w 0 is +ve, then origin lies on the +ve side of the hyper plane ‘H’.
If w 0 is -ve, then origin lies on the -ve side of the hyper plane ‘H’.
If w 0 is zero, then the hyper plane passes through origin. And also,
11 / 31
3. Distance of origin from the hyperplane H
12 / 31
3. Distance of origin from the hyperplane H
13 / 31
3. Distance of origin from the hyperplane H
14 / 31
3. Distance of origin from the hyperplane H
15 / 31
Design of weight vector W
16 / 31
Converting to Homogeneous form
g (X ) = W t X + w 0
g (X ) ≈ at y
x1
x 2
x 3
g (X ) ≈ w 1 w 2 ... ... w d w0
..
x d
1
Pd
g (X ) ≈ i=1 w i x i + w0
g (X ) ≈ W tX + w0
17 / 31
Decision rule in Homogeneous form
18 / 31
How to design weight vector ’W ’ and w 0 using the
samples?
19 / 31
Two Criterion Decision rule in Homogeneous form
20 / 31
Two Criterion Decision rule in Homogeneous form
Given a weight vector ’a’; If we take all the samples which are labelled
as ω1
If for each of the samples, at yi > 0; then that weight vector ‘a’ is
correctly classifying all the samples which are labelled as ω1
If we also find, for the same weight vector ’a’ all the samples
belonging to class ω2 ;
If at yi < 0; then the weight vector ’a’ is also classified correctly for all
samples belongs to class ω2
That particular weight vector ’a’ is the correct weight vector, because
it is correctly classified all the samples labelled as ω1 , also it is
correctly classified all the samples labelled as ω2 .
21 / 31
Single Criterion
22 / 31
Single Criterion: How can we do that?
23 / 31
Single Criterion: How can we do that?
24 / 31
Gradient Descent Procedure
25 / 31
Gradient Descent Procedure
Initialize with weight vector a(k) with some random values and try to
minimize the training error for every iteration.
At the k th iteration, we know the values of a(k).
We should update the weight vector for a(k+1).
a(k + 1) = a(k) − η(k) 5 J(a(k))
This is called Gradient Descent Procedure or Steepest Descent
Procedure.
26 / 31
Algorithm: Gradient Descent
Our aim will be to find out weight vector ’a’ which will classify all the
training samples correctly.
So, we can try to design a criterion function which will make use of
samples which are not correctly classified.
If the samples are not correctly classifies by a(k), then update weight
vector ’a’ in a(k+1)
So, accordingly we can define the criterion function
Criterion function can be defined as,
J p (a) = ∀y misclassified (−at y ), here p refers to perceptron criterion.
P
28 / 31
Perceptron Criterion Function
t y ),
P
J p (a) = ∀y misclassified (−a
Here, (−at y ) is positive.
As a result, The criterion for J p (a), never have a negative value.
It can always have a positive value.
The minimum value can be 0.
So, we have a global minimum for J p (a) and this can be find by
Gradient Decent Procedure.
29 / 31
Perceptron Criterion Function
30 / 31
THANK YOU
31 / 31