0% found this document useful (0 votes)

24 views31 pages

Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033

The document discusses linear discriminant functions for classification. It explains that a linear discriminant function can be formulated to minimize a criterion function using training samples without assuming a probability distribution. The function separates linearly separable classes using a weight vector and bias term estimated from the samples.

Uploaded by

Ankur Saroj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views31 pages

Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033

Uploaded by

Ankur Saroj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Linear Classifier: Linear Discriminant Function

Compiled by Lakshmi Manasa, CED16I033

Guided by
Dr Umarani Jayaraman

Department of Computer Science and Engineering

Indian Institute of Information Technology Design and Manufacturing
Kancheepuram

April 18, 2022

1 / 31
Discriminant Function

We know the proper forms for the discriminant functions and use the
samples to estimate the values of parameters of the discriminant
function
Although it estimates the parameters of the discriminant function, it
is said to be non-parametric form as it does require the knowledge
about the probability distributions.
Linear Discriminant function will be formulated as a problem of
minimizing a criterion function.
Criterion function: the obvious criterion function for classification
purpose is the sample risk or training error.

2 / 31
Discriminant function

Training error: The average loss incurred in classifying the set of

training samples.
No probability form is assumed: If the parametric form of the
class-density function is not known; then we have to design the
decision boundary using samples which are available with us.
Here, we don’t assume any parametric form of any probability
distribution function.
But, what we know is that, the classes are linearly separable

3 / 31
Linear Discriminant Function

Non parametric form

Supervised Learning
Classes are linearly separable
Classes : ω1 and ω2
Using this information, as the classes are linearly separable, we can
formulate the linear equation as g (x) = W t X + w0 X -
d-dimensional vector W - d-dimensional weight vector W t X - Inner
product of two vectors w0 - bias/threshold weight

4 / 31
Decision criteria

g (x) > 0; xω1

g (x) < 0; xω2
g (x) = 0; then x on the decision boundary
Now let us analyze the significance of each attribute in the equation,
g (x) = W t X + w0
Nature of weight vector W
What does g(x) represents?

5 / 31
1. Nature of weight vector w

g (X 1 ) = g (X 2 )
W tX 1 + w0 = W tX 2 + w0
W t (X 1 − X 2 ) = 0
We know that, A.B = |A|.|B|cosΘ;
If A.B = 0, then A is perpendicular to B
Likewise, W t (X 1 − X 2 ) is the inner product of weight vector W with
(X1 − X2 ).
As it is zero, it indicates that vector ‘W ’ is orthogonal to any vector
lying on decision surface.
In d-dimensional space, this surface is called as Hyper plane ’H’. 6 / 31
2. What does g(x) represents?

Draw a perpendicular line from a point x to the Hyper plane ‘H’

which is Xp
W
Let the distance of X and Xp is ‘r’ Then, X = X p + r . ||W ||
7 / 31
2. What does g(x) represents?

As seen earlier, W is orthogonal to the hyper plane ‘H’.

So, the direction of ‘W ’ is same direction of from X p to X .
Hence, Both X p to X and ‘W ’ is orthogonal to hyper plane ‘H’
8 / 31
2. What does g(x) represents?

Pd
W w
= qPi=1 i
||W || d 2
i=1 (w i )

w
X = X P + r . ||W ||

g (X ) = W t X + w 0
w
g (X ) = W t [X p + r . ||W || ] + w 0
t
g (X ) = W t X p + w 0 + r . W||W.W
||

The point X p that lies on the decision surface so W t X p + w 0 is zero.

t
g (X ) = 0 + r . W||W.W
||
2
g (X ) = 0 + r . ||W ||
||W ||
g (X ) = r .||W ||
9 / 31
2. Why g(x) is algebraic measure?

If ax + by + c = 0 is the equation of the straight line and (x1 , y1 ) is a

point, then distance of (x1 , y1 ) to the line is nothing but
ax√1 +by 1 +c
d= a2 +b 2
In, 2-dimension.
g (x)
r= ||w ||2
In, d-dimension.

10 / 31
3. Distance of origin from the hyperplane H

w0
Distance of origin from the huperplane H is ||W || ; w 0 is
Bias/Threshold.
If w 0 is +ve, then origin lies on the +ve side of the hyper plane ‘H’.
If w 0 is -ve, then origin lies on the -ve side of the hyper plane ‘H’.
If w 0 is zero, then the hyper plane passes through origin. And also,

11 / 31
3. Distance of origin from the hyperplane H

12 / 31
3. Distance of origin from the hyperplane H

13 / 31
3. Distance of origin from the hyperplane H

14 / 31
3. Distance of origin from the hyperplane H

If w 0 is zero, discriminant function g (x) takes the particular form

g (x) = W T X ; in this case we don’t have any bias because w 0 = 0
g (x) = W T X is said to be in Homogeneous form
In mathematics, It is convenient, If we represent the equation in
Homogeneous form.
So, in order to design a linear classifier we should estimate two
parameters such as weight vector W and bias w0 .
Since it is supervised learning W and w 0 are supposed to be
estimated based on the samples that are available.

15 / 31
Design of weight vector W

Assumption: Two classes and linearly separable case

We have two classes and it is linearly separable
We should have the discriminant function which separates these two
classes
It is of the form g (X ) = W t X + w 0
This expression is not in homogeneous form.
Hence, converting this homogeneous form makes the analysis easier.

16 / 31
Converting to Homogeneous form

g (X ) = W t X + w 0
g (X ) ≈ at y
 
x1
x 2 
 
x 3 
g (X ) ≈ w 1 w 2 ... ... w d w0  
 .. 
 
x d 
1
Pd
g (X ) ≈ i=1 w i x i + w0
g (X ) ≈ W tX + w0

17 / 31
Decision rule in Homogeneous form

The decision rule remains the same, for at y

If at y > 0 then decide y ω1
If at y < 0 then decide y ω2
If at y = 0 then no decision can be taken.

18 / 31
How to design weight vector ’W ’ and w 0 using the
samples?

We have n- no of samples (or) training samples y1 , y2 , ..., yn

         
x11 x21 .. .. xn1
 x12   x22  .. ..  xn2 
         
 ..   ..  .. ..  .. 
y1 = 
  y2 =   y3 =   y 4 =   yn =  
 .. 
  ..  .. ..  .. 
       
x 1 d  x2d  .. .. xnd 
1 1 .. .. 1
These are the samples which are useful to train the classifier.
Some of the samples are labeled as ω1 and some are labeled as ω2 .
Let’s consider the i th sample as yi .

19 / 31
Two Criterion Decision rule in Homogeneous form

The decision rule remains the same, for at yi

If at yi > 0 then decide yi ω1
If at yi < 0 then decide yi ω2
If at yi = 0 then no decision can be taken.

20 / 31
Two Criterion Decision rule in Homogeneous form

Given a weight vector ’a’; If we take all the samples which are labelled
as ω1
If for each of the samples, at yi > 0; then that weight vector ‘a’ is
correctly classifying all the samples which are labelled as ω1
If we also find, for the same weight vector ’a’ all the samples
belonging to class ω2 ;
If at yi < 0; then the weight vector ’a’ is also classified correctly for all
samples belongs to class ω2
That particular weight vector ’a’ is the correct weight vector, because
it is correctly classified all the samples labelled as ω1 , also it is
correctly classified all the samples labelled as ω2 .

21 / 31
Single Criterion

Instead of two conditions at y i > 0 and at y i < 0, Can’t we have a

single criterion to classify correctly.
at y i > 0 true, irrespective of class label.
We can say that, y i is correctly classified if at y i > 0. Otherwise, yi is
mis-classified.
Otherwise, include < 0 and = 0.

22 / 31
Single Criterion: How can we do that?

Samples belonging to class ω1 , we can take them as it is.

For the samples belonging to class ω2 , we augment them by
appending 1 and then take negative of it.
Take all the samples which are labelled as ω2 and then negate it.
Instead of considering yi , consider −yi
If we take negative, this at y i which is supposed to be < 0, now it will
be > 0.
So, we get single (uniform) decision criterion which is at y i > 0 for
both the classes.

23 / 31
Single Criterion: How can we do that?

If at y i > 0, all samples are correctly classified, irrespective of class

labels.
Now, for this what will be the weight vector ’a’ ?
We take some criteria Function, J(a).
J(a) has to be minimized, if ’a’ is a solution (correct weight) vector.
J(a) will be minimum, If it classifies all the training samples correctly,
for the weight vector ’a’ which is obtained.
For minimization of J(a), we can make use of Gradient Descent
Procedure.

24 / 31
Gradient Descent Procedure

25 / 31
Gradient Descent Procedure

Initialize with weight vector a(k) with some random values and try to
minimize the training error for every iteration.
At the k th iteration, we know the values of a(k).
We should update the weight vector for a(k+1).
a(k + 1) = a(k) − η(k) 5 J(a(k))
This is called Gradient Descent Procedure or Steepest Descent
Procedure.

26 / 31
Algorithm: Gradient Descent

Initialize a, threshold θ, η(.), k ← 0

do k ← k + 1
a ← a − η(k) 5 J(a)
until η(k) 5 J(a) < θ
return a
27 / 31
Perceptron Criterion Function

Our aim will be to find out weight vector ’a’ which will classify all the
training samples correctly.
So, we can try to design a criterion function which will make use of
samples which are not correctly classified.
If the samples are not correctly classifies by a(k), then update weight
vector ’a’ in a(k+1)
So, accordingly we can define the criterion function
Criterion function can be defined as,
J p (a) = ∀y misclassified (−at y ), here p refers to perceptron criterion.
P

28 / 31
Perceptron Criterion Function

t y ),
P
J p (a) = ∀y misclassified (−a
Here, (−at y ) is positive.
As a result, The criterion for J p (a), never have a negative value.
It can always have a positive value.
The minimum value can be 0.
So, we have a global minimum for J p (a) and this can be find by
Gradient Decent Procedure.

29 / 31
Perceptron Criterion Function

According to Gradient Descent Procedure, take gradient of J p (a)

w.r.t weight vector a.
J p (a) = ∀y misclassified (−at y ),
P
P
5. J p (a) = ∀y misclassified (−y )
The update rule is,
a(0) ⇒ Initial weight vector; arbitrary .
P
a(k + 1) = a(k) + η(k) ∀y misclassified (y )
This is the algorithm to design weight vector ’a’ if the samples are
linearly separable.

30 / 31
THANK YOU

31 / 31

Repeated Measure Design
No ratings yet
Repeated Measure Design
12 pages
SAS 18 ACC 117 2nd Periodical Exam CS
No ratings yet
SAS 18 ACC 117 2nd Periodical Exam CS
8 pages
Mba-1-Sem-Business-Statistics-Mba-Aktu-Previous Year Paper
50% (2)
Mba-1-Sem-Business-Statistics-Mba-Aktu-Previous Year Paper
2 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Pearsons R
No ratings yet
Pearsons R
8 pages
Chapter 8
No ratings yet
Chapter 8
103 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Machine Learning: Linear Models For Classification 1
No ratings yet
Machine Learning: Linear Models For Classification 1
30 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Guideshorttestu 01
No ratings yet
Guideshorttestu 01
219 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Act. Stat
67% (3)
Act. Stat
10 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
Gauss Markov Book
No ratings yet
Gauss Markov Book
150 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
SML Lecture5
No ratings yet
SML Lecture5
45 pages
Linear Classifiers
No ratings yet
Linear Classifiers
48 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
wp667 PDF
No ratings yet
wp667 PDF
38 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
BA ZG524 Advanced Statistical Methods
No ratings yet
BA ZG524 Advanced Statistical Methods
7 pages
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
No ratings yet
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
21 pages
NN Theory
No ratings yet
NN Theory
138 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Linear Discriminant Functions Lesson 26: Characterization of The Decision Boundary
No ratings yet
Linear Discriminant Functions Lesson 26: Characterization of The Decision Boundary
7 pages
LDPC Optimization
No ratings yet
LDPC Optimization
32 pages
Ameen, Muhydeen Garba (09/30GB116) : An Empirical Assesment of Causes of Building Failures in Lagos State
No ratings yet
Ameen, Muhydeen Garba (09/30GB116) : An Empirical Assesment of Causes of Building Failures in Lagos State
58 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
Logistic Regression Training DR Anil
No ratings yet
Logistic Regression Training DR Anil
38 pages
Regression
No ratings yet
Regression
24 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Maher PDF
No ratings yet
Maher PDF
10 pages
Unit-4 Part-1 ML Ai&Ml r23
No ratings yet
Unit-4 Part-1 ML Ai&Ml r23
20 pages
Professor Salary Salary+1000 Salary 1.05
No ratings yet
Professor Salary Salary+1000 Salary 1.05
6 pages
t4 Sol
No ratings yet
t4 Sol
8 pages
Module 6 T Tests
No ratings yet
Module 6 T Tests
39 pages
Bayesian
No ratings yet
Bayesian
21 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
No ratings yet
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
43 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Lec 7
No ratings yet
Lec 7
21 pages
Horssen Et Al - 1999
No ratings yet
Horssen Et Al - 1999
14 pages
CS221 - Artificial Intelligence - Machine Learning - 3 Linear Classification
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 3 Linear Classification
28 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
24 pages
5 2021 Ekong
No ratings yet
5 2021 Ekong
22 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
No ratings yet
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
22 pages
PRu 4
No ratings yet
PRu 4
13 pages
Chapter Classification
No ratings yet
Chapter Classification
12 pages
I Nteraction Terms in Logit and Probit Models: Chunrong Ai, Edward C. Norton
No ratings yet
I Nteraction Terms in Logit and Probit Models: Chunrong Ai, Edward C. Norton
7 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Hayashi ch3 4 - GMM
No ratings yet
Hayashi ch3 4 - GMM
31 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Biostat Assignment 7
No ratings yet
Biostat Assignment 7
7 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
Main
No ratings yet
Main
5 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
Exam
No ratings yet
Exam
11 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
A Literature Survey On Domain Adaptation of Statistical Classifiers
No ratings yet
A Literature Survey On Domain Adaptation of Statistical Classifiers
12 pages
Perceptron Notes
No ratings yet
Perceptron Notes
5 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
3 Recitation StochasticGradientDescent
No ratings yet
3 Recitation StochasticGradientDescent
10 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
3 Linear
No ratings yet
3 Linear
5 pages
Standard Deviation Formulas
No ratings yet
Standard Deviation Formulas
10 pages
Cost Segregation and Estimation (Final)
No ratings yet
Cost Segregation and Estimation (Final)
3 pages
Hsieh Yeung 1986 Active Neck Motion Measurements With A Tape Measure
No ratings yet
Hsieh Yeung 1986 Active Neck Motion Measurements With A Tape Measure
3 pages
SOAL UAS Metlit Januari 2024 Take Home
No ratings yet
SOAL UAS Metlit Januari 2024 Take Home
3 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Partial Correlation
No ratings yet
Partial Correlation
2 pages
Ex4 22
No ratings yet
Ex4 22
3 pages
Sociology 405/805 Winter 2004 Notes For T-Test and MCA 1. T-Test
No ratings yet
Sociology 405/805 Winter 2004 Notes For T-Test and MCA 1. T-Test
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet

Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033

Uploaded by

Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033

Uploaded by

Linear Classifier: Linear Discriminant Function

Compiled by Lakshmi Manasa, CED16I033

Department of Computer Science and Engineering

April 18, 2022

Training error: The average loss incurred in classifying the set of

Non parametric form

g (x) > 0; xω1

Draw a perpendicular line from a point x to the Hyper plane ‘H’

As seen earlier, W is orthogonal to the hyper plane ‘H’.

The point X p that lies on the decision surface so W t X p + w 0 is zero.

If ax + by + c = 0 is the equation of the straight line and (x1 , y1 ) is a

If w 0 is zero, discriminant function g (x) takes the particular form

Assumption: Two classes and linearly separable case

The decision rule remains the same, for at y

We have n- no of samples (or) training samples y1 , y2 , ..., yn

The decision rule remains the same, for at yi

Instead of two conditions at y i > 0 and at y i < 0, Can’t we have a

Samples belonging to class ω1 , we can take them as it is.

If at y i > 0, all samples are correctly classified, irrespective of class

Initialize a, threshold θ, η(.), k ← 0

According to Gradient Descent Procedure, take gradient of J p (a)

You might also like

g (x) > 0; xω1