0% found this document useful (0 votes)

38 views16 pages

Lec 10

This document provides an overview of a lecture on linear classifiers and support vector machines from an online course on deep learning. The lecture begins by continuing the discussion on linear classifiers from the previous class. It then discusses using a linear boundary to separate samples from two classes in a 2D or multi-dimensional space. The lecture explains how to find the limits on the vector a, which is perpendicular to the separating hyperplane. It also describes using a gradient descent algorithm to iteratively minimize the error and adjust the vector a to properly classify samples. The overall goal is to position the separating hyperplane within the solution region to correctly classify all training samples from the two classes.

Uploaded by

Kutti Dhivya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views16 pages

Lec 10

Uploaded by

Kutti Dhivya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Deep Learning

Prof. Prabir Kumar Biswas

Department Of Electronics and Electrical Communication Engineering
Indian Institute of Technology, Kharagpur

Lecture –10
Linear Classifier – II

Hello welcome to the NPTEL online certification course on Deep Learning.

(Refer Slide Time: 00:33)

In our previous class, we have talked about we have recapitulated the discriminant function
and the decision boundary. We have talked about the nearest neighbour and k-NN or k
nearest neighbuor classifier and we had started our discussion on linear classifier. So, today
let us continue our discussion on linear classifier and then we will move on to support vector
machine.
(Refer Slide Time: 01:03)

So, you see here that the linear classifier that we are discussing over here is for a 2 class
problem that is we have samples belonging to 2 different classes say ω 1 and ω 2 . And I
assume that the samples are linearly separable and in which case given these two training
samples from classes ω 1 and ω 2 , I can separate these samples using a linear boundary. So,
the linear boundary in a two dimensional case will be a straight line having an equation of the
form

aX 1 + bX 2 + C = 0

Whereas, in a multi dimensional case, this linear boundary which will be nothing, but on
hyperplane will be of the form

at Y = 0

a is a d plus 1 dimensional feature vector Y is also a d plus 1 dimensional feature vector. So,
as we have said in our previous lecture that this a includes the vector which is perpendicular
to the plane separating the feature vectors belonging to 2 classes and also the bias term and Y
is obtained as appending 1 to the feature vectors of dimension d. So, that is how we get d plus
1 dimensional vector for a and also d plus 1 dimensional vector representing Y that is the
feature vectors.

And we have also assumed that for all the feature vectors taken from class ω 1 , Y remains as
it is; whereas, for all the feature vectors which be taken from class ω 2 Y is negated. This is
for the design purpose. Why we have done it? Because while designing the separator I have
an uniform decision rule that I should have always at Y > 0 if our training vector Y is
correctly classified.

So, our original classification rule was if Y is taken from class ω 1 that then at Y > 0 ; if Y
is taken from class ω 2 then at Y < 0 . So, what we have done is we have simply negated Y
for all the samples taken from class ω 2 . So, for after negation whether the same samples are
taken from class ω 1 or ω 2 , I should always have at Y > 0 . So, this is simply for the
convenience of designing the decision boundary.

(Refer Slide Time: 04:18)

So, given this, now let us see that we have seen that as we have our decision boundary given
by at Y = 0 ; that means, the vector a is orthogonal to the plane separating the two feature
vectors right. So, can we have can we try to find out what should be should be the limits on
a?
(Refer Slide Time: 04:46)

So, in order to see that let us take a very small example; a simple example. So, something like
this here I have samples taken from class ω 1 , I have taken smaller sample size for clarity of
the pictures. So, these are the feature vectors which are taken from class ω 1 and these are the
feature vectors which are taken from class ω 2 . So, what I want to have is I want to have a
plane which separates these two sets of feature vectors; one from class ω 1 and one from
class ω 2 .

(Refer Slide Time: 05:22)

So, how we do this? So, I am trying to find out obtain something like this. So, you find that
this pin clearly separates in the feature vectors from class ω 1 and the feature vectors from
class ω 2 . Now you find that there is a limit of orientation of this particular plane. If I rotate
this plane in say anti clockwise direction. So, this is my new position of the plane, this is the
limit to which I can rotate because if I rotate it further, then this is the point which is going to
be misclassified.

Similarly coming to the other side; if I take this plane, this is a limit on the other side.
Because if I rotate it further in the clockwise direction, then this is a vector which is going to
be misclassified. So, that range in which I can have this separating plane between these two
classes is given by this. So, this is the separating plane given the separation plane, what
happens to our solution vector a because this solution vector a as we have seen is orthogonal
to the plane

at Y = 0

(Refer Slide Time: 06:53)

So, if I look at that you find that over here what I have done is in earlier case, our feature
vectors these were the feature vectors corresponding to class ω 2 on this side. And what he
said is for design purpose, we want to negate this feature vectors. So, that always I will have
at Y > 0 while designing. This is my decision for correct classification and I can always do
it because now the feature vectors are the training vectors. So, for every vector I know what
is this class belongingness.

But you remember once your classifier is designed; that means, once your separating plane is
designed, my decision rule will always be or the classification rule will always be that if
at Y > 0 then Y should belong to class ω 1 . This is what we will do while classification or
during testing because in that case the feature vector Y that we went to classify I do not know
from which class this has been taken because that is what I have to decide.

But during design with the training vectors, I know from which from which class which
training vector has done. So, I can negate Y in that case for all the Y taken from class ω 2 and
that is what has been done over here.

(Refer Slide Time: 08:28)

So, given this you will find that when I have this limiting position of the separating plane the
vector a is this one which is orthogonal to this particular plane. Similarly on the other limiting
side when this is the position of the separating plane, then my solution vector a is this one.
And you find that if the solution vector a is rotated towards left from this limiting position,
then this is a vector which is going to be misclassified.

Similarly if the feature if the solution vector a is rotated in the clockwise direction from this
position, then this is a feature vector belonging to class ω 2 which is going to be misclassified.
So, that clear clearly tells me that I have a solution region and my solution vector a vector
which is perpendicular or orthogonal to the separating plane must lie within this solution
region. If it is outside this, then vectors belonging to class ω 1 will be misclassified; if it is
outside on this side, then the vector belong to class ω 2 are going to be misclassified.

So, my solution vector must lie within this conical region. So, how I can have a feature vector
of this solution vector of this problem?

(Refer Slide Time: 10:11)

So, for doing this, I start my algorithm like this. So, initially I assume that at 0 th instant, my
feature vector a 0 is chosen arbitrarily right and we have I have also said that at any k th step
in my iteration, I have a solution vector say a k. And it is possible that this solution vector a k
at the k th step, we will correctly classify for some of the samples and will be miss correctly
classify some other samples.

So, for all the samples which are correctly classified, I will always have a k transpose Y
greater than 0 and for all the samples which are incorrectly classified, I will have a k
transpose Y less than 0. Irrespective of whether the vector is taken from class ω 1 or ω 2
because all the training vectors taken from class omega 2 have been negated.
So, during design or during training whenever I come across this type of situation that
at Y < 0 , immediately I can conclude that current a k has misclassified the vector Y. And
for this miss classification,.

So, whenever at Y < 0 , I generate an error term:

which will always be positive; that means, if I have any misclassified sample by a k, I will
have a positive error and this I do for all the samples which are misclassified by a k. So, you
have take the summation of this over all Y which are misclassified and I call it the error J as a
function of a because as all the Ys are fixed because those are my training samples, but what
I can vary is a.

So, I represent this as an error J(.) which is a function of a and this is an error which is
normally called perceptron criteria. (Refer Slide Time: 13:07)
So, what is my solution approach? If you look at the figure over here, you find that this was
my solution region. So, if I have at any instant of time a solution vector a over here, my
approach should be that I should be able to push this a towards this side. So, that it moves
inside the solution region or at any instant of time if my solution vector a is on this side, I
should push it in this direction or push it towards the solution region.

So, that eventually my vector will land in the solution region and I get the proper separating
boundary. So, what is the approach that I should take for this?

(Refer Slide Time: 14:00)

The approach can be that as I said that my error:

So, I can take an approach to reduce this error or to minimize this error by following and
algorithm known as gradient descent algorithm. What is gradient descent algorithm? Let us
take a very simple case something like this. Say I have a variable X and I have a function
f(X) and the function f(X) is something like this and iteratively I want to get a value of x for
which f x is minimum. So, somewhere over here as in this case, I want to get a value of a
which will minimize this error J a.

So, you find that here what I can do is if I start somewhere over here so, this is my X 0 ; I will
take the gradient of f(X) with respect to x. So, what I will compute is:

∂f (X)
What I do is I move x in the negative direction of ∂(X)
. So, I move in this direction. So, my
∂f (X)
x next time will be the previous x minus ∂(X)
and if I take this movement; if I change x in

the negative direction of the gradient, I am moving in the direction of minimum f(x). So, this
is what is known as gradient descent algorithm and we will talk more of this in details later.
But for the time being let me take that I will use this gradient descent algorithm to minimize
my error term J.

(Refer Slide Time: 16:10)

So, what I have is I have J:

I want to take the gradient of J with respect to a and which is nothing, but from here it is Y
minus take the summation over all Y which are misclassified right. So, my decision rule can
be as we said before that initially at the 0 th instant, I assumed that a(0) will be chosen
arbitrarily or at random.

(Refer Slide Time: 17:13)

So, the better term is I choose a 0 at random and then if I have a k at k th iteration from there,
I want to find out a k + 1 which should reduce my error J(a)

Now we have:

∇J(a) = − ∑ Y
misclassif ied examples

So, the update rule becomes:

So, given this, now let us see how this rule upgradation rule actually tries to refine a k + 1 or
the weight vector. So, that eventually the weight vector pushes falls in the solution region.

(Refer Slide Time: 19:24)

So, this is what we were talking about. This is my solution region; solution region is within
this. Now if I take an initial separating plane something like this. So, I am assuming that this
is my initial separating plane and the weight vector a or a(0) is this one which is orthogonal to
the separating plane. So, here you find clearly that the initial solution vector which I have
chosen at random because this is the pending plane has been chosen at random, it falls
outside the solution region because the solution region is this.

So, naturally I have to upgrade or I have to modify this solution vector a(0). So, that it is
moved towards the solution region over here.
(Refer Slide Time: 20:33)

And for that what updation rule we have used, the updation rule was:

So, what is the misclassified sample in this case? I have these two samples which are
misclassified by this separating vector right. So, if you take the sum of these two feature
vectors which are misclassified, I will get a feature vector; I will get the sum of the
misclassified Y which is in this direction and this sum is scaled by some η which is known
as rate of convergence. I will come to what is the importance of the state of convergence
later.

So, this is a(k) or a 0; you add this sum of Y or sum of these vectors this particular vector
scaled by eta to this vector to get your a(1). So, a(1) will be moved in this direction because
sum of Y misclassified Y is in this direction. So, η times this is added to this initial vector
which was chosen at random at the vector moves towards this direction.

Now what is the importance of the state of convergence eta? If the value of η is very high,
then it is possible that I will jump this solution region and my modified vector will be
somewhere over here. So, you find that you have crossed the solution region for a larger
value of η . If the value of eta is very small maybe I will be landing somewhere over here
which is again not in the solution region. So, if the value of η is very small, then your rate of
convergence or the rate at which you are moving towards the solution is small. If the value of
η is very high, then that risk is that you may overshoot the solution region and to go to the
other side of the solution region. So, your number of iterations to reach the solution will be
larger.

However, if the value of η is appropriate and then, it is possible that your next modified
solution vector the vector that you obtain will be in your solution region and which is the
solution that you are looking for. Now whatever it is, once I have these misclassified samples
using misclassified samples you refine the weight vector a.

(Refer Slide Time: 23:13)

And for by this refinement, it is possible that the next weight vector; the next separating plane
that you will land that you will achieve is this one. And you find that for this separating
plane, the weight vector is this which is again as we said that if the value of η is very large;
in that case, you can cross the solution region you can overshoot the solution region and that
is what exactly has happened over here.

So, this is my weight vector which is falls in which falls on the other side of the solution
region and the sample which is misclassified by this separating plane is this one right. So, at
the next weight vector that I should get which is a(2):
So, it is possible that the next solution that I obtain will be within this solution region is
somewhere over here. So, this approach clearly says that by taking the gradient descent
approach to reduce the error, it is possible that I will get eventually I will get a separating
plane which separates the 2 classes of the feature vectors the samples belonging to class
ω 1 and ω 2 .

(Refer Slide Time: 24:55)

And that is possible when I have the samples which are really linearly separable. If the
samples are not linearly separable, then it is not possible to obtain a separating boundary is
something like this. And the approach that we have to take in such cases is what is known as
minimum error criteria so, that the separating plane that you obtain does not totally remove
the error, but the plane will try to reduce the squared error or the minima or it will try to
minimize the squared error of classification.

So, in today’s lecture what we have done is we have tried to find out a boundary between the
samples taken from the 2 different classes, the samples which are provided for designing the
separating plane or the training samples. And for designing this separating plane what we
have done is we have appended one to all the attaining vectors and the vectors taken from
class omega 1 which actually falls on the negative bound; negative half of the boundary.

We have negated them for design purpose only, but you keep in mind that once you are
separating plane is properly designed or once I get this equation at Y = 0 which is the
equation of this separating plane by using the training vectors for classification of the
unknown sample. Now my classification rule will be for an unknown sample say Y unknown;
if it is greater than 0, then Y unknown is to be classified to class ω 1 ; if it is less than 0, then
Y unknown has to be classified to class ω 2 .

So, with this, we will stop today’s lecture. Next class, we will talk about the support vector
machine.

Thank you.

Special Program in Sports Curriculum Map
No ratings yet
Special Program in Sports Curriculum Map
10 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Company Profile
No ratings yet
Company Profile
17 pages
SIGA AB4G Audible Detector Base Installation Sheet
No ratings yet
SIGA AB4G Audible Detector Base Installation Sheet
6 pages
HT2000 H56163 HR PDF
No ratings yet
HT2000 H56163 HR PDF
1 page
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
No ratings yet
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
43 pages
2023 Annual Gad Accomplishment Report
100% (1)
2023 Annual Gad Accomplishment Report
5 pages
5932-4 X 6 5 MW Wartsila Vasa 18v32ln Technical Details
No ratings yet
5932-4 X 6 5 MW Wartsila Vasa 18v32ln Technical Details
16 pages
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
No ratings yet
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
59 pages
Pattern Recognition: C G (P) G (F (M) )
No ratings yet
Pattern Recognition: C G (P) G (F (M) )
143 pages
Taiga Report v2
No ratings yet
Taiga Report v2
17 pages
National Biosecurity Manual For Beef Cattle Feedlots
No ratings yet
National Biosecurity Manual For Beef Cattle Feedlots
36 pages
PG TRB Tamil Question Paper 2001
No ratings yet
PG TRB Tamil Question Paper 2001
15 pages
Gas Turbin Mark VI
100% (8)
Gas Turbin Mark VI
42 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
Cam Dynamics
No ratings yet
Cam Dynamics
8 pages
Power Sources: Marine Propulsion Is The Mechanism or System Used To Generate
No ratings yet
Power Sources: Marine Propulsion Is The Mechanism or System Used To Generate
7 pages
Smib TLab
No ratings yet
Smib TLab
13 pages
Lec 11
No ratings yet
Lec 11
17 pages
New - Hab.1 (Repaired)
No ratings yet
New - Hab.1 (Repaired)
42 pages
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
No ratings yet
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
59 pages
Dairying in Haryana
No ratings yet
Dairying in Haryana
124 pages
Nptel Lec
No ratings yet
Nptel Lec
22 pages
Lec 13
No ratings yet
Lec 13
16 pages
Lec 19
No ratings yet
Lec 19
16 pages
Lec 12
No ratings yet
Lec 12
14 pages
Lec 9
No ratings yet
Lec 9
15 pages
Lec 9
No ratings yet
Lec 9
15 pages
Lec 6
No ratings yet
Lec 6
14 pages
Lec 8
No ratings yet
Lec 8
16 pages
Lec 14
No ratings yet
Lec 14
18 pages
Brand Guidelines 2024
No ratings yet
Brand Guidelines 2024
27 pages
Report
No ratings yet
Report
7 pages
Patter Recognition (Spring 2010) Midterm Exam: and ω are distributed according to
No ratings yet
Patter Recognition (Spring 2010) Midterm Exam: and ω are distributed according to
4 pages
6.034 Notes: Section 7.1: Slide 7.1.1
No ratings yet
6.034 Notes: Section 7.1: Slide 7.1.1
25 pages
Uspfo Sop
No ratings yet
Uspfo Sop
34 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Power Magnetics
No ratings yet
Power Magnetics
24 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Perceptron Learning Algorithm Lecture Supplement
No ratings yet
Perceptron Learning Algorithm Lecture Supplement
6 pages
Fermentation Broth: Pump and Line Calculation Sheet
No ratings yet
Fermentation Broth: Pump and Line Calculation Sheet
12 pages
2.water Hardness - Ion Exchange Method
No ratings yet
2.water Hardness - Ion Exchange Method
5 pages
Lec 9
No ratings yet
Lec 9
15 pages
SVM Intro
No ratings yet
SVM Intro
114 pages
Pattern Recognition Linear Classifier by Zaheer Ahmad
0% (1)
Pattern Recognition Linear Classifier by Zaheer Ahmad
37 pages
Linear Discriminant Functions Lesson 26: Characterization of The Decision Boundary
No ratings yet
Linear Discriminant Functions Lesson 26: Characterization of The Decision Boundary
7 pages
Lec 4
No ratings yet
Lec 4
14 pages
Edu 251 Thematic Unit Final
No ratings yet
Edu 251 Thematic Unit Final
7 pages
SVM - Friend or Foe?: Reason 1
No ratings yet
SVM - Friend or Foe?: Reason 1
9 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
PRu 4
No ratings yet
PRu 4
13 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
Feature Selection For Nonlinear Kernel Support Vector Machines
No ratings yet
Feature Selection For Nonlinear Kernel Support Vector Machines
6 pages
Viewsonic Cine1000 Prospecto
No ratings yet
Viewsonic Cine1000 Prospecto
2 pages
Image Processing Mahalanobis Distance
No ratings yet
Image Processing Mahalanobis Distance
17 pages
The Diamond Lens
No ratings yet
The Diamond Lens
19 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
RIRN100826
No ratings yet
RIRN100826
44 pages
Bim Project Execution Plan
No ratings yet
Bim Project Execution Plan
1 page
Machine Learning - Lecture 5
No ratings yet
Machine Learning - Lecture 5
19 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
02 Training Patterns
No ratings yet
02 Training Patterns
18 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Lecture Notes On Pattern Recognition and Image Processing
No ratings yet
Lecture Notes On Pattern Recognition and Image Processing
24 pages
Report 1
No ratings yet
Report 1
6 pages
Introduction of Support Vector Machines
No ratings yet
Introduction of Support Vector Machines
16 pages
Chromosomal Crossover
No ratings yet
Chromosomal Crossover
4 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages
Bon Appétit Bon Appétit
No ratings yet
Bon Appétit Bon Appétit
16 pages
CBSE Class 12 Biology Chapter 12 Ecosystem Important Questions 2024-25
No ratings yet
CBSE Class 12 Biology Chapter 12 Ecosystem Important Questions 2024-25
14 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
Triton Family of Feet Brochure-2
No ratings yet
Triton Family of Feet Brochure-2
12 pages
Importance of Forest in The Philippines
No ratings yet
Importance of Forest in The Philippines
39 pages
Fisher Linear Discriminant Analysis: Max Welling
No ratings yet
Fisher Linear Discriminant Analysis: Max Welling
4 pages
Routine 2024-25
No ratings yet
Routine 2024-25
11 pages
Linear Classifiers PPT 1
No ratings yet
Linear Classifiers PPT 1
14 pages
Machine Learning-4
100% (1)
Machine Learning-4
18 pages
Lec 41
No ratings yet
Lec 41
6 pages
21 Support Vector Machines 03-10-2024
No ratings yet
21 Support Vector Machines 03-10-2024
72 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
Linear Classifiers
No ratings yet
Linear Classifiers
48 pages
SVM 1
No ratings yet
SVM 1
6 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
Pat Recogn
No ratings yet
Pat Recogn
145 pages
Main
No ratings yet
Main
5 pages
Ai and ML
No ratings yet
Ai and ML
16 pages

Lec 10

Uploaded by

Lec 10

Uploaded by

Deep Learning

Prof. Prabir Kumar Biswas

Hello welcome to the NPTEL online certification course on Deep Learning.

(Refer Slide Time: 00:33)

(Refer Slide Time: 04:18)

(Refer Slide Time: 05:22)

(Refer Slide Time: 06:53)

(Refer Slide Time: 08:28)

(Refer Slide Time: 10:11)

So, whenever at Y < 0 , I generate an error term:

(Refer Slide Time: 14:00)

The approach can be that as I said that my error:

(Refer Slide Time: 16:10)

So, what I have is I have J:

(Refer Slide Time: 17:13)

So, the update rule becomes:

(Refer Slide Time: 19:24)

(Refer Slide Time: 23:13)

(Refer Slide Time: 24:55)

You might also like