Lec 12
Lec 12
Lecture - 12
Support Vector Machine - II
Hello, welcome to the NPTEL Online Certification course on Deep Learning. You remember
in the previous class we started our discussion on the Support Vector Machine. So, in today’s
lecture, we will continue with the same discussion.
So, in the previous class we have just introduced or gave a brief introduction of what the
Support Vector Machine is and today we are going to talk about what should be the design
approach of a Support Vector Machine.
(Refer Slide Time: 00:59).
So, we have seen that in our case we will assume again a two class problem.
So, we have the feature vectors given from two classes; ω 1 and ω 2 and all the training
vectors we assume that are given as leveled pair in the sense that attaining vector X i which is
the i-th training vector will be given as a pair ( X i , y i ) where this y i indicates the label. So, if
the training vector y i is taken from class ω 1 hen we will set label y i to be +1 else -1.
if this is the separating plane between the feature vectors belonging to the 2 classes
for every X belonging to class ω 1 which are my training vectors this condition must be
satisfied that at X + b > 0
In the same manner, if I take a feature vector X from class ω 2 where this feature vector X
falls on the negative side of the linear boundary this condition that at X + b < 0 must be
satisfied.
(Refer Slide Time: 03:19)
y i (at X i + b) > 0
If X i is correctly classified by the separating plane at X + b = 0 and this will be less than 0
if X i is misclassified by the separating plane.
So, for different values of a and b, I have got I can obtain different separating planes and
maybe many of those departing planes will satisfy the same condition that is
y i (at X i + b) > 0
(Refer Slide Time: 04:55)
Now for different values of a the vector a and for different values of the bias b, I get different
such planes, but for each such plane I will have the different margins or different confidence
level of classification. So, what is that?
Now, given this you find that if I take this particular separating plane, this separating plane
gives me a margin which is given by this, so that the distance between these two planes gives
me the margin or what is the confidence level of the confidence level given by this particular
classifier. Similarly if I take another separating plane set, this one here again you find that the
margin is given by this much ok. So obviously the margin given in this option is less than the
margin given in the previous option.
To continue further if I take this separating plane, then again the margin is given by this. So,
out of so many options which one should be preferred and that is the scope of the Support
Vector Machine that is what the Support Vector Machine does. The Support Vector Machine
tries to get a separating plane which maximizes the margin and for such a separating plane
the separating plane should be at a maximal distance from the vectors belonging to both the
classes. That means, the vectors belonging to class ω 1 should try to maximize the distance of
the separating plane from the vectors belonging to class omega 1 and it also tried to maximize
the distance from the vectors belonging to class ω 2 right.
So, I should get that particular separating plane. I should try to obtain that particular
separating plane which maximizes this margin and for classification my rule :
y i (at X i + b) > 0
This is for the classification, but as I am talking about the margin I want that for correct
classification of a reliable classification for every X i, the distance from the separating plane
must be more than a certain threshold. So, that distance as we said earlier that a measure of
the distance is given by at X i + b
So, if (at X i + b) = 0 , that means X i falls on the separating plane in which case the
distance of X i from the separating plane is 0. For any non-zero value if X i is taken from class
1, then I must have (at X i + b) to be greater than certain threshold say d and if X i is taken
from class 2, then I should have (at X i + b) should be less than minus d and this should be
true for all the training samples whether the training samples are taken from class 1 or the
training samples are taken from class 2.
So, if X i is taken from class 1, then this should be satisfied that (at X i + b) should be greater
than d and if the training sample X i is taken from class 2, then this one should be satisfied
that is (at X i + b) must be less than d less than minus d and by taking this particular option I
have an uniform criteria that is y i (at X i + b) > 0 should always be greater than d
irrespective of from whichever class this training sample X i has been obtained. What I can
do is, I can always normalize this expression.
So, while designing I can have the condition that y i (at X i + b) > 0 should be greater than
or equal to 1 and I will use this approach while designing the classifier or while choosing the
separating plane, but for classification my rule will be once I fix what should be a and what
should b after designing the separating plane or choosing the separating plane using the
training vectors, then for any unknown X my classification rule can be that (at X i + b) > 0
indicates that X belongs to class 1 else to class 2.
(Refer Slide Time: 11:24)
So, right now our aim is that I should choose this separating plane y i (at X i + b) = 0 which
satisfies the condition that y i (at X i + b) > 1 . So, that is after normalization. So, how I can
do that?
So, what I am saying is that in this particular equation of this particular separating plane I
should take that particular separating plane which maximizes this margin. So, how I can
obtain this margin and how I can maximize this margin? So, for that let us take one vector on
this margin which is say X + and I have take I will take another vector on this margin which
is say X − . So, X + is taken within the class 1 region and X − is taken within class 2 region.
So, a vector ( X + − X − ) is a vector drawn from X − to X + and once I have this vector, then
from here you find that I can obtain that margin which is given by this as a dot product of the
vector ( X + − X − )minus with the unit vector in that direction of w, right.
(Refer Slide Time: 13:24)
But dot product of the vector drawn from ( X + − X − ) with the unit vector in the direction of
w which is nothing, but orthogonal to the separating plane and the unit vector in this direction
is given by a upon ||a|| So, the margin that you get is:
at 2
|a|
(X + − X −) = |a|
So, as we said earlier that I should choose or I aim to choose that particular separating plane
2
which maximizes the margin and the margin comes out to be |a|
.
So, I should choose that particular a which maximizes this and here you find that obviously
as mod of a comes in the denominator, I can maximize this term indefinitely by making a
smaller and smaller, but that is not the solution because the a and b that I choose also must
satisfy the requirement that y i (at X i + b) > 1 . So, I have to minimize a subject to the
constraint that y i (at X i + b) > 1 . So, it becomes a constrained optimization problem and as
you know that to solve accountant optimization problem, we have to make use of Lagrangian.
So, here what I have to do is, I have to form a Lagrangian using this particular constant.
(Refer Slide Time: 17:44)
Over all that is all the training vectors which are given for designing the Support Vector
Machine in the same manner.
⇒
(Refer Slide Time: 22:03)
So, now let us see what Lagrangian that we had. We had Lagrangian:
So, now you can make use of any of the optimization tool to optimize L(.) with respect to
alphas and the state of such alphas that you get which maximizes this L(.) can give you what
is my solution vector a.
And once you have the solution vector a, you get your separating plane and this is the
separating plane which maximizes the margin or in other words, this separating plane will
give you our robust linear classifier.
So, today what we have done is, we have tried to find out a linear boundary between the
feature vectors taken from two different classes; 1 and 2 and using Support Vector Machine
we have tried to find out one such linear separator i will plane between the two separating
planes in such a manner that this separator maximizes the margin between the vectors
belonging to class 1 and the vectors belonging to class 2.
So, so far whatever we have discussed, whether it is a linear discriminator or a Support
Vector Machine we have considered a problem which is only two class problem. So, next we
will generalize this and try to find out that how we can obtain or how we can extend similar
concepts to multi class problems. With this I stop here today.
Thank you.