0% found this document useful (0 votes)

14 views11 pages

SVM 3

The lecture discusses the challenges of classifying linearly non-separable data using Support Vector Machines (SVMs). It introduces the concept of slack variables to allow for some misclassification while maximizing the margin between classes. The trade-off between the value of the penalty parameter C and the margin is also highlighted, emphasizing the importance of understanding these parameters for effective model tuning.

Uploaded by

srkkps6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

SVM 3

Uploaded by

srkkps6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

NPTEL

NPTEL ONLINE CERTIFICTION COURSE

Introduction to Machine Learning

Lecture 29

Prof. Balaraman Ravindran

Computer Science and Engineering
Indian Institute of Technology Madras

SVMs for Linearly Non

Separable Data

Suppose I have some data which is not linearly separable. So that is the problem that we have
seen with perceptions? So what happens if the data is not linearly separable? Perceptrons do not
converge. So can we tweak our objective function that we have here to make sure that we can
handle non linearly separable data. We are saying it is okay to say non linearly separable data
was my question. It should be linearly inseparable data right, so you have to be careful where
you put the not the negation there.

So what we do in this case? Somebody had a suggestion. So there are many ways there are many
choices you can make right let me not play around with it are many choices you could make but
there is one particular choice which is seems to yield a very nice optimization formulation. So
what is a choice I am going to say that I would really like to maximize the margin and I would
like to get as many data points correct as possible right.
(Refer Slide Time: 01:42)
So if you think about it so there are a couple of things. So this is the margin that I want so what
are the problems here? Well these data points are within the margin so I have some data points
that are within the margin so I would like to minimize such cases there are some data points that
are within the margin and erroneous. I would like to minimize such cases as well. If you think of
what if I had tried to get this correct, there is a gap here and there seems to be a gap here between
the points if I try to get this correct and move my classification surface below then the margin
would have been reduced even further. So it is okay to get this wrong but then what about this
case is it within the margin or outside the margin? Within. So the margin for that class is defined
on the other side right so the margin for that class is this side so anything to this side and x is
within the margin. This will be yi times this right, so this will actually be negative so it is within
the margin if we want things to be greater than one yi times f(x) we want it to be greater than one
right ≥ 1 this is going to be negative.

(Refer to slide time: 04.25)

So obviously this is within the margin. So essentially what I want to do is minimize these
distances so you can see the distance that I marked here so these distances I would like to
minimize, so this is a certain small distance inside the margin, this is a large distance inside the
margin is a very large distance inside the margin likewise so I can mark each one of these and I
want to minimize these, so let us denote them says  1 to  5 and I want to minimize those right
essentially.

So if I minimize the sum of these deviations I make along with my original objective function.
Why do not they minimize the maximum here? Because that would essentially mean that I will
try to get as many things character as possible so in this case I do not mind getting something
wrong as long as the overall deviation is not does not exceed a certain limit. See that the
difference between minimizing the maximum and minimizing a sum is that I might as well give
up all of the sum to a single data point it might be something that is very hard to classify.

And I might have one single outlier somewhere here right let us let us let us draw so this data
might be perfectly separable and I might have an outlier then okay so now if I say okay minimize
the sum of the things it is fine. But if I say minimize the max okay then it is going to actually
give me a hyper plane somewhere there but like I said many different formulations are possible
this one actually yields a very nice computation that is one of the reasons people use this.
(Refer Slide Time: 07:19)

So what I am going to do is write it here. So I am going to say that this has to be that we had
already found out and I am going to introduce a slack variable so it does not have to be greater
than M. It can be some fraction lesser also M is what I would really like but I allow it to have a
slack. Ideally I would want most of these ξi’s to be 0 and I force ξi’s to be zero I am back here
but I really like some lee way right.
So I am allowing myself that leeway by introducing ξi here. This is a very standard technique for
relaxing constraints in optimization. That is one of the reasons people adopt this is a standard
constraint. So another thing which I could have chosen is that in fact this is a little bit more
common constraint but it turns out that in this particular case if I choose M - ξi instead of M(1-
ξi) I end up getting a non convex optimization problem.
So we do not want that so we end up doing this. So I drew this figure first because I wanted to
get an idea of what these slack variables actually mean. So the slack variables essentially tell you
by what fraction right you are violating the margin. So is ξ1 is essentially what fraction of
distance you are coming in here from the margin ξ2 is what fraction of the distance you are
coming in from the margin.
So the margin is M so I have moved some fraction of the distance inside. So they essentially that
is what the ξ tells me. So what are the constraints we have? So the first constraint I have is okay
all ξi have to be ≥ 0. I do not care about points going to that side of the margin. So all ξi ≥ 0 and
the second thing is whatever we have been talking about. I do not want the ξi is to be very large
taken in total so I want to upper bound them by a constant.
So because I am talking about going that side of the margin, if ξi are negative so essentially I
will be imposing a tighter constraint than what I was looking for so this will be like it will larger
than M. This is well I will be having a thing that is larger than M and it is a relative distance, so
essentially this becomes M- Mξi. So the original should be M, so it is now M ξi away from there.
So ξi is essentially a relative distance and if I make ξi is negative so this will become plus so that
will essentially mean that not only do I want the data points to be away than M actually asking it
to be further away than so it just imposes a tighter constraint. So I do not want that to happen so
and here we are essentially giving it a budget that we do not want it to be greater than the budget
right fine. So we saw such a constraint earlier where it we see such a constraint earlier we had a
budget we did not want it to be greater than a budget.
(Refer to slide time: 10.43)
So yeah ridge regression and LASSO and other things we had this thing. So wherever we are
looking at this regularized regression, so we had this greater than or lesser than a constant and
what did we do in those cases? We push it into the objective function and then add a multiplier
there and then we said okay it has to be. So then there is a relationship between this constant and
the multiplier that we put in the objective function. So likewise will do the same thing here I will
do all the other transformations that we need to do to normalize β and things like this.
So essentially I will end up with the same objective function I had there and you want ≥ to
because they have gotten rid of the M. Why how do we get rid of M, because M is 1 / β. So 1 by
||β|| so we got rid of that just anything else we need here. So now that we have this objective
function what should be the value of C? If I want a linearly separable case we want to solve the
linearly separable problem right or I want to ensure that all ξi are 0 what should the value of C be
this is the simple question infinity right C should be infinity.
(Refer to slide time: 13.53)

So the larger the value of C the more you are penalizing the violations. So the smaller the ξi
should be. So the larger the value of C the smaller the ξi should be so there is a trade-off. So the
larger you make C, the smaller will the margin be. But we will be getting more of the training
data correct right so for large values of C you are allowing a little bit more leeway. So C is very
small then you are allowing lot more errors to happen if C is very large then you are forcing the
classifier to classify as much of the training data is correct as possible okay.
The data is truly linearly separable and you make C very large what will happen? You will find
the correct linear separator. But if the data is truly linearly separable but you keep C small what
might happen? You might trade off errors in the training data for a larger margin even if the data
is linearly separable. Is that a desirable thing? When exactly? So if the data is noisy such that
there are some data points that are closer to the margin it may be one or two data points that are
closer to the margin. So if you are trying to find the perfect linear separation you will pay
attention to them as well right and therefore you will end up having a low margin, but then if you
are willing to ignore a few noisy data points, even if the training data looks perfectly separable
you might end up making a few errors on it but you will get a more robust classifier. So can
people visualize a situation I am going to try and do something here let us see that works it has
looks perfectly separable. That is noise is it still separable. There you go there it is still separable
and if you try to solve it as a perfectly separable problem that is the separator that you are going
to get but if you allow errors right so that will probably be the separating hyper plane you get and
that is probably a more appropriate hyper plane right.
(Refer to slide time: 16.11)

Apart from being robust it is it is also correct in an expected sense. We will move on to the
primal. So I just wanted to leave this on board till I wrote this note so that you can compare it.
(Refer Slide Time: 16:46)

So that is the primal value also having have α and ξ, α, μ has to be ≥ 0.

(Refer to slide time: 18.33)

(Refer Slide Time: 18:59)

Yeah I do not have to do this. It is not a single condition is there for each i. Do we need the
n


i 1
i  const . No right, so that is why we consider constructed put that into the optimization

n
objective function itself right, so by minimizing this right we are ensuring that 
i 1
i will be less

than some limit right and like I was mentioning in the ridge regression discussion, so you can
find a relationship between this constant and this C right.
It is also a function of the range of the objective function but you can always find so basically
they are equivalent ways of writing the optimization problem except that you have to this
constant and the C will not be the same there will be different values so this constraint is gone
this is no longer present here that went into the objective function.

So putting all of this back in and doing some algebra can be surprised at the algebra outcome of
this. Anyone has already solved it? Looks familiar right? It is essentially the same dual you will
get but your constraints are different. This is already there so it is just added for completeness
sake but what is important here is earlier while I had only a non negativity constraint on α now I
have a upper bound on the value of α. So why is that because α is only C- μ and since α is only C
- μ so there has to be a upper bound on α okay good so what about the other KKT conditions.
(Refer Slide Time: 22:28)

So 1 to 7 are the KKT conditions.

(Refer to slide time: 23.31)
So what do you notice here again? Well you notice again that your β is determined by your αiyixi
just like you hand earlier. β is given by those xi for which α will be nonzero. So like we had
earlier those xi’s for which α is non zero or called support points. Our support vectors depending
on how we want to look at it. Now let us go look at when it will be nonzero. So when will α be
0? When will this whole thing be nonzero? Well it lies at a large enough distance on the right
side of the margin right what about ξi is a will be 0. Then ready somewhere here the ξi will be 0.
So in the ξi = 0 so it will be left with this term alone -1 that just takes exactly the same condition
that we had earlier. So if this is far enough away from the margin then this will be nonzero so α
is have to be 0. So we know for sure okay the same thing. Things that are on the right side of the
margin means correct side. Things that are on the correct side of the margin then α will be 0 so
they would not contribute anything.
(Refer Slide Time: 29:21)
So now what about things that are on the margin? Is that a third case? We have to consider third
case now right. We have to consider the third case in which case what happened as ξi will start
increasing right ξi’s will become non zero. If ξi is nonzero what does it imply? Because my α is
C – μi , so if ξi is nonzero then μi will become 0 that for my α is will become C. So now how will
this term go to 0 by suitably makings ξi a large enough.

So I will make ξi a large enough. So that this term will go to 0. Because this will be negative will
be less than 1. So I will make this I will just ξi so that this term in the square bracket goes to 0
because my αi will be C what is that in case this is because I do not want to penalize this case.
This case also ξi will be 0. So this case ξi is 0 this case also ξi will be 0 because what I really
need is that is my condition ≥ 1- ξi so if is equal to 1, so I can set ξi to 0. In both these cases ξi=
0.
So what are all the support vectors? Everything on the margin and everything on the wrong side
of the margin as well right. Everything for which alpha is nonzero now becomes support vectors
so at the end of the day I am going to say that you are just going to use a package to solve all of
these things right but it is like saying yeah anyway you are going to use Microsoft Windows or I
mean Mac OS X or something why do you learn operating system right. So you need to know
what the internals are it is not the fact that you just use the tools that matters it when you can just
use the tools well yeah we can do a tool course right how to use the tools right how will you start
up limbsvm it is not trivial. So many people I know actually run experiments with SVM's by just
using the default parameter settings that the package will give. The thing is you need to
understand what is it that you are tuning right. So now I told you about the C parameter right, so
you understand have some idea of what a large C means versus what a small C means hey
instead of blindly saying that okay I am going to queue C from some number to some other
number right, so to have an appreciation of what these things are doing actually helps you even
use the tools better right so that is the whole idea behind doing all of this is not that I am going to
expect you to come and derive a large margin classifier tomorrow when ideally.
IIT Madras Production

Funded by
Department of Higher Education
Ministry of Human Resource Development
Government of India

www.nptel.ac.in

Copyrights Reserved

Data Mining - Theories - Algorithms - and Examples PDF
No ratings yet
Data Mining - Theories - Algorithms - and Examples PDF
347 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
SVM 1
No ratings yet
SVM 1
6 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
Support Vector Machines
No ratings yet
Support Vector Machines
25 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
Support Vector Machines: CS229 Lecture Notes
100% (2)
Support Vector Machines: CS229 Lecture Notes
25 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
cs229 SVM Notes
No ratings yet
cs229 SVM Notes
20 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Support Vector Machines: 1 What's SVM
No ratings yet
Support Vector Machines: 1 What's SVM
25 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
6.034 Notes: Section 8.1: Slide 8.1.1
No ratings yet
6.034 Notes: Section 8.1: Slide 8.1.1
31 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
06 Fitting Matching
No ratings yet
06 Fitting Matching
13 pages
10 SVM
No ratings yet
10 SVM
77 pages
SVM 2
No ratings yet
SVM 2
8 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Convex Optimization Prerequisite - Topics
No ratings yet
Convex Optimization Prerequisite - Topics
6 pages
7 SVM For Scientists Annotated
No ratings yet
7 SVM For Scientists Annotated
76 pages
Perceptron
No ratings yet
Perceptron
23 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
28.3 - Why We Take Values +1 and and - 1 For Support Vector Planes - mp4
No ratings yet
28.3 - Why We Take Values +1 and and - 1 For Support Vector Planes - mp4
2 pages
Lec 18
No ratings yet
Lec 18
6 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM Student
No ratings yet
SVM Student
40 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Lec 30
No ratings yet
Lec 30
22 pages
Lec 12
No ratings yet
Lec 12
14 pages
Support Vector Machine SVM
No ratings yet
Support Vector Machine SVM
58 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
Main
No ratings yet
Main
5 pages
SVM and Kernel
No ratings yet
SVM and Kernel
57 pages
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
No ratings yet
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
11 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
No ratings yet
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
20 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
Eecs127 Reader
No ratings yet
Eecs127 Reader
199 pages
21 Support Vector Machines 03-10-2024
No ratings yet
21 Support Vector Machines 03-10-2024
72 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Convex Optimizatiom IP
No ratings yet
Convex Optimizatiom IP
97 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
COMP 4211 - Machine Learning
No ratings yet
COMP 4211 - Machine Learning
19 pages
Nocedal - Wright CH - 02-01
No ratings yet
Nocedal - Wright CH - 02-01
9 pages
Split Testing
From Everand
Split Testing
B. Vincent
No ratings yet
Optimised Future State: Making the Most of Process Mapping
From Everand
Optimised Future State: Making the Most of Process Mapping
Giles Johnston
5/5 (1)
The Little Book of Javascript
From Everand
The Little Book of Javascript
Karl Agius
No ratings yet
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet
PSR - Module4 - Test of Significance
No ratings yet
PSR - Module4 - Test of Significance
53 pages
PSR - Module3 - Bivariate Random Variables - Class11
No ratings yet
PSR - Module3 - Bivariate Random Variables - Class11
64 pages
Lec 12
No ratings yet
Lec 12
9 pages
Lec 20
No ratings yet
Lec 20
16 pages
Sy0-701 9
No ratings yet
Sy0-701 9
23 pages
Addressing Modes
No ratings yet
Addressing Modes
37 pages
Final Examination - Spring 2021 Semester Sajid Ali - 40760: Faculty of Engineering, Sciences and Technology
No ratings yet
Final Examination - Spring 2021 Semester Sajid Ali - 40760: Faculty of Engineering, Sciences and Technology
4 pages
Thesis Example Chapter 4
100% (3)
Thesis Example Chapter 4
4 pages
Design and Implementation of School Records Management System
No ratings yet
Design and Implementation of School Records Management System
13 pages
Osv Manual
No ratings yet
Osv Manual
3 pages
FYP Final Report
No ratings yet
FYP Final Report
42 pages
Message
No ratings yet
Message
2 pages
Dumpkiller: Latest It Exam Questions & Answers
No ratings yet
Dumpkiller: Latest It Exam Questions & Answers
8 pages
Operational Plan Modern Space Multifunctional Table
No ratings yet
Operational Plan Modern Space Multifunctional Table
20 pages
Tata Motors
No ratings yet
Tata Motors
1 page
0 MS Word Master Handout - Blank - Student - 2023 - Update
No ratings yet
0 MS Word Master Handout - Blank - Student - 2023 - Update
9 pages
Cyber Security Brochure
No ratings yet
Cyber Security Brochure
5 pages
Report AMRUTHA FINAL
No ratings yet
Report AMRUTHA FINAL
12 pages
File Converter - Requirements
No ratings yet
File Converter - Requirements
6 pages
Rujrjrj
No ratings yet
Rujrjrj
3 pages
Canada RBC
No ratings yet
Canada RBC
2 pages
2023-2024 - SEM - 1 - Online B.Sc. CS-Batch 2 - SEM 1 - BCS ZC313 - Introduction To Programming - EC-3 - SLOT-1 - 08-10-2023
No ratings yet
2023-2024 - SEM - 1 - Online B.Sc. CS-Batch 2 - SEM 1 - BCS ZC313 - Introduction To Programming - EC-3 - SLOT-1 - 08-10-2023
9 pages
CC Domain4
No ratings yet
CC Domain4
67 pages
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
No ratings yet
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
2 pages
Aaditya Bank Project
No ratings yet
Aaditya Bank Project
36 pages
FIRST SEMESTER 2022-2023: of Programming Languages 10 Edition, Pearson, 2012.
No ratings yet
FIRST SEMESTER 2022-2023: of Programming Languages 10 Edition, Pearson, 2012.
3 pages
Chapter - 02 Logistics
No ratings yet
Chapter - 02 Logistics
13 pages
SAP B1 Approval Procedures
No ratings yet
SAP B1 Approval Procedures
7 pages
SAQA - 115431 - Learner Guide
No ratings yet
SAQA - 115431 - Learner Guide
21 pages
Covid To Cancer: How AI Is Being Used To Beat Deadly Diseases
No ratings yet
Covid To Cancer: How AI Is Being Used To Beat Deadly Diseases
1 page
Readme
No ratings yet
Readme
2 pages
User Manual 2 2946010
No ratings yet
User Manual 2 2946010
25 pages
VXR P Doc List
No ratings yet
VXR P Doc List
5 pages

SVM 3

Uploaded by

SVM 3

Uploaded by

NPTEL

NPTEL ONLINE CERTIFICTION COURSE

Introduction to Machine Learning

Prof. Balaraman Ravindran

SVMs for Linearly Non

(Refer to slide time: 04.25)

So that is the primal value also having have α and ξ, α, μ has to be ≥ 0.

(Refer Slide Time: 18:59)

So 1 to 7 are the KKT conditions.

You might also like