0% found this document useful (0 votes)
26 views20 pages

Lecture 14

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views20 pages

Lecture 14

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Support Vector Machines

(Informal : Version 0)
Introduction

10/26/2021 P. Viswanath, at IIITS 1


Linear Classifier
Classifier:
If f(x1,x2) < 0 assign Class 1;
x2
If f(x1,x2) > 0 assign Class 2;
Class 2

f(x1,x2) = w1x1+w2x2+b = 0
Class 1

x1

10/26/2021 P. Viswanath, at IIITS 2


Perceptron
• Perceptron is the name given to the linear
classifier.
• If there exists a Perceptron that correctly
classifies all training examples, then we say
that the training set is linearly separable.
• Different Perceptron learning techniques are
available.

10/26/2021 P. Viswanath, at IIITS 3


Perceptron – Let us begin with linearly
separable data
• For linearly separable data many Perceptrons are
possible that correctly classifies the training set.

All being doing equally good


on training set, which one is
Class 2
good on the unseen test set?

Class 1

10/26/2021 P. Viswanath, at IIITS 4


Hard Linear SVM
• The best perceptron for a linearly separable
data is called “hard linear SVM”.
• For each linear function we can define its
margin.
• That linear function which has the maximum
margin is the best one.

10/26/2021 P. Viswanath, at IIITS 5


Class 2 Class 2

Class 1 Class 1

Margin
10/26/2021 P. Viswanath, at IIITS 6
Maximizing the Margin
Var1 IDEA : Select the
separating
hyperplane that
maximizes the
margin!

Margin
Width

Margin
Width
Var2
10/26/2021 P. Viswanath, at IIITS 7
Support Vectors
Var1

Support Vectors

Margin
Width
Var2
10/26/2021 P. Viswanath, at IIITS 8
What if the data is not linearly separable?
But solving a non-linear
Var1
problem is mathematically
more difficult

Var2
10/26/2021 P. Viswanath, at IIITS 9
Kernel Mapping

10/26/2021 P. Viswanath, at IIITS 10


An example
Input Space Feature Space

y = -1
y = +1

10/26/2021 P. Viswanath, at IIITS 11


The Trick !!
• There is no need to do this mapping explicitly.
• For some mappings, the dot product in the
feature space can be expressed as a function
in the Input space.
• 𝜑 𝑋1 ∙ 𝜑 𝑋2 = 𝑘 𝑋1 , 𝑋2

10/26/2021 P. Viswanath, at IIITS 12


• So, if the solution is going to involve only dot
products then it can be solved using kernel
trick (of course, appropriate kernel function
has to be chosen).

• The problem is, with powerful kernels like


“Gaussian kernel” it is possible to learn a non-
linear classifier which does extremely well on
the training set.

10/26/2021 P. Viswanath, at IIITS 13


Discriminant functions: non-linear

This makes zero mistakes with the training set.

10/26/2021 P. Viswanath, at IIITS 14


Other important issues …
• This classifier is doing very well as for the training data is
considered.
• But this does not guarantee that the classifier works well
with a data element which is not there in the training set
(that is, with unseen data).
• This is overfitting the classifier with the training data.
• May be we are respecting noise also (There might be
mistakes while taking the measurements).
• The ability “to perform better with unseen test patterns
too” is called the generalization ability of the classifier.

10/26/2021 P. Viswanath, at IIITS 15


Generalization ability
• This is discussed very much.
• It is argued that the simple one will have better
generalization ability (eg: Occam’s razor: Between two
solutions, if everything else is same then choose the
simple one).
• How to quantify this?
• (Training error + a measure of complexity) should be
taken into account while designing the classifier.
• Support vector machines are proved to have better
generalization ability.

10/26/2021 P. Viswanath, at IIITS 16


Discriminant functions …

This has some training error, but this is a relatively simple one.

10/26/2021 P. Viswanath, at IIITS 17


Overfitting and underfitting

underfitting good fit overfitting

10/26/2021 P. Viswanath, at IIITS 18


Soft SVM
• Allow for some mistakes with the training set !
• But, this is to achieve a better margin.

10/26/2021 P. Viswanath, at IIITS 19


10/26/2021 P. Viswanath, at IIITS 20

You might also like