Module 6-Svm
Module 6-Svm
• Learning a classifier searching for the decision boundaries that optimize our
objective function
A Simple Classifier: Minimum
Distance Classifier
• Training
• Separate training vectors by class
• Compute the mean for each class, mk, k = 1,… m
• Prediction
• Compute the closest mean to a test vector x’ (using Euclidean distance)
• Predict the corresponding class
• In the 2-class case, the decision boundary is defined by the locus of the hyperplane that is
halfway between the 2 means and is orthogonal to the line connecting them
• This is a very simple-minded classifier – easy to think of cases where it will not work very well
A Simple Classifier: Minimum
Distance Classifier
8
FEATURE 2
4
0
0 1 2 3 4 5 6 7 8
FEATURE 1
Linear Classifiers
• Linear classifier single linear decision boundary
(for 2-class case)
• We can always represent a linear decision boundary by a linear equation:
w1 x1 + w2 x2 + … + wd xd = S wj xj = wt x = 0
• In d dimensions, this defines a (d-1) dimensional hyperplane
• d=3, we get a plane; d=2, we get a line
• Note that a minimum distance classifier is a special (restricted) case of a linear classifier
Linear Classifiers
5
FEATURE 2
0
0 1 2 3 4 5 6 7 8
FEATURE 1
8
7
Another Possible
Decision Boundary
6
5
FEATURE 2
0
0 1 2 3 4 5 6 7 8
FEATURE 1
Classifier Principles
8
Minimum Error
7 Decision Boundary
5
FEATURE 2
0
0 1 2 3 4 5 6 7 8
FEATURE 1
Minimum Distance Classifier
INTRODUCTION
Think of support vector
machine as a “road
machine”, which separates
the left , right-side cars,
buildings, pedestrians and
makes the widest lane as
possible. And those cars,
buildings, really close to
the street is the support
vectors.
What is Support Vector Machine (Classifier)
1. Support Vector Machine (the “road machine”) is responsible for finding the decision boundary to separate
different classes and maximize the margin.
2. Margins are the (perpendicular) distances between the line and those dots closest to the line.
Support Vector Machine (SVM)
• A Support Vector Machine (SVM) is a supervised machine learning
algorithm that can be employed for both classification and regression
purposes. SVMs are more commonly used in classification problems
and as such, this is what we will focus on in this post.
Support Vectors
• Support vectors are the data points nearest to the hyperplane, the
points of a data set that, if removed, would alter the position of the
dividing hyperplane. Because of this, they can be considered the
critical elements of a data set.
SVM in linear separable cases
Obviously, infinite lines exist to separate the red and green dots in the
example above. SVM needs to find the optimal line with the constraint
of correctly classifying either class:
1. Follow the constraint: only look into the separate hyperplanes(e.g. separate
lines), hyperplanes that classify classes correctly
Combined Rule
What is margin?
• Let’s say we have a hyperplane — line X
• calculate the perpendicular distance from all those 40 dots to line X, it
will be 40 different distances
• Out of the 40, the smallest distance, that’s our margin!
What is margin?
• The distance between either side of the dashed line to the solid line is
the margin. We can think of this optimal line as the mid-line of the
widest stretching we can possibly have between red and green dots.
To sum up, SVM in the linear separable
cases:
1. Constrain/ensure that each observation is on the correct side
of the Hyperplane
2. Pick up the optimal line so that the distance from those
closest dots to the Hyperplane, so-called margin, is maximized
SVM in linear non-separable cases
Example
Solution
• Soft Margin: try to find a line to separate, but tolerate one or few
misclassified dots (e.g. the dots circled in red)
• Kernel Trick: try to find a non-linear decision boundary
Soft Margin
• Two types of misclassifications are tolerated by SVM under soft margin:
Applying Soft Margin, SVM tolerates a few dots to get misclassified and tries to
balance the trade-off between finding a line that maximizes the margin and
minimizes the misclassification.
How much tolerance(soft) we want to give when finding the decision boundary is
an important hyper-parameter for the SVM (both linear and nonlinear solutions).
In Sklearn, it is represented as the penalty term — ‘C’.
The bigger the C, the more penalty SVM gets when it makes misclassification.
Therefore, the narrower the margin is and fewer support vectors the decision boundary will
depend on.
Kernel Trick
Support vector machine with a polynomial kernel can generate a non-linear decision
boundary using those polynomial features.
Radial Basis Function (RBF)
kernel
• Radial Basis Function kernel as a transformer/processor to generate new features by measuring the
distance between all other dots to a specific dot/dots — centers. The most popular/basic RBF kernel
is the Gaussian Radial Basis Function: