Support Vactor Machine Final
Support Vactor Machine Final
From the figure above its very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregates our data points or does a classification
between red and blue circles. So how do we choose the best line or in
general the best hyperplane that segregates our data points.
Selecting the best hyper-plane:
One reasonable choice as the best hyperplane is the one that
represents the largest separation or margin between the two classes.
So we choose the hyperplane whose distance from it to the nearest
data point on each side is maximized. If such a hyperplane exists it is
known as the maximum-margin hyperplane/hard margin. So from the
above figure, we choose L2.
Let’s consider a scenario like shown below
Here we have one blue ball in the boundary of the red ball. So how
does SVM classify the data? It’s simple! The blue ball in the boundary
of red ones is an outlier of blue balls. The SVM algorithm has the
characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.
So in this type of data points what SVM does is, it finds maximum
margin as done with previous data sets along with that it adds a
penalty each time a point crosses the margin. So the margins in these
type of cases are called soft margin. When there is a soft margin to the
data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss
is a commonly used penalty. If no violations no hinge loss.If violations
hinge loss proportional to the distance of violation.
Till now, we were talking about linearly separable data(the group of
blue balls and red balls are separable by a straight line/linear line).
What to do if data are not linearly separable?
Say, our data is like shown in the figure above. SVM solves this by
creating a new variable using a kernel. We call a point x i on the line
and we create a new variable y i as a function of distance from origin
o.so if we plot this we get something like as shown below
In this case, the new variable y is created as a function of distance
from the origin. A non-linear function that creates a new variable is
referred to as kernel.
SVM Kernel:
The SVM kernel is a function that takes low dimensional input space and
transforms it into higher-dimensional space, ie it converts non separable
problem to separable problem. It is mostly useful in non-linear separation
problems. Simply put the kernel, it does some extremely complex data
transformations then finds out the process to separate the data based on the
labels or outputs defined.
Advantages of SVM:
Effective in high dimensional cases
Its memory efficient as it uses a subset of training points in the
decision function called support vectors
Different kernel functions can be specified for the decision functions
and its possible to specify custom kernels
In Linear SVM, the two classes were linearly separable, i.e a single straight
line is able to classify both the classes. But imagine if you have three
classes, obviously they will not be linearly separable. Therefore, Non-linear
SVM’s come handy while handling these kinds of data where classes are not
linearly separable.