Support Vector Machine (SVM) - Kernel Functions[1]
Support Vector Machine (SVM) - Kernel Functions[1]
&
Kernel Functions
Reference [1], Chapter 7, section 7.3
Reference [3], Chapter 6, page 292
[1]: Flach, P. (2015). Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.
[3]: Christopher & Bishop, M. (2016). Pattern Recognition and Machine Learning. New York: Springer-Verlag
Support Vector Machine (SVM)
The Support Vector Machine (SVM) is a supervised learning algorithm mostly used
for classification but it can be used also for regression.
Usually a learning algorithm tries to learn the most common characteristics (what
differentiates one class from another) of a class and the classification is based on those
representative characteristics learnt (so classification is based on differences between
classes). The SVM works in the other way around.
It finds the most similar examples between classes. Those will be the support vectors.
Example
As an example, let’s consider two classes, apples and lemons.
Other algorithms will learn the most evident, most representative characteristics of apples
and lemons, like apples are green and rounded while lemons are yellow and have elliptic
form.
In contrast, SVM will search for apples that are very similar to lemons, for example apples
which are yellow and have elliptic form. This will be a support vector. The other support
vector will be a lemon similar to an apple (green and rounded).
Based on these support vectors, the algorithm tries to find the best hyperplane that
separates the classes.
Here, you can see that the margin for hyper-plane C is high as
compared to both A and B. Hence, we name the right
hyper-plane as C.
The SVM algorithm has a feature to ignore outliers and find the
hyper-plane that has the maximum margin. Hence, we can say,
SVM classification is robust to outliers.
Identifying the Right Hyperplane (Scenario V)
In the scenario below, we can’t have linear hyperplane between
the two classes, so how does SVM classify these two classes?
Till now, we have only looked at the linear hyperplane.
The SVM kernel is a function that takes low dimensional input space and transforms it to a higher
dimensional space i.e. it converts not separable problem to separable problem. It is mostly useful in
non-linear separation problem.
Terminologies Used in SVM
The points closest to the hyperplane are called as the support vector points and the
distance of the vectors from the hyperplane are called the margins.
The basic intuition to develop over here is that more the farther SV points, from the
hyperplane, more is the probability of correctly classifying the points in their respective
region or classes.
SV points are very critical in determining the hyperplane because if the position of the
vectors changes the hyperplane’s position is altered. Technically this hyperplane can also be
called as margin maximizing hyperplane.
Hyperplane (Decision Surface)
The hyperplane is a function which is used to differentiate between features.
In 2D, the function used to classify between features is a line whereas, the function used to
classify the features.
In 3D, it is called as a plane similarly the function which classifies the point in higher
dimension is called as a hyperplane.
Now since you know about the hyperplane, let’s move back to SVM.
Let’s say there are “m” dimensions:
thus, the equation of the hyperplane in the ‘M’ dimension can be given as =
where, wi = vectors
1. select two hyperplanes (in 2D) which separates the data with no points between them
(red lines)
2. maximize their distance (the margin)
3. the average line (here the line half way between the two red lines) will be the decision
boundary
This is very nice and easy, but finding the best margin, the optimization problem is not trivial (it is
easy in 2D, when we have only two attributes, but what if we have N dimensions with N a very big
number)