Support Vector Machine - Theory
Support Vector Machine - Theory
SVMs are useful for analyzing complex data that can't be separated
by a simple straight line. Called nonlinear SMVs, they do this by
using a mathematical trick that transforms data into higher-
dimensional space, where it is easier to find a boundary.
Some of the most popular kernel functions for SVMs are the
following:
Linear kernel. This is the simplest kernel function, and it maps the
data to a higher-dimensional space, where the data is linearly
separable.
RBF kernel. This is the most popular kernel function for SVMs, and
it is effective for a wide range of classification problems.
Sigmoid kernel. This kernel function is similar to the RBF kernel,
but it has a different shape that can be useful for some classification
problems.
Advantages of SVMs:
SVMs are powerful machine learning algorithms that have the
following advantages:
Effective in cases of limited data. SVMs can work well even when
the training data set is small. The use of support vectors ensures
that only a subset of data points influences the decision boundary,
which can be beneficial when data is limited.
Disadvantages of SVMs:
While support vector machines are popular for the reasons listed
above, they also come with some limitations and potential issues:
Classification
Classification is about sorting things into different groups or
categories based on their characteristics, akin to putting things into
labeled boxes. Sorting emails into spam or nonspam categories is
an example.
Decision boundary
A decision boundary is an imaginary line or boundary that separates
different groups or categories in a data set, placing data sets into
different regions. For instance, an email decision boundary might
classify an email with over 10 exclamation marks as "spam" and an
email with under 10 marks as "not spam."
Grid search
A grid search is a technique used to find the optimal values of
hyperparameters in SVMs. It involves systematically searching
through a predefined set of hyperparameters and evaluating the
performance of the model.
Hyperplane
In n-dimensional space -- that is, a space with many dimensions -- a
hyperplane is defined as an (n-1)-dimensional subspace, a flat
surface that has one less dimension than the space itself. In a two-
dimensional space, its hyperplane would be one-dimensional or a
line.
Kernel function
A kernel function is a mathematical function used in the kernel trick
to compute the inner product between two data points in the
transformed feature space. Common kernel functions include linear,
polynomial, Gaussian (RBF) and sigmoid.
Kernel trick
A kernel trick is a technique used to transform low-dimensional data
into higher-dimensional data to find a linear decision boundary. It
avoids the computational complexity that arises when explicitly
mapping the data to a higher dimension.
Margin
The margin is the distance between the decision boundary and the
support vectors. An SVM aims to maximize this margin to improve
generalization and reduce overfitting.
One-vs-All
One-vs-All, or OvA, is a technique for multiclass classification using
SVMs. It trains a binary SVM classifier for each class, treating it as
the positive class and all other classes as the negative class.
One-vs-One
One-vs-One, or OvO, is a technique for multiclass classification
using SVMs. It trains a binary SVM classifier for each pair of classes
and combines predictions to determine the final class.
Regression
Regression is predicting or estimating a numerical value based on
other known information. It's similar to making an educated guess
based on given patterns or trends. Predicting the price of a house
based on its size, location and other features is an example.
Regularization
Regularization is a technique used to prevent overfitting in SVMs.
Regularization introduces a penalty term in the objective function,
encouraging the algorithm to find a simpler decision boundary rather
than fitting the training data perfectly.
Support vector
A support vector is a data point or node lying closest to the decision
boundary or hyperplane. These points play a vital role in defining the
decision boundary and the margin of separation.