0% found this document useful (0 votes)
30 views16 pages

2.1 SVM

Support Vector Machines (SVM) is a supervised machine learning algorithm capable of performing classification, regression and outlier detection. SVM finds a hyperplane that separates classes with the maximum margin. The margin is the largest region separating classes without points, bounded by parallel lines called support vectors. SVM solves a constrained optimization problem to find the hyperplane that generalizes well to unseen data.

Uploaded by

Nikhil Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views16 pages

2.1 SVM

Support Vector Machines (SVM) is a supervised machine learning algorithm capable of performing classification, regression and outlier detection. SVM finds a hyperplane that separates classes with the maximum margin. The margin is the largest region separating classes without points, bounded by parallel lines called support vectors. SVM solves a constrained optimization problem to find the hyperplane that generalizes well to unseen data.

Uploaded by

Nikhil Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 16

Support Vector Machines

 Support Vector Machine (SVM) is a


supervised machine learning algorithm capable of performing
classification, regression and even outlier detection. The
linear SVM classifier works by drawing a straight line between
two classes. Effective in high dimensional spaces.
 SVM take cares of outliers better than KNN. If training data
is much larger than no. of
features(m>>n), KNN is better than SVM. SVM outperforms KN
N when there are large features and lesser training data.
The Margin and Support Vectors
• The margin is the largest region we can put that
separates the classes without there being any points
inside, where the box is made from two lines that are
parallel to the decision boundary.
• It has the imaginative name of the maximum margin
(linear) classifier. The data points in each class that lie
closest to the classification line have a name as well.
They are called support vectors.

• Using the argument that the best classifier is the one that
goes through the middle of no-man’s land, we can now
make two arguments: first that the margin should be as
large as possible, and second that the support vectors are
the most useful data points because they are the ones that
we might get wrong. This leads to an interesting feature
of these algorithms:

• After training we can throw away all of the data except


for the support vectors, and use them for classification,
which is a useful saving in data storage.
A Constrained Optimization Problem
• This technique using for decide the classifier is good or
bad. Write the constraints for classifier, the classifier
should answer it. This is known as constraint
optimization problem. This means that the constraints
just need to check each data point for this condition.
• A convex problem is one where if we take any two
points on the line and join them with a straight line,
then every point on the line.
• A function is convex if every straight line that links two
points on the curve does not intersect the curve
anywhere else. The function on the left is convex, but
the one on the right is not, as the dashed line shows.
Slack Variables for Non-Linearly Separable Problems

• We have done so far has assumed that the dataset is


linearly separable. We know that this is not always the
case, but if we have a non-linearly separable dataset, then
we cannot satisfy the constraints for all of the data points.
• These slack variables are telling us that, when comparing
classifiers, we should consider the case where one
classifier makes a mistake by putting a point just on the
wrong side of the line, and another puts the same point a
long way onto the wrong side of the line.
• The first classifier is better than the second, because the
mistake was not as serious, so we should include this
information in our minimization criterion.
By modifying the features we hope to find spaces where
the data are linearly separable.
KERNELS
 Kernel is used due to set of mathematical functions used
in Support Vector Machine provides the window to manipulate
the data. So, Kernel Function generally transforms the training set
of data so that a non-linear decision surface is able to transformed
to a linear equation in a higher number of dimension spaces.

 The choice of the kernel and kernel/ regularisation parameters can


be automated by optimizing cross-validation based model
selection (or use the radius-margin or span bounds).

 Mercer Kernel, or a positive definite kernel. A Mercer kernel will


be symmetric by definition (i.e., K = KT ). Many kernel methods
do not require us to explicitly compute φ(x), but instead we will
compute the n × n Gram matrix using the kernel function κ(·, ·).
Kernel Functions
The different kinds of kernel functions are visualized in the
graphic below.
Mercer Kernel
The Support Vector Machine Algorithm
Advantages and Disadvantages of SVM
Advantages:
• SVM works relatively well when there is a clear margin of separation between
classes.
• SVM is more effective in high dimensional spaces.
• SVM is effective in cases where the number of dimensions is greater than the
number of samples.
• SVM is relatively memory efficient.

Disadvantages:
• SVM algorithm is not suitable for large data sets.
• SVM does not perform very well when the data set has more noise i.e.
target classes are overlapping.
• In cases where the number of features for each data point exceeds the
number of training data samples, the SVM will underperform.
• As the support vector classifier works by putting data points, above
and below the classifying hyper plane there is no probabilistic
explanation for the classification.
Multiclass Classification using Support Vector Machine
 SVM are applied on binary classification, dividing data
points either in 1 or 0. For multiclass classification, the
same principle is utilized.

 The multiclass problem is broken down to multiple binary


classification cases, which is also called one-vs-one. In
scikit-learn one-vs-one is not default and needs to be
selected explicitly (as can be seen further down in the
code).

 One-vs-rest is set as default. It basically divides the data


points in class x and rest. Consecutively a certain class is
distinguished from all other classes.
The number of classifiers necessary for one-vs-one multiclass
classification can be retrieved with the following formula (with n
being the number of classes):

You might also like