0% found this document useful (0 votes)
129 views16 pages

Support Vector Machines

Support vector machines (SVMs) are a popular machine learning algorithm that construct hyperplanes in a multidimensional space to classify data points. SVMs find the optimal hyperplane that maximally separates the data by its class while minimizing errors. This optimal hyperplane is determined by support vectors, which are the data points closest to the hyperplane. SVMs can use kernels to handle non-linear decision boundaries and project the data into higher dimensional spaces to better separate classes.

Uploaded by

Naresha a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views16 pages

Support Vector Machines

Support vector machines (SVMs) are a popular machine learning algorithm that construct hyperplanes in a multidimensional space to classify data points. SVMs find the optimal hyperplane that maximally separates the data by its class while minimizing errors. This optimal hyperplane is determined by support vectors, which are the data points closest to the hyperplane. SVMs can use kernels to handle non-linear decision boundaries and project the data into higher dimensional spaces to better separate classes.

Uploaded by

Naresha a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Support Vector

Machines
INTRODUCTION

 One of the most popular and talked about machine

learning algorithms

 A high-performing algorithm with little tuning

 Statistical Algorithm
Maximal-Margin Classier

 a hypothetical classier that best explains how SVM works


 numeric input variables (x) in your data (the columns) form an n-dimensional
space
 In SVM, a hyperplane is selected to best separate the points in the input
variable space by their class, either class 0 or class 1

 Where the coefficients (B1 and B2) that determine the slope of the line and
the intercept (B0) are found by the learning algorithm, and X1 and X2 are the
two input variables
 By plugging in input values into the line equation, we can calculate whether a
new point is above or below the line
Condition for a new point will be:

 Above the line, the equation returns a value greater than 0 and the point
belongs to the first class (class 0)

 Below the line, the equation returns a value less than 0 and the point belongs to
the second class (class 1)

 A value close to the line returns a value close to zero and the point may be
difficult to classify

 If the magnitude of the value is large, the model may have more confidence in
the prediction
 Distance between the line and the closest data points is referred to as the
margin

 Best or optimal line that can separate the two classes is the line that as the
largest margin (Maximal-Margin hyperplane)
 Margin is calculated as the perpendicular distance from the line to only the
closest points
 Only these points are relevant in defining the line and in the construction of
the classier called the Support vectors
 Hyperplane is learned from training data using an optimization procedure that
maximizes the margin
Soft Margin Classier

 A real data is messy and difficult to be separated perfectly with a hyperplane


 Constraint of maximizing the margin of the line that separates the classes
must be relaxed (Soft margin classier)
 This change allows some points in the training data to violate the separating
line
 An additional set of coefficients are introduced that give the margin enough
wiggle room in each dimension
 These coefficients are called slack variables

 This increases the complexity of the model as there are more parameters for
the model to fit to the data to provide this complexity
 A tuning parameter is introduced called simply C that denes the magnitude of
the wiggle allowed across all dimensions
 C parameters denes the amount of violation of the margin allowed
 Larger the value of C the more violations of the hyperplane are permitted

 During the learning of the hyperplane from data, all training instances that lie

within the distance of the margin will affect the placement of the hyperplane

and are referred to as support vectors


 C affects the number of instances that are allowed to fall within the margin,
C influences the number of support vectors used by the model

 The smaller the value of C, the more sensitive the algorithm is to the training
data (higher variance and lower bias)

 The larger the value of C, the less sensitive the algorithm is to the training
data (lower variance and higher bias)
Support Vector Machines (Kernels)

 SVM algorithm is implemented in practice using a kernel

 Learning of the hyperplane in linear SVM is done by transforming the problem


using some linear algebra

 Linear SVM can be rephrased using the inner product of any two given
observations, rather than the observations themselves

 For example, the inner product of the vectors [2; 3] and [5; 6] is 2 5 + 3 6 or
28
 Equation for making a prediction for a new input using the dot product
between the input (x) and each support vector (xi) is calculated as follows:

 This is an equation that involves calculating the inner products of a new input
vector (x) with all support vectors in training data

 The coefficients B0 and ai (for each input) must be estimated from the
training data by the learning algorithm
Kernels
Linear Kernel SVM
 Dot-product is called the kernel and can be re-written as:

 Kernel defines the similarity or a distance measure between new data and the support
vectors

 Dot product is the similarity measure used for linear SVM or a linear kernel because the
distance is a linear combination of the inputs
Polynomial Kernel SVM
 Instead of the dot-product, we can use a polynomial kernel, for example:

 Where the degree of the polynomial must be specified manually to the


learning algorithm

 When d = 1 this is the same as the linear kernel

 The polynomial kernel allows for curved lines in the input space
Radial Kernel SVM
 We can also have a more complex radial kernel. For example:

 Where gamma is a parameter that must be specified to the learning algorithm

 A good default value for gamma is 0.1, where gamma is often 0 < gamma < 1

 Radial kernel is very local and can create complex regions within the feature
space, like closed polygons in a two-dimensional space
How to Learn a SVM Model

 SVM model needs to be solved using an optimization procedure


 A numerical optimization procedure to search for the coefficients of the
hyperplane

 This is inefficient and is not the approach used in widely used SVM
implementations like LIBSVM
 A variation of gradient descent called sub-gradient descent can be used
 There are specialized optimization procedures that re-formulate the
optimization problem to be a Quadratic Programming problem

 Most popular method for fitting SVM is the Sequential Minimal Optimization
(SMO) method that is very efficient

 It breaks the problem down into sub-problems that can be solved analytically
(by calculating) rather than numerically (by searching or optimizing)
Preparing Data For SVM

How to best prepare your training data when learning an SVM


model
 Numerical Inputs:

 SVM assumes that inputs are numeric

 If we have categorical inputs we may need to covert them to binary dummy variables (one
variable for each category)

 Binary Classifcation:

 Basic SVM is intended for binary (two-class) classification problems

You might also like