0% found this document useful (0 votes)
127 views19 pages

Support Vector Machines

SVM is a supervised machine learning algorithm that can be used for both classification and regression tasks. It finds a hyperplane in a multidimensional space that distinctly classifies data points. SVM uses nonlinear kernel tricks to maximize the margin between the decision boundary and the closest data points of each class. The data points that define these margins are called support vectors. New data points can be classified based on which side of the hyperplane they fall on.

Uploaded by

Jimsy Johnson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views19 pages

Support Vector Machines

SVM is a supervised machine learning algorithm that can be used for both classification and regression tasks. It finds a hyperplane in a multidimensional space that distinctly classifies data points. SVM uses nonlinear kernel tricks to maximize the margin between the decision boundary and the closest data points of each class. The data points that define these margins are called support vectors. New data points can be classified based on which side of the hyperplane they fall on.

Uploaded by

Jimsy Johnson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Jimsy Johnson

Junior Research Fellow


IIITM-K, Trivandrum
Introduction

 SVM is a supervised learning algorithm that can be employed for both


classification and regression purpose.
 SVM is a discriminative classifier formally defined by a separating
hyperplane.
 A classification method for both linear and nonlinear data
 It uses a nonlinear mapping to transform the original training data into a
higher dimension
 With the new dimension, it searches for the linear optimal separating
hyperplane (i.e., “decision boundary”)
Introduction(Contd…..)
 With an appropriate nonlinear mapping to a sufficiently high dimension,
data from two classes can always be separated by a hyperplane
 SVM finds this hyperplane using support vectors (“essential” training
tuples) and margins (defined by the support vectors)
 Features: training can be slow but accuracy is high owing to their ability to
model complex nonlinear decision boundaries (margin maximization)
 Can use both for classification and prediction
 Applications:
 handwritten digit recognition, object recognition, speaker identification,
benchmarking time-series prediction tests
How do we find the right hyperplane?

 The distance between the


hyperplane and the nearest data
point from either set is known
as the margin.
 The goal is to choose a
hyperplane with the greatest
possible margin between the
hyperplane and any point
within the training set, giving a
greater chance of new data
being classified correctly.
What happens when there is no clear
hyperplane?
 A dataset will often look more like the jumbled balls
which represent a linearly non separable dataset.
 move away from a 2d view of the data to a 3d view.
 Imagine that our two sets of colored balls above are
sitting on a sheet and this sheet is lifted suddenly,
launching the balls into the air.
 While the balls are up in the air, use the sheet to
separate them.
 This ‘lifting’ of the balls represents the mapping of
data into a higher dimension. This is known as
kernelling.
 This line separates the 2
classes.
 Any point that is left of
line falls into black circle
class and on right falls
into blue square class.
 Separation of classes.
That’s what SVM does.
 It finds out a line/ hyper-
plane (in
multidimensional space
that separate outs
classes).
SVM—General Philosophy

Small Margin Large Margin


Support Vectors

December 22, 2019 8


SVM—Linearly Separable
 A separating hyperplane can be written
as
W●X+b=0
where W={w1, w2, …, wn} is a weight
vector and b a scalar
 For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
 The hyperplane defining the sides of the
margin:
 H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1,
and
 H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
 Combining the two inequalities of
Equations

December 22, 2019 9


 Any training tuples that fall on hyperplanes H1 or H2 and
are called support vectors.
 They are equally close to separating MMH
 Support Vectors are the most difficult tuples to classify and
give the most information regarding classification.
 To find the size of the maximal margin, the distance from
the separating hyperplane to any point on H1 is 1/||w||,
where ||w|| is the Euclidean norm of W.
 This is equal to the distance from any point on H2 to
separating hyperplanes.
 Therefore the maximal margin is 2/||w||
Finding MMH and Support
Vectors..
 Using constrained quadratic optimization we can find
Maximum Marginal Hyperplane
 Once support vectors and MMH are found, we train
the available data.
 MMH is a linear class boundary, so the SVM can be
used to classify linearly separable data.
SVM—When Data Is Linearly
Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to
find the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum
marginal hyperplane (MMH)

December 22, 2019 12


How can we classify new tuples….
 Based on the Lagrangian formulation, MMH can be
written as the decision boundary:

• where yi is the class label of support vector Xi; XT


is a test tuple, ai and b0 are numeric parameters that
were determined automatically by the optimization
• l is the number of support vectors
 Given a test tuple, XT,

 put it in the above equation. the result tells us on


which side of the hyperplane the test tuple falls.

 If the sign is positive, then XT falls on or above the


MMH, and so the SVM predicts that X belongs to
T

class +1.

 If the sign is negative, then XT falls on or below the


MMH and the class prediction is -1.
Why Is SVM Effective on High Dimensional Data?

 The complexity of trained classifier is characterized by the number of


support vectors rather than the dimensionality of the data
 The support vectors are the essential or critical training examples —they
lie closest to the decision boundary (MMH)
 If all other training examples are removed and the training is repeated,
the same separating hyperplane would be found
 The number of support vectors found can be used to compute an (upper)
bound on the expected error rate of the SVM classifier, which is
independent of the data dimensionality
 Thus, an SVM with a small number of support vectors can have good
generalization, even when the dimensionality of the data is high

December 22, 2019 15


Applications
 Spam Detection
 Credit Card Fraud Detection
 Digit Recognition
 Speech Understanding
 Face Detection
 Product Recommendation:
 Customer Segmentation
Implementation
 SVM can be implemented by using weka, R, Python
etc..
 In weka SVM classifier is there for performing
classification.
 In R, SVM classification can be performed with the
help of some packages.
 The packages used are..
 E1071
 Rocr
 Gplots
 Gtools
Disadvantages
 The biggest limitation of the support vector approach
lies in choice of the kernel.
 A second limitation is speed and size, both in training
and testing.
 The optimal design for multiclass SVM classifiers.
 High algorithmic complexity and extensive memory
requirements of the required quadratic programming
in large-scale tasks.
THANK YOU…..

You might also like