0% found this document useful (0 votes)
28 views26 pages

SVM TNP

Support Vector Machines are a type of classifier that uses hyperplanes to divide data into categories. The support vector machine seeks the optimal hyperplane that separates classes with the maximum margin. This maximal margin classifier can only be applied when classes are linearly separable. The support vector classifier extends this approach to non-separable data by allowing some misclassification with a soft margin. It finds a hyperplane that separates most points correctly while minimizing misclassifications. Support vector machines further generalize this using kernel methods to project data into higher dimensional spaces to allow for nonlinear decision boundaries.

Uploaded by

Akash Raj Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views26 pages

SVM TNP

Support Vector Machines are a type of classifier that uses hyperplanes to divide data into categories. The support vector machine seeks the optimal hyperplane that separates classes with the maximum margin. This maximal margin classifier can only be applied when classes are linearly separable. The support vector classifier extends this approach to non-separable data by allowing some misclassification with a soft margin. It finds a hyperplane that separates most points correctly while minimizing misclassifications. Support vector machines further generalize this using kernel methods to project data into higher dimensional spaces to allow for nonlinear decision boundaries.

Uploaded by

Akash Raj Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Support Vector Machines

By
Dr.Trilok Nath Pandey
SCOPE,VIT,Chennai
Support Vector Machines
• The support vector machine (SVM), an approach for classification that was developed
in the computer science community in the 1990s and that has grown in popularity
since then.

• The support vector machine is a generalization of a simple and intuitive classifier


called the maximal margin classifier .

• This classifier unfortunately cannot be applied to most data sets, since it requires
that the classes be separable by a linear boundary.

• support vector classifier, an extension of the maximal margin classifier that can be
applied in a broader range of cases.
Maximal Margin Classifier
• In this section, we define a hyperplane and introduce the concept of an
optimal separating hyperplane.
• What Is a Hyperplane?
• In a p-dimensional space, a hyperplane is a flat affine subspace of
hyperplane dimension p - 1.
• For instance, in two dimensions, a hyperplane is a flat one-dimensional
subspace—in other words, a line. In three dimensions, a hyperplane is a
flat two-dimensional subspace—that is, a plane.
• The mathematical definition of a hyperplane is quite simple. In two
dimensions, a hyperplane is defined by the equation
Maximal Margin Classifier
• for parameters β0, β1, and β2. When we say that (Eqn) “defines” the
hyperplane, we mean that any X = (X1, X2)T for which (Eqn) holds is a point on
the hyperplane. Note that (Eqn) is simply the equation of a line, since indeed
in two dimensions a hyperplane is a line.
• Equation can be easily extended to the p-dimensional setting:

• defines a p-dimensional hyperplane, again in the sense that if a point X =


(X1, X2, . . . , Xp)T in p-dimensional space (i.e. a vector of length p) satisfies
(Eqn), then X lies on the hyperplane.
Maximal Margin Classifier
• Now, suppose that X does not satisfy (Eqn); rather,

• Then this tells us that X lies to one side of the hyperplane.

• then X lies on the other side of the hyperplane. So we can think of the
hyperplane as dividing p-dimensional space into two halves.
Maximal Margin Classifier
The hyperplane 1 + 2X1 + 3X2 = 0 is shown. The blue region is the set of points for which 1 + 2X1 + 3X2 > 0,
and the purple region is the set of points for which 1 + 2X1 + 3X2 < 0.
Maximal Margin Classifier
Now suppose that we have a n×p data matrix X that consists of n training observations in p-
dimensional space,
Maximal Margin Classifier
The Maximal Margin Classifier
• In general, if our data can be perfectly separated using a hyperplane, then there will in fact exist an
infinite number of such hyperplanes.
• In order to construct a classifier based upon a separating hyperplane, we must have a reasonable
way to decide which of the infinite possible separating hyperplanes to use.
• A natural choice is the maximal margin hyperplane (also known as the optimal separating
hyperplane), which is the separating hyperplane that is farthest from the training observations.
• That is, we can compute the(perpendicular) distance from each training observation to a given
separating hyperplane; the smallest such distance is the minimal distance from the observations to
the hyperplane, and is known as the margin.
• The maximal margin hyperplane is the separating hyperplane for which the margin is largest that is,
it is the hyperplane that has the farthest minimum distance to the training observations.
• We can then classify a test observation based on which side of the maximal margin hyperplane it
lies.
• This is known as the maximal margin classifier.
Maximal Margin Classifier
• We can label the observations from the blue class as yi = 1 and
those from the purple class as yi = -1. Then a separating
hyperplane has the property that
The Maximal Margin Classifier
• The maximal margin hyperplane is the separating hyperplane for which
the margin is largest—
• that is, it is the hyperplane that has the farthest minimum distance to
the training observations.
• We can then classify a test observation based on which side of the
maximal margin hyperplane it lies.
• This is known as the maximal margin classifier.
The Non-separable Case
• However, we can extend the concept of a separating hyperplane in
order to develop a hyperplane that almost separates the classes,
using a so-called soft margin.

• The generalization of the maximal margin classifier to the non-


separable case is known as the support vector classifier.
Support Vector Classifiers
• Greater robustness to individual observations, and
• Better classification of most of the training observations.
• That is, it could be worthwhile to misclassify a few training observations
in order to do a better job in classifying the remaining observations.
• The support vector classifier, sometimes called a soft margin classifier,
does exactly this.
• Rather than seeking the largest possible margin so that every observation
is not only on the correct side of the hyperplane but also on the correct
side of the margin, we instead allow some observations to be on the
incorrect side of the margin, or even the incorrect side of the hyperplane.
Details of the Support Vector Classifier
• The support vector classifier classifies a test observation
depending on which side of a hyperplane it lies.
• The hyperplane is chosen to correctly separate most of the
training observations into the two classes, but may misclassify a
few observations.
• It is the solution to the optimization problem
• It is the solution to the optimization problem
Details of the Support Vector Classifier

• where C is a nonnegative tuning parameter.


• M is the width of the margin; we seek to make this quantity as large
as possible.
Details of the Support Vector Classifier
Details of the Support Vector Classifier

• The support vector classifier classifies a test observation depending on


which side of a hyperplane it lies. The hyperplane is chosen to correctly
• Observations that lie directly on the margin, or on the wrong side of the margin for
their class, are known as support vectors.
• These observations do affect the support vector classifier.
• When the tuning parameter C is large, then the margin is wide, many observations
violate the margin, and so there are many support vectors.
• In this case, many observations are involved in determining the hyperplane.
• In contrast, if C is small, then there will be fewer support vectors and hence the resulting
classifier will have low bias but high variance.
Details of the Support Vector Classifier
Details of the Support Vector Classifier
The Support Vector Machine
• The support vector machine (SVM) is an extension of the support vector classifier that results
from enlarging the feature space in a specific way using kernels.
• In Support vector classifier, we replace it with a generalization of the inner product of the
form

• This is known as a polynomial kernel of degree d, where d is a positive integer


• where K is some function that we will refer to as a kernel. A kernel is a function that
quantifies the similarity of two observations.
The Support Vector Machine

You might also like