0% found this document useful (0 votes)
10 views52 pages

ML Chapter 5 Part 2

Uploaded by

growhigh007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views52 pages

ML Chapter 5 Part 2

Uploaded by

growhigh007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Machine Learning

Samatrix Consulting Pvt Ltd


Support Vector Machines
Support Vector Machines
• Now, we shall discuss the support vector machine (SVM).
• SVMs perform well in a variety of settings and they are considered
one of the best “out of the box” classifiers.
• The support vector machine is generalization of a simple classifier
that we call maximal margin classifier.
• We cannot apply maximal margin classifier to most data sets because
it requires the classes to be separated by a linear boundary.
• We shall also discuss the support vector classifier that is an extension
of the maximal margin classifier that can be applied in a broader
range of cases.
Support Vector Machines
• Then we shall discuss support vector machine which is an extension
of the support vector classifier.
• Support vector machine can accommodate non-linear class
boundaries.
• We generally use support vector machine for binary classification
setting.
• In this chapter, we shall discuss extensions of support vector
machines to the case of more than two classes.
Maximal Margin Classifier
• To start with let’s define hyperplane and learn the concept of an
optimal separating hyperplane.
Hyperplane

Hyperplane
• In this case, the two-dimensional data is clearly linearly
separable.
• In fact, we can draw an infinite number of straight lines
to separate the blue balls from the red balls.
• Hence, we need to identify which one out of the
infinite straight lines is optimal.
• In other words, which line has the minimum
classification error on a new observation.
• We have drawn the straight line based on the training
sample.
• But the straight line is expected to classify one or more
test samples correctly.
Hyperplane
• In the figure – 1, which of the three lines, black, red, and
green is better than the other two?
• Intuitively, if a line passes very close to any of the points
and if there is small change in those points, the line
needs to be adjusted else the point may be misclassified.
• In this case, the green line is close to a red ball and the
red line is close to a blue ball.
• A slight change in the positions of these two balls may
result in misclassification.
• The black line on the other hand is less sensitive and less
susceptible to model variance.
Classification Using Hyperplane

Classification Using Hyperplane

Classification Using Hyperplane

Maximal Margin Hyperplane
• The basic idea of support vector machines is to find out an optimal
hyperplane for linearly separable patterns.
• We can use maximal margin hyperplane (also known as the optimal
separating hyperplane).
• This is a hyperplane that is farthest from the training observations.
• In this case, the perpendicular distance between each training
observation and a given separating hyperplane is computed.
• The smallest of these distances is the measure of the closeness of
hyperplane to the observations.
Maximal Margin Hyperplane
• We call this smallest distance, margin.
• The objective of the support vector
machine is to find a hyperplane that
has the maximum margin between the
hyperplane and the training
observations.
• This is known as the maximal margin
classifier.
• In the figure – 3, we have shown the
maximal margin hyperplane.
Maximal Margin Hyperplane
• The maximal margin hyperplane
results in a larger margin (maximal
minimal distance) from the
hyperplane to the observations.
• We can also say that the maximal
margin hyperplane represents the
mid-line of the widest “slab” that can
be inserted between two classes.
Maximal Margin Hyperplane
• From figure – 3, we can notice that the
three training observations have equal
distance from the maximal margin
hyperplane.
• They lie along the dashed lines that
indicates the width of the margin. We
name these three vectors as support
vectors.
Maximal Margin Hyperplane

Non-separable Cases
• If a separating hyperplane exists, we can naturally use the maximal
margin classifier to perform classification.
• In several instances, there is no separating hyperplane.
• Hence no maximal margin classifier exists. We cannot separate the
two classes exactly.
• However, by using soft margin, we can develop a hyperplane that can
almost separate the classes.
• We can generalize the maximal margin classifier to the non-separable
cases.
• Such generalization is known as support vector classifier.
Non-separable Cases
• Figure – 4 illustrates two classes
of observation.
• In this case, we cannot separate
the two classes by a hyperplane
and hence cannot use the
maximal margin classifier.
Support Vector Classifier - Overview
• We have seen that it is not necessary for the observations from two classes to be
separable by a hyperplane.
• In several instances, even if a separating hyperplane exists, a classifier that is
based on a separating hyperplane is not desirable.
• In the figure – 5, we have illustrated a case, where a single additional observation
in the right-hand panel, has resulted in a big change in the maximal margin
hyperplane.
Support Vector Classifier - Overview

• Hence the resulting maximal margin is not satisfactory in such cases.


• Since the maximal margin hyperplane is very sensitive to a change in
a single observation, it may have overfit the training data.
• We may not be interested in a classifier that is based on a hyperplane
that does not perfectly separate the two classes.
• We may opt for a method that misclassifies a few training
observations so that it may classify the remaining observations more
effectively
Support Vector Classifier - Overview
• One such method is support vector classifier (also known as soft
margin classifier).
• This classifier does not look for the largest possible margin so that
every observation is not only on the correct side of the hyperplane
but also at the correct side of margin.
• This classifier allows some observations not only to be on the
incorrect side of the margins but also on the incorrect side of the
hyperplane.
• We call it “soft”, because some training observations can violate it.
Support Vector Classifier - Explanation

• Figure – 6 illustrates one such example, where a support vector classifier was used to fit a small data
set. We have used solid line to show the hyperplane and dashed line to show the margins.
• For purple observations, the observation number 3, 4, 5, and 6 are on the correct side of the
margin, the observation 2 is on the margin, and observation 1 is on the wrong side of the margin.
• For the blue observations, observation number 7 and 10 are on the correct side of the margin,
observation number 9 is on the margin, observation number 8 is on the wrong side.
• Right hand panel is largely same as the left-hand panel except two additional observations, 11 and
12. These two points are on the wrong side of the hyperplane as well as the margin.
Support Vector Classifier - Explanation

Slack Variable and Tuning Parameter

Slack Variable and Tuning Parameter

Slack Variable and Tuning Parameter

Support Vector Classifier
Support Vector Machines
• In this section, we shall learn about a mechanism for converting a
linear classifier that produces non-linear decision boundaries.
• The support vector machines do this in an automatic way.
Non-Linear Decision Boundary
• The support vector classifiers work well if for a two-class setting, the
decision boundary between the two classes is linear.
• In practice, on several occasions, we face non-linear class boundaries.
In such cases, a SVC or any other linear classifier will perform poorly.
Non-Linear Decision Boundary

Non-Linear Decision Boundary

Kernel Function
• It is not easy to handle nonlinear transformation of input data into higher
dimensional.
• Many options are available but the options may be computationally heavy.
• The kernel functions were introduced to avoid some of these problems.
• Support Vector Machine (SVM) is an extension to the support vector
classifier.
• It results from transformation of input data in a specific way using kernels.
• The main idea behind the support vector machine is to enlarge our input
data so that we can accommodate a non-linear boundary between the
classes.
• This can be achieved efficiently using a kernel approach.
Kernel Function

Example – Kernel Function

Kernel Function
• Some of the common kernels are a polynomial kernel, sigmoid kernel,
and Gaussian radial basis function.
• Each one of these will result in a different nonlinear classifier in the
original input space.
SVM Two or More Classes
• The concept of separating hyperplanes does not naturally support
more than two classes.
• However, several proposals have been made to extend SVM to
K-classes.
• The two most popular approaches are one-versus-one and
one-versus-all approaches.
One-Versus-One Classification

One-Versus-All Classification

Thanks
Samatrix Consulting Pvt Ltd

You might also like