0% found this document useful (0 votes)
16 views67 pages

S V M (SVM) : Upport Ector Achine

Support Vector Machines (SVM) are supervised learning models primarily used for classification and regression tasks, known for their effectiveness in pattern recognition. SVMs work by finding the optimal hyperplane that maximizes the margin between different classes, making them robust to outliers. The algorithm can also handle non-linear data through the use of kernel functions, transforming input space into higher dimensions for better classification.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views67 pages

S V M (SVM) : Upport Ector Achine

Support Vector Machines (SVM) are supervised learning models primarily used for classification and regression tasks, known for their effectiveness in pattern recognition. SVMs work by finding the optimal hyperplane that maximizes the margin between different classes, making them robust to outliers. The algorithm can also handle non-linear data through the use of kernel functions, transforming input space into higher dimensions for better classification.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

SUPPORT VECTOR MACHINE

(SVM)
One of the most prevailing and exciting
supervised learning models with associated
learning algorithms that analyse data and
recognise patterns is Support Vector Machines
(SVMs).

It is used for solving both regression and


classification problems. However, it is mostly
used in solving classification problems.
SVMs was first introduced by B.E. Boser et al. in
1992 and has become popular due to success in
handwritten digit recognition in 1994.

Before the emergence of Boosting Algorithms for


example, XGBoost and AdaBoost, SVMs had been
commonly used.
PROBLEM WITH LOGISTIC REGRESSION

there is an infinite number of decision


boundaries, Logistic Regression only picks an
arbitrary one.

For point C, since it’s far away from the decision boundary, we are quite
certain to classify it as 1.

For point A, even though we classify it as 1 for now, since it is pretty close
to the decision boundary, if the boundary moves a little to the right, we
would mark point C as “0” instead.

Hence, we’re much more confident about our prediction at A than at C


PROBLEM WITH LOGISTIC REGRESSION
Logistic Regression doesn’t care whether the
instances are close to the decision boundary.
Therefore, the decision boundary it picks may
not be optimal.
If a point is far from the decision boundary, we
may be more confident in our predictions.
Therefore, the optimal decision boundary should
be able to maximize the distance between the
decision boundary and all instances. i.e.,
maximize the margins.
That’s why SVMs algorithm is important!
WHAT IS SUPPORT VECTOR MACHINE (SVM)

Given a set of training examples, each marked as


belonging to one or the other of two categories, an
SVM training algorithm builds a model that
assigns new examples to one category or the other,
making it a non-probabilistic binary linear classifier.

The objective of applying SVMs is to find the best line


in two dimensions or the best hyperplane in more
than two dimensions in order to help us separate our
space into classes.

The hyperplane (line) is found through


the maximum margin. i.e the maximum distance
between data points of both classes.
A Support Vector Machine (SVM) is a
discriminative classifier formally defined by a
separating hyperplane.

In other words, given labeled training data


(supervised learning), the algorithm outputs an
optimal hyperplane which categorizes new
examples.

In two dimentional space this hyperplane is a line


dividing a plane in two parts where in each class
lay in either side.
In the SVM algorithm, we plot each data item as
a point in n-dimensional space (where n is
number of features you have) with the value of
each feature being the value of a particular
coordinate.

Then, we perform classification by finding the


hyper-plane that differentiates the two classes
very well.
SUPPORT VECTOR MACHINES

Imagine the labelled training set are two classes of


data points (two dimensions):

To separate the two classes, there are so many


possible options of hyperplanes that separate
correctly.

We can achieve exactly the same result using


different hyperplanes (L1, L2, L3).

However, if we add new data points, the consequence


of using various hyperplanes will be very different in
terms of classifying new data point into the right
group of class.
IDENTIFY THE RIGHT HYPER-PLANE
(SCENARIO-1):
Here, we have three hyper-planes (A, B and C).
Now, identify the right hyper-plane to classify
star and circle.

You need to remember a thumb rule to identify the right hyper-plane: “Select
the hyper-plane which segregates the two classes better”. In this scenario,
hyper-plane “B” has excellently performed this job.
IDENTIFY THE RIGHT HYPER-PLANE
(SCENARIO-2):
Here, we have three hyper-planes (A, B and C)
and all are segregating the classes well. Now,
How can we identify the right hyper-plane?

Here, maximizing the distances between nearest data point (either class) and
hyper-plane will help us to decide the right hyper-plane. This distance is called
as Margin. Let’s look at the next figure:
Above, you can see that the margin for hyper-plane C is high as compared to
both A and B. Hence, we name the right hyper-plane as C. Another lightning
reason for selecting the hyper-plane with higher margin is robustness. If we
select a hyper-plane having low margin then there is high chance of
miss-classification.
IDENTIFY THE RIGHT HYPER-PLANE
(SCENARIO-3):

The hyper-plane B seems a good choice here as it has higher margin compared
to A. But, here is the catch, SVM selects the hyper-plane which classifies the
classes accurately prior to maximizing margin.

Here, hyper-plane B has a classification error and A has classified all correctly.
Therefore, the right hyper-plane is A.
CAN WE CLASSIFY TWO
CLASSES (SCENARIO-4)?:

Below, I am unable to segregate the two classes


using a straight line, as one of the stars lies in
the territory of other(circle) class as an outlier.
As I have already mentioned, one star at other
end is like an outlier for star class. The SVM
algorithm has a feature to ignore outliers and
find the hyper-plane that has the maximum
margin. Hence, we can say, SVM classification is
robust to outliers.
SUPPORT VECTOR POINTS

The vector points closest to the hyperplane are


known as the support vector points because
only these two points are contributing to the
result of the algorithm, other points are not.

If a data point is not a support vector, removing


it has no effect on the model.

On the other hands, deleting the support vectors


will then change the position of the hyperplane.
HYPERPLANE

The dimension of the hyperplane depends upon


the number of features.

If the number of input features is 2, then the


hyperplane is just a line.

If the number of input features is 3, then the


hyperplane becomes a two-dimensional plane.

It becomes difficult to imagine when the number


of features exceeds 3.
MARGIN

The distance of the vectors from the hyperplane


is called the margin which is a separation of a
line to the closest class points.

We would like to choose a hyperplane that


maximises the margin between classes.

The graph in next slide shows what good margin


and bad margin are.
Hard Margin

⚫ If the training data is linearly separable, we can select


two parallel hyperplanes that separate the two classes of
data, so that the distance between them is as large as
possible.

Soft Margin

⚫ As most of the real-world data are not fully linearly


separable, we will allow some margin violation to occur
which is called soft margin classification. It is better to
have a large margin, even though some constraints are
violated. Margin violation means choosing a hyperplane,
which can allow some data points to stay in either
incorrect side of hyperplane and between margin and
correct side of the hyperplane.
MAXIMUM MARGIN HYPERPLANE
LINEAR ALGEBRA REVISITED
MAXIMISING THE MARGIN
For Support Vector Classifier (SVC), we use 𝐰T𝐱+𝑏
where 𝐰 is the weight vector and 𝑏 is the bias.
The name of the variables in the hyperplane
equation are w and x which means they are
vectors!

A vector has magnitude (size) and direction


which works perfectly well in 3 or more
dimensions.

Therefore, the application of “vector” is used in


the SVMs algorithm.
the equation of calculating the
Margin
Note that 𝑦i is either +1 or -1Therefore, we have:
CLASSIFYING NON-LINEAR DATA

WHAT ABOUT DATA POINTS ARE NOT LINEARLY SEPARABLE?


NON-LINEAR SEPARATE

SVM has a technique called the kernel trick.

These are functions which take low dimensional


input space and transform it into a
higher-dimensional space i.e. it converts not
separable problem to separable problem.

It is mostly useful in non-linear separation


problem.
SVM KERNELS

In practice, SVM algorithm is implemented with


kernel that transforms an input data space into the
required form.

SVM uses a technique called the kernel trick in


which kernel takes a low dimensional input space
and transforms it into a higher dimensional space.

In simple words, kernel converts non-separable


problems into separable problems by adding more
dimensions to it.

It makes SVM more powerful, flexible and accurate.


SOME FREQUENTLY USED KERNELS
LINEAR KERNEL

It can be used as a dot product between any two


observations. The formula of linear kernel is as
below −

From the above formula, we can see that the


product between two vectors say 𝑥 & 𝑥𝑖 is the sum
of the multiplication of each pair of input values.
POLYNOMIAL KERNEL

It is more generalized form of linear kernel and


distinguish curved or nonlinear input space.

Following is the formula for polynomial kernel −

Here d is the degree of polynomial, which we


need to specify manually in the learning
algorithm.
RADIAL BASIS FUNCTION (RBF) KERNEL

RBF kernel, mostly used in SVM classification, maps


input space in indefinite dimensional space.

Following formula explains it mathematically −

Here, gamma ranges from 0 to 1. We need to manually


specify it in the learning algorithm. A good default value
of gamma is 0.1.

As we implemented SVM for linearly separable data, we


can implement it in Python for the data that is not
linearly separable. It can be done by using kernels.
TUNING PARAMETERS: KERNEL,
REGULARIZATION, GAMMA AND MARGIN.

These are tuning parameters in SVM classifier.

Varying those we can achive considerable non


linear classification line with more accuracy in
reasonable amount of time.
REGULARIZATION

The Regularization parameter (often termed as C


parameter ) tells the SVM optimization how much
you want to avoid misclassifying each training
example.

For large values of C, the optimization will choose a


smaller-margin hyperplane if that hyperplane does a
better job of getting all the training points classified
correctly.

Conversely, a very small value of C will cause the


optimizer to look for a larger-margin separating
hyperplane, even if that hyperplane misclassifies
more points.
The images below are example of two different
regularization parameter.

Left one has some misclassification due to lower


regularization value.

Higher value leads to results like right one.


GAMMA
The gamma parameter defines how far the influence of a
single training example reaches, with low values meaning
‘far’ and high values meaning ‘close’.

In other words, with low gamma, points far away from


plausible separation line are considered in calculation for
the separation line. Where as high gamma means the
points close to plausible line are considered in calculation.
WHY SVMS

Solve the data points are not linearly separable

Effective in a higher dimension.

Suitable for small data set: effective when the


number of features is more than training
examples.

Overfitting problem:
⚫ The hyperplane is affected by only the support
vectors thus SVMs are not robust to the outliner.
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE
LINEAR SOLVED EXAMPLE

Boundary will be perpendicular to y axis


LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE
NON-LINEAR SOLVED EXAMPLE

You might also like