0% found this document useful (0 votes)

26 views46 pages

Introduction To Support Vector Machines

machine learning SVM

Uploaded by

webdev397

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views46 pages

Introduction To Support Vector Machines

machine learning SVM

Uploaded by

webdev397

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

INTRODUCTION TO

SUPPORT VECTOR
MACHINES

1
SVMs: A New Generation of Learning
Algorithms
• Pre 1980:
– Almost all learning methods learned linear decision surfaces.
– Linear learning methods have nice theoretical properties
• 1980’s
– Decision trees and NNs allowed efficient learning of nonlinear decision
surfaces
– Little theoretical basis and all suffer from local minima
• 1990’s
– Efficient learning algorithms for non-linear functions based on computational
learning theory developed
– Nice theoretical properties.

2
Key Ideas
• Two independent developments within last decade
– Computational learning theory
– New efficient separability of non-linear functions that use “kernel
functions”
• The resulting learning algorithm is an optimization algorithm rather
than a greedy search.

3
Statistical Learning Theory
• Systems can be mathematically described as a system that
– Receives data (observations) as input and
– Outputs a function that can be used to predict some features of
future data.
• Statistical learning theory models this as a function estimation
problem
• Generalization Performance (accuracy in labeling test data) is
measured

4
Motivation for Support Vector
Machines
• The problem to be solved is one of the supervised binary classification.
That is, we wish to categorize new unseen objects into two separate
groups based on their properties and a set of known examples, which
are already categorized.
• A good example of such a system is classifying a set of new documents
into positive or negative sentiment groups, based on other documents
which have already been classified as positive or negative.
• Similarly, we could classify new emails into spam or non-spam, based on
a large corpus of documents that have already been marked as spam or
non-spam by humans. SVMs are highly applicable to such situations.

5
Motivation for Support Vector
Machines
• A Support Vector Machine models the situation by creating a feature space,
which is a finite-dimensional vector space, each dimension of which
represents a "feature" of a particular object. In the context of spam or
document classification, each "feature" is the prevalence or importance of a
particular word.
• The goal of the SVM is to train a model that assigns new unseen objects into
a particular category.
• It achieves this by creating a linear partition of the feature space into two
categories.
• Based on the features in the new unseen objects (e.g. documents/emails), it
places an object "above" or "below" the separation plane, leading to a
categorization (e.g. spam or non-spam). This makes it an example of a non-
probabilistic linear classifier. It is non-probabilistic, because the features in
the new objects fully determine its location in feature space and there is no
stochastic element involved.
6
OBJECTIVES
• Support vector machines (SVM) are supervised learning models with
associated learning algorithms that analyze data used for
classification and regression analysis.
• It is a machine learning approach.
• They analyze the large amount of data to identify patterns from
them.
• SVMs are based on the idea of finding a hyperplane that best divides
a dataset into two classes, as shown in the image below.

7
Support Vectors
• Support Vectors are simply the co-ordinates of individual observation.
Support Vector Machine is a frontier which best segregates the two classes
(hyper-plane/ line).
• Support vectors are the data points that lie closest to the decision surface
(or hyperplane)
• They are the data points most difficult to classify
• They have direct bearing on the optimum location of the decision surface
• We can show that the optimal hyperplane stems from the function class
with the lowest “capacity” (VC dimension).
• Support vectors are the data points nearest to the hyperplane, the points
of a data set that, if removed, would alter the position of the dividing
hyperplane. Because of this, they can be considered the critical elements
of a data set.
8
What is a hyperplane?
• As a simple example, for a classification task with only two features,
you can think of a hyperplane as a line that linearly separates and
classifies a set of data.
• Intuitively, the further from the hyperplane our data points lie, the
more confident we are that they have been correctly classified. We
therefore want our data points to be as far away from the hyperplane
as possible, while still being on the correct side of it.
• So when new testing data are added, whatever side of the
hyperplane it lands will decide the class that we assign to it.

9
How do we find the right
hyperplane?
• How do we best segregate the two classes within the data?
• The distance between the hyperplane and the nearest data point
from either set is known as the margin. The goal is to choose a
hyperplane with the greatest possible margin between the
hyperplane and any point within the training set, giving a greater
chance of new data being classified correctly. There will never be any
data point inside the margin.

10
But what happens when there is no
clear hyperplane?
• Data are rarely ever as clean as our simple example above. A dataset
will often look more like the jumbled balls below which represent a
linearly non separable dataset.

• In order to classify a dataset like the one above it’s necessary to move
away from a 2d view of the data to a 3d view. Explaining this is easiest
with another simplified example. Imagine that our two sets of colored
balls above are sitting on a sheet and this sheet is lifted suddenly,
launching the balls into the air. While the balls are up in the air, you use
the sheet to separate them. This ‘lifting’ of the balls represents the
mapping of data into a higher dimension. This is known as kernelling.11
Because we are now in three dimensions, our hyperplane can no
longer be a line. It must now be a plane as shown in the example
above. The idea is that the data will continue to be mapped into higher
and higher dimensions until a hyperplane can be formed to segregate
it.
12
How does it work? How can we
identify the right hyper-plane?

• You need to remember a thumb rule to identify the right

hyper-plane:
“Select the hyper-plane which segregates the two
classes better”.

13
Identify the right hyperplane
(Scenario-1):
• Here, we have three hyperplanes (A, B and C). Now, identify the right
hyperplane to classify star and circle.

• Hyperplane “B” has excellently performed this job.

14
Identify the right hyperplane
(Scenario-2):
• Here, we have three hyperplanes (A, B and C) and all are segregating
the classes well. Now, how can we identify the right hyperplane?

Here, maximizing the distances

between nearest data point (either
class) and hyperplane will help us to
decide the right hyperplane.

15
Scenario-2
This distance is called as Margin. Let’s look at the below snapshot:
We can see that the margin for
hyperplane C is high as compared to
both A and B. Hence, we name
the right hyperplane as C. Another
lightning reason for selecting the
hyperplane with higher margin is
robustness. If we select a hyperplane
having low margin then there is high
chance of missclassification.
16
Identify the right hyperplane
(Scenario-3)

• Some of you may have selected the hyper-plane B as it has higher

margin compared to A. But, here is the catch, SVM selects the
hyperplane which classifies the classes accurately prior to maximizing
margin. Here, hyperplane B has a classification error and A has
classified all correctly. Therefore, the right hyperplane is A.
17
Can we classify two
classes (Scenario-4)?

•We are unable to segregate the two classes using a straight line, as
one of star lies in the territory of other (circle) class as an outlier.
•One star at other end is like an outlier for star class. SVM has a feature
to ignore outliers and find the hyperplane that has maximum margin.
Hence, we can say, SVM is robust to outliers.

18
Find the hyperplane to
segregate to classes (Scenario-
5)
• In the scenario below, we can’t have linear hyperplane between the
two classes, so how does SVM classify these two classes? Till now, we
have only looked at the linear hyperplane.

SVM can solve this problem. It

solves this problem by
introducing additional feature.
Here, we will add a new feature
z=x2+y2.

19
Scenario-5
• Now, let’s plot the data points on axis x and z:

• In above plot, points to consider are:

• All values for z would be positive always because z is the squared sum of both x
and y
• In the original plot, red circles appear close to the origin of x and y axes, leading
to lower value of z and star relatively away from the origin result to higher value
of z.
20
• In SVM, it is easy to have a linear hyperplane between these two classes. But,
another burning question which arises is, should we need to add this feature
manually to have a hyperplane. No, SVM has a technique called the kernel trick.
These are functions which takes low dimensional input space and transform it to
a higher dimensional space i.e. it converts not separable problem to separable
problem, these functions are called kernels. It is mostly useful in non-linear
separation problem. Simply put, it does some extremely complex data
transformations, then find out the process to separate the data based on the
labels or outputs you’ve defined.
• When we look at the hyperplane in original input space it looks like a circle:

21
Linear Separating Hyperplanes
• The linear separating hyperplane is the key geometric entity that is at
the heart of the SVM. Informally, if we have a high-dimensional
feature space, then the linear hyperplane is an object one dimension
lower than this space that divides the feature space into two regions.
• This linear separating plane need not pass through the origin of our
feature space, i.e. it does not need to include the zero vector as an
entity within the plane. Such hyperplanes are known as affine.
• If we consider a real-valued p-dimensional feature space, known
mathematically as p, then our linear separating hyperplane is an
affine p−1 dimensional space embedded within it.

22
• For the case of p=2 this hyperplane is simply a one-dimensional
straight line, which lives in the larger two-dimensional plane, whereas
for p=3 the hyperplane is a two-dimensional plane that lives in the
larger three-dimensional feature space.

23
24
25
Classification

26
27
28
29
Deriving the Classifier
• Separating hyperplanes are not unique, since it is possible to slightly
translate or rotate such a plane without touching any training
observations.
• So, not only do we need to know how to construct such a plane, but
we also need to determine the most optimal. This motivates the
concept of the maximal margin hyperplane (MMH), which is the
separating hyperplane that is farthest from any training observations,
and is thus "optimal".

30
31
32
33
• One of the key features of the MMC (and subsequently SVC and SVM)
is that the location of the MMH only depends on the support vectors,
which are the training observations that lie directly on the margin
(but not hyperplane) boundary (see points A, B and C in the figure).
This means that the location of the MMH is NOT dependent upon any
other training observations.

• Thus it can be immediately seen that a potential drawback of the

MMC is that its MMH (and thus its classification performance) can be
extremely sensitive to the support vector locations. However, it is
also partially this feature that makes the SVM an attractive
computational tool, as we only need to store the support vectors in
memory once it has been "trained" (i.e. the bj values are fixed).

34
Constructing the Maximal Margin
Classifier

35
• Despite the complex looking constraints, they actually state that each
observation must be on the correct side of the hyperplane and at least a
distance M from it. Since the goal of the procedure is to maximize M,
this is precisely the condition we need to create the MMC.
• Clearly, the case of perfect separability is an ideal one. Most "real world"
datasets will not have such perfect separability via a linear hyperplane.
However, if there is no separability then we are unable to construct a
MMC by the optimization procedure above. So, how do we create a form
of separating hyperplane?

36
Support Vector Classifiers
• Essentially we have to relax the requirement that a separating
hyperplane will perfectly separate every training observation on the
correct side of the line (i.e. guarantee that it is associated with its true
class label), using what is called a soft margin. This motivates the
concept of a support vector classifier(SVC).
• MMCs can be extremely sensitive to the addition of new training
observations.
If we add one point to the
MMH perfectly +1 class, we see that the
separating the location of the MMH
two classes changes substantially.
Hence in this situation the
MMH has clearly been
over-fit.
37
• We could consider a classifier based on a separating hyperplane that doesn't
perfectly separate the two classes, but does have a greater robustness to the
addition of new individual observations and has a better classification on most of
the training observations. This comes at the expense of some misclassification of
a few training observations.
• This is how a support vector classifier or soft margin classifier works. A SVC allows
some observations to be on the incorrect side of the margin (or hyperplane),
hence it provides a "soft" separation. The following figures demonstrate
observations being on the wrong side of the margin and the wrong side of the
hyperplane respectively:

38
39
where C, the budget, is a non-negative "tuning" parameter. M still
represents the margin and the slack variables ϵi allow the individual
observations to be on the wrong side of the margin or hyperplane.
•In essence the ϵi tell us where the i-th observation is located relative
to the margin and hyperplane. For ϵi=0 it states that the xi training
observation is on the correct side of the margin. For ϵi>0 we have that
xi is on the wrong side of the margin, while for ϵi>1 we have that xi is on
the wrong side of the hyperplane.
•C collectively controls how much the individual ϵi can be modified to
violate the margin. C=0 implies that ϵi=0,∀i and thus no violation of the
margin is possible, in which case (for separable classes) we have the
MMC situation.
40
• For C>0 it means that no more than C observations can violate the
hyperplane. As C increases the margin will widen. See figures for two
differing values of C:

• How do we choose C in practice? Generally this is done via cross-

validation. In essence C is the parameter that governs the bias-
variance trade-off for the SVC. A small value of C means a low bias,
high variance situation. A large value of C means a high bias, low
variance situation.
41
42
Support Vector Machines
• The motivation behind the extension of a SVC is to allow non-linear
decision boundaries. This is the domain of the Support Vector
Machine (SVM). Consider the following figures. In such a situation a
purely linear SVC will have extremely poor performance, simply
because the data has no clear linear separation:

43
44
• This is clearly not restricted to quadratic polynomials. Higher
dimensional polynomials, interaction terms and other functional
forms, could all be considered. Although the drawback is that it
dramatically increases the dimension of the feature space to the
point that some algorithms can become untractable.

• The major advantage of SVMs is that they allow a non-linear

enlargening of the feature space, while still retaining a significant
computational efficiency, using a process known as the "kernel trick.

• So what are SVMs? In essence they are an extension of SVCs that

results from enlargening the feature space through the use of
functions known as kernels.

45
SVM Kernel Functions
• SVM algorithms use a set of mathematical functions that are defined as the
kernel. The function of kernel is to take data as input and transform it into
the required form. Different SVM algorithms use different types of kernel
functions. These functions can be different types. For example linear,
nonlinear, polynomial, radial basis function (RBF), and sigmoid.
• Introduce Kernel functions for sequence data, graphs, text, images, as well
as vectors. The most used type of kernel function is RBF. Because it has
localized and finite response along the entire x-axis.
• The kernel functions return the inner product between two points in a
suitable feature space. Thus by defining a notion of similarity, with little
computational cost even in very high-dimensional spaces.
46

Lab Manual Math 209
No ratings yet
Lab Manual Math 209
75 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Difference Between Steganography and Cryptography
No ratings yet
Difference Between Steganography and Cryptography
5 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
4 - Lec 11 - ML - Support Vector Machine
No ratings yet
4 - Lec 11 - ML - Support Vector Machine
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
26 pages
Unit 2
No ratings yet
Unit 2
47 pages
Unit 3 - SVM
No ratings yet
Unit 3 - SVM
43 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Support Vector Machine
No ratings yet
Support Vector Machine
4 pages
Understanding Support Vector Machine Algorithm From Examples Along With Code
No ratings yet
Understanding Support Vector Machine Algorithm From Examples Along With Code
11 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
SVM
No ratings yet
SVM
43 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
SVMs
No ratings yet
SVMs
30 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
SVM (Repaired)
No ratings yet
SVM (Repaired)
39 pages
SVMs
No ratings yet
SVMs
30 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Lecture - 7 Classification (SVM)
No ratings yet
Lecture - 7 Classification (SVM)
48 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Pca PDF
No ratings yet
Pca PDF
10 pages
Support Vector Machine: Suraj Kumar Das
No ratings yet
Support Vector Machine: Suraj Kumar Das
10 pages
Support Vector Machine
No ratings yet
Support Vector Machine
40 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
9 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
10 Classification SVM
No ratings yet
10 Classification SVM
22 pages
Unit2 Notes What Is A Support Vector Machine
No ratings yet
Unit2 Notes What Is A Support Vector Machine
11 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
Unit 2
No ratings yet
Unit 2
10 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
27 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
SVM
No ratings yet
SVM
11 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
What Is Support Vector Machine
No ratings yet
What Is Support Vector Machine
13 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Lecture 15
No ratings yet
Lecture 15
35 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Important Components of E-Commerce
100% (1)
Important Components of E-Commerce
23 pages
Search Engine Optimization
No ratings yet
Search Engine Optimization
12 pages
Operating System
No ratings yet
Operating System
117 pages
Bayesian Networks
No ratings yet
Bayesian Networks
45 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
KNN Ques
No ratings yet
KNN Ques
13 pages
IC-GASMOTDS-2025 Brochure 250710 130719
No ratings yet
IC-GASMOTDS-2025 Brochure 250710 130719
8 pages
StatsProb ParameterStatisticSampling Plan
No ratings yet
StatsProb ParameterStatisticSampling Plan
36 pages
Noncomputability and The Busy Beaver Problem: Bryant A. Julstrom
No ratings yet
Noncomputability and The Busy Beaver Problem: Bryant A. Julstrom
36 pages
7.6.A DesignBriefApollo13
No ratings yet
7.6.A DesignBriefApollo13
2 pages
While, For and Repeat Until
No ratings yet
While, For and Repeat Until
5 pages
S G G G: 56 2 Vector and Tensor Analysis in Euclidean Space
No ratings yet
S G G G: 56 2 Vector and Tensor Analysis in Euclidean Space
5 pages
7 Queuing Disciplines
No ratings yet
7 Queuing Disciplines
3 pages
Sprawozdanie 4
No ratings yet
Sprawozdanie 4
4 pages
CH 4
100% (1)
CH 4
113 pages
Ai 2
No ratings yet
Ai 2
10 pages
Hippo Video Assessment (Round 1)
No ratings yet
Hippo Video Assessment (Round 1)
4 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Lecture Notes - Optimal Control (LQG, MPC)
No ratings yet
Lecture Notes - Optimal Control (LQG, MPC)
76 pages
Slide Matlab
No ratings yet
Slide Matlab
41 pages
On Genetic
No ratings yet
On Genetic
17 pages
Analysis and Design of Algorithm (ADA) : Amity School of Engineering & Technology (CSE)
No ratings yet
Analysis and Design of Algorithm (ADA) : Amity School of Engineering & Technology (CSE)
18 pages
(Ebook) Optimization Models by Giuseppe C
No ratings yet
(Ebook) Optimization Models by Giuseppe C
82 pages
8.6.1 - What Is An Iteration
No ratings yet
8.6.1 - What Is An Iteration
22 pages
Question Paper Code:: (10 2 20 Marks)
No ratings yet
Question Paper Code:: (10 2 20 Marks)
3 pages
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
No ratings yet
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
6 pages
What Is The Role of Algorithm Analysis in Data Structures?: Computer Science
No ratings yet
What Is The Role of Algorithm Analysis in Data Structures?: Computer Science
10 pages
Graph Traversal: Text Depth-First Search Breadth-First Search
No ratings yet
Graph Traversal: Text Depth-First Search Breadth-First Search
41 pages
Maize Leaf Disease Identification
No ratings yet
Maize Leaf Disease Identification
10 pages
Libopenabe v1.0.0 Design
No ratings yet
Libopenabe v1.0.0 Design
30 pages
Machine Learning Algorithms To Predict Mortality and Allocate Palliative Care For Older Patients With Hip Fracture
No ratings yet
Machine Learning Algorithms To Predict Mortality and Allocate Palliative Care For Older Patients With Hip Fracture
6 pages
MSC Applicable Mathematics Handbook
No ratings yet
MSC Applicable Mathematics Handbook
63 pages
3D Incompressible Navier-Stokes Solver Lower-Gauss-Seidel Algorithm
No ratings yet
3D Incompressible Navier-Stokes Solver Lower-Gauss-Seidel Algorithm
1 page

Introduction To Support Vector Machines

Uploaded by

Introduction To Support Vector Machines

Uploaded by

INTRODUCTION TO

• You need to remember a thumb rule to identify the right

• Hyperplane “B” has excellently performed this job.

Here, maximizing the distances

• Some of you may have selected the hyper-plane B as it has higher

SVM can solve this problem. It

• In above plot, points to consider are:

• Thus it can be immediately seen that a potential drawback of the

• How do we choose C in practice? Generally this is done via cross-

• The major advantage of SVMs is that they allow a non-linear

• So what are SVMs? In essence they are an extension of SVCs that

You might also like