0% found this document useful (0 votes)

38 views11 pages

Support Vector Machines

Support Vector Machines (SVMs) are powerful machine learning algorithms used for classification and regression, capable of handling both linear and non-linear data through the kernel trick. The algorithm aims to find the maximum margin hyperplane that best separates different classes, relying on support vectors to define this boundary. While SVMs are effective in high-dimensional spaces and robust to overfitting, they can be sensitive to noise and outliers, and their performance depends heavily on the choice of kernel and parameters.

Uploaded by

ryopurv96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

Support Vector Machines

Uploaded by

ryopurv96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Support Vector Machines

Support Vector Machines (SVMs in short) are machine learning algorithms that are used for
classification and regression purposes. SVMs are one of the powerful machine learning algorithms
for classification, regression and outlier detection purposes. An SVM classifier builds a model that
assigns new data points to one of the given categories. Thus, it can be viewed as a non-
probabilistic binary linear classifier.
The original SVM algorithm was developed by Vladimir N Vapnik and Alexey Ya. Chervonenkis in
1963. At that time, the algorithm was in early stages. The only possibility is to draw hyperplanes
for linear classifier. In 1992, Bernhard E. Boser, Isabelle M Guyon and Vladimir N Vapnik
suggested a way to create non-linear classifiers by applying the kernel trick to maximum-margin
hyperplanes. The current standard was proposed by Corinna Cortes and Vapnik in 1993 and
published in 1995.
SVMs can be used for linear classification purposes. In addition to performing linear classification,
SVMs can efficiently perform a non-linear classification using the kernel trick. It enable us to
implicitly map the inputs into high dimensional feature spaces.

Example

Let us consider two tags, yellow and blue, and our data has two features, x, and y. Given a pair of
(x,y) coordinates, we want a classifier that outputs either yellow or blue. We plot the labeled
training data on a plane:

An SVM takes these data points and outputs the hyperplane, which is simply a line in two-
dimension, that best separates the tags. The line is the decision boundary. Anything falling to one
side of it will be classified as yellow, and anything on the other side will be classified as blue.
For SVM, the best hyperplane is the one that maximizes the margins from both tags. It is the
hyperplane whose distance to the nearest element of each tag is the largest.

The above was easy since the data was linearly separable—a straight line can be drawn to separate
yellow and blue. However, in real scenarios, cases are usually not this simple. Consider the
following case:

There is no linear decision boundary. The vectors are, however, very clearly segregated, and it
seems as if it should be easy to separate them.
In this case, we will add a third dimension. Up until now, we have worked with two dimensions, x,
and y. A new z dimension is introduced in this case. It is set to be calculated a certain way that is
convenient, z = x² + y² (equation of a circle.) Taking a slice of this three-dimensional space looks
like this:

Let us see what SVM can do with this:

Note that since we are in three dimensions now, the hyperplane is a plane parallel to the x-axis at
a particular point in z, let us say z = 1. Now, it should be mapped back to two dimensions:

There we go! The decision boundary is a circumference with radius 1, and it separates both tags
by using SVM.
SVM working
In SVMs, our main objective is to select a hyperplane with the maximum possible margin between
support vectors in the given dataset. SVM searches for the maximum margin hyperplane in the
following 2 step process –
1. Generate hyperplanes which segregates the classes in the best possible way. There are
many hyperplanes that might classify the data. We should look for the best hyperplane
that represents the largest separation, or margin, between the two classes.
2. We choose the hyperplane so that distance from it to the support vectors on each side is
maximized. If such a hyperplane exists, it is known as the maximum margin
hyperplane and the linear classifier it defines is known as a maximum margin
classifier.

Step-by-step discussion

Step 1: Identifying support vectors

Support Vectors: Input vectors that just touch the boundary of the margin (street) – circled below,
there are 3 of them (or, rather, the ‘tips’ of the vectors). Support vectors are the sample data
points, which are closest to the hyperplane. These data points will define the separating line or
hyperplane better by calculating margins.
Here, in the second figure, we have shown the actual support vectors, v1, v2, v3, instead of just
the 3 circled points at the tail ends of the support vectors. d denotes 1/2 of the street ‘width’

Step 2: Defining the hyperplane

Hyperplane
A hyperplane is a decision boundary which separates between given set of data points having
different class labels. The SVM classifier separates data points using a hyperplane with the
maximum amount of margin. This hyperplane is known as the maximum margin hyperplane and
the linear classifier it defines is known as the maximum margin classifier.

Define the hyperplanes H such that:

w•xi +b ≥ +1 when yi =+1
w•xi +b ≤ -1 when yi = –1

H1 and H2 are the planes:

H1: w•xi +b = +1
H2: w•xi +b = –1
The points on the planes H1 and H2 are the tips of the Support Vectors
The plane H0 is the median in between, where w•xi +b =0

The margin (gutter) of a separating hyperplane is d+ + d–.

d+ = the shortest distance to the closest positive point
d- = the shortest distance to the closest negative point
Step 3:

Defining the separating Hyperplane

Form of equation defining the decision surface separating the classes is a hyperplane of the form:
wTx + b = 0
– w is a weight vector
– x is input vector
– b is bias
• Allows us to write
wTx + b ≥ 0 for di = +1
wTx + b < 0 for di = –1

Some final definitions

• Margin of Separation (d): the separation between the hyperplane and the closest data point for a
given weight vector w and bias b. A margin is a separation gap between the two lines on the
closest data points. It is calculated as the perpendicular distance from the line to support vectors
or closest data points. In SVMs, we try to maximize this separation gap so that we get maximum
margin.
• Optimal Hyperplane (maximal margin): the particular hyperplane for which the margin of
separation d is maximized.

Step 4:

Maximizing the margin (aka street width)

We want a classifier (linear separator) with as big a margin as possible.

Recall the distance from a point(x0,y0) to a line: Ax+By+c = 0 is: |Ax0 +By0 +c|/sqrt(A2+B2), so,
The distance between H0 and H1 is then: |w•x+b|/||w||=1/||w||, so
The total distance between H1 and H2 is thus: 2/||w||

In order to maximize the margin, we thus need to minimize ||w||. With the condition that there are
no datapoints between H1 and H2:
xi•w+b ≥ +1 when yi =+1
xi•w+b ≤ –1 when yi =–1
Can be combined into: yi(xi•w) ≥ 1
Maximum margin hyperplane

Step 5: Defining the optimization problem

Problem is: minimize ||w||, s.t. discrimination boundary is obeyed, i.e., min f(x) s.t. g(x)=0, which
we can rewrite as:
min f: ½ ||w||2 (Note this is a quadratic function)
s.t. g: yi(w•xi)–b = 1 or [yi(w•xi)–b] – 1 =0

This is a constrained optimization problem It can be solved by the Lagrangian multipler method
Because it is quadratic, the surface is a paraboloid, with just a single global minimum.

Kernel functions
In practice, SVM algorithm is implemented using a kernel. It uses a technique called the kernel
trick. In simple words, a kernel is just a function that maps the data to a higher dimension where
data is separable. A kernel transforms a low-dimensional input data space into a higher
dimensional space. So, it converts non-linear separable problems to linear separable problems by
adding more dimensions to it. Thus, the kernel trick helps us to build a more accurate classifier.
Hence, it is useful in non-linear separation problems.
The most widely used kernels in SVM are the linear kernel, polynomial kernel, and Gaussian
(radial basis function) kernel. The choice of kernel relies on the nature of the data and the job at
hand. The linear kernel is used when the data is roughly linearly separable, whereas the
polynomial kernel is used when the data has a complicated curved border. The Gaussian kernel is
employed when the data has no clear boundaries and contains complicated areas of overlap.
To make this data linearly separable, we can use the kernel trick.

By applying the kernel trick to the data, we transform it into a higher-dimensional feature space

where the data becomes linearly separable. We can see this in the plot below, where the red and

blue data points have been separated by a hyperplane in the 3D space:

Linear
These are commonly recommended for text classification because most of these types of
classification problems are linearly separable.

The linear kernel works really well when there are a lot of features, and text classification
problems have a lot of features. Linear kernel functions are faster than most of the others and you
have fewer parameters to optimize.

Here's the function that defines the linear kernel:

f(X) = w^T * X + b

In this equation, w is the weight vector that you want to minimize, X is the data that you're trying
to classify, and b is the linear coefficient estimated from the training data. This equation defines
the decision boundary that the SVM returns.

Polynomial

The polynomial kernel isn't used in practice very often because it isn't as computationally efficient
as other kernels and its predictions aren't as accurate.

Here's the function for a polynomial kernel:

f(X1, X2) = (a + X1^T * X2) ^ b

This is one of the more simple polynomial kernel equations you can use. f(X1, X2) represents the
polynomial decision boundary that will separate your data. X1 and X2 represent your data.

Gaussian Radial Basis Function (RBF)

One of the most powerful and commonly used kernels in SVMs. Usually the choice for non-linear
data.

Here's the equation for an RBF kernel:

f(X1, X2) = exp(-gamma * ||X1 – X2||^2)

In this equation, gamma specifies how much a single training point has on the other data points
around it. ||X1 - X2|| is the dot product between your features.

Sigmoid

More useful in neural networks than in support vector machines, but there are occasional specific
use cases.

Here's the function for a sigmoid kernel:

f(X, y) = tanh(alpha * X^T * y + C)

In this function, alpha is a weight vector and C is an offset value to account for some
misclassification of data that can happen.
Assumptions

Support Vector Machines (SVMs) have certain assumptions and properties that are important to
understand when using them:

Linear Separability: The primary assumption of SVM is that the data is or can be transformed into
a linearly separable space. In other words, there exists a hyperplane that can distinctly separate
the classes.

Margin Maximization: SVM aims to find the hyperplane that maximizes the margin between
classes. This assumes that a larger margin contributes to better generalization and improved
performance.

Noisy Data Handling: SVMs are sensitive to noisy data and outliers, as these may influence the
position and orientation of the decision boundary. Outliers can have a significant impact on the
resulting hyperplane.

Kernel Function Choice: The choice of the kernel function (linear, polynomial, radial basis
function) and its parameters can affect the performance of SVM. The appropriate kernel and
parameters depend on the characteristics of the data.

Memory Efficiency: SVMs are memory-efficient due to the use of a subset of training points
(support vectors) in decision-making. This can be an advantage in terms of memory usage, but it
also assumes that these support vectors are representative of the entire dataset.

Strengths:

Effective in High-Dimensional Spaces: SVMs perform well in high-dimensional spaces, making

them suitable for tasks with a large number of features.

Robust to Overfitting: SVMs are less prone to overfitting, especially in high-dimensional spaces,
due to the use of a margin that penalizes data points inside the margin.

Effective in Cases with Clear Margin of Separation: SVMs work well when there is a clear margin
of separation between classes, making them suitable for tasks with distinct and well-separated
classes.

Kernel Trick for Non-Linear Data: The kernel trick allows SVMs to handle non-linear decision
boundaries by implicitly mapping data into higher-dimensional spaces.

Versatile Kernels: SVMs support different kernel functions, providing flexibility in capturing
different types of relationships in the data.

Memory Efficiency: SVMs use a subset of training points (support vectors) in decision-making,
making them memory-efficient, especially when dealing with large datasets.

Weaknesses:

Sensitivity to Noise and Outliers: SVMs can be sensitive to noise and outliers, as they may
influence the position and orientation of the decision boundary.

Difficulty in Handling Large Datasets: SVMs can become computationally expensive and memory-
intensive, particularly with large datasets.
Choice of Kernel and Parameters: The choice of the appropriate kernel and tuning of
hyperparameters can be challenging, and the performance may be sensitive to these choices.

Limited Interpretability: The decision function of SVMs is not easily interpretable, making it
challenging to understand the contribution of each feature to the final decision.

Not Suitable for Imbalanced Datasets: SVMs may not perform well on highly imbalanced datasets
where one class significantly outnumbers the other.

Binary Classification: SVMs are inherently binary classifiers, and extensions to handle multiclass
problems may require strategies like one-vs-one or one-vs-all.

Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
1 State Space Search
No ratings yet
1 State Space Search
18 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
15 pages
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
No ratings yet
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
39 pages
Support Vector Machine
100% (1)
Support Vector Machine
11 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Lecture0 PDF
No ratings yet
Lecture0 PDF
125 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
Unit2 Notes What Is A Support Vector Machine
No ratings yet
Unit2 Notes What Is A Support Vector Machine
11 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Artificial Intelligence - Lecture 6
No ratings yet
Artificial Intelligence - Lecture 6
34 pages
Gauss-Siedel Method: Major: All Engineering Majors Authors: Autar Kaw
No ratings yet
Gauss-Siedel Method: Major: All Engineering Majors Authors: Autar Kaw
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
SVM (Repaired)
No ratings yet
SVM (Repaired)
39 pages
SVM
No ratings yet
SVM
43 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
40 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Support Vector Machines SVM & NAive Bayes
No ratings yet
Support Vector Machines SVM & NAive Bayes
30 pages
Unit 2
No ratings yet
Unit 2
47 pages
2B Queues
No ratings yet
2B Queues
21 pages
Session Svmclassification
No ratings yet
Session Svmclassification
28 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
28 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
10 Classification SVM
No ratings yet
10 Classification SVM
22 pages
AI Searchingstrategies PDF
No ratings yet
AI Searchingstrategies PDF
65 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
No ratings yet
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
20 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
27 pages
DA Unit-2 Part-2
No ratings yet
DA Unit-2 Part-2
24 pages
Support Vector Machine
No ratings yet
Support Vector Machine
18 pages
ML Lec-19
No ratings yet
ML Lec-19
20 pages
Support Vector Machine (SVM) - Kernel Functions
No ratings yet
Support Vector Machine (SVM) - Kernel Functions
20 pages
Thesis Antonio Napoletano
No ratings yet
Thesis Antonio Napoletano
136 pages
TMA947 Nonlinear Optimisation, 7.5 Credits MMG621 Nonlinear Optimisation, 7.5 Credits
No ratings yet
TMA947 Nonlinear Optimisation, 7.5 Credits MMG621 Nonlinear Optimisation, 7.5 Credits
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
Support Vector Machine-1
No ratings yet
Support Vector Machine-1
12 pages
CSE-304 Design & Analysis of Algorithm: Recurrence Relation
No ratings yet
CSE-304 Design & Analysis of Algorithm: Recurrence Relation
5 pages
SVMs
No ratings yet
SVMs
30 pages
SVMs
No ratings yet
SVMs
30 pages
SVM MJJ
No ratings yet
SVM MJJ
19 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
SVM Algorithm
No ratings yet
SVM Algorithm
17 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Module-2:Divide and Conquer
No ratings yet
Module-2:Divide and Conquer
26 pages
Adversarial Search Algorithms in Artificial Intelligence (AI) - GeeksforGeeks
No ratings yet
Adversarial Search Algorithms in Artificial Intelligence (AI) - GeeksforGeeks
20 pages
SVM
No ratings yet
SVM
11 pages
(CSE-225) Lecture-5 (Analysis of Algorithms)
No ratings yet
(CSE-225) Lecture-5 (Analysis of Algorithms)
32 pages
Lecture 04 (3hrs) Neural Network and Deep Learning-Part A
No ratings yet
Lecture 04 (3hrs) Neural Network and Deep Learning-Part A
76 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Chapter 3 - Support Vector Machine With Math. - Deep Math Machine Learning - Ai - Medium
No ratings yet
Chapter 3 - Support Vector Machine With Math. - Deep Math Machine Learning - Ai - Medium
11 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
70 pages
SVM
No ratings yet
SVM
12 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
6 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
9 pages
Support Vector Machine Algorithm
No ratings yet
Support Vector Machine Algorithm
8 pages
SVM Theory
No ratings yet
SVM Theory
7 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Machine Learning (R17a0534) 54 57
No ratings yet
Machine Learning (R17a0534) 54 57
4 pages
SVM Notes
No ratings yet
SVM Notes
4 pages
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
No ratings yet
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
6 pages
Binary Search Algorithm
No ratings yet
Binary Search Algorithm
6 pages
Uma035 4
No ratings yet
Uma035 4
2 pages
Content Beyond Syllabus Subbu Okk
No ratings yet
Content Beyond Syllabus Subbu Okk
8 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Introduction To Management Science: Integer Programming
No ratings yet
Introduction To Management Science: Integer Programming
23 pages
ABET Syllabus-Numerical Methods (Spring 2024-2025)
No ratings yet
ABET Syllabus-Numerical Methods (Spring 2024-2025)
2 pages
Grading Rubric
No ratings yet
Grading Rubric
1 page
Design and Analysis of Algorithms: Divide and Conquer Methodology
No ratings yet
Design and Analysis of Algorithms: Divide and Conquer Methodology
17 pages
Direct and Indirect Approaches For Solving Optimal Control Problems in MATLAB
No ratings yet
Direct and Indirect Approaches For Solving Optimal Control Problems in MATLAB
13 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Greatest Common Divisor of Strings
No ratings yet
Greatest Common Divisor of Strings
8 pages
MSApriori Algorithm Steps
No ratings yet
MSApriori Algorithm Steps
5 pages
Ai7 8
No ratings yet
Ai7 8
6 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bresenham Line Algorithm: Efficient Pixel-Perfect Line Rendering for Computer Vision
From Everand
Bresenham Line Algorithm: Efficient Pixel-Perfect Line Rendering for Computer Vision
Fouad Sabry
No ratings yet