0% found this document useful (0 votes)

106 views69 pages

Final - Support Vector Machine - Class - Modifie

Uploaded by

guptakomal12122001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views69 pages

Final - Support Vector Machine - Class - Modifie

Uploaded by

guptakomal12122001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Support Vector Machines

Support Vector Machines (SVM)

■ SVM were introduced by Vladimir Vapnik (Vapnik, 1995).
Technique based on statistical learning theory.

■ The main objective in SVM is to find the hyperplane which

separates the d-dimensional data points perfectly into two classes.

■ However, since example data is often not linearly separable, SVM's

introduce the notion of a “kernel induced feature space" which casts
the data points (input space) into a higher dimensional feature space
where the data is separable.

■ SVM works well with higher dimensional data and thus avoids
dimensionality problem.

■ SVM's higher-dimensional space doesn't need to be dealt with

directly which eliminates overfitting.
Support Vector Machines (SVM)

We discuss
 A classification technique when training data are linearly
separable known as a Linear SVM
• A classification technique when training data are linearly
non-separable known as a Non-linear SVM.
 Maximum margin of hyperplane is also key point
Support Vector Machines - Linear classifier

■ Classification tasksare based on drawing separating lines to

distinguish between objects of different class labels are known
as hyperplane classifiers.

■ A decision plane is one that separates between a set of objects

having different class labels.

■ Any new object falling to the right is labeled, i.e., classified, as

GREEN (or classified as RED should it fall to the left of the
separating line).

■ The objects closest to the hyperplane is called support vectors

Linear
separability

linearly
separabl
e

not
linearly
separabl
e
Input Space to Feature Space

•The original objects are transformed, using a set of mathematical

functions, known as kernels.
•Instead of constructing the complex curve, we find an optimal line
that can separate the objects.
Linear Classifiers We are given l training examples {xi, y i};
i = 1.. l , where each example has d
inputs (xi ∈ Rd), and a class label with
one of two values (yi ∈{-1, 1}.
denotes +1
denotes -1
•All hyperplanes in Rd
are parameterized by
a vector (w) and a
constant (b),
expressed using the
equation w . x + b = 0
•w is the vector
• Given such a hyperplane (w,b) that separates the orthogonal to the
data, using function f(x) = sign(w. x + b) hyperplane
Linear Classifiers

f(x,w,b) = sign(w x + b)
denotes +1 wx+b>0
denotes -1

How would you

classify this data?

wx+b<0
Finding a hyperplane

In matrix form, a hyperplane thus can be represented as

W.X + b = 0 (4)
where W =[w1,w2.......wm] and X = [x1,x2.......xm] and b is a real
constant.

Here, W and b are parameters of the classifier model to be

evaluated given a training set D.
Hyperplane and Classification

W.X + b = 0, the equation can be explain as follows.

W represents the orientation and b is the intercept of the

hyperplane from the origin.

If both W and b are scaled (up or down) by dividing a non zero

constant, we get the same hyperplane.

This means there can be infinite number of solutions using various

scaling factors, all of them geometrical representing the same
hyperplane.

Debasis Samanta (IIT Kharagpur)

Max Margin of Linear Classifiers
Two problem may arise:
 Whether all hyperplanes are equivalent so far the classification of
data is concerned?
 If not, which hyperplane is the best?

 Regarding Classification error (with training data), all of them are

with zero error.
 But there is no guarantee that all hyperplanes perform equally well
on unseen (i.e., test) data.

 For a good classifier, choose one of the infinite number of

hyperplanes, so that it performs better not only on training data but
as well as test data.
Hyperplanes with decision boundaries and their margins.
Maximum Margin Hyperplane

Two hyperplanes H1 and H2 have decision boundaries(denoted

as b11 and b12 for H1 and b21 and b22 for H2).

A decision boundary is a boundary which is parallel to hyperplane

and touches the closest class in one side of the hyperplane.

The distance between the two decision boundaries of a

hyperplane is called the margin. If data is classified using
Hyperplane H1, then it is with larger margin then using
Hyperplane H2.

The larger the margin, lower is the classification error.

The Perceptron Classifier
Given linearly separable data xi labelled into two categories yi = {-1,1} ,
find a weight vector w such that the discriminant function

f ( xi ) = w > xi + b
separates the categories for i = 1, .., N
• how can we find this separating hyperplane ?

The Perceptron Algorithm

Write classifier f ( xi ) = w̃> x̃i + w0 = w> xi
as

where w = (w̃, w0), xi = (x̃i , 1)

• Initialize w = 0
• Cycle though the data points { xi, yi }
• if xi is misclassified w ← w + α sign(f (xi ))
then xi
• Until all the data is correctly classified
For example in 2D
• Initialize w = 0
• Cycle though the data points { xi, yi }
• if xi is misclassified then w ← w + α sign(f (xi )) xi
• Until all the data is correctly classified

before update after Update

X2 X2

w
w

X1 X1
xi
8

6
Perceptron
example 4

-2

-4

-6

-8

-10

-15 -10 -5 0 5 10

• if the data is linearly separable, then the algorithm will

converge
• convergence can be slow …
• separating line close to training data
• we would prefer a larger margin for generalization
Hyperplane Classifier

■ To avoid such confusions

■ A given hyperplane represented by (w,b) is equally expressed
by all pairs {λw, λb} for λ ∈ R+.

■ We define the hyperplane which separates the data from the

hyperplane by a “distance” so that at least one example on
both sides has a distance of exactly 1.
■ That is, we consider those that satisfy:

■ w . xi + b ≥ 1 when yi = +1
■ w . xi + b ≤ 1 when yi = -1

yi (w . xi + b) ≥ 1 ∀i
• To obtain the geometric distance from the hyperplane to a data
point, we normalize by the magnitude of w.
• We want the hyperplane that maximizes the geometric distance
to the closest data points. d( (w, b) , x ) = [y (w. x + b)] / ||w|| ≥ 1 / ||w||
i i i

Choosing the hyperplane that maximizes the margin

Maximum Margin
1. Maximizing the margin
2. f(x,w,b)are
support vectors = sign(w x + b)
important
denotes +1
denotes -1
The maximum
margin linear
classifier
Support Vectors
This is the
simplest kind of
SVM (Called an
LSVM)

Linear SVM
Linear SVM for Linearly Not Separable Data

a linear SVM can be refitted to learn a hyperplane that is

tolerable to a small number of non-separable training data.

The approach of refitting is called soft margin approach (hence,

the SVM is called Soft Margin SVM), where it introduces slack
variables to the inseparable cases.

More specifically, the soft margin SVM considers a linear SVM

hyperplane (i.e., linear decision boundaries) even in situations
where the classes are not linearly separable.

.
Linear SVM Mathematically
x+ M=Margin Width

X-
Two hyperplanes are parallel (they
have the same normal) and that no
training points fall between them.

What we know:
■ w . x+ + b = +1

■ w . x- + b = -1

■ w . (x+-x-) = 2
Linear SVM Mathematically
■ Goal: 1) Correctly classify all training data
if yi = +1
if yi = -1
for all i
2) Maximize the Margin or same as minimize

■ We can formulate a constrained optimization Problem and solve for

w and b

■ Minimize

subject to
Searching for MMH

constrained optimization problem is popularly known as convex

optimization problem, where objective function is quadratic and
constraints are linear in the parameters W and b.

The well known technique to solve a convex optimization problem

is the standard Lagrange Multiplier method.

First, we shall learn the Lagrange Multiplier method, then come

back to the solving of our own SVM problem.

Debasis Samanta (IIT Kharagpur)

Lagrange Multipliers
• Consider a problem: minx f(x) subject to h(x) = 0

• We define the Lagrangian L (x, α ) = f(x) - α h (x)

• α is called “Lagrange multiplier”

• Solve: minx maxα L (x, α ) subject to α ≥ 0

Original Problem:
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all i {(xi ,yi)}: yi (wTxi + b) ≥ 1
Construct the Lagrangian Function for
optimization

S. T. αi ≥ 0; ∀
i

Our goal is to: OR

The derivative with respect to b

Substituting we get:

maxα :

Subject to

α is the vector of m non-negative Lagrange multipliers to be determined,

and C is a constant
Optimal hyperplane :

• The vector w is just a linear combination of the training examples.

•If we’ve found the αi’s, in order to make a prediction, we have
to calculate a quantity that depends only on the inner product
between x and the points in the training set.

•
Illustration : Linear SVM
Consider the case of a binary classification starting with a
training data of 8 tuples as shown in Table 1.
Using quadratic programming, we can solve the KKT constraints
to obtain the Lagrange multipliers λ i for each training tuple,
which is shown in Table 1.
Note that only the first two tuples are support vectors in this case.
Let W = (w1, w2) and b denote the parameter to be
determined now. We can solve for w1 and w2 as follows:

w1 = Σ λ i .yi .xi1 = 65.52 × 1 × 0.38 + 65.52 × −1 × 0.49 = −6.64

w2 = Σ λ i .yi .xi2 = 65.52 × 1 × 0.47 + 65.52 × −1 × 0.61 = −9.32

i
Illustration : Linear SVM

Table 1: Training Data

x1 x2 y λ
0.38 0.47 + 65.52
0.49 0.61 - 65.52
0.92 0.41 - 0
0.74 0.89 - 0
0.18 0.58 + 0
0.41 0.35 + 0
0.93 0.81 - 0
0.21 0.10 + 0

Debasis Samanta (IIT Kharagpur)

Illustration : Linear SVM

Linear SVM example.

Debasis Samanta (IIT Kharagpur)

Illustration : Linear SVM
The parameter b can be calculated for each support vector
as follows
b1 = 1 −W.x1 // for support vector x1
= 1 −(−6.64) × 0.38 −(−9.32) × 0.47 //using dot product
= 7.93
b2 = 1 −W .x2 // for support vector x2
= 1 −(−6.64) × 0.48 −(−9.32) × 0.611 //using dot product
= 7.93

Averaging these values of b1 and b2, we get b = 7.93.

Debasis Samanta (IIT Kharagpur)

Illustration : Linear SVM

Thus, the MMH is −6.64x1 −9.32x2 + 7.93 = 0 (also see Fig. 6).
Suppose, test data is X = (0.5, 0.5). Therefore,
δ(X ) = W .X + b
= −6.64 × 0.5 −9.32 × 0.5 + 7.93
= −0.05
= −ve

This implies that the test data falls on or below the MMH and
SVM classifies that X belongs to class label -.
Dataset with noise

denotes +1 ■ Hard Margin: So far we require

all data points be classified correctly.
denotes -1
- No training error
■ What if the training set is noisy?
- Solution 1: use very powerful
kernels

OVERFITTING!
Soft Margin Classification
Slack variables ξi can be added to allow
misclassification of difficult or noisy examples.

What should our quadratic

ε11 optimization criterion be?
ε2 Minimize

ε7
Hard Margin v.s. Soft Margin
■ The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1

■ The new formulation incorporating slack variables:

Find w and b such that

Φ(w) =½ wTw + λ Σξ i is minimized and for all {(x ,yi)}
i
yi (w xi + b) ≥ 1- ξ i and ξi ≥ 0 for all i
T

■ Parameter λ can be viewed as a way to control

overfitting.
Linear SVMs: Overview
■ The classifier is a separating hyperplane.
■ Most “important” training points are support vectors; they
define the hyperplane.
■ Quadratic optimization algorithms can identify which training
points xi are support vectors with non-zero Lagrangian
multipliers αi.
■ Both in the dual formulation of the problem and in the solution
training points appear only inside dot products:

Find α1…αN such that

Q(α) =Σα i - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

f(x) = Σαiyixi Tx + b
Learned model

f ( x) = w > x + b

Slide from Deva

Ramanan
Illustration :
Linear SVM
Thus, the MMH is −6.64x1 −9.32x2 + 7.93 = 0 (also see Fig. 6).
Suppose, test data is X = (0.5, 0.5). Therefore,
δ(X ) = W .X + b
= −6.64 × 0.5 −9.32 × 0.5 + 7.93
= −0.05
= −ve

This implies that the test data falls on or below the MMH and
SVM classifies that X belongs to class label -.
Non-linear SVMs
■ Datasets that are linearly separable with some noise work out
great:
0 x

■ But what are we going to do if the dataset is just too hard?

0 x
■ How about… mapping data to a higher-dimensional space:

0 x
Non-Linear SVM
For understanding this, .
Note that a linear hyperplane is expressed as a linear equation
in terms of n-dimensional component, whereas a non-linear
hypersurface is a non-linear expression.

Figure 13: 2D view of few class separabilities.

Data Analytics Autumn 2018 41 / 131

Debasis Samanta (IIT Kharagpur)
Non-Linear SVM
A hyperplane is expressed as
linear : w1x1 + w2x2 + w3x3 + c = 0 (30)

Whereas a non-linear hypersurface is expressed as.

Nonlinear : w1x 2 + w2x 2 + w3x1x2 + w4x 2 + w5x1x3 + c = 0 (31)
1 2 3

The task therefore takes a turn to find a nonlinear

decision boundaries, that is, nonlinear hypersurface in
input space comprising with linearly not separable data.
This task indeed neither hard nor so complex, and fortunately
can be accomplished extending the formulation of linear SVM,
we have already learned.

Data Analytics Autumn 2018 42 / 131

Debasis Samanta (IIT Kharagpur)
Non-linear SVMs: Feature spaces
■ General idea: the original input space (nonlinear separable
data) can always be mapped to some higher-dimensional
feature space where the training set is linearly separable:

Φ: x → φ(x)
Concept of Non-Linear Mapping
Example: Non-linear mapping to linear SVM
The below figure shows an example of 2-D data set consisting
of class label +1 (as +) and class label -1 (as -).

Figure: Non-linear mapping to Linear SVM.

Concept of Non-Linear Mapping

Example: Non-linear mapping to linear SVM

We see that all instances of class -1 can be separated from
instances of class +1 by a circle, The following equation of
the decision boundary can be thought of:
X (x1, x2) = +1

if q
(x1 −0.5) 2 + (x2 −0.5) 2 > 2

X (x1, x2) = −1 otherwise

Concept of Non-Linear Mapping
Example: Non-linear mapping to linear SVM
The Z space when plotted will take view where data are
separable with linear boundary,
Z : z1 + z2 = −0.46

Non-linear mapping to Linear SVM.

Debasis Samanta (IIT Kharagpur)

Non-Linear to Linear Transformation: Issues

The non linear mapping and hence a linear decision boundary

concept looks pretty simple. But there are many potential
problems to do so.

1 Mapping: How to choose the non linear mapping to a higher

dimensional space?

In fact, the φ-transformation works fine for small example.

But, it fails for realistically sized problems.

2 Cost of mapping:For n-dimensional input instances there exist

(N+d−1)!
NH = different monomials comprising a feature space of
d!(N− 1)!

dimensionality NH . Here, d is the maximum degree of monomial.

Debasis Samanta (IIT Kharagpur)

Non-Linear to Linear Transformation: Issues

Dimensionality problem: It may suffer from the curse of

dimensionality problem often associated with a high dimensional
data.

More specifically, in the calculation of W.X or Xi .X (in δ(X ) see Eqn.

18), we need n multiplications and n additions (in their dot products)
for each of the n-dimensional input instances and support vectors,
respectively.
As the number of input instances as well as support vectors are
enormously large, it is therefore, computationally expensive.

4 Computational cost: Solving the quadratic constrained optimization

problem in the high dimensional feature space is too a
computationally expensive task.

Debasis Samanta (IIT Kharagpur) Autumn 2018 48 / 131

Non-Linear to Linear Transformation: Issues

Fortunately, mathematicians have cleverly proposes an elegant

solution to the above problems.

Their solution consist of the following:

1 Dual formulation of optimization problem
2 Kernel trick

Autumn 2018 49 / 131

Mapping the Inputs to other dimensions - the
use of Kernels
• Finding the optimal curve to fit the data is difficult.
• To “pre-process" the data in such a way that the problem is
transformed into one of finding a simple hyperplane.
•We define a mapping z = φ(x) that transforms the d-dimensional
input vector x into a (usually higher) d*-dimensional vector z.
• To choose a φ() so that the new training data {φ(xi),yi} is
separable by a hyperplane.
• How do we go about choosing φ()?
Kernel Trick

Example: SVM classifier for non-linear data

Suppose, there are a set of data in R 2 (i.e., in 2-D space), φ is the
mapping for X ϵ R 2 to Z (z1 , z2 , z3 )ϵR 3 , in the 3-D space.

R 2 ⇒ X (x1 , x2 )

R 3 ⇒ Z (z1 , z2 , z3 )

Debasis Samanta (IIT Kharagpur) Autumn 2018 51 / 131

Kernel Trick

Example: SVM classifier for non-linear data

φ(X ) ⇒ Z

z1 = x12, z 2 = √ 2x1 x2, z3 = x22

The hyperplane in R2 is of the form

w1 x12 + w2√ 2x1x 2 + w3x22 = 0

Which is the equation of an ellipse in 2D.

Debasis Samanta (IIT Kharagpur) Autumn 2018 52 / 131

Kernel Trick
Example: SVM classifier for non-linear data
After the transformation, φ as mentioned above, we have the
decision boundary of the form

w1 z1 + w2 z2 + w3 z2 = 0
This is clearly a linear form in 3-D space. In other words,
W .x + b = 0 in R 2 has a mapped equivalent W .z + bJ = 0 in R 3

This means that data which are not linearly separable in 2-D are
separable in 3-D, that is, non linear data can be classified by a
linear SVM classifier.

Debasis Samanta (IIT Kharagpur) Autumn 2018 53 / 131

Kernel Trick
The above can be generalized in the following.

Classifier:
n
δ(x) = Σ λ i yi yi .x + b
i=1

n
δ(z) = Σ λ i yi φ(xi ).φ(x) + b
i=1

Learning:

n 1
Maximize Σ λ i − 2 i,jΣ i j i j i j
λ λ .y .y .x .x
i=1

1
n
Maximize Σ λ i − 2 Σ λ i λ j .yi .yj φ(xi ).φ(xj )
i=1 i,j
Σ
Subject to: λ i ≥ 0, i λ i .yi = 0

Debasis Samanta (IIT Kharagpur) Autumn 2018 54 / 131

Kernel Trick
Now, question here is how to choose φ, the mapping function
X ⇒ Z , so that linear SVM can be directly applied to.

A breakthrough solution to this problem comes in the form of a

method as the kernel trick.

We discuss the kernel trick in the following.

We know that (.) dot product is often regarded as a measure of
similarity between two input vectors.
For example, if X and Y are two vectors, then
X .Y = |X ||Y|cosθ

Here, similarity between X and Y is measured as cosine similarity.

If θ=0 (i.e., cosθ=1), then they are most similar, otherwise
orthogonal means dissimilar.
Debasis Samanta (IIT Kharagpur) Autumn 2018 55 / 131
Kernel Trick
Analogously, if Xi and Xj are two tuples, then Xi .Xj is regarded as
a measure of similarity between Xi and Xj .

Again, φ(Xi ) and φ(Xj ) are the transformed features of Xi and Xj ,

respectively in the transformed space; thus, φ(Xi ).φ(Xj ) is also
should be regarded as the similarity measure between φ(Xi ) and
φ(Xj ) in the transformed space.

This is the basic idea behind the kernel trick.

Now, naturally question arises, if both measures the similarity, then

what is the correlation between them (i.e., Xi .Xj and φ(Xi ).φ(Xj )).

Let us try to find the answer to this question through an example.

Debasis Samanta (IIT Kharagpur) Autumn 2018 56 / 131

Kernel Trick
Example: Correlation between Xi .Xj and φ(Xi ).φ(Xj )
Without any loss of generality, let us consider a situation stated
below.

φ : R 2 ⇒ R 3 = x12 ⇒ z, x 22 ⇒ z 2, √ 2x1x2 ⇒ z 3 (40)

Suppose, Xi = [xi1, xi2] and Xj = [xj1, xj2] are any two vectors in
R2.

Similarly, φ(Xi ) = [x i1
2 , 2.x .x , x 2 ] and
√ i1 i2 i2
φ(Xj ) = [xj1, √2.xj1.xj2, xj2
2 ] are two transformed version of X and
i
2
Xj but in 3
R .

Debasis Samanta (IIT Kharagpur) Autumn 2018 57 / 131

Kernel Trick
Example: Correlation between Xi .Xj and φ(Xi ).φ(Xj )
Now,
x 2j1
φ(Xi ).φ(Xj ) = [x i1, √2.xi1 .xi2, xi2
2]
√2xj1x j2
2
x2
j2
= x 2 .x 2 + 2xi1 xi2 xj1 xj2 + x 2 .x 2
i1 j1 i2 j2

= (xi1 .xj1 + xi2 .xj2 ) 2

x j1 2
= { [xi1 , xi2] xj2}

= (Xi .Xj ) 2

Debasis Samanta (IIT Kharagpur) Autumn 2018 58 / 131

Kernel Trick : Correlation between Xi.Xj and φ(Xi ).φ(Xj )

With reference to the above example, we can conclude that

φ(Xi ).φ(Xj ) are correlated to Xi .Xj .
In fact, the same can be proved in general, for any feature vectors
and their transformed feature vectors.
More specifically, there is a correlation between dot products of
original data and transformed data.
Based on the above discussion, we can write the following
implications.
Xi .Xj ⇒ φ(Xi ).φ(Xj ) ⇒ K (Xi , Xj ) (41)
Here, K (Xi , Xj denotes a function more popularly called as kernel
function

Debasis Samanta (IIT Kharagpur) Autumn 2018 59 / 131

Kernel Trick : Significance

Computational efficiency:
Another important significance is easy and efficient computability.

We know that in the discussed SVM classifier, we need several and

repeated round of computation of dot products both in learning
phase as well as in classification phase.

On other hand, using kernel trick, we can do it once and with fewer
dot products.

Debasis Samanta (IIT Kharagpur) Autumn 2018 60 / 131

The “Kernel Trick”
■ The linear classifier relies on dot product between vectors K(x ,x )=x Tx
i j i j
■ If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
■ A kernel function is some function that corresponds to an inner product in
some expanded feature space.
■ Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2 ,
Need to show that K(xi,xj)= φ(x i) Tφ(x j):
K(xi,xj)=(1 + xiTxj)2,
= 1+ x i12xj12 + 2 xi1xj1 xi2xj2 + x i22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(x j), where φ(x) = [1 x 12 √2 x 1x 2 x2 2 √2x1 √2x2]
Non-linear SVMs Mathematically
■ Dual problem formulation:
Find α1…αN such that
Q(α) =Σαi - ½ ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαi yi = 0
(2) αi ≥ 0 for all αi

■ The solution is:

f(x) = Σαi yi K(xi, xj) + b

■ Optimization techniques for finding αi’s remain the same!

Examples of Kernel Functions
T
■ Linear: K(xi,xj)= xi xj

■ Polynomial of power p: K(x ,x )= (1+ x Tx )p

i j i j
Produces large dot products. Power ρ is specified
apriori by the user.

■ Gaussian (radial-basis function network):

Examples of Kernel Functions
■ Laplacian:
K (X, Y ) = e−λ||x −y ||

■ Mahalanobis:
K (X, Y ) = e−(X −y )T A(x −y )
Followed when statistical test data is known

 Sigmoid: K (X, Y ) = tanh(β0XT y + β1)

Followed when statistical test data is known
Nonlinear SVM - Overview
■ SVM locates a separating hyperplane in the
feature space and classify points in that
space
■ It does not need to represent the space
explicitly, simply by defining a kernel
function
■ The kernel function plays the role of the dot
product in the feature space.
Properties of SVM
■ Flexibility in choosing a similarity function.

■ Sparseness of solution when dealing with large data

sets
-only support vectors are used to specify the separating
hyperplane

■ Ability to handle large feature spaces

-complexity does not depend on the dimensionality of the
feature space

■ Overfitting can be controlled by soft margin

approach
■ Nice math property: a simple convex optimization problem
which is guaranteed to converge to a single global solution

■ Feature Selection
Weakness of SVM
■ It is sensitive to noise
-A relatively small number of mislabeled examples can
dramatically decrease the performance

■ It only considers two classes

- how to do multi-class classification with SVM?
- Answer:
1) with output arity m, learn m SVM’s
❑ SVM 1 learns “Output==1” vs “Output != 1”
❑ SVM 2 learns “Output==2” vs “Output != 2”
❑ :
❑ SVM m learns “Output==m” vs “Output != m”
2)To predict the output for a new input, just predict with each
SVM and find out which one puts the prediction the furthest
into the positive region.
Some Issues
■ Choice of kernel
- Gaussian or polynomial kernel is default
- if ineffective, more elaborate kernels are needed
-domain experts can give assistance in formulating appropriate
similarity measures

■ Choice of kernel parameters

- e.g. σ in Gaussian kernel
- σ is the distance between closest points with different
classifications
- In the absence of reliable criteria, applications rely on the use
of a validation set or cross-validation to set such parameters.

■ Optimization criterion – Hard margin v.s. Soft margin

- a lengthy series of experiments in which various parameters
are tested

English Stage 9 Sample Paper 2 Insert - tcm143-595376
55% (11)
English Stage 9 Sample Paper 2 Insert - tcm143-595376
4 pages
CHY Brochure A4 72pg V11
No ratings yet
CHY Brochure A4 72pg V11
72 pages
TPO 57 Listening
No ratings yet
TPO 57 Listening
11 pages
Hindu Conceptions of Law
No ratings yet
Hindu Conceptions of Law
25 pages
Accounting Information Systems 14th Edition (Ebook PDF) Download
100% (1)
Accounting Information Systems 14th Edition (Ebook PDF) Download
58 pages
Weekly Test
No ratings yet
Weekly Test
2 pages
Upsc Cms Guru Answerkey2022p1
No ratings yet
Upsc Cms Guru Answerkey2022p1
45 pages
Lecture No. 08. Manual Techniques at Shoulder - A
No ratings yet
Lecture No. 08. Manual Techniques at Shoulder - A
29 pages
GCRG International Conference
No ratings yet
GCRG International Conference
5 pages
27 Support - Vector - Machine
No ratings yet
27 Support - Vector - Machine
17 pages
CTRF
No ratings yet
CTRF
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Lecture7C Classification
No ratings yet
Lecture7C Classification
34 pages
Diagnostic Procedures in Gynecology (2023)
No ratings yet
Diagnostic Procedures in Gynecology (2023)
3 pages
SSC CPO 2023 Answer Key in English GS2
No ratings yet
SSC CPO 2023 Answer Key in English GS2
7 pages
The Geisha Memory 2
No ratings yet
The Geisha Memory 2
25 pages
SVM PPT
No ratings yet
SVM PPT
32 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
74 pages
Miraña Genus Aeromonas
No ratings yet
Miraña Genus Aeromonas
1 page
Tugas Inggris Ridwan TaufikC1B230115 An23 Kls Pesantren
No ratings yet
Tugas Inggris Ridwan TaufikC1B230115 An23 Kls Pesantren
5 pages
Emcee Script
100% (2)
Emcee Script
2 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
Module 6-Svm
No ratings yet
Module 6-Svm
47 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Exp 14
No ratings yet
Exp 14
27 pages
Support Vector Machine (SVM) : Salim A
No ratings yet
Support Vector Machine (SVM) : Salim A
255 pages
Instructions: Meet DRU - The World's First Pizza Delivery Robot!
No ratings yet
Instructions: Meet DRU - The World's First Pizza Delivery Robot!
9 pages
Corbin's Concepts of Fitness and Wellness: A Comprehensive Lifestyle Approach ISE 13th Edition Charles B. Corbin 2024 Scribd Download
100% (1)
Corbin's Concepts of Fitness and Wellness: A Comprehensive Lifestyle Approach ISE 13th Edition Charles B. Corbin 2024 Scribd Download
79 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
SVM
No ratings yet
SVM
11 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
6BT - 6BTA ReCon - Cummins Inc
No ratings yet
6BT - 6BTA ReCon - Cummins Inc
7 pages
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
No ratings yet
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Instruction Manual: Agri AGRI Solar Electric Fence Energizers
No ratings yet
Instruction Manual: Agri AGRI Solar Electric Fence Energizers
21 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
Control System Configuration PDF
100% (1)
Control System Configuration PDF
2 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
LG Dry Contact (Only AC 24V) : Installation Manual
No ratings yet
LG Dry Contact (Only AC 24V) : Installation Manual
11 pages
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
No ratings yet
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
3 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
No ratings yet
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
14 pages
Jona
No ratings yet
Jona
4 pages
Value Creation Through Mergers and Acquistion - Eicher Motors
No ratings yet
Value Creation Through Mergers and Acquistion - Eicher Motors
21 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Lamprel Jebel Ali
No ratings yet
Lamprel Jebel Ali
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
28 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
System Monitoring With Sar and Ksar
No ratings yet
System Monitoring With Sar and Ksar
9 pages
FPGA Implementation of Simplified SVPWM Algorithm For Three Phase Voltage Source Inverter
No ratings yet
FPGA Implementation of Simplified SVPWM Algorithm For Three Phase Voltage Source Inverter
8 pages
Brand Personality
No ratings yet
Brand Personality
3 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
CH 5 SVM
No ratings yet
CH 5 SVM
25 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Unit 2
No ratings yet
Unit 2
47 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Chapter 3 - Support Vector Machine With Math. - Deep Math Machine Learning - Ai - Medium
No ratings yet
Chapter 3 - Support Vector Machine With Math. - Deep Math Machine Learning - Ai - Medium
11 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages