0% found this document useful (0 votes)
3 views20 pages

Support Vector Machine

Uploaded by

AVINASH SAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Support Vector Machine

Uploaded by

AVINASH SAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Support Vector machine


Are explicitly based on a theoretical model of learning

Come with theoretical guarantees about their performance

Are not affected by local minima

Do not suffer from the curse of dimensionality

Support vectors are the data points that lie closest to the decision
surface

They are the most difficult to classify

They have direct bearing on the optimum location of the
decision surface
Which Hyperplane?

In general, lots of possible
solutions

Support Vector Machine
finds an optimal solution.
SUPPORT VECTOR MACHINE

Also known as maximum-
margin classifier

Support vectors are the elements of the training set that would
change the position of the dividing hyper plane if removed.

Support vectors are the critical elements of the training set

The problem of finding the optimal hyper plane is an optimization
problem and can be solved by optimization techniques (use
Lagrange multipliers to get into a form that can be solved
analytically).
Maximizing the margin

We want a classifier with a margin as big as possible

In order to maximize the margin, we need to minimize ||w||. With
the condition that there are no data points between H1 and H2:
 xi •w+b ≥ +1 when y i =+1
 x i •w+b ≤ -1 when y i =-1
 Can be combined into y i (x i •w) ≥ 1
Constrained optimization
problem

Our goal is to develop a computationally efficient procedure for using
the training sample T={xi,di} i=1...N to find the optimal
hyperplane subject to constraint

di(wTxi+b)>=1 for i=1...N



Or we can state it as :(This is our primal problem)
Given the training sample T={xi,di} i=1..N ,find the optimum values of
the weight vector w and bias b such that they satisfy the constraint
di(wTxi+b) >=1
and the weight vector w minimizes the cost function
f(w)=(1/2)wTw
We can solve this constrained optimization problem by using method of
Lagrange multipliers
What is Lagrange Multipliers ?


When you want to maximize (or minimize) a multi variable function f(x,y,
…)
subject to the constraint that another multi variable function equals a
constant, g(x,y,…)=c, follow
 Step1: Introduce a new variable λ define a new function
L(x,y......λ)=f(x,y,…) - λ(g(x,y,…)−c)

The function L is known as Lagrange function and λ is known as
Lagrange multiplier.
 Step 2-set ∇L(x,y,…,λ)=0
 Step 3-Consider each solution, which will look something like (x0,y0,
…,λ0). Plug each one into function f
 Whichever one gives the greatest (or smallest) value is the maximum
(or minimum) point your are seeking.
Example :

Problem :Suppose you are running a factory, producing some sort of widget that requires steel as a raw
material. Your costs are predominantly human labor, which is $20 per hour for your workers, and the steel
itself, which runs for $170 per ton. Suppose your revenue R is loosely modeled by the following equation:
R (h,s)=200.h(2/3).s(1/3)
 Where h=hours of labour s=Tons of steel

If your budget is $20,000 what is the maximum possible revenue?

Now this budgetary constraint can be modeled as :

20h+170s=20000
or g(h,s)=20h+170s-20000=0
 we begin by writing the Lagrangian function for this
setup:
Step 1- L=(h,s,λ)=200.h(2/3).s(1/3)-λ(20h+170s-20000)
Step 2- set the gradient ∇L= 0 for every variable of the
∂ L /∂h=0
function:
( 2 /3 ) (1 /3 )
 ​ ∂ L / ∂ h =200 h + 170 s − 20000= 0

(−1 /3 ) (1/ 3)
200.(2 /3)h s −20 lambda=0 ...........(1)

Similarly solving

∂ L /∂ s=0
(2/ 3) (−1/ 3)
200.(1/ 3) h s −170 lambda=0 ...........(2)
∂ L / ∂ lam bda = 0
20 h+170 s−20000=0 .........(3)

Now solving the 3 equations we can find
h=666.67
s=39.21
λ=2.59

This means you should employ about 667 hours of labor,
and purchase 39 tons of steel, which will give a maximum
revenue of
51,777 $ also satisfying your budgetary constraint

The interpretation of λ is : For every additional $1 of
investment you will add a profit of 2.59$

Solving the Lagrangian function w.r.t condition -1 and condition-2
we get respectively

The solution vector w is defined in terms of expansion that
involves the N training examples.

The primal problem deals with a convex problem with linear
constraint

Given such constrained optimization problem we can build
another dual problem.

As per the Duality theorem-
 If the primal problem has an optimal solution then the dual problem
also has an optimal solution and the corresponding optimal values are
equal.

To postulate our dual problem let us expand our primal problem as
follows :

We can reformulate the previous equation as :Dual Problem)

Note that this optimization depends on samples xi only through
the dot product (xi)T(xj).

If we lift xi to high dimension using phi(x) we need to compute
high dimensional product phi(xi)Tphi(xj)

OR Can we have a function that can directly compute
the value of the dot product in high dimension and such a function is
known as KERNEL FUNCTION

Kernel function do not need to perform operations in high
dimension explicitely.
THANK YOU

YOU ARE ALSO TAKING A LOT OF PAIN FOR US THANK YOU

You might also like