Lecture 9 - SVM
Lecture 9 - SVM
2
Lagrange Mulipliers - example
u To maximize f(x, y) subject to g(x, y) = k
find:
l Thelargest valueof c
suchthat thelevel
curvef(x, y) =c
intersects g(x, y) =k.
l This happens whenthe
lines areparallel
Lagrange Multiplier – the idea
2. Formulate L
3. Find g(l) = minx (L)
solve dL/dx = 0
4. Find max g(l,n) s.t. li >= 0
5. See if the constraints are ni >= 0
binding
6. Find x* 7
Lagrange Multiplier Steps
1. Start with the primal
2. Formulate L
wikipedia
10
Formulate and solve
Findvaluesof aset of kprobabilities (p1, p2, …pk)
that maximizetheirentropy
minimize
f(p) = ?? 1. Formulate L
2. Find minx (L) = g(l)
subject to 3. Find max g(l)
4. See if constraints are binding
?? 5. Find x*
Answer: pi =??
12
Formulate
Find values of a vector of p non-negative weights
w that minimize Si(yi-xiw)2
minimize
f(w) = ??
subject to
??
13
Primal and Dual Formulation
L(x,λ)=f(x)+∑λigi(x)
Say there is a linear decision boundary which can perfectly separate the
training data
Which linear separator will the Perceptron algorithm return?
Motivation:
• Then:
decision boundary:
Linear classifier:
why? d=∣w⋅xi+b∣
∣∣w∣∣
Training data is correctly classified if:
if yi = +1
if yi = -1
Together: for all i
SVM Formulation (contd. 2)
Distance between the hyperplanes:
Therefore, want:
Such that:
(for all i)
Maximize:
Such that:
(for all i)
Minimize:
Such that:
(for all i)
Minimize:
Such that:
(for all i)
SVM: Question
SVM standard (primal) form (with slack):
Minimize:
Such that:
(for all i)
Questions:
1. How do we find the optimal w, b and x?
2. Why is it called “Support Vector Machine”?
How to Find the Solution?
SVM standard (primal) form:
Cannot simply take the derivative
(wrt w, b and x) and examine the Minimize:
stationary points…
Such that:
Why? (for all i)
x=5 x
minimize (objective)
subject to: for 1 £ i £ n (constraints)
What to do?
• Projection methods
We’ll assume that the
start with a feasible solution x0, problem is feasible
find x1 that has slightly lower objective value,
if x1 violates the constraints,
project back to the constraints.
iterate.
• Penalty methods
use a penalty function to incorporate the constraints
into the objective
• …
The Lagrange (Penalty) Method
Optimization problem:
Consider the augmented function:
Minimize:
Such that:
(for all i)
(Lagrange variables,
(Lagrange function)
or dual variables)
Observation:
For any feasible x and all li ³ 0, we have
x feasible è g(x) £ 0
l ³ 0 è f(x) ³ f(x) + l g(x) = L(x, l)
Hence:
(also called the dual)
(Weak) Duality Theorem
Optimization problem:
Theorem (weak Lagrangian duality):
Minimize:
Such that:
(also called the minimax inequality) (for all i)
Lagrange function:
Primal:
Under what conditions can we
achieve equality?
Dual:
Convexity
A function f: Rd ® R is called convex iff for any two points x, x’ and b Î [0,1]
£
Convexity
A set S Ì Rd is called convex iff for any two points x, x’ Î S and any b Î [0,1]
Examples:
Convex Optimization
A constrained optimization
minimize (objective)
subject to: for 1 £ i £ n (constraints)
Initialize
for t = 1,2,…do
(step in the gradient direction)
Lagrange function:
Theorem (strong Lagrangian duality):
For a convex optimization problem, if
there exists a feasible point x, s.t.
Primal:
(for all i), or
whenever gi is affine
Dual:
Then (aka Slater’s condition;
sufficient for strong duality)
Ok, Back to SVMs
SVM standard (primal) form:
Observations:
• object function is convex
Minimize:
• the constraints are affine, inducing a (w,b)
polytope constraint set. Such that:
(for all i)
• So, SVM is a convex optimization problem
(in fact a quadratic program)
Minimize:
(w,b)
Primal:
Such that:
(for all i)
Dual:
Unconstrained, let’s calculate
Minimize:
(w,b)
Primal:
Such that:
(for all i)
Dual:
Unconstrained, let’s calculate
So:
subject to
SVM Optimization Interpretation
Such that:
(for all i)
• Constrained Optimization
• Convex Optimization
Regression