0% found this document useful (0 votes)
17 views

Lecture 9 - SVM

The document discusses support vector machines (SVMs) and constrained optimization using Lagrange multipliers. It begins by introducing SVMs as an extension of the perceptron algorithm that finds a maximum margin linear separator for classification. It then formulates the SVM optimization problem to maximize the margin between two parallel hyperplanes that separate the data while allowing some misclassifications. The document explains how to solve the constrained optimization problem using Lagrange multipliers, which results in the dual problem. It notes that solving the dual problem provides a way to find the optimal solution while avoiding issues with non-differentiability in the primal problem.

Uploaded by

Husein Yusuf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture 9 - SVM

The document discusses support vector machines (SVMs) and constrained optimization using Lagrange multipliers. It begins by introducing SVMs as an extension of the perceptron algorithm that finds a maximum margin linear separator for classification. It then formulates the SVM optimization problem to maximize the margin between two parallel hyperplanes that separate the data while allowing some misclassifications. The document explains how to solve the constrained optimization problem using Lagrange multipliers, which results in the dual problem. It notes that solving the dual problem provides a way to find the optimal solution while avoiding issues with non-differentiability in the primal problem.

Uploaded by

Husein Yusuf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Lecture 9

Support Vector Machines


Lagrange Multipliers
Constrained optimization

Slides adapted from Lyle HUngar, University of Pennsylvania


Constrained optimization
u What constraintsmightwewant forML?
l Probabilities sumto 1
l Regression weights non-negative
l Regression weights less than aconstant
u Moregenerally
l Fixed amountof moneyor time or energy available

2
Lagrange Mulipliers - example
u To maximize f(x, y) subject to g(x, y) = k
find:

l Thelargest valueof c
suchthat thelevel
curvef(x, y) =c
intersects g(x, y) =k.
l This happens whenthe
lines areparallel
Lagrange Multiplier – the idea

This makes the curves be


parallel
As on the last slide
5
Lagrange Multiplier –
generalization
Find
minx f(x)
s.t.
ci(x) ≤ 0 j=1…m
Set
L(x,l) = f(x) + lTc(x)
At the minimum of L(x,l) For each lj, either
lj = 0 (the constraint is not active)
dL/dx = df/dx + lTdc/dx = 0 or
lj > 0 (the constraint is active)
lici(x) = 0 j=1…m and thus
li ≥ 0 j=1…m ci(x) = 0
6

KKT = Karush Kuhn Tucker conditions


Lagrange Multiplier Steps
1. Start with the primal

2. Formulate L
3. Find g(l) = minx (L)
solve dL/dx = 0
4. Find max g(l,n) s.t. li >= 0
5. See if the constraints are ni >= 0
binding
6. Find x* 7
Lagrange Multiplier Steps
1. Start with the primal

2. Formulate L

3. Find g(l) = minx (L)


solve dL/dx = 0
plug back into L
4. Find max g(l,n) s.t. li >= 0
try maximizing without constraints
5. Seeif the constraints are binding
it depends on the sign of –bc 8
6. Find x*
plug l* into relation = b/a
Lagrange Multipliers Visually
feasible infeasible
min (1/2) x2
s.t.
2x + 5 > 0

a=2, b=-5, c=1


Solve
maximize
1. Formulate L = f0(x) +l f1(x)
f(x,y) = x + y 2. Find minx (L) = g(l)
3. Find max g(l)
subject to 4. See if constraints are binding
x2 + y2 – 1 = 0 5. Find x*
Note that we formulate the
problem in terms of
Answer: x* = (x,y) = ?? minimization!!!
The answer

wikipedia
10
Formulate and solve
Findvaluesof aset of kprobabilities (p1, p2, …pk)
that maximizetheirentropy

minimize
f(p) = ?? 1. Formulate L
2. Find minx (L) = g(l)
subject to 3. Find max g(l)
4. See if constraints are binding
?? 5. Find x*
Answer: pi =??

12
Formulate
Find values of a vector of p non-negative weights
w that minimize Si(yi-xiw)2

minimize
f(w) = ??
subject to
??

13
Primal and Dual Formulation

• Primal Problem: The original optimization problem


that we aim to solve, typically maximizing or
minimizing a function subject to constraints.
• Dual Problem: Derived from the primal problem using
the Lagrangian, aiming to provide a lower bound for
the primal problem.
Primal and Dual Formulation
For a primal problem with objective function
f(x) and constraints gi(x):

L(x,λ)=f(x)+∑λigi(x)

• The solution of the dual problem provides a lower


bound on the optimal value of the primal
problem.
• If the primal problem is a maximization, the dual
is a minimization, and vice versa.
Support Vector Machines
Perceptron and Linear Separablity

Say there is a linear decision boundary which can perfectly separate the
training data
Which linear separator will the Perceptron algorithm return?

The separator with a


large margin g is better
for generalization

How can we incorporate the margin in finding the linear boundary?


Solution: Support Vector Machines (SVMs)

Motivation:

• It returns a linear classifier that is stable solution by giving a maximum


margin solution

• Slight modification to the problem provides a way to deal with non-


separable cases

• It is kernelizable, so gives an implicit way of yielding non-linear


classification.
SVM Formulation
• Say the training data S is linearly separable by
some margin (but the linear separator does not
necessarily passes through the origin).

• Then:
decision boundary:
Linear classifier:

• Idea: we can try finding two parallel hyperplanes that correctly


classify all the points, and maximize the distance between them!
SVM Formulation (contd. 1)
Decision boundary for the two hyperpanes:

Distance between the two hyperplanes:

why? d=∣w⋅xi+b∣
∣∣w∣∣
Training data is correctly classified if:
if yi = +1
if yi = -1
Together: for all i
SVM Formulation (contd. 2)
Distance between the hyperplanes:

Training data is correctly classified if:


(for all i)

Therefore, want:

Maximize the distance:

Such that:
(for all i)

Let’s put it in the standard form…


SVM Formulation (finally!)

Maximize:

Such that:
(for all i)

SVM standard (primal) form:

Minimize:

Such that:
(for all i)

What can we do if the problem is not-linearly separable?


SVM Formulation (non-separable case)

Idea: introduce a slack for the mis-


classified points, and minimize the
slack!

SVM standard (primal) form (with slack):

Minimize:

Such that:
(for all i)
SVM: Question
SVM standard (primal) form (with slack):

Minimize:

Such that:
(for all i)

Questions:
1. How do we find the optimal w, b and x?
2. Why is it called “Support Vector Machine”?
How to Find the Solution?
SVM standard (primal) form:
Cannot simply take the derivative
(wrt w, b and x) and examine the Minimize:
stationary points…
Such that:
Why? (for all i)

x2 Gradient not zero at


Minimize: x2
the function minima
Such that: x ³ 5 (respecting the
(infeasible constraints)!
region)

x=5 x

Need a way to do optimization with constraints


Detour: Constrained Optimization

Constrained optimization (standard form):

minimize (objective)
subject to: for 1 £ i £ n (constraints)
What to do?
• Projection methods
We’ll assume that the
start with a feasible solution x0, problem is feasible
find x1 that has slightly lower objective value,
if x1 violates the constraints,
project back to the constraints.
iterate.
• Penalty methods
use a penalty function to incorporate the constraints
into the objective
• …
The Lagrange (Penalty) Method
Optimization problem:
Consider the augmented function:
Minimize:
Such that:
(for all i)
(Lagrange variables,
(Lagrange function)
or dual variables)
Observation:
For any feasible x and all li ³ 0, we have
x feasible è g(x) £ 0
l ³ 0 è f(x) ³ f(x) + l g(x) = L(x, l)

• if x is infeasible, then g(x) > 0, so maxl³0 lg(x) = ¥


• if x is feasible, then g(x) £ 0, so maxl³0 lg(x) = 0
b/c either g(x) = 0, or by picking l = 0, lg(x)=0

So, the optimal value/solution to the original constrained optimization:


The problem becomes
unconstrained in x!
The Dual Problem
Optimization problem:
Optimal value:
(also called the primal) Minimize:
Such that:
For any fixed x, define (for all i)
(g is affine in l)

Since, for any feasible x and all li ³ 0 Lagrange function:

Let x* be the minimum feasible (over f),


For all li ³ 0

Hence:
(also called the dual)
(Weak) Duality Theorem
Optimization problem:
Theorem (weak Lagrangian duality):
Minimize:
Such that:
(also called the minimax inequality) (for all i)

Lagrange function:

(called the duality gap)

Primal:
Under what conditions can we
achieve equality?
Dual:
Convexity

A function f: Rd ® R is called convex iff for any two points x, x’ and b Î [0,1]

£
Convexity

A set S Ì Rd is called convex iff for any two points x, x’ Î S and any b Î [0,1]

Examples:
Convex Optimization

A constrained optimization

minimize (objective)
subject to: for 1 £ i £ n (constraints)

is called a convex optimization problem If:

the objective function is convex function, and


the feasible set induced by the constraints gi is a convex set
(if all f and g are convex, then the constraint
problem is a convex optimization)
Why do we care?
We can find the optimal solution for convex problems efficiently!
Convex Optimization: Niceties

• Every local optima is a global optima in a convex optimization


problem.
Example convex problems:
Linear programs, quadratic programs,
Conic programs, semi-definite program.

Several solvers exist to find the optima:


CVX, SeDuMi, C-SALSA, …

• We can use a simple ‘descend-type’ algorithm for finding the


minima!
Gradient Descent (for finding local minima)

Theorem (Gradient Descent):


Given a smooth function
Then, for any and
For sufficiently small , we have:

Can derive a simple algorithm (the projected Gradient Descent):

Initialize
for t = 1,2,…do
(step in the gradient direction)

(project back onto the constraints)


terminate when no progress can be made, ie,
Back to Constrained Opt.: Duality Theorems
Optimization problem:
Theorem (weak Lagrangian duality):
Minimize:
Such that:
(for all i)

Lagrange function:
Theorem (strong Lagrangian duality):
For a convex optimization problem, if
there exists a feasible point x, s.t.
Primal:
(for all i), or
whenever gi is affine
Dual:
Then (aka Slater’s condition;
sufficient for strong duality)
Ok, Back to SVMs
SVM standard (primal) form:
Observations:
• object function is convex
Minimize:
• the constraints are affine, inducing a (w,b)
polytope constraint set. Such that:
(for all i)
• So, SVM is a convex optimization problem
(in fact a quadratic program)

• Moreover, strong duality holds.

• Let’s examine the dual… the Lagrangian is:


SVM Dual
Lagrangian: SVM standard (primal) form:

Minimize:
(w,b)
Primal:
Such that:
(for all i)
Dual:
Unconstrained, let’s calculate

• when aI > 0, the corresponding xi is the support vector


• w is only a function of the support vectors!
SVM Dual (contd.)
Lagrangian: SVM standard (primal) form:

Minimize:
(w,b)
Primal:
Such that:
(for all i)
Dual:
Unconstrained, let’s calculate

So:

subject to
SVM Optimization Interpretation

SVM standard (primal) form:

Minimize: Maximize g = 2/||w||


(w,b)

Such that:
(for all i)

SVM standard (dual) form:

Maximize: Kernelized version


(ai)
Only a function of
Such that:
“support vectors”
(for all i)
What We Learned…

• Support Vector Machines

• Maximum Margin formulation

• Constrained Optimization

• Lagrange Duality Theory

• Convex Optimization

• SVM dual and Interpretation

• How get the optimal solution


Questions?
Next time…

Regression

You might also like