0% found this document useful (0 votes)

30 views42 pages

Lecture 9 - SVM

The document discusses support vector machines (SVMs) and constrained optimization using Lagrange multipliers. It begins by introducing SVMs as an extension of the perceptron algorithm that finds a maximum margin linear separator for classification. It then formulates the SVM optimization problem to maximize the margin between two parallel hyperplanes that separate the data while allowing some misclassifications. The document explains how to solve the constrained optimization problem using Lagrange multipliers, which results in the dual problem. It notes that solving the dual problem provides a way to find the optimal solution while avoiding issues with non-differentiability in the primal problem.

Uploaded by

Husein Yusuf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views42 pages

Lecture 9 - SVM

Uploaded by

Husein Yusuf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Lecture 9

Support Vector Machines

Lagrange Multipliers
Constrained optimization

Slides adapted from Lyle HUngar, University of Pennsylvania

Constrained optimization
u What constraintsmightwewant forML?
l Probabilities sumto 1
l Regression weights non-negative
l Regression weights less than aconstant
u Moregenerally
l Fixed amountof moneyor time or energy available

2
Lagrange Mulipliers - example
u To maximize f(x, y) subject to g(x, y) = k
find:

l Thelargest valueof c
suchthat thelevel
curvef(x, y) =c
intersects g(x, y) =k.
l This happens whenthe
lines areparallel
Lagrange Multiplier – the idea

This makes the curves be

parallel
As on the last slide
5
Lagrange Multiplier –
generalization
Find
minx f(x)
s.t.
ci(x) ≤ 0 j=1…m
Set
L(x,l) = f(x) + lTc(x)
At the minimum of L(x,l) For each lj, either
lj = 0 (the constraint is not active)
dL/dx = df/dx + lTdc/dx = 0 or
lj > 0 (the constraint is active)
lici(x) = 0 j=1…m and thus
li ≥ 0 j=1…m ci(x) = 0
6

KKT = Karush Kuhn Tucker conditions

Lagrange Multiplier Steps
1. Start with the primal

2. Formulate L
3. Find g(l) = minx (L)
solve dL/dx = 0
4. Find max g(l,n) s.t. li >= 0
5. See if the constraints are ni >= 0
binding
6. Find x* 7
Lagrange Multiplier Steps
1. Start with the primal

2. Formulate L

3. Find g(l) = minx (L)

solve dL/dx = 0
plug back into L
4. Find max g(l,n) s.t. li >= 0
try maximizing without constraints
5. Seeif the constraints are binding
it depends on the sign of –bc 8
6. Find x*
plug l* into relation = b/a
Lagrange Multipliers Visually
feasible infeasible
min (1/2) x2
s.t.
2x + 5 > 0

a=2, b=-5, c=1

Solve
maximize
1. Formulate L = f0(x) +l f1(x)
f(x,y) = x + y 2. Find minx (L) = g(l)
3. Find max g(l)
subject to 4. See if constraints are binding
x2 + y2 – 1 = 0 5. Find x*
Note that we formulate the
problem in terms of
Answer: x* = (x,y) = ?? minimization!!!
The answer

wikipedia
10
Formulate and solve
Findvaluesof aset of kprobabilities (p1, p2, …pk)
that maximizetheirentropy

minimize
f(p) = ?? 1. Formulate L
2. Find minx (L) = g(l)
subject to 3. Find max g(l)
4. See if constraints are binding
?? 5. Find x*
Answer: pi =??

12
Formulate
Find values of a vector of p non-negative weights
w that minimize Si(yi-xiw)2

minimize
f(w) = ??
subject to
??

13
Primal and Dual Formulation

• Primal Problem: The original optimization problem

that we aim to solve, typically maximizing or
minimizing a function subject to constraints.
• Dual Problem: Derived from the primal problem using
the Lagrangian, aiming to provide a lower bound for
the primal problem.
Primal and Dual Formulation
For a primal problem with objective function
f(x) and constraints gi(x):

L(x,λ)=f(x)+∑λigi(x)

• The solution of the dual problem provides a lower

bound on the optimal value of the primal
problem.
• If the primal problem is a maximization, the dual
is a minimization, and vice versa.
Support Vector Machines
Perceptron and Linear Separablity

Say there is a linear decision boundary which can perfectly separate the
training data
Which linear separator will the Perceptron algorithm return?

The separator with a

large margin g is better
for generalization

How can we incorporate the margin in finding the linear boundary?

Solution: Support Vector Machines (SVMs)

Motivation:

• It returns a linear classifier that is stable solution by giving a maximum

margin solution

• Slight modification to the problem provides a way to deal with non-

separable cases

• It is kernelizable, so gives an implicit way of yielding non-linear

classification.
SVM Formulation
• Say the training data S is linearly separable by
some margin (but the linear separator does not
necessarily passes through the origin).

• Then:
decision boundary:
Linear classifier:

• Idea: we can try finding two parallel hyperplanes that correctly

classify all the points, and maximize the distance between them!
SVM Formulation (contd. 1)
Decision boundary for the two hyperpanes:

Distance between the two hyperplanes:

why? d=∣w⋅xi+b∣
∣∣w∣∣
Training data is correctly classified if:
if yi = +1
if yi = -1
Together: for all i
SVM Formulation (contd. 2)
Distance between the hyperplanes:

Training data is correctly classified if:

(for all i)

Therefore, want:

Maximize the distance:

Such that:
(for all i)

Let’s put it in the standard form…

SVM Formulation (finally!)

Maximize:

Such that:
(for all i)

SVM standard (primal) form:

Minimize:

Such that:
(for all i)

What can we do if the problem is not-linearly separable?

SVM Formulation (non-separable case)

Idea: introduce a slack for the mis-

classified points, and minimize the
slack!

SVM standard (primal) form (with slack):

Minimize:

Such that:
(for all i)
SVM: Question
SVM standard (primal) form (with slack):

Minimize:

Such that:
(for all i)

Questions:
1. How do we find the optimal w, b and x?
2. Why is it called “Support Vector Machine”?
How to Find the Solution?
SVM standard (primal) form:
Cannot simply take the derivative
(wrt w, b and x) and examine the Minimize:
stationary points…
Such that:
Why? (for all i)

x2 Gradient not zero at

Minimize: x2
the function minima
Such that: x ³ 5 (respecting the
(infeasible constraints)!
region)

x=5 x

Need a way to do optimization with constraints

Detour: Constrained Optimization

Constrained optimization (standard form):

minimize (objective)
subject to: for 1 £ i £ n (constraints)
What to do?
• Projection methods
We’ll assume that the
start with a feasible solution x0, problem is feasible
find x1 that has slightly lower objective value,
if x1 violates the constraints,
project back to the constraints.
iterate.
• Penalty methods
use a penalty function to incorporate the constraints
into the objective
• …
The Lagrange (Penalty) Method
Optimization problem:
Consider the augmented function:
Minimize:
Such that:
(for all i)
(Lagrange variables,
(Lagrange function)
or dual variables)
Observation:
For any feasible x and all li ³ 0, we have
x feasible è g(x) £ 0
l ³ 0 è f(x) ³ f(x) + l g(x) = L(x, l)

• if x is infeasible, then g(x) > 0, so maxl³0 lg(x) = ¥

• if x is feasible, then g(x) £ 0, so maxl³0 lg(x) = 0
b/c either g(x) = 0, or by picking l = 0, lg(x)=0

So, the optimal value/solution to the original constrained optimization:

The problem becomes
unconstrained in x!
The Dual Problem
Optimization problem:
Optimal value:
(also called the primal) Minimize:
Such that:
For any fixed x, define (for all i)
(g is affine in l)

Since, for any feasible x and all li ³ 0 Lagrange function:

Let x* be the minimum feasible (over f),

For all li ³ 0

Hence:
(also called the dual)
(Weak) Duality Theorem
Optimization problem:
Theorem (weak Lagrangian duality):
Minimize:
Such that:
(also called the minimax inequality) (for all i)

Lagrange function:

(called the duality gap)

Primal:
Under what conditions can we
achieve equality?
Dual:
Convexity

A function f: Rd ® R is called convex iff for any two points x, x’ and b Î [0,1]

£
Convexity

A set S Ì Rd is called convex iff for any two points x, x’ Î S and any b Î [0,1]

Examples:
Convex Optimization

A constrained optimization

minimize (objective)
subject to: for 1 £ i £ n (constraints)

is called a convex optimization problem If:

the objective function is convex function, and

the feasible set induced by the constraints gi is a convex set
(if all f and g are convex, then the constraint
problem is a convex optimization)
Why do we care?
We can find the optimal solution for convex problems efficiently!
Convex Optimization: Niceties

• Every local optima is a global optima in a convex optimization

problem.
Example convex problems:
Linear programs, quadratic programs,
Conic programs, semi-definite program.

Several solvers exist to find the optima:

CVX, SeDuMi, C-SALSA, …

• We can use a simple ‘descend-type’ algorithm for finding the

minima!
Gradient Descent (for finding local minima)

Theorem (Gradient Descent):

Given a smooth function
Then, for any and
For sufficiently small , we have:

Can derive a simple algorithm (the projected Gradient Descent):

Initialize
for t = 1,2,…do
(step in the gradient direction)

(project back onto the constraints)

terminate when no progress can be made, ie,
Back to Constrained Opt.: Duality Theorems
Optimization problem:
Theorem (weak Lagrangian duality):
Minimize:
Such that:
(for all i)

Lagrange function:
Theorem (strong Lagrangian duality):
For a convex optimization problem, if
there exists a feasible point x, s.t.
Primal:
(for all i), or
whenever gi is affine
Dual:
Then (aka Slater’s condition;
sufficient for strong duality)
Ok, Back to SVMs
SVM standard (primal) form:
Observations:
• object function is convex
Minimize:
• the constraints are affine, inducing a (w,b)
polytope constraint set. Such that:
(for all i)
• So, SVM is a convex optimization problem
(in fact a quadratic program)

• Moreover, strong duality holds.

• Let’s examine the dual… the Lagrangian is:

SVM Dual
Lagrangian: SVM standard (primal) form:

Minimize:
(w,b)
Primal:
Such that:
(for all i)
Dual:
Unconstrained, let’s calculate

• when aI > 0, the corresponding xi is the support vector

• w is only a function of the support vectors!
SVM Dual (contd.)
Lagrangian: SVM standard (primal) form:

Minimize:
(w,b)
Primal:
Such that:
(for all i)
Dual:
Unconstrained, let’s calculate

So:

subject to
SVM Optimization Interpretation

SVM standard (primal) form:

Minimize: Maximize g = 2/||w||

(w,b)

Such that:
(for all i)

SVM standard (dual) form:

Maximize: Kernelized version

(ai)
Only a function of
Such that:
“support vectors”
(for all i)
What We Learned…

• Support Vector Machines

• Maximum Margin formulation

• Constrained Optimization

• Lagrange Duality Theory

• Convex Optimization

• SVM dual and Interpretation

• How get the optimal solution

Questions?
Next time…

Regression

Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
Lecture 8. SVM Dual Problem Derivation
No ratings yet
Lecture 8. SVM Dual Problem Derivation
16 pages
04SVM
No ratings yet
04SVM
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
An Overview On Support Vector Machines
No ratings yet
An Overview On Support Vector Machines
14 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Lec 4
No ratings yet
Lec 4
19 pages
Datamining M1 SVM
No ratings yet
Datamining M1 SVM
28 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
20 SVM
No ratings yet
20 SVM
35 pages
Ds 3
No ratings yet
Ds 3
25 pages
EXPLOR 1 Stamped
No ratings yet
EXPLOR 1 Stamped
46 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
Lecture14 KKT
No ratings yet
Lecture14 KKT
37 pages
315 F19 15 SVM 2
No ratings yet
315 F19 15 SVM 2
35 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
8 SVMs
No ratings yet
8 SVMs
72 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
Ds 5
No ratings yet
Ds 5
21 pages
Lecture5 SVM
No ratings yet
Lecture5 SVM
67 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
77 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Introduction To Machine Learning (CS 771A, IIT Kanpur) : Course Notes and Exercises
No ratings yet
Introduction To Machine Learning (CS 771A, IIT Kanpur) : Course Notes and Exercises
39 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
Kernel SVM For Image Classification
No ratings yet
Kernel SVM For Image Classification
20 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Karush-Kuhn-Tucker (KKT) Conditions: Lecture 11: Convex Optimization
No ratings yet
Karush-Kuhn-Tucker (KKT) Conditions: Lecture 11: Convex Optimization
4 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
SVM New
No ratings yet
SVM New
12 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
No ratings yet
1 Number 1: Support Vector Machine: 1.1 Case 1: Linear Separable Binary Classification
11 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
24 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
49 pages
Classification: Linear SVM
No ratings yet
Classification: Linear SVM
26 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
3 pages
5 Lagrange Duality
No ratings yet
5 Lagrange Duality
4 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture 9 - SVM

Uploaded by

Lecture 9 - SVM

Uploaded by

Lecture 9

Support Vector Machines

Slides adapted from Lyle HUngar, University of Pennsylvania

This makes the curves be

KKT = Karush Kuhn Tucker conditions

3. Find g(l) = minx (L)

a=2, b=-5, c=1

• Primal Problem: The original optimization problem

• The solution of the dual problem provides a lower

The separator with a

How can we incorporate the margin in finding the linear boundary?

• It returns a linear classifier that is stable solution by giving a maximum

• Slight modification to the problem provides a way to deal with non-

• It is kernelizable, so gives an implicit way of yielding non-linear

• Idea: we can try finding two parallel hyperplanes that correctly

Distance between the two hyperplanes:

Training data is correctly classified if:

Maximize the distance:

Let’s put it in the standard form…

SVM standard (primal) form:

What can we do if the problem is not-linearly separable?

Idea: introduce a slack for the mis-

SVM standard (primal) form (with slack):

x2 Gradient not zero at

Need a way to do optimization with constraints

Constrained optimization (standard form):

• if x is infeasible, then g(x) > 0, so maxl³0 lg(x) = ¥

So, the optimal value/solution to the original constrained optimization:

Since, for any feasible x and all li ³ 0 Lagrange function:

Let x* be the minimum feasible (over f),

(called the duality gap)

is called a convex optimization problem If:

the objective function is convex function, and

• Every local optima is a global optima in a convex optimization

Several solvers exist to find the optima:

• We can use a simple ‘descend-type’ algorithm for finding the

Theorem (Gradient Descent):

Can derive a simple algorithm (the projected Gradient Descent):

(project back onto the constraints)

• Moreover, strong duality holds.

• Let’s examine the dual… the Lagrangian is:

• when aI > 0, the corresponding xi is the support vector

SVM standard (primal) form:

Minimize: Maximize g = 2/||w||

SVM standard (dual) form:

Maximize: Kernelized version

• Support Vector Machines

• Maximum Margin formulation

• Lagrange Duality Theory

• SVM dual and Interpretation

• How get the optimal solution

You might also like