Constrained Optimization
Constrained Optimization
Gilles Gasso
1 Introduction
2 Formulation
4 QP Problem
Output to be predicted: y ∈ R
Input variables: x ∈ Rd
Linear model: f (x) = x> θ
θ ∈ Rd : parameters of the model
Determination of a sparse θ
Minimization of square error
Only a few paramters are non-zero
PN
min 12 i=1 (yi − x>
i θ)
2
θ∈Rd
s.c. kθkp ≤ k
http://www.ds100.org/sp17/assets/notebooks/
Pd linear_regression/Regularization.html
with kθkpp = j=1 |θj |
p
Caserne Maison
House Mi : defined by its coordinates
zi = [xi yi ]>
Let θ be the coordinates of the
?
firehouse
Minimize the distance from the
firehouse to the farthest house
6 f(x) = -1 f(x)= 1
5
class 1
4 class 2 f(x) = 0
3
1 Margin= 2
w
0
−1
−2 −1 0 1 2 3 4 5 6 7
Contraints
How many constraints ?
N constraints yi (θ > xi + b) ≥ 1 with i = 1, · · · , N
Type of constraints ?
Inequality constraints
Goal
Find the minimum θ ∗ of J(θ) such as every constraint being satisfied
Gilles Gasso Introduction to constrained optimization 7 / 26
Formulation
Constrained optimization
Elements of the problem
θ ∈ Rd : vector of unknown real parameters
J : Rd → R : the function to be minimized on its domain domJ
fi and gj are differentiable functions of Rd on R
Feasibility
Let p ∗ = minθ {J(θ) such that fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j}
If p ∗ = ∞ then the problem does not admit a feasible solution
Gilles Gasso Introduction to constrained optimization 8 / 26
Formulation
Feasibility domain
The feasible domain is defined by the set of constraints
n o
Ω(θ) = θ ∈ Rd ; fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j
Feasible points
θ 0 is feasible if θ 0 ∈ domJ and θ 0 ∈ Ω(θ) ie θ0 fulfills all the
constraints and J(θ0 ) has a finite value
θ ∗ is a global solution of the problem if θ ∗ is a feasible solution such
that J(θ∗ ) ≤ J(θ) for every θ
θ̂ is a local optimal solution if θ̂ is feasible and J(θ̂) ≤ J(θ) for every
kθ − θ̂k ≤
Example 1
5.75
5 −5
10
0
5.7
6
10
0.9θ12
0
− 0.74θ1 θ2
−1
min
−5
0
−5
4
−10
θ
5.75
2
5
0
5.7
10
+0.75θ12 − 5.4θ1 − 1.2θ2 −1
0
10
−5
−5
0
0
s.c. −4 ≤ θ1 ≤ −1
0
5
−2 5.7
5
5.7 10
10
J(θ) = c
−4
−3 ≤ θ2 ≤ 4 −5 0 5
Domaine Ω
10
θ
Parameters: θ = 1
θ2
Objective function:
Ω(θ) = θ ∈ R2 ; −4 ≤ θ 1 ≤ −1 and − 3 ≤ θ 2 ≤ 4
Example 2
2.5
1
∇ h(θ*) =
2 T
2 0 3 (1, 1)
4
1.5 ∇ J(θ*) = (1, 1)T
Example 1 −1
0
1
θ* = (1, 1)T 2 3
0.5
θ2
min θ1 + θ2 0 −2
θ∈R2 −1
0
1
2
−0.5
∇ J(θ*) = (1, 1)T
s.c. θ12 + θ22 −2=0 −1 *
θ = (−1, −1)
T
−3
−2 1
−1
−1.5 0
∇ h(θ*) = (1, 1)T
−2
−2 −1 0 1 2
θ1
An equality constraint
Domain of feasibility: a circle with center at 0 and diameter equals to 2
>
The optimal solution is obtained for θ ∗ = −1 −1 and we have
J(θ ∗ ) = −2
Optimality
Notion of Lagrangian
Primal problem P
minθ∈Rd J(θ)
fi (θ) = 0 ∀i = 1, · · · , n n equality constraints
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m m inequality constraints
θ is called primal variable
Principle of Lagrangian
Each constraint is associated to a scalar parameter called Lagrange
multiplier
Equality constraint fi (θ) = 0 : we associate λi ∈ R
Inequality constraint gj (θ) ≤ 0 : we associate µj ≥ 0a
Lagrangian allows to transform the problem with constraints into a problem
without constraints with additional variables: λi and µj .
a
Beware of the sense of the inequality gj (θ) ≤ 0
Gilles Gasso Introduction to constrained optimization 13 / 26
Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Lagrangian formulation
Lagrangien
The Lagrangian is defined by :
n
X m
X
L(θ, λ, µ) = J(θ)+ λi fi (θ)+ µj gj (θ) avec µj ≥ 0, ∀j = 1, · · · , m
i=1 j=1
Examples
Example 1
min θ1 + θ2
θ∈R2
s.c. θ12 + 2θ22 − 2 ≤ 0 inequality constraint
s.c. θ2 ≥ 0 inequality constraint (mind the type of inequality)
Example 2
1
θ12 + θ22 + θ32
min 2
θ∈R3
s.c. θ1 + θ2 + 2θ3 = 1 equality constraint
θ1 + 4θ2 + 2θ3 = 3 equality constraint
Lagrangian :
1 2
θ1 + θ22 + θ32 + λ1 (θ1 + θ2 + 2θ3 − 1) + λ2 (θ1 + 4θ2 + 2θ3 − 3),
L(λ, θ) = λ1 , λ2 any
2
Dual feasibility µj ≥ 0 ∀j = 1, · · · , m
Example
1 2
min (θ1 + θ22 )
θ∈R2 2
s.c. θ1 − 2θ2 + 2 ≤ 0
Duality
Dual function
Let L(θ, λ, µ) be the lagrangian of the primal problem P with µj ≥ 0. The
dual function corresponding to it is D(λ, µ) = minθ L(θ, λ, µ)
D(λ, µ) ≤ p ∗
Dual problem
Dual problem
max D(λ, µ)
λ,µ
s.c. λj ≥ 0 ∀ j = 1, · · · , m
http://www.onmyphd.com/?p=duality.theory
Remarks
Transform the primal problem into an equivalent dual problem possibly
much simpler to solve
Solving the dual problem can lead to the solution of the primal problem
Solving the dual problem gives the optimal values of the Lagrange multipliers
Dual solution
2
∇D(µ) = 0 ⇒ µ = (that satisfies µ ≥ 0) (2)
5
4 >
Primal solution : (2) and (1) lead to θ = − 25
5
Gilles Gasso Introduction to constrained optimization 21 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Convexity condition
minθ∈Rd J(θ) J is a convex function
fi (θ) = 0 ∀i = 1, · · · , n fi are linear ∀i = 1, n
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m gj are convex functions ∀j = 1, m
Problems of interest
Linear Programming (LP)
Quadratic Programming (QP)
Off-the-shelves toolboxes exist for those problems (Gurobi, Mosek, CVX . . . )
QP convex problem
Standard form
1 >
min 2 θ Gθ + q> θ + r
θ∈Rd
s.c. a>
i θ = bi ∀i = 1, · · · , n affine equality constraint
c>
j θ ≥ dj ∀j = 1, · · · , m linear inequality constraints
Examples
SVM Problem
1 2 1 2
min2 (θ + θ22 ) minθ,bR 2 kθk
θ∈R 2 1 s.t. >
yi (θ xi + b) ≥ 1 ∀i = 1, N
s.c. θ1 − 2θ2 + 2 ≤ 0
Summary
A reference book