Optimization Methods (MFE) : Elena Perazzi

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Optimization methods (MFE)

Lecture 03

Elena Perazzi

EPFL

Fall 2019

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 1 / 28


Today’s topics

Constrained optimization with inequality and equality constraints.

Numerical methods: penalty function method (applies to equality &


inequality constraints), barrier method (applies to inequality
constraints only).

Kuhn-Tucker theory.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 2 / 28


Penalty function methods

Numerical method to find the max/min subject to equality or


inequality constraints.
Approximate a constrained optimization problem with an
unconstrained one, then apply standard techniques (Newton, search
methods etc) to find the solution.
Main idea: Add a term to objective function that prescribes a high
cost for the violation of the constraint.
Consider the problem

Minimize{f (x1 , x2 , ...., xn ) s.t. (x1 , ..., xn ) ∈ S} (1)

where f is a continuous function Rn → R and S is a set in Rn .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 3 / 28


Penalty function methods

the idea of the penalty function method is to replace (1) by

Minimize θ(~x , c) ≡ f (x1 , x2 , ...., xn ) + cP(x1 , x2 , ...., xn ) (2)

where c is a positive constant (the penalty parameter) and


P : Rn → R is such that
I P(x1 , x2 , ...., xn ) is continuous
I P(x1 , x2 , ...., xn ) ≥ 0 for every (x1 , x2 , ...., xn ) ∈ Rn
I P(x1 , x2 , ...., xn ) = 0 ⇔ x ∈ S
For large enough c it is clear that the min will be in S.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 4 / 28


Penalty function methods

In the case of inequality constraints,


S = {(x1 , ..., xn ) s.t. gi (x1 , ..., xn ) ≤ 0, i = 1, ..., k,
hj (x, ..., xn ) = 0, j = 1, ..., m}
a useful penalty function is

P(x1 , x2 , ..., xn ) = Σki=1 max{0, gi (x1 , x2 , ..., xn )}2


+ Σm
j=1 hj (x1 , x2 , ..., xn )
2

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 5 / 28


Penalty function methods

cP(x)

c=1 c=1

c = 10 c = 10

c = 100 c = 100

a b x

g1 (x) = x − b, g2 (x) = a − x

P(x) = max{0, (a − x)}2 + max{0, (x − b)}2

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 6 / 28


Example

Consider the problem:

min f (x, y ) = (x − 2)2 + (y − 3)2 s.t. x +y =1 (3)

We build

θ(x, y , c) = (x − 2)2 + (y − 3)2 + c(1 − x − y )2 (4)

The minimum is found by imposing


   
~ 2x − 4 + 2cx − 2c + 2cy 0
∇f = = (5)
2y − 6 + 2cx + 2cy − 2c 0

For c = 0 the min is x = 2, y = 3 as expected. For c → ∞ the min


is x = 0, y = 1, which satisfies the constraint.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 7 / 28


Selecting the penalty parameter

When solving the problem numerically, should we set c to a very large


number, to be sure that the min of the θ(~x , c) belongs to the feasible
region? No! Why not?

Large values of c result in very non-smooth functions and steep


gradients close to the constraint boundaries. This results in huge
convergence difficulties for all standard min-search methods unless
the algorithm starts at a point extremely close to the minimum
being sought.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 8 / 28


Penalty function methods

Algorithm:
Start with a relatively small value of the penalty parameter c at an
infeasible point not too close to the constraint boundary. This will
ensure that no steep gradients are present in the initial optimization
of θ(~x , c). The min of θ(~x , c) will probably not be a feasible point for
the original problem.
Gradually increase c (e.g. ck+1 = ηck ), each time starting the
optimization from the solution of the problem with the previous value
of c. If c increases gradually the solution of the new problem will
never be far from the solution of the previous one. This will make it
easier to find the min of θ(~x , c) from one iteration to the next.
Stop when you find a solution that is in or close enough to the
feasible region.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 9 / 28


Barrier function methods
Only for problems with inequality constraints
The penalty function method is an exterior method: start form a
point outside the feasible region, stop if we find a minimum of θ(~x , c)
inside or close to the feasible region.
The barrier function method is an interior method: start form a point
inside the feasible region, set a barrier at the border of the feasible
region to prevent the solution from being infeasible.
With inequality constraints gi ≤ 0, i = 1, ..., k define
1
B(~x , r ) = f (~x ) + r Σki=1 − (6)
gi (~x )

In the feasible region, the extra term on the RHS is positive, and
becomes infinite at the borders of the region, where at least one
constraint is satisfied with equality.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 10 / 28


Barrier function methods

Algorithm:
Start with a relatively high value of the barrier parameter r at an point
inside the feasible region, not too close to the constraint boundary.
The solution of this problem will stay inside the feasible region, and
will not approach the region where the barrier term is high.
Gradually decrease r , each time starting the optimization from the
solution of the problem with the previous value of r .
Stop if xk+1 close enough to xk .
The solution will converge to a (local) constrained min of the original
problem from the inside of the feasible region, in a similar way as the
solution of the penalty function problem was converging from the
outside.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 11 / 28


Kuhn-Tucker theory

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 12 / 28


Optimization with inequality constraints: intuition
8

f2 (x)=5-x 2 f1 (x)=5-(x-1)2
6
x*=0, f'(x*)=0 x*=1, f'(x*)=0
5

2
3 f1 (x)=5-(x+1)
x*=0, f'(x*)<0

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Consider the problem: max f (x) s.t. x ≥ 0


In the three cases above f 0 (x ∗ )x ∗ = 0

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 13 / 28


Optimization with inequality constraints: intuition

Consider the problem: min f (~x ) s.t. g (~x ) ≤ 0


~ (x ∗ ) = 0, g (x ∗ ) < 0
Left case: ∇f
~ (x ∗ ) 6= 0, g (x ∗ ) = 0
Right case: ∇f
~ (x ∗ ) = 0
In both cases g (x ∗ )∇f

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 14 / 28


Remember from previous lectures...

The gradient of a function points in the direction of fastest increase


of the function.
The gradient of a function at any point is perpendicular to the level
curve passing through that point.
At a constrained optimum, the tangent of the level curve passing
through that point coincides with the tangent of the constraint.
(Remember for example the consumer demand problem!)
It follows from the last point that at a constrained optimum, the
gradient of the constraint must be parallel to the gradient of the
function!

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 15 / 28


Optimization with equality constraints: reminder
Remember the case of optimization with only equality constraints,
e.g. min f (x) s.t. h(x) = 0.
For a minimization or maximization problem: at the optimum

∇L ~ (x ∗ ) + λ∇h(x
~ = ∇f ~ ∗
)=0 (7)

In other words, at the optimum the gradient of the function must be


parallel to that of the constraint, but the sign does not matter. (Sign
of λ not important).
This implies that the directional derivative along the constraint is
zero. This is the important point.
To find a better point, we would like to move in the direction of the
gradient (for a maximization) or in the opposite direction (for a
minimization), but we can’t because this would imply violating the
constraint.
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 16 / 28
Optimization with inequality constraints: intuition

Consider the problem: min f (~x ) s.t. g (~x ) ≤ 0


If x ∗ on the boundary of the feasible region (the constraint is
~ (x ∗ ) must point in the opposite direction than ∇g
“tight”), ∇f ~ (x ∗ ) .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 17 / 28


Optimization with inequality constraints: intuition
Drawing pictures similar to the one in the last slide it is easy to convince
oneself that at an optimum x ∗ where the constraint binds...
For a maximization problem where the constraint is of the form
~ (x ∗ ) pointing in the same direction as
g(x) ≤ 0, we must have ∇f
~ (x ).
∇g ∗

For a minimization problem where the constraint is of the form


~ (x ∗ ) pointing in the opposite direction
g(x) ≤ 0, we must have ∇f
~ (x ).
than ∇g ∗

For a minimization problem where the constraint is of the form


~ (x ∗ ) pointing in the same direction as
g(x) ≥ 0, we must have ∇f
~ (x ).
∇g ∗

For a maximization problem where the constraint is of the form


~ (x ∗ ) pointing in the opposite direction
g(x) ≥ 0, we must have ∇f
~ (x ).
than ∇g ∗

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 18 / 28


Optimization with inequality constraints

Again the general form of the problem is

min/max f (x1 , x2 , ...., xn )


s.t. hi (x1 , x2 , ...., xn ) = 0 i = 1, ..., k
gj (x1 , x2 , ...., xn ) ≤ 0 j = 1, ..., m
or gj (x1 , x2 , ...., xn ) ≥ 0 j = 1, ..., m

Now we require that f , hi i = 1, ..., k and gj j = 1, ..., m are C 1 .


As in the case of optimization with only equality constraints, we
demand k < n, but no requirement on m!

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 19 / 28


1.5

0.5

0
0 0.5 1 1.5

x2 ≤ 1 − x1
x2 ≥ x1
x2 ≥ 0.5
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 20 / 28
Kuhn-Tucker theory – tools

Main tool: the Lagrangean L!


With all constraints in the form gj ≤ 0

L = f (~x ) + Σki=1 λi hi (~x ) + Σm


j=1 µi gj (~
x) (8)

for a minimization problem; and

L = f (~x ) + Σki=1 λi hi (~x ) − Σm


j=1 µi gj (~
x) (9)

for a maximization problem.


If some of the constraints are in the form gj (x) ≥ 0 define
g̃ (x) ≡ −g (x).
For each point x define J(x) as the Jacobian of the constraints
satisfied with equality at that point – all the hi and a subset of the gj .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 21 / 28


Kuhn-Tucker theorem
If x ∗ is a local minimum conditional on gj (x) ≤ 0, j = 1, ..., m and
hi (x) = 0, i = 1, ..., k, and J(x ∗ ) has maximal rank (i.e. rank=number of
active constraints) then there exists a vector of Lagrange multipliers
(λ∗1 , λ∗2 , ...., λ∗k , µ∗1 , ..., µ∗m ) such that

~ x L(x ∗ , λ∗ , µ∗ ) = ∇f
∇ ~ (x ∗ ) + Σk λ∗ ∇h
~ i (x ∗ ) + Σm µ∗ ∇g
~ i (x ∗ ) = 0
i=1 i j=1 i
∂L(x ∗ , λ∗ , µ∗ )
= hi (x ∗ ) = 0
∂λi
∂L(x ∗ , λ∗ , µ∗ )
= gj (x ∗ ) ≤ 0
∂µj
µ∗j ≥ 0
∂L(x ∗ , λ∗ , µ∗ ) ∗ ∗ ∗
Σm
j=1 µj = Σ m
j=1 gj (x )µj = 0
∂µj
These are the necessary conditions! for a min if the constraint qualification
holds.
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 22 / 28
Kuhn-Tucker theorem
If x ∗ is a local maximum conditional on gj (x) ≤ 0, j = 1, ..., m and
hi (x) = 0, i = 1, ..., k, and J(x ∗ ) has maximal rank (i.e. rank=number of
active constraints) then there exists a vector of Lagrange multipliers
(λ∗1 , λ∗2 , ...., λ∗k , µ∗1 , ..., µ∗m ) such that

~ x L(x ∗ , λ∗ , µ∗ ) = ∇f
∇ ~ (x ∗ ) + Σk λ∗ ∇h
~ i (x ∗ )−Σm µ∗ ∇g
~ i (x ∗ ) = 0
i=1 i j=1 i
∂L(x ∗ , λ∗ , µ∗ )
= hi (x ∗ ) = 0
∂λi
∂L(x ∗ , λ∗ , µ∗ )
= gj (x ∗ ) ≤ 0
∂µj
µ∗j ≥ 0
∂L(x ∗ , λ∗ , µ∗ ) ∗ ∗ ∗
Σm
j=1 µj = Σ m
j=1 gj (x )µj = 0
∂µj

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 23 / 28


Complementary slackness conditions

The equations

gj (x ∗ ) ≤ 0
µ∗j ≥ 0
∗ ∗
Σm
j=1 gj (x )µj = 0

are called the complementary slackness conditions.


If the j − th constraint is slack, i.e. gj < 0, then we must have that
the corresponding multiplier is zero, µ∗j = 0.
Meaning of the multiplier: gain in f if we relax the constraint. If the
constraint is slack → gain is 0. If the constraint is tight → gain is
positive.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 24 / 28


Example

min 3x + 4y s.t. (x 2 + y 2 ) ≥ 4, x ≥ 1

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 25 / 28


Kuhn-Tucker sufficient conditions

Consider the problem max f (x) s.t. gj (x) ≤ 0, j = 1..., m.


If
At point x ∗ the Kuhn-Tucker necessary conditions are satisfied
At this point the constraint qualifications are satisfied
The function f is concave and each of the constraints in convex
then x ∗ is a local maximum.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 26 / 28


Kuhn-Tucker sufficient conditions
Consider the problem max f (x) s.t. gj (x) ≤ 0, j = 1..., m. Let’s write
down the Lagrangean as a function of x, given the value of µ = µ∗ ≥ 0
that satisfies the Kuhn-Tucker equations
L(x|µ = µ∗ ) = f (x) − µ∗ Σm
j=1 gj (x) (10)

f concave, g convex ⇒ L(x) concave. Hence, if x ∗ satisfies the


Kuhn-Tucker first-order conditions
L(x ∗ |µ = µ∗ ) ≥ L(x|µ = µ∗ ) (11)
so
f (x ∗ ) − µ∗ Σm ∗ ∗ m
j=1 gj (x ) ≥ f (x) − µ Σj=1 gj (x) (12)

But since the −µ∗ Σm ∗


j=1 gj (x ) = 0 on the LHS is zero (complementary
slackness conditions), and −µ∗ Σm j=1 gj (x) ≥ 0 for any feasible point,
we can conclude that in the feasible region
f (x ∗ ) ≥ f (x) (13)
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 27 / 28
Kuhn-Tucker sufficient conditions and saddle points
It turns out that (x ∗ , µ∗ ) satisfying the sufficient conditions is a saddle
point of the Lagrangean: in a constrained maximization problem
(x ∗ , µ∗ ) is a max with respect to x and a min with respect to µ
Z (x, µ∗ ) ≤ Z (x ∗ (µ∗ ), µ∗ ) ≤ Z (x ∗ (µ), µ) (14)
The Kuhn-Tucker algorithm can be interpreted as a maximization of
the Lagrangean with respect to x and a minimization with respect to
µ (subject to µ ≥ 0).

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2019 28 / 28

You might also like