Opt Lecture2 2019
Opt Lecture2 2019
Numerical methods
Andrew Lesniewski
Baruch College
New York
Fall 2019
Outline
2 Numerical methods
Here, E and I are disjoint subsets of the set of indices 1, . . . , m, such that
E ∪ I = {1, . . . , m}.
A point x ∈ Rn is called feasible, if it satisfies all the constraints. We can thus
characterize the subset Ω as the set of all feasible points of the problem:
Examples
Solving constraint optimization problems is challenging and, before developing
general methodology, we discuss a few examples.
Example 1. Consider the problem:
(
x1 x2 = a2 , where a > 0,
min x1 + x2 , subject to (3)
x1 , x2 ≥ 0.
In this case, c1 (x) = x1 x2 − a2 , E = {1}, c2 (x) = −x1 , c3 (x) = −x2 , I = {2, 3},
and the feasible set is the hyperbola x1 x2 = a2 in the first quadrant.
The special feature of this problem is that the constraints can be solved. Namely,
x2 = a2 /x1 , which reduces the problem to minimizing a function of one variable:
a2
g(x1 ) = x1 + .
x1
Setting the derivative of g(x1 ) to zero we find that x1∗ = a (the solution −a is
rejected because it is not in the feasible set). That means that x2∗ = a, and
inspection shows that x ∗ = (a, a) is, in fact, a global minimum.
Situations in which the constraints can be solved are rare. In general, solving the
constraints is either impossible or it leads to cumbersome calculations.
Examples
In this case, c1 (x) = x12 + x22 − 2, E = {1}, I = ∅, and the feasible set consists
√
of the circle of radius 2.
Inspection shows that the solution to this problem is
1
x∗ = − .
1
We note that
1
∇f (x ∗ ) = − ∇c1 (x ∗ ), (5)
2
i.e., at the solution, the gradient of the objective function is proportional to the
gradient of the constraint function.
Examples
This is not a coincidence! We will see that this is a general fact: at a local
minimizer, the gradient of the objective function is a linear combination of the
gradients of the constraints.
We rewrite (5) in the form:
1
∇f (x ∗ ) + ∇c1 (x ∗ ) = 0. (6)
2
The proportionality coefficient λ∗ = 1/2 (in this example) is called the Lagrange
multiplier.
We can interpret this observation geometrically as follows.
Examples
If x is a feasible point that is not a solution to (4), then there is a small vector
h = εd, ε > 0, such that x + h is feasible and f (x + h) < f (x), i.e.
0 = c1 (x + h)
≈ ∇c1 (x)T h,
0 > f (x + h) − f (x)
≈ ∇f (x)T h.
∇c1 (x)T d = 0,
∇f (x)T d < 0.
Thus x can be a solution if and only if ∇f (x) and ∇c1 (x) are parallel. In other
words, there has to exist a scalar −λ such that ∇f (x) = −λ∇c1 (x).
We shall call a feasible point x regular, if the vectors ∇c1 (x), . . . , ∇cm (x) are
linearly independent.
If x is regular, the m vectors ∇ci (x) ∈ Rn span an m-dimensional subspace
W (x) ⊂ Rn .
The first condition in (7) defines the subspace of first order feasible variations:
Notice that V (x) is simply the orthogonal complement of W (x), V (x) = W (x)⊥ .
The (impossibility of the) second condition in (7) implies that the gradient ∇f (x)
of the objective function has to be perpendicular to V (x).
Indeed, if the inner product of ∇f (x) with a nonzero d ∈ V (x) is positive then, by
replacing d with −d, it can be negative, which is assumed impossible. Therefore,
∇f (x) ∈ V (x)⊥ .
But V (x)⊥ = W (x)⊥⊥ = W (x).
Consequently, ∇f (x) must be a linear combination of the constraint gradients
∇ci (x), which span W (x), i.e.
m
X
∇f (x) + λi ∇ci (x) = 0,
i=1
(ii) If, in addition, f (x) and ci (x) are twice continuously differentiable, then
m
X
d T ∇2 f (x ∗ ) + λ∗i ∇2 ci (x ∗ ) d ≥ 0, (10)
i=1
for all d ∈ V (x ∗ ).
Lagrangian function
m
X
L(x, λ) = f (x) + λi ci (x). (11)
i=1
∇x L(x ∗ , λ∗ ) = 0,
∇λ L(x ∗ , λ∗ ) = 0, (12)
d T
∇2xx L(x ∗ , λ∗ )d ≥ 0, for all d ∈ V (x ).∗
We emphasize that these conditions are necessary but not sufficient: a solution
to the system above may not represent a local minimum.
Example
1 2
min (x + x22 + x32 ), subject to x1 + x2 + x3 = 3.
2 1
xi + λ = 0, for i = 1, 2, 3,
x1 + x2 + x3 = 3.
Note also that the second order (positive definiteness) condition holds, as
∇2xx L(x ∗ , λ∗ ) = I3 (the 3 × 3 identity matrix). In fact, x ∗ is a global minimum.
Sufficient conditions
The theorem below gives a sufficient condition for the existence of a local
minimizer.
Second Order Sufficient Conditions. Let f (x) and ci (x) be twice continuously
differentiable and let x ∗ ∈ Ω ⊂ Rn , λ∗ ∈ Rm , be such that
∇x L(x ∗ , λ∗ ) = 0,
(13)
∇λ L(x ∗ , λ∗ ) = 0,
and
d T ∇2xx L(x ∗ , λ∗ )d > 0, for all d ∈ V (x ∗ ). (14)
Then x∗ is a strict local minimizer of f (x) subject to equality constraints
ci (x), i = 1, . . . , m.
Example
1 T
min f (x) = x Ax + x T b, subject to x T c = 1, (15)
2
bT A−1 c + 1
λ∗ = − ,
c T A−1 c
∗ −1 ∗
x = −A (b + λ c),
∇2xx L(x ∗ , λ∗ ) = A.
Note that, since A is positive definite, condition (14) holds for all d (and in
particular, for d ∈ V (x ∗ ) = {d : c T d = 0}).
Consequently, x ∗ is a strict (global) minimizer.
Sensitivity analysis
Lagrange multipliers have often intuitive interpretation, depending on the specific
problem at hand.
In general, they can be interpreted as the rates of change of the objective
function as the constraint functions are varied.
Let x ∗ and λ∗ be a local minimizer and the corresponding Lagrange multiplier,
respectively, of a constrained optimization problem for f (x).
Consider now the following family of constrained optimization problems,
parameterized by a vector u = (u1 , . . . , um ) ∈ Rm :
x(0) = x ∗ ,
λ(0) = λ∗ .
Sensitivity analysis
m
X
∇x f (x(u)) + λi (u)∇x ci (x(u)) = 0, (17)
i=1
In words: the Lagrange multipliers are the (negative of the) rates of change of
the primal function as the parameters are varied.
Sensitivity analysis
The proof is a straightforward calculation. From the chain rule and (17), we get
for each j = 1, . . . , m,
n
X
∇xk ci (x(u))∇ui xk (u) = ∇uj ci (x(u))
k =1
= ∇u j u i
= δji ,
Sensitivity analysis
As a consequence,
m
X
∇uj p(u) = − λi (u)δji
i=1
= −λj (u),
Inequality constraints
In this case, c1 (x) = x12 + x22 − 2, E = ∅, I = {1}, and the feasible set consists
of the unit circle and its interior.
Inspection shows that the solution to this problem continues to be
1
x∗ = − .
1
Notice that condition (5) continues to hold. We will argue that, in case of an
inequality constraint, the sign of the Lagrange multiplier is not a coincidence.
Inequality constraints
Inequality constraints
Case 2. x lies on the boundary of the circle, i.e. c1 (x) = 0 (the constraint is
active).
In this case, the conditions read
∇c1 (x)T d ≤ 0,
(22)
∇f (x)T d < 0.
The first of these conditions defines a closed half-plane, while the second one
defines an open half-plane.
The intersection of these two half-planes should be empty!
A reflection shows that this is possible only if there is a positive constant λ such
that
∇f (x) = −λ∇c1 (x).
Inequality constraints
We can formulate the results of the analysis of Cases 1 and 2 in the following
elegant way using the Lagrange function.
If x ∗ is a local minimizer of f (x) (no feasible descent direction d is possible), then
λ∗ c1 (x ∗ ) = 0.
Consider now the general problem (1), in which both equality and inequality
constraints are present.
For any feasible point x ∈ Ω, we define the set of active inequality constraints by
For convenience, we also introduce the Lagrange multiplier λ∗i = 0 for each
inactive constraint i. We can thus write the above condition compactly as
m
X
∇f (x ∗ ) + λ∗i ∇ci (x ∗ ) = 0,
i=1 (25)
λ∗i ci (x ∗ ) = 0.
∂
λ∗i = − lim p(u)
ui ↓0 ∂ui
≥ 0.
The arguments above are not exactly a proof, but are convincing enough to
understand the following necessary condition.
∇x L(x ∗ , λ∗ ) = 0,
λ∗i ci (x ∗ ) = 0, (27)
λ∗i ≥ 0, for i ∈ I.
If, additionally, f (x) and ci (x) are twice continuously differentiable, then
λ∗i ci (x ∗ ) = 0 (29)
is just a compact way of stating the fact λ∗i = 0, if ci (x ∗ ) is not active, and it can
be nonzero (and, in fact, nonnegative), if the constraint is active.
Examples
1 2
min (x + x22 + x32 ), subject to x1 + x2 + x3 ≤ −3.
2 1
xi + λ = 0, for i = 1, 2, 3,
or
xi = −λ, for i = 1, 2, 3.
Examples
Note also that the second order (positive definiteness) condition holds, as
∇2x L(x ∗ , λ∗ ) = I3 .
A reflection shows that x ∗ is, in fact, a global minimizer.
Examples
Example 7. Consider the problem:
(
x1 + 2x2 ≤ 1,
min 2(x1 − 1)2 + (x2 − 2)2 , subject to
x1 ≥ x2 .
4x1 − 4 + λ1 − λ2 = 0,
2x2 − 4 + 2λ1 + λ2 = 0,
λ1 (x1 + 2x2 − 1) = 0,
λ2 (−x1 + x2 ) = 0,
λ1 ≥ 0,
λ2 ≥ 0.
Examples
Sufficient conditions
Assume now that the functions f (x) and ci (x), i = 1, . . . , m, are twice
continuously differentiable, and let x ∗ ∈ Rn , λ∗ ∈ Rm be such that
∇x L(x ∗ , λ∗ ) = 0,
ci (x ∗ ) = 0, if i ∈ E,
ci (x ∗ ) ≤ 0, if i ∈ I, (30)
λi ci (x ∗ )
∗
= 0,
λ∗i ≥ 0, for i ∈ I,
and
d T ∇2xx L(x ∗ , λ∗ )d > 0, (31)
for all d, such that ∇ci (x ∗ )T d = 0, i ∈ E ∪ A(x ∗ ).
Assume also that λ∗i > 0 for all active inequality constraints (strict
complementary slackness condition).
Then x ∗ is a strict local minimizer of f (x) subject to the constraints.
m
µX
Q(x, µ) = f (x) + ci (x)2 , (32)
2
i=1
Example
Example 7. Consider again the problem:
µ 2
Q(x, µ) = x1 + x2 + (x + x22 − 2)2 .
2 1
∇Q(x, µ) = ,
1 + µx2 (x12 + x22 − 2)
and 2
3x1 + x22 − 2
2x1 x2
2
∇ Q(x, µ) = µ ,
2x1 x2 x12 + 3x22 − 2
respectively.
Example
The quadratic penalty function can be defined for a problem with both equality
and inequality constraints as follows:
µX µX 2
Q(x, µ) = f (x) + ci (x)2 + ci (x)+ , (34)
2 2
i∈E i∈I
m m
X µX
Lµ (x, λ) = f (x) + λi ci (x) + ci (x)2 . (35)
m=1
2
i=1
It differs form the standard Lagrange function (11) by the presence of the
quadratic penalty term.
Note that the λi ’s are not, strictly speaking, the Lagrange multipliers, as the
critical points of Lµ (x, λ) are not feasible points for the original problem.
0 ≈ ∇x Lµ (x, λ)
m
X
= ∇f (x) + λi + µci (x) ∇ci (x),
m=1
or
λ∗i − λi
ci (x) ≈ .
µ
In other words, if λi is close to λ∗i , the infeasibility of x will be much smaller than
1/µ.
This suggest that the value of λi at each iteration step k (denoted by λi,k ) is
updated according to the following rule:
for all i = 1, . . . , m.
Example
µ 2
Lµ (x, λ) = x1 + x2 + λ(x12 + x22 − 2) + (x + x22 − 2)2 .
2 1
∇Lµ (x, λ) = ,
1 + 2λx2 + µx2 (x12 + x22 − 2)
and
2λ + µ(3x12 + x22 − 2)
2µx1 x2
2
∇ Lµ (x, λ) = ,
2µx1 x2 2λ + µ(x12 + 3x22 − 2)
respectively.
Example
Barrier methods
Exterior penalty methods allow for generating infeasible points during the search.
Therefore they not suitable when feasibility has to be strictly enforced.
This could be the case if the objective function is undefined outside of the
feasible set.
Barrier methods are similar in spirit to the external penalty method.
They generate a sequence of unconstrained modified differentiable objective
functions whose unconstrained minimizers are expected to converge to the
solution of the constrained problem in the limit.
Barrier methods
Barrier methods belong to the category of interior penalty function methods and
apply to inequality constraint optimization problems, i.e. E = ∅ in (1).
The essence of the method is to add a penalty term B(x) to the objective
function, which has the following properties:
(i) It is defined and continuous whenever ci (x) < 0, for all i = 1, . . . , m.
(ii) It goes to +∞, whenever cj (x) ↑ 0, for any i.
Commonly used barrier functions are:
m
X
B(x) = − log − ci (x) (logarithmic), (38)
i=1
and
m
X 1
B(x) = − (inverse). (39)
ci (x)
i=1
Barrier methods
We find a local minimizer xk∗ , and use it as the initial guess for the next iteration.
A general convergence theorem guarantees that any limit point of this sequence
is a global minimizer of the original constrained problem.
Since the barrier function is defined only in the interior of the feasible set, any
successive iteration must also be an interior point.
The barrier term µk B(x) goes to zero for interior points as µk → 0. The barrier
term becomes thus increasingly irrelevant for interior points, while allowing xk∗ to
move closer to the boundary of Ω.
This behavior is expected if the solution to the original constraint problem lies on
the boundary.
Example
1 2
min (x + x22 ), subject to x1 ≥ 2.
2 1
xk∗ = (1 +
p
1 + µk , 0).
References
[2] Nocedal, J., and Wright, S. J.: Numerical Optimization, Springer (2006).