lecture18
lecture18
Optimality Conditions
Junfeng WU
1 / 33
Recap: Nonlinear Optimization
Some terminologies:
▶ Global vs local optimizer (minimizer)
▶ Gradient, Hessian, Taylor expansions
2 / 33
Optimality Conditions for Unconstrained Problems-Continued
3 / 33
.
4 / 33
Second-Order Necessary Condition
Consider the Taylor expansion again but to the 2nd order (assuming f is
twice continuously differentiable):
1
f (x + td) = f (x) + t∇f (x)⊤ d + t2 d⊤ ∇2 f (x)d + o(t2 ).
2
When the first-order necessary condition holds, we have:
1
f (x + td) = f (x) + t2 d⊤ ∇2 f (x)d + o(t2 ).
2
In order for x to be a local minimizer, we also need d⊤ ∇2 f (x)d to be
nonnegative for every d ∈ Rn .
5 / 33
Second-Order Necessary Condition (SONC)
Definition: Semidefiniteness
We call a (symmetric) matrix A positive (negative) semidefinite (PSD/NSD)
if and only if for all x we have x⊤ Ax ≥ 0 (≤ 0).
Remark:
▶ Therefore, the second-order necessary condition requires the Hessian
matrix at x∗ to be PSD. In the one-dimensional case, this is
equivalent to f ′′ (x∗ ) ≥ 0.
6 / 33
Positive Semidefinite Matrices
7 / 33
Example Continued
f ′′ (x) = 12x2 − 18 ≥ 0
√
Only x1 = −1
√ − 6/2 and x3 ′′= 2 satisfy the condition.
√ But for the point
x2 = −1 + 6/2, we obtain f (x2 ) = 12(1 − 6) < 0 (thus, x2 is not a
local minimizer).
8 / 33
SONC is Not Sufficient
9 / 33
.
10 / 33
Second-Order Sufficient Condition (SOSC)
11 / 33
Proof
We need the following lemma
Lemma: Bounds and Eigenvalues
Let A ∈ Rn×n be a symmetric matrix. Then
λmin (A)∥x∥2 ≤ x⊤ Ax ≤ λmax (A)∥x∥2 ∀ x ∈ Rn ,
where λmin (A) and λmax (A) are the smallest and largest EV of A.
The proof is by another variant of Taylor expansion, i.e.,
1
f (x∗ + d) = f (x∗ ) + d⊤ ∇2 f (x∗ )d + o(∥d∥2 ),
2
for d tends to 0.
When ∇2 f (x∗ ) is positive definite, we have d⊤ ∇2 f (x∗ )d ≥ µ||d||2 , where
µ > 0 is the smallest eigenvalue of ∇2 f (x∗ ).
Thus, we have
µ o(∥d∥2 )
µ
f (x∗ + d) ≥ f (x∗ ) + ∥d∥2 + o(∥d∥2 ) = f (x∗ ) + ∥d∥2 + .
2 2 ∥d∥2
o(∥d∥2 )
Since ∥d∥ → 0, we can have ∥d∥2 ≥ − µ4 , which shows
f (x∗ + d) > f (x∗ ).
12 / 33
For Maximization Problems
13 / 33
Optimality Conditions
In many cases, we can utilize these conditions to identify local and global
optimal solutions.
General Strategy:
▶ Use FONC and SONC to identify all possible candidates. Then, use
the sufficient conditions to verify.
▶ If a problem only has one stationary point and one can reason that
the problem must have a finite optimal solution, then this point must
be the (global) optimum.
14 / 33
Examples–I
X ⊤ Xβ = X ⊤ y
15 / 33
Optimality Conditions for Unconstrained Problems-Continued
16 / 33
Constrained Problems
We have derived necessary and sufficient conditions for the local minimum
for unconstrained problems.
▶ What is the difference between constrained and unconstrained
problems?
Consider the example f (x) = 100x2 (1 − x)2 − x with constraint
−0.2 ≤ x ≤ 0.8.
In addition to the original local minimizer (x1 = 0.013), there is one more
local minimizer on the boundary (x = 0.8).
17 / 33
Constrained Problems
f ′ (0.8) < 0
18 / 33
Feasible Directions
For example,
▶ If F = {x|Ax = b}, then the feasible directions at x is {d|Ad = 0}
▶ If F = {x|Ax ≥ b}, then the feasible directions at x is
{d|aTi d ≥ 0 if aTi x = bi }
19 / 33
FONC for Constrained Problems
20 / 33
An Alternative View
SF (x∗ ) ∩ SD (x∗ ) = ∅
21 / 33
Nonlinear Optimization with Equality Constraints
Consider
minimizex f (x)
s.t. Ax = b
▶ The feasible direction set is {d|Ad = 0}.
▶ The descent direction set is {d|∇f (x)T d < 0}.
The FONC says that at local minimum, there cannot be a solution to both
systems (feasible and descent direction)
AT y = ∇f (x)
22 / 33
Nonlinear Optimization with Equality Constraints
AT y = ∇f (x)
Theorem
If x∗ is a local minimum for (1), then there must exist y such that
AT y = ∇f (x∗ )
23 / 33
Proof
First it is easy to see that if there exists y such that AT y = ∇f (x). Then
we can’t have a d such that Ad = 0 and ∇f (x)T d < 0 (multiplying dT to
both sides of the equation will reach a contradiction).
minimized ∇f (x)T d
s.t. Ad = 0
24 / 33
Example
Consider the problem:
minimize (x1 − 1)2 + (x2 − 1)2
s.t. x1 + x2 = 1
25 / 33
Example Continued
AT y = ∇f (x)
2x1 − 2 = y 2x2 − 2 = y
is the only candidate for local minimum. And it is indeed a local minimizer
(also a global minimizer)
26 / 33
Another Example
W T z = 2(X T Xβ − X T y)
27 / 33
Another Example Continued
1 T
W β = ξ, X T Xβ = W z + XT y
2
We can write this as:
W 0 β ξ
=
XT X − 21 W T z XT y
28 / 33
Inequality Constraints
minimizex f (x)
s.t. Ax ≥ b (2)
Theorem
If x∗ is a local minimum of (2), then there exists some y ≥ 0 satisfying
∇f (x∗ ) = AT y
yi · (aTi x∗ − bi ) = 0, ∀i
29 / 33
Proof
30 / 33
Proof Continued
∇f (x) = AT y
yi · (aTi x − bi ) = 0, ∀i
31 / 33
More General Cases — KKT Conditions
32 / 33
Find KKT Conditions
minimizex f (x)
s.t. gi (x) ≥ 0 i = 1, ..., m
hi (x) = 0 i = 1, ..., p
ℓi (x) ≤ 0 i = 1, ..., r
xi ≥ 0 i∈M
xi ≤ 0 i∈N
xi free i∈
/ M ∪N
One can use the feasible/descent directions arguments to find the KKT
conditions. But it is not very convenient.
▶ In the next lecture, we present a direct approach
33 / 33