0% found this document useful (0 votes)
9 views

lecture18

Uploaded by

Yatharth Chawla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

lecture18

Uploaded by

Yatharth Chawla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

MAT3007 Optimization

Optimality Conditions

Junfeng WU

School of Data Science


The Chinese University of Hong Kong, Shenzhen

1 / 33
Recap: Nonlinear Optimization

Some terminologies:
▶ Global vs local optimizer (minimizer)
▶ Gradient, Hessian, Taylor expansions

Then we studied the optimality conditions for unconstrained problems.

Theorem (First-Order Necessary Condition)


If x∗ is a local minimizer of f (·) for an unconstrained problem, then we
must have ∇f (x∗ ) = 0.

▶ The FONC can be used to find candidates for local minimizers


▶ However, FONC is not sufficient

2 / 33
Optimality Conditions for Unconstrained Problems-Continued

Optimality Conditions for Unconstrained Problems-Continued

3 / 33
.

Second-Order Necessary Conditions

4 / 33
Second-Order Necessary Condition

Consider the Taylor expansion again but to the 2nd order (assuming f is
twice continuously differentiable):
1
f (x + td) = f (x) + t∇f (x)⊤ d + t2 d⊤ ∇2 f (x)d + o(t2 ).
2
When the first-order necessary condition holds, we have:
1
f (x + td) = f (x) + t2 d⊤ ∇2 f (x)d + o(t2 ).
2
In order for x to be a local minimizer, we also need d⊤ ∇2 f (x)d to be
nonnegative for every d ∈ Rn .

5 / 33
Second-Order Necessary Condition (SONC)

Theorem: Second-Order Necessary Conditions


If x∗ is a local minimizer of f , then it holds that:
1. ∇f (x∗ ) = 0;
2. For all d ∈ Rn : d⊤ ∇2 f (x∗ )d ≥ 0.

Definition: Semidefiniteness
We call a (symmetric) matrix A positive (negative) semidefinite (PSD/NSD)
if and only if for all x we have x⊤ Ax ≥ 0 (≤ 0).

Remark:
▶ Therefore, the second-order necessary condition requires the Hessian
matrix at x∗ to be PSD. In the one-dimensional case, this is
equivalent to f ′′ (x∗ ) ≥ 0.

6 / 33
Positive Semidefinite Matrices

Here are some useful facts about PSD matrices:


▶ We usually only talk about PSD properties for symmetric matrices.
▶ If a matrix A is not symmetric, we use 12 (A + A⊤ ) to define the PSD
properties (because x⊤ Ax = 12 x⊤ (A + A⊤ )x).
▶ A symmetric matrix is PSD if and only if all the eigenvalues are
nonnegative.
▶ For any matrix A, A⊤ A is a (symmetric) PSD matrix.

7 / 33
Example Continued

For f (x) := x4 − 9x2 + 4x − 1, the second-order condition is:

f ′′ (x) = 12x2 − 18 ≥ 0

Only x1 = −1
√ − 6/2 and x3 ′′= 2 satisfy the condition.
√ But for the point
x2 = −1 + 6/2, we obtain f (x2 ) = 12(1 − 6) < 0 (thus, x2 is not a
local minimizer).

In the example of least squares problem, we use the following fact:


▶ If f (x) = x⊤ M x (M is symmetric), then ∇2 f (x) = 2M .

Therefore, the Hessian matrix in that problem is 2X ⊤ X, which is always a


PSD matrix. Therefore, the SONC always holds!

8 / 33
SONC is Not Sufficient

However, even if both the first- and second-order necessary conditions


hold, we still can not guarantee that the candidate is a local minimum!

Example: Consider f (x) = x3 at 0.


▶ f ′ (0) = f ′′ (0) = 0, thus FONC and SONC hold.
▶ But 0 is not a local minimum

▶ A point x satisfying ∇f (x) = 0 is called critical point or stationary


point.
▶ The SONC can be used to verify that a stationary point is not a local
minimizer.
⇝ By modifying the SONC, we can get a sufficient condition.

9 / 33
.

Second-Order Sufficient Conditions

10 / 33
Second-Order Sufficient Condition (SOSC)

Theorem: Second-Order Sufficient Conditions


Let f be twice continuously differentiable. If x∗ satisfies:
1. ∇f (x∗ ) = 0;
2. For all d ∈ Rn \{0}: d⊤ ∇2 f (x∗ )d > 0;
then x∗ is a strict local minimum/minimizer of f .

Definition: Definite Matrices


We call a (symmetric) matrix A positive (negative) definite (PD/ND) if and
only if for all x ̸= 0: x⊤ Ax > 0 (< 0).

▶ A PD matrix must be PSD (thus PD is a stronger notion).


▶ A symmetric matrix is PD ⇐⇒ all its eigenvalues are positive.

11 / 33
Proof
We need the following lemma
Lemma: Bounds and Eigenvalues
Let A ∈ Rn×n be a symmetric matrix. Then
λmin (A)∥x∥2 ≤ x⊤ Ax ≤ λmax (A)∥x∥2 ∀ x ∈ Rn ,
where λmin (A) and λmax (A) are the smallest and largest EV of A.
The proof is by another variant of Taylor expansion, i.e.,
1
f (x∗ + d) = f (x∗ ) + d⊤ ∇2 f (x∗ )d + o(∥d∥2 ),
2
for d tends to 0.
When ∇2 f (x∗ ) is positive definite, we have d⊤ ∇2 f (x∗ )d ≥ µ||d||2 , where
µ > 0 is the smallest eigenvalue of ∇2 f (x∗ ).

Thus, we have
µ o(∥d∥2 )
 
µ
f (x∗ + d) ≥ f (x∗ ) + ∥d∥2 + o(∥d∥2 ) = f (x∗ ) + ∥d∥2 + .
2 2 ∥d∥2
o(∥d∥2 )
Since ∥d∥ → 0, we can have ∥d∥2 ≥ − µ4 , which shows
f (x∗ + d) > f (x∗ ).
12 / 33
For Maximization Problems

Our conditions are derived for minimization problems. For maximization


problems, we just change the inequalities. Let f ∈ C 2 (twice continuously
differentiable).
Theorem: FONC for Maximization
If x∗ is a local (unconstrained) maximizer of f , then we must have ∇f (x∗ ) =
0.

Theorem: SONC for Maximization


If x∗ is a local maximizer of f , then we must have 1.) ∇f (x∗ ) = 0; 2.)
∇2 f (x∗ ) is negative semidefinite.

Theorem: SOSC for Maximization


If x∗ satisfies 1.) ∇f (x∗ ) = 0; 2.) ∇2 f (x∗ ) is negative definite, then x∗ is
a strict local maximizer.

13 / 33
Optimality Conditions

Optimality Conditions for Unconstrained Problems:


▶ First-order necessary condition.
▶ Second-order necessary condition.
▶ Second-order sufficient condition.

In many cases, we can utilize these conditions to identify local and global
optimal solutions.

General Strategy:
▶ Use FONC and SONC to identify all possible candidates. Then, use
the sufficient conditions to verify.
▶ If a problem only has one stationary point and one can reason that
the problem must have a finite optimal solution, then this point must
be the (global) optimum.

14 / 33
Examples–I

In the example f (x) = x4 − 9x2 + 4x − 1, the points x1 and x3 satisfy the


second-order sufficient conditions (f ′′ (x) > 0) and are local minimizer.

In the least squares problem, if X ⊤ X is positive definite (or if it is


invertible), then the solution β of the FONC

X ⊤ Xβ = X ⊤ y

is unique and it satisfies the second-order sufficient conditions.


⇝ It must be the unique global minimizer of the problem.

15 / 33
Optimality Conditions for Unconstrained Problems-Continued

Optimality Conditions for Unconstrained Problems-Continued

16 / 33
Constrained Problems
We have derived necessary and sufficient conditions for the local minimum
for unconstrained problems.
▶ What is the difference between constrained and unconstrained
problems?
Consider the example f (x) = 100x2 (1 − x)2 − x with constraint
−0.2 ≤ x ≤ 0.8.

In addition to the original local minimizer (x1 = 0.013), there is one more
local minimizer on the boundary (x = 0.8).

17 / 33
Constrained Problems

At the boundary (x∗ = 0.8), the FONC is not satisfied

f ′ (0.8) < 0

However, at this point, in order to stay feasible, we can only go leftward.


That is, in the Taylor expansion

f (x∗ + d) = f (x∗ ) + df ′ (x∗ ) + o(d)

we can only take d to be negative (otherwise it won’t be feasible).

Thus f (x∗ + d) > f (x∗ ) in a small neighborhood of x∗ in the feasible


region. Thus x∗ is a local minimizer.

18 / 33
Feasible Directions

Now we formalize the above arguments.

Definition (Feasible Direction)


Given x ∈ F , we call d to be a feasible direction at x if there exists ᾱ > 0
such that x + αd ∈ F for all 0 ≤ α ≤ ᾱ.

For example,
▶ If F = {x|Ax = b}, then the feasible directions at x is {d|Ad = 0}
▶ If F = {x|Ax ≥ b}, then the feasible directions at x is
{d|aTi d ≥ 0 if aTi x = bi }

19 / 33
FONC for Constrained Problems

Theorem (FONC for Constrained Problems)


If x∗ is a local minimum, then for any feasible direction d at x∗ , we must
have ∇f (x∗ )T d ≥ 0

In unconstrained problems, all directions are feasible, thus we must have


∇f (x∗ ) = 0.

20 / 33
An Alternative View

Definition (Descent Direction)


Let f be continuously differentiable. Then d is called a descent direction
at x if and only if ∇f (x)T d < 0.

⇝ If d is a descent direction at x, then there exists γ̄ > 0 such that


f (x + γd) < f (x) for all 0 < γ ≤ γ̄.

If we denote the set of feasible directions at x by SF (x) and the set of


descent directions at x by SD (x). Then the first order necessary condition
can be written as:

SF (x∗ ) ∩ SD (x∗ ) = ∅

Or in other words, there cannot be any feasible descent directions.

21 / 33
Nonlinear Optimization with Equality Constraints

Consider
minimizex f (x)
s.t. Ax = b
▶ The feasible direction set is {d|Ad = 0}.
▶ The descent direction set is {d|∇f (x)T d < 0}.

The FONC says that at local minimum, there cannot be a solution to both
systems (feasible and descent direction)

Theorem (Alternative System)


The system Ad = 0 and ∇f (x)T d < 0 does not have a solution if and
only if there exists y such that

AT y = ∇f (x)

22 / 33
Nonlinear Optimization with Equality Constraints

Therefore, the first-order necessary condition for

minimizex f (x) (1)


s.t. Ax = b

is that there exists y such that

AT y = ∇f (x)

Theorem
If x∗ is a local minimum for (1), then there must exist y such that

AT y = ∇f (x∗ )

23 / 33
Proof

First it is easy to see that if there exists y such that AT y = ∇f (x). Then
we can’t have a d such that Ad = 0 and ∇f (x)T d < 0 (multiplying dT to
both sides of the equation will reach a contradiction).

To prove the reverse, consider the LP:

minimized ∇f (x)T d
s.t. Ad = 0

If there doesn’t exist d satisfying Ad = 0 and ∇f (x)T d < 0, then the


optimal value of this LP must be 0.
Therefore, by the strong duality theorem, its dual problem must also be
feasible (and the optimal value is 0). However, the dual constraint is
AT y = ∇f (x). Thus the theorem is proved. □

24 / 33
Example
Consider the problem:
minimize (x1 − 1)2 + (x2 − 1)2
s.t. x1 + x2 = 1

▶ This problem finds the nearest point on the line x1 + x2 = 1 to the


point (1, 1)

Figure: Finding the nearest point on the line to (1,1)

25 / 33
Example Continued

By the FONC, x = (x1 , x2 ) is a local minimizer if there exists y such that

AT y = ∇f (x)

Here A = (1, 1). And ∇f (x) = (2x1 − 2; 2x2 − 2).


Thus it means there exists y such that

2x1 − 2 = y 2x2 − 2 = y

Also combined with the constraint x1 + x2 = 1. We have


x1 = x2 = 1/2

is the only candidate for local minimum. And it is indeed a local minimizer
(also a global minimizer)

26 / 33
Another Example

Consider a constrained version of the least squares problem:

minimizeβ ||Xβ − y||22


s.t. Wβ = ξ

The gradient is 2(X T Xβ − X T y).

Therefore, the FONC is that there exists z such that

W T z = 2(X T Xβ − X T y)

Therefore, an optimal β must satisfy:


1 T
W β = ξ, X T Xβ = W z + XT y
2

27 / 33
Another Example Continued

1 T
W β = ξ, X T Xβ = W z + XT y
2
We can write this as:
    
W 0 β ξ
=
XT X − 21 W T z XT y

Here the size of X be m × n , and the size of W be d × n. Then these are


n + d linear equations with n + d unknowns.

This is a system of linear equations with n + d equations and n + d


unknowns. Solving this equation will yield the unique candidate for local
minimizer (provided the left hand side matrix is of full rank).

28 / 33
Inequality Constraints

Now we consider an inequality constrained problem:

minimizex f (x)
s.t. Ax ≥ b (2)

What should be the necessary optimality conditions?

Theorem
If x∗ is a local minimum of (2), then there exists some y ≥ 0 satisfying

∇f (x∗ ) = AT y
yi · (aTi x∗ − bi ) = 0, ∀i

where aTi is the ith row of A.

29 / 33
Proof

We consider the descent directions and the feasible directions at x∗ .


First it is easy to see that the descent directions are:

SD (x∗ ) = {d : ∇f (x∗ )T d < 0}

For the feasible directions, it is

SF (x∗ ) = {d : aTi d ≥ 0, if aTi x∗ = bi }

Local optimality requires that SD (x∗ ) ∩ SF (x∗ ) = ∅. We define


A(x) = {i : aTi x = bi } to be the active constraints at x, then the
necessary condition should be:

There does not exist d such that


1. ∇f (x∗ )T d < 0
2. aTi d ≥ 0 for i ∈ A(x∗ )

30 / 33
Proof Continued

The nonexistence of d such that


1. ∇f (x)T d < 0
2. aTi d ≥ 0 for i ∈ A(x)
is equivalent to the existence of y ≥ 0, such that
X
∇f (x) = ai yi
i∈A(x)

This can be further written as the following conditions:


▶ There exists y ≥ 0 such that

∇f (x) = AT y
yi · (aTi x − bi ) = 0, ∀i

31 / 33
More General Cases — KKT Conditions

We have discussed cases with linear equality constraints or linear inequality


constraints and derived the (necessary) optimality conditions
▶ We want to extend them to more general cases — KKT conditions
▶ We call the first-order necessary conditions for a general optimization
problem the KKT conditions
▶ Solutions that satisfy the KKT conditions are called KKT points.
▶ KKT points are candidate points for local optimal solutions.
▶ The KKT conditions were originally named after H. Kuhn and A.
Tucker, who first published the conditions in 1951. Later scholars
discovered that the conditions had been stated by W. Karush in his
master’s thesis in 1939.

32 / 33
Find KKT Conditions

We consider the general nonlinear optimization problem:

minimizex f (x)
s.t. gi (x) ≥ 0 i = 1, ..., m
hi (x) = 0 i = 1, ..., p
ℓi (x) ≤ 0 i = 1, ..., r
xi ≥ 0 i∈M
xi ≤ 0 i∈N
xi free i∈
/ M ∪N

One can use the feasible/descent directions arguments to find the KKT
conditions. But it is not very convenient.
▶ In the next lecture, we present a direct approach

33 / 33

You might also like