MAE Optimization Lecture 3 Handout
MAE Optimization Lecture 3 Handout
Unconstrained Optimization
E. Flayac
Numerical Optimization
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 3/43
Unconstrained Optimization Problem
Definitions
▶ We say that x∗ ∈ Rn is a global solution (global minimum) of
(Punc ) if ∀x ∈ Rn , f (x∗ ) ≤ f (x).
▶ We say that x∗ ∈ Rn is a local solution (local minimum) of (Punc ) if
∃r > 0, ∀x ∈ B(x∗ , r ), f (x∗ ) ≤ f (x).
▶ We say that x∗ ∈ Rn is a strict local solution (strict local minimum)
of (Punc ) if there exists r > 0 such that:
f (x∗ ) < f (x), ∀x ∈ B(x∗ , r )\{x∗ }.
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 4/43
Example
f (x)
Local Minimum x
Global Minimum
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 5/43
Necessary Optimality Conditions
Remarks
▶ An element x∗ ∈ Rn satisfying ∇f (x∗ ) = 0 is called a stationary
point or a critical point.
▶ Condition (1) is necessary but not sufficient, e.g., n = 1,
f (x) = −x 2 , f ′ (x) = −2x.
In this case, x ∗ = 0 is a stationary point (f ′ (0) = 0) but a global
maximum.
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 6/43
Necessary Optimality Conditions: proof
First-Order Necessary Optimality Condition: Proof
▶ Taylor expansion at x∗ for h ∈ Rn
f (x∗ + h) = f (x∗ ) + ∇f (x∗ )T h + ||h||ϵ(h)
For d ∈ Rn and t > 0 and h = td
f (x∗ + td) = f (x∗ ) + t∇f (x∗ )T d + t||d||ϵ(td)
▶ For t small, x + td ≈ x, thus f (x∗ ) ≤ f (x∗ + td) as x∗ is a local
minimum.
∗
f (x
) ≤ ∗
f (x ) + t∇f (x∗ )T d + t||d||ϵ(td)
Which implies by dividing by t > 0:
0 ≤ ∇f (x∗ )T d + ||d||ϵ(td)
| {z }
→0 when t→0
Remark
Taylor expansion at x∗ for h ∈ Rn
f (x∗ + h) = f (x∗ ) + ∇f (x∗ )T h + ||h||ϵ(h)
If then ∇f (x∗ ) = 0 (stationary point) , then
f (x∗ + h) = f (x∗ ) + ||h||ϵ(h)
f (x∗ + h) ≈ f (x∗ )
Therefore:
▶ f is approximately constant around x∗
▶ The graph of f is ”flat” around x∗
▶ Question : do we have
▶ f (x∗ + h) > f (x∗ ) for any h ?
▶ f (x∗ + h) < f (x∗ ) for any h ?
▶ or something else ?
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 8/43
Example of stationary point in 2D : Minimum
20
z
10
−2 −2
−1
0 0
1
y 2 2 x
Graph of f1 (x, y ) = 3x 2 + 2.5y 2 with curves y = 0 (blue) and x = 0 (red)
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 9/43
Example of stationary point in 2D : Maximum
−10
z
−20
−2 −2
−1
0 0
1
y 2 2 x
Graph of f2 (x, y ) = −3x 2 − 2.5y 2 with curves y = 0 (blue) and x = 0 (red)
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 10/43
Example of stationary point in 2D : Saddle point
10
z
−10
−2
0 −2
y 0
2
2 x
Graph of f3 (x, y ) = 3x 2 − 2.5y 2 with curves y = 0 (blue) and x = 0 (red)
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 11/43
Example of stationary point in 2D : other
10
z
0
−10
−2
0 −2
y 0
2
2 x
Graph of f4 (x, y ) = 3x 2 − 2.5y 3 with curves y = 0 (blue) and x = 0 (red)
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 12/43
Examples of stationary points: minimum and maximum
x∗
0
Set x∗ = ∗ = .
y 0
Stationary point and minimum
6x
f1 (x, y )= 3x 2 + 2.5y 2 ∇f1 (x, y ) =
5y
∗ ∗ 0
∇f1 (x , y )= ⇒ x∗ is a stationary point and a (global) minimum
0
Stationary point and maximum
2 2 −6x
f2 (x, y )= −3x − 2.5y ∇f1 (x, y ) =
−5y
∗ ∗ 0
∇f2 (x , y )= ⇒ x∗ is stationary point and a (global) maximum
0
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 13/43
Examples stationary points: saddle points and other
x∗
0
Set x∗ = ∗ = .
y 0
Stationary point and a saddle point
6x
f3 (x, y )= 3x 2 − 2.5y 2 ∇f3 (x, y ) =
−5y
0
∇f3 (x ∗ , y ∗ )= ⇒ x∗ is a stationary point and a saddle point
0
Other stationary point
2 3 6x
f4 (x, y )= 3x − 2.5y ∇f4 (x, y ) =
−7.5y 2
∗ ∗ 0
∇f4 (x , y )= ⇒ x∗ is another type of stationary point
0
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 14/43
Necessary Optimality Conditions: Remark
Remark
If x∗ is a stationary point:
▶ f is approximately constant around x∗
▶ The graph of f is ”flat” around x∗
▶ Question : do we have
▶ f (x∗ + h) > f (x∗ ) for any h ?
▶ f (x∗ + h) < f (x∗ ) for any h ?
▶ or something else ?
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 15/43
Necessary optimality conditions
Remarks
Condition (2) is necessary but still not sufficient, e.g
▶ n = 1, f (x) = −x 4 , f ′ (x) = −4x 3 f ′′ (x) = −12x 2 ,
▶ For x ∗ = 0 one gets f ′ (x ∗ ) = f ′ (0) = 0 and f ′′ (x ∗ ) = f ′′ (0) = 0
▶ In this case, x ∗ = 0 satisfies (2) but it is a global maximum.
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 16/43
Sufficient optimality conditions
Second-order sufficient optimality condition
Let x∗ ∈ Rn . If f ∈ C 2 (Rn , R) and x∗ satisfies:
∇f (x∗ ) = 0 and ∇2 f (x∗ ) ≻ 0,
then x∗ is a strict local solution of (Punc ).
Sketch of proof
▶ Taylor expansion (of order 2) at x∗ for h ∈ Rn
f (x∗ + h) = f (x∗ ) + ∇f (x∗ )T h + hT ∇2 f (x∗ )h + ||h||2 ϵ(h)
▶ We assumed that ∇f (x∗ ) = 0 so we get:
f (x∗ + h) = f (x∗ ) + hT ∇2 f (x∗ )h +||h||2 ϵ(h)
| {z } |{z}
>0 as ∇2 f (x∗ )≻0 ≈0 for h≈0
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 17/43
Examples of stationary points: minimum and maximum
x∗
0
Set x∗ = ∗ = .
y 0
Stationary point and minimum
20
z
∗ ∗ 0 10
∇f1 (x , y )=
0 0
−2
−1
−2
0 0
6 0
1
y x
∗ ∗
2 2
2
∇ f1 (x , y )= ≻0
0 5 f1 (x, y ) = 3x 2 + 2.5y 2
−10
z
∗ ∗ 0
∇f2 (x , y )= −20
0 −2
−1
−2
0 0
−6 0
1
y x
∗ ∗
2 2
2
∇ f2 (x , y )= ≺0
0 −5 f2 (x, y ) = −3x 2 − 2.5y 2
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 18/43
Examples stationary points: saddle points and other
x∗
0
Set x∗ = = .
y∗ 0
Stationary point and a saddle point
10
z
0
−10
0
∇f3 (x ∗ , y ∗ )= −2
0 0 −2
y 0
2
2 ∗ ∗ 6 0 2 x
∇ f3 (x , y )= ⊁ 0 nor ⊀ 0
0 −5 f3 (x, y ) = 3x 2 − 2.5y 2
z
0
−10
∗ ∗ 0
∇f4 (x , y )= −2
0 0 −2
2 ∗ ∗ 6 0 6 0 y 0
∇ f4 (x , y )= ⪰0
2
= 2 x
0 −10y ∗ 0 0
f4 (x, y ) = 3x 2 − 2.5y 3
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 19/43
Classification of stationary points using the Hessian: ∇2 f (x∗ )
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 20/43
What happens if λi = 0 for some 1 ≤ i ≤ n
0 50
50
z
z
z
−50 0
−2 −2 −2 −2 −2 −2
−1 −1 −1
0 0 0 0 0 0
1 1 1
y 2 2 x y 2 2 x y 2 2 x
4 4 4 4 44
f5 (x, y ) = 3x + 2.5y
f6 (x, y ) = −3x − 2.5y
f7 (x, y ) = 3x − 2.5y
0 0 0
∇f5 (0, 0) = ∇f6 (0, 0) = ∇f7 (0, 0) =
0 0 0
0 0 0 0 0 0
∇2 f5 (0, 0) = ⪰0 ∇2 f6 (0, 0) = ⪰0 ∇2 f7 (0, 0) = ⪰0
0 0 0 0 0 0
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 21/43
How to compute analytically and classify stationary points ?
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 22/43
Sufficient optimality conditions in the convex case
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 23/43
Unconstrained convex quadratic optimization
1
minn f (x) = xT Sx − cT x. (Pquad )
x∈R 2
Properties
▶ ∀x ∈ Rn , ∇f (x) = Sx − c and ∇2 f (x) = S.
▶ f is convex ⇐⇒ S ⪰ 0.
▶ Let x∗ ∈ Rn . If S ⪰ 0 then:
x∗ ∈ Rn is a global solution of (Pquad ) ⇐⇒ Sx∗ = c.
▶ If S ≻ 0 then (Pquad ) has a unique global solution x∗ = S −1 c.
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 24/43
Linear least squares
Let R ∈ Mn×p (R) and y ∈ Rn . The linear least squares problem can be
defined as follows:
1
minp f (x) = ∥Rx − y∥2 . (Pls )
x∈R 2
Properties
▶ (Pls ) is a special case of (Pquad ) with S = R T R and c = R T y.
▶ ∀x ∈ Rn , ∇f (x) = R T Rx − R T y and ∇2 f (x) = R T R ⪰ 0.
▶ x∗ ∈ Rn is a global solution of (Pls ) ⇐⇒ R T Rx∗ = R T y.
▶ If R has full rank then (Pls ) has a unique global solution
x∗ = (R T R)−1 R T y.
Introduction to Optimization | Optimality conditions for unconstrained optimization | March 11th | Slide 25/43
Numerical Optimization
xk → x∗ as k → +∞
where x∗ ∈ argminx∈Rn f (x)
Convergence of Costs of Iterates to the Optimal Value
f (xk ) → f ∗ as k → +∞
where f∗ = minx∈Rn f (x)
Convergence to a Stationary Point
∇f (xk ) → 0 as k → +∞
if f is differentiable
In practice:
▶ We do not know the optimal solution(s) x∗ ;
▶ We do not know the optimal value f ∗ = f (x∗ ).
Property
If d ∈ Rn \{0} is a descent direction of f at x, then there exists ᾱ > 0
such that ∀α ∈ (0, ᾱ]:
f (x + αd) < f (x). (3)
Remark
From Taylor expansion, we have
f (x + αd) = f (x) + α(∇f (x)T d + ϵ(d))
where f : Rn → R, d ∈ Rn , and α > 0.
Introduction to Optimization | Descent direction methods | March 11th | Slide 33/43
Descent Directions
Examples:
▶ d = −∇f (x) (steepest descent)