0% found this document useful (0 votes)

51 views30 pages

Unconstrained

This document discusses various methods for unconstrained minimization of functions, including gradient descent, steepest descent, and Newton's method. It provides terminology and assumptions, describes how the methods iteratively find points with lower function values to converge to the optimal value. Gradient descent uses the negative gradient as the search direction at each step. Steepest descent uses a normalized steepest direction. The choice of norm can affect the performance of steepest descent. Examples are provided to illustrate the methods.

Uploaded by

gjrfjsdgfhjk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views30 pages

Unconstrained

Uploaded by

gjrfjsdgfhjk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

L.

Vandenberghe ECE236B (Winter 2022)

10. Unconstrained minimization

• terminology and assumptions

• gradient descent method
• steepest descent method
• Newton’s method
• self-concordant functions
• implementation

10.1
Unconstrained minimization

minimize 𝑓 (𝑥)

• 𝑓 convex, twice continuously differentiable (hence dom 𝑓 open)

• we assume optimal value 𝑝★ = inf 𝑥 𝑓 (𝑥) is attained (and finite)

Unconstrained minimization methods

• produce sequence of points 𝑥 (𝑘) ∈ dom 𝑓 , 𝑘 = 0, 1, . . . , with

𝑓 (𝑥 (𝑘) ) → 𝑝★

• can be interpreted as iterative methods for solving optimality condition

∇ 𝑓 (𝑥★) = 0

Unconstrained minimization 10.2

Initial point and sublevel set

algorithms in this chapter require a starting point 𝑥 (0) such that

• 𝑥 (0) ∈ dom 𝑓
• sublevel set 𝑆 = {𝑥 | 𝑓 (𝑥) ≤ 𝑓 (𝑥 (0) )} is closed

2nd condition is hard to verify, except when all sublevel sets are closed:

• equivalent to condition that epi 𝑓 is closed

• true if dom 𝑓 = R𝑛
• true if 𝑓 (𝑥) → ∞ as 𝑥 → bd dom 𝑓

examples of differentiable functions with closed sublevel sets:

X
𝑚 X
𝑚
𝑓 (𝑥) = log( exp(𝑎𝑇𝑖 𝑥 + 𝑏𝑖 )), 𝑓 (𝑥) = − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1 𝑖=1

Unconstrained minimization 10.3

Strong convexity and implications

𝑓 is strongly convex on 𝑆 if there exists an 𝑚 > 0 such that

∇2 𝑓 (𝑥) 𝑚𝐼 for all 𝑥 ∈ 𝑆

Implications

• for 𝑥, 𝑦 ∈ 𝑆 ,
𝑇 𝑚
𝑓 (𝑦) ≥ 𝑓 (𝑥) + ∇ 𝑓 (𝑥) (𝑦 − 𝑥) + k𝑥 − 𝑦k22
2
• 𝑆 is bounded
• 𝑝★ > −∞ and for 𝑥 ∈ 𝑆 ,

★ 1
𝑓 (𝑥) − 𝑝 ≤ k∇ 𝑓 (𝑥) k22
2𝑚

useful as stopping criterion (if you know 𝑚 )

Unconstrained minimization 10.4

Descent methods

𝑥 (𝑘+1) = 𝑥 (𝑘) + 𝑡 (𝑘) Δ𝑥 (𝑘) with 𝑓 (𝑥 (𝑘+1) ) < 𝑓 (𝑥 (𝑘) )

• other notations:
𝑥 + = 𝑥 + 𝑡Δ𝑥, 𝑥 := 𝑥 + 𝑡Δ𝑥

• Δ𝑥 is the step, or search direction; 𝑡 is the step size, or step length

• for convex 𝑓 : if 𝑓 (𝑥 +) < 𝑓 (𝑥) then Δ𝑥 must be a descent direction:

∇ 𝑓 (𝑥)𝑇 Δ𝑥 < 0

General descent method

given: a starting point 𝑥 ∈ dom 𝑓
repeat
1. determine a descent direction Δ𝑥
2. line search: choose a step size 𝑡 > 0
3. update: 𝑥 := 𝑥 + 𝑡Δ𝑥
until stopping criterion is satisfied

Unconstrained minimization 10.5

Line search types

Exact line search: 𝑡 = argmin𝑡>0 𝑓 (𝑥 + 𝑡Δ𝑥)

Backtracking line search (with parameters 𝛼 ∈ (0, 1/2) , 𝛽 ∈ (0, 1) )

• starting at 𝑡 = 1, repeat 𝑡 := 𝛽𝑡 until

𝑓 (𝑥 + 𝑡Δ𝑥) < 𝑓 (𝑥) + 𝛼𝑡∇ 𝑓 (𝑥)𝑇 Δ𝑥

• graphical interpretation: backtrack until 𝑡 ≤ 𝑡0

𝑓 (𝑥 + 𝑡Δ𝑥)

𝑇
𝑓 (𝑥) + 𝑡∇ 𝑓 (𝑥) Δ𝑥 𝑓 (𝑥) + 𝛼𝑡∇ 𝑓 (𝑥)𝑇 Δ𝑥
𝑡
𝑡=0 𝑡0
Unconstrained minimization 10.6
Gradient descent method

Gradient descent: general descent method with Δ𝑥 = −∇ 𝑓 (𝑥)

given: a starting point 𝑥 ∈ dom 𝑓
repeat
1. Δ𝑥 := −∇ 𝑓 (𝑥)
2. line search: choose step size 𝑡 via exact or backtracking line search
3. update: 𝑥 := 𝑥 + 𝑡Δ𝑥
until stopping criterion is satisfied

• stopping criterion usually of the form k∇ 𝑓 (𝑥) k2 ≤ 𝜖

• convergence result: for strongly convex 𝑓 ,

𝑓 (𝑥 (𝑘) ) − 𝑝★ ≤ 𝑐 𝑘 ( 𝑓 (𝑥 (0) ) − 𝑝★)

𝑐 ∈ (0, 1) depends on 𝑚 , 𝑥 (0) , line search type

• very simple, but often very slow

Unconstrained minimization 10.7

Quadratic problem in R2

𝑓 (𝑥) = 21 (𝑥12 + 𝛾𝑥22) (𝛾 > 0)

with exact line search, starting at 𝑥 (0) = (𝛾, 1) :

𝑘 𝑘
(𝑘) 𝛾−1 (𝑘) 𝛾−1
𝑥1 = 𝛾 , 𝑥2 = −
𝛾+1 𝛾+1

• very slow if 𝛾 ≫ 1 or 𝛾 ≪ 1
• example for 𝛾 = 10:

𝑥 (0)
𝑥2

0
𝑥 (1)

−4
−10 0 10
𝑥1
Unconstrained minimization 10.8
Nonquadratic example

𝑓 (𝑥1, 𝑥2) = 𝑒 𝑥1+3𝑥2−0.1 + 𝑒 𝑥1−3𝑥2−0.1 + 𝑒 −𝑥1−0.1

𝑥 (0) 𝑥 (0)

𝑥 (2)
𝑥 (1)

𝑥 (1)

backtracking line search exact line search

Unconstrained minimization 10.9

Example in R100

𝑇
X
500
𝑓 (𝑥) = 𝑐 𝑥 − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1

104

102
𝑓 (𝑥 (𝑘) ) − 𝑝★

100
exact l.s.

10−2

backtracking l.s.
10−4
0 50 100 150 200
𝑘

‘linear’ convergence, i.e., a straight line on a semilog plot

Unconstrained minimization 10.10

Steepest descent method

Normalized steepest descent direction (at 𝑥 , for norm k · k ):

Δ𝑥nsd = argmin{∇ 𝑓 (𝑥)𝑇 𝑣 | k𝑣k = 1}

interpretation: for small 𝑣,

𝑓 (𝑥 + 𝑣) ≈ 𝑓 (𝑥) + ∇ 𝑓 (𝑥)𝑇 𝑣

direction Δ𝑥 nsd is unit-norm step with most negative directional derivative

(Unnormalized) steepest descent direction

Δ𝑥sd = k∇ 𝑓 (𝑥) k∗Δ𝑥nsd

satisfies ∇ 𝑓 (𝑥)𝑇 Δ𝑥 sd = −k∇ 𝑓 (𝑥) k∗2

Steepest descent method

• general descent method with Δ𝑥 = Δ𝑥sd
• convergence properties similar to gradient descent

Unconstrained minimization 10.11

Examples

• Euclidean norm: Δ𝑥sd = −∇ 𝑓 (𝑥)

• quadratic norm k𝑥k𝑃 = (𝑥𝑇 𝑃𝑥) 1/2 (𝑃 ∈ S++
𝑛 ):

Δ𝑥sd = −𝑃−1 ∇ 𝑓 (𝑥)

• ℓ1-norm: Δ𝑥sd = −(𝜕 𝑓 (𝑥)/𝜕𝑥𝑖 )𝑒𝑖 , where |𝜕 𝑓 (𝑥)/𝜕𝑥𝑖 | = k∇ 𝑓 (𝑥) k∞

unit balls, steepest descent directions for a quadratic norm and ℓ1-norm:

−∇ 𝑓 (𝑥)

−∇ 𝑓 (𝑥)
Δ𝑥nsd
Δ𝑥nsd

Unconstrained minimization 10.12

Choice of norm for steepest descent

𝑥 (0)
𝑥 (0)
𝑥 (2)
𝑥 (1) 𝑥 (2)

𝑥 (1)

• steepest descent with backtracking line search for two quadratic norms
• ellipses show {𝑥 | k𝑥 − 𝑥 (𝑘) k𝑃 = 1}
• equivalent interpretation of steepest descent with quadratic norm k · k𝑃 :
gradient descent after change of variables 𝑥¯ = 𝑃1/2𝑥

shows choice of 𝑃 has strong effect on speed of convergence

Unconstrained minimization 10.13

Newton step

Δ𝑥nt = −∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥)

Interpretations

• 𝑥 + Δ𝑥nt minimizes second order approximation

b 𝑇 1 𝑇 2
𝑓 (𝑥 + 𝑣) = 𝑓 (𝑥) + ∇ 𝑓 (𝑥) 𝑣 + 𝑣 ∇ 𝑓 (𝑥)𝑣
2

• 𝑥 + Δ𝑥nt solves linearized optimality condition

∇ 𝑓 (𝑥 + 𝑣) ≈ ∇ b
𝑓 (𝑥 + 𝑣) = ∇ 𝑓 (𝑥) + ∇2 𝑓 (𝑥)𝑣 = 0

b
𝑓′

b
𝑓 𝑓′
(𝑥 + Δ𝑥nt, 𝑓 ′ (𝑥 + Δ𝑥nt))
(𝑥, 𝑓 (𝑥)) (𝑥, 𝑓 ′ (𝑥))

(𝑥 + Δ𝑥nt, 𝑓 (𝑥 + Δ𝑥nt)) 𝑓

Unconstrained minimization 10.14

• Δ𝑥nt is steepest descent direction at 𝑥 in local Hessian norm

k𝑢k∇2 𝑓 (𝑥) = (𝑢𝑇 ∇2 𝑓 (𝑥)𝑢) 1/2

𝑥 + Δ𝑥nsd
𝑥 + Δ𝑥nt

dashed lines are contour lines of 𝑓 ; ellipse is {𝑥 + 𝑣 | 𝑣𝑇 ∇2 𝑓 (𝑥)𝑣 = 1}

arrow shows −∇ 𝑓 (𝑥)

Unconstrained minimization 10.15

Newton decrement

𝜆(𝑥) = (∇ 𝑓 (𝑥)𝑇 ∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥)) 1/2

a measure of the proximity of 𝑥 to 𝑥★

Properties

• gives an estimate of 𝑓 (𝑥) − 𝑝★, using quadratic approximation b

𝑓:

b 1
𝑓 (𝑥) − inf 𝑓 (𝑦) = 𝜆(𝑥) 2
𝑦 2

• equal to the norm of the Newton step in the quadratic Hessian norm

𝜆(𝑥) = (Δ𝑥𝑇nt ∇2 𝑓 (𝑥)Δ𝑥nt) 1/2

• directional derivative in the Newton direction: ∇ 𝑓 (𝑥)𝑇 Δ𝑥nt = −𝜆(𝑥) 2

• affine invariant (unlike k∇ 𝑓 (𝑥) k2)

Unconstrained minimization 10.16

Newton’s method

given: a starting point 𝑥 ∈ dom 𝑓 , tolerance 𝜖 > 0

repeat
1. compute the Newton step and decrement

Δ𝑥nt := −∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥) ; 𝜆2 := ∇ 𝑓 (𝑥)𝑇 ∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥)

2. stopping criterion: quit if 𝜆2/2 ≤ 𝜖
3. line search: choose step size 𝑡 by backtracking line search
4. update: 𝑥 := 𝑥 + 𝑡Δ𝑥 nt

Affine invariance

• Newton iterates for 𝑓˜(𝑦) = 𝑓 (𝑇 𝑦) with starting point 𝑦 (0) = 𝑇 −1𝑥 (0) are

𝑦 (𝑘) = 𝑇 −1𝑥 (𝑘)

• independent of linear changes of coordinates

Unconstrained minimization 10.17

Classical convergence analysis

Assumptions

• 𝑓 strongly convex on 𝑆 with constant 𝑚

• ∇2 𝑓 is Lipschitz continuous on 𝑆 , with constant 𝐿 > 0:

k∇2 𝑓 (𝑥) − ∇2 𝑓 (𝑦) k2 ≤ 𝐿k𝑥 − 𝑦k2

( 𝐿 measures how well 𝑓 can be approximated by a quadratic function)

Outline: there exist constants 𝜂 ∈ (0, 𝑚 2/𝐿) , 𝛾 > 0 such that

• if k∇ 𝑓 (𝑥) k2 ≥ 𝜂, then 𝑓 (𝑥 (𝑘+1) ) − 𝑓 (𝑥 (𝑘) ) ≤ −𝛾

• if k∇ 𝑓 (𝑥) k2 < 𝜂, then
2
𝐿 (𝑘+1) 𝐿 (𝑘)
2
k∇ 𝑓 (𝑥 ) k2 ≤ 2
k∇ 𝑓 (𝑥 ) k2
2𝑚 2𝑚

Unconstrained minimization 10.18

Classical convergence analysis

Damped Newton phase ( k∇ 𝑓 (𝑥) k2 ≥ 𝜂 )

• most iterations require backtracking steps

• function value decreases by at least 𝛾
• if 𝑝★ > −∞, this phase ends after at most ( 𝑓 (𝑥 (0) ) − 𝑝★)/𝛾 iterations

Quadratically convergent phase ( k∇ 𝑓 (𝑥) k2 < 𝜂 )

• all iterations use step size 𝑡 = 1

• k∇ 𝑓 (𝑥) k2 converges to zero quadratically: if k∇ 𝑓 (𝑥 (𝑘) ) k2 < 𝜂, then

2𝑙−𝑘 2𝑙−𝑘
𝐿 𝑙 𝐿 𝑘 1
2
k∇ 𝑓 (𝑥 ) k2 ≤ 2
k∇ 𝑓 (𝑥 ) k2 ≤ , 𝑙≥𝑘
2𝑚 2𝑚 2

Unconstrained minimization 10.19

Classical convergence analysis

Conclusion: number of iterations until 𝑓 (𝑥) − 𝑝★ ≤ 𝜖 is bounded above by

𝑓 (𝑥 (0) ) − 𝑝★
+ log2 log2 (𝜖0/𝜖)
𝛾

• 𝛾 , 𝜖0 are constants that depend on 𝑚 , 𝐿 , 𝑥 (0)

• second term is small (of the order of 6) and almost constant for practical
purposes

• in practice, constants 𝑚 , 𝐿 (hence 𝛾 , 𝜖0) are usually unknown

• provides qualitative insight in convergence properties (i.e., explains two
algorithm phases)

Unconstrained minimization 10.20

Examples

Example in R2 (page 10.9)

105

𝑥 (0) 100

𝑓 (𝑥 (𝑘) ) − 𝑝★
𝑥 (1) 10−5

10−10

10−15
0 1 2 3 4 5
𝑘

• backtracking parameters 𝛼 = 0.1, 𝛽 = 0.7

• converges in only 5 steps
• quadratic local convergence

Unconstrained minimization 10.21

Examples

Example in R100 (page 10.10)

105 2

exact line search

100 1.5
𝑓 (𝑥 (𝑘) ) − 𝑝★

step size 𝑡 (𝑘)

backtracking
10−5 1

exact line search

10−10 0.5 backtracking

10−15 0
0 2 4 6 8 10 0 2 4 6 8
𝑘 𝑘

• backtracking parameters 𝛼 = 0.01, 𝛽 = 0.5

• backtracking line search almost as fast as exact l.s. (and much simpler)
• clearly shows two phases in algorithm

Unconstrained minimization 10.22

Examples

Example in R10000 (with sparse 𝑎𝑖 )

X
10000 X
100000
𝑓 (𝑥) = − log(1 − 𝑥𝑖2) − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1 𝑖=1

105
𝑓 (𝑥 (𝑘) ) − 𝑝★

100

10−5

0 5 10 15 20
𝑘

• backtracking parameters 𝛼 = 0.01, 𝛽 = 0.5

• performance similar as for small examples

Unconstrained minimization 10.23

Self-concordance

Shortcomings of classical convergence analysis

• depends on unknown constants (𝑚 , 𝐿 , . . . )

• bound is not affinely invariant, although Newton’s method is

Convergence analysis via self-concordance (Nesterov and Nemirovski)

• does not depend on any unknown constants

• gives affine-invariant bound
• applies to special class of convex functions (‘self-concordant’ functions)
• developed to analyze polynomial-time interior-point methods for convex
optimization

Unconstrained minimization 10.24

Self-concordant functions

Definition

• convex 𝑓 : R → R is self-concordant if

| 𝑓 ′′′ (𝑥)| ≤ 2 𝑓 ′′ (𝑥) 3/2 for all 𝑥 ∈ dom 𝑓

• 𝑓 : R𝑛 → R is self-concordant if 𝑔(𝑡) = 𝑓 (𝑥 + 𝑡𝑣) is s.c. for all 𝑥 ∈ dom 𝑓 and 𝑣

Examples on R

• linear and quadratic functions

• negative logarithm 𝑓 (𝑥) = − log 𝑥
• negative entropy plus negative logarithm: 𝑓 (𝑥) = 𝑥 log 𝑥 − log 𝑥

Affine invariance: if 𝑓 : R → R is s.c., then 𝑓˜(𝑦) = 𝑓 (𝑎𝑦 + 𝑏) is s.c.:

𝑓˜′′′ (𝑦) = 𝑎 3 𝑓 ′′′ (𝑎𝑦 + 𝑏), 𝑓˜′′ (𝑦) = 𝑎 2 𝑓 ′′ (𝑎𝑦 + 𝑏)

Unconstrained minimization 10.25

Self-concordant calculus

Properties

• preserved under sums and positive scaling by factor ≥ 1

• preserved under composition with affine function
• if 𝑔 is convex with dom 𝑔 = R++ and |𝑔′′′ (𝑥)| ≤ 3𝑔′′ (𝑥)/𝑥 then

𝑓 (𝑥) = log(−𝑔(𝑥)) − log 𝑥

is self-concordant

Examples: properties can be used to show that the following are s.c.
P𝑚 𝑇 𝑥) on {𝑥 | 𝑎𝑇 𝑥 < 𝑏 , 𝑖 = 1, . . . , 𝑚}
• 𝑓 (𝑥) = − 𝑖=1
log(𝑏 𝑖 − 𝑎 𝑖 𝑖 𝑖
𝑛
• 𝑓 (𝑋) = − log det 𝑋 on S++
• 𝑓 (𝑥) = − log(𝑦 2 − 𝑥𝑇 𝑥) on {(𝑥, 𝑦) | k𝑥k2 < 𝑦}

Unconstrained minimization 10.26

Convergence analysis for self-concordant functions

Summary: there exist constants 𝜂 ∈ (0, 1/4] , 𝛾 > 0 such that

• if 𝜆(𝑥) > 𝜂, then

𝑓 (𝑥 (𝑘+1) ) − 𝑓 (𝑥 (𝑘) ) ≤ −𝛾

• if 𝜆(𝑥) ≤ 𝜂, then
2
(𝑘+1) (𝑘)
2𝜆(𝑥 ) ≤ 2𝜆(𝑥 )

(𝜂 and 𝛾 only depend on backtracking parameters 𝛼, 𝛽)

Complexity bound: number of Newton iterations bounded by

𝑓 (𝑥 (0) ) − 𝑝★
+ log2 log2 (1/𝜖)
𝛾

for 𝛼 = 0.1, 𝛽 = 0.8, 𝜖 = 10−10, bound evaluates to 375( 𝑓 (𝑥 (0) ) − 𝑝★) + 6

Unconstrained minimization 10.27

Numerical example

150 randomly generated instances of

P
𝑚
minimize 𝑓 (𝑥) = − log(𝑏𝑖 − 𝑎𝑇𝑖 𝑥)
𝑖=1

◦: 𝑚 = 100, 𝑛 = 50
iterations
15
: 𝑚 = 1000, 𝑛 = 500
^: 𝑚 = 1000, 𝑛 = 50 10

0
0 5 10 15 20 25 30 35
(0) ★
𝑓 (𝑥 )−𝑝
• number of iterations much smaller than 375( 𝑓 (𝑥 (0) ) − 𝑝★) + 6
• bound of the form 𝑐( 𝑓 (𝑥 (0) ) − 𝑝★) + 6 with smaller 𝑐 (empirically) valid
Unconstrained minimization 10.28
Implementation

main effort in each iteration: evaluate derivatives and solve Newton system

𝐻Δ𝑥 = −𝑔

where 𝐻 = ∇2 𝑓 (𝑥) , 𝑔 = ∇ 𝑓 (𝑥)

Via Cholesky factorization

𝐻 = 𝐿𝐿𝑇 , Δ𝑥nt = −𝐿 −𝑇 𝐿 −1 𝑔, 𝜆(𝑥) = k𝐿 −1 𝑔k2

• cost (1/3)𝑛3 flops for unstructured system

• cost ≪ (1/3)𝑛3 if 𝐻 sparse, banded

Unconstrained minimization 10.29

Example of dense Newton system with structure

X
𝑛
𝑓 (𝑥) = 𝜓𝑖 (𝑥𝑖 ) + 𝜓0 ( 𝐴𝑥 + 𝑏), 𝐻 = 𝐷 + 𝐴𝑇 𝐻0 𝐴
𝑖=1

• assume 𝐴 ∈ R 𝑝×𝑛 , dense, with 𝑝 ≪ 𝑛

• 𝐷 diagonal with diagonal elements 𝜓𝑖′′ (𝑥𝑖 ) ; 𝐻0 = ∇2𝜓0 ( 𝐴𝑥 + 𝑏)

Method 1: form 𝐻 , solve via dense Cholesky factorization (cost (1/3)𝑛3)

Method 2 (page 9.15): factor 𝐻0 = 𝐿 0 𝐿𝑇0 ; write Newton system as

𝐷Δ𝑥 + 𝐴𝑇 𝐿 0 𝑤 = −𝑔, 𝐿𝑇0 𝐴Δ𝑥 − 𝑤 = 0

eliminate Δ𝑥 from first equation; compute 𝑤 and Δ𝑥 from

(𝐼 + 𝐿𝑇0 𝐴𝐷 −1 𝐴𝑇 𝐿 0)𝑤 = −𝐿𝑇0 𝐴𝐷 −1 𝑔, 𝐷Δ𝑥 = −𝑔 − 𝐴𝑇 𝐿 0 𝑤

cost: 2𝑝 2 𝑛 (dominated by computation of 𝐿𝑇0 𝐴𝐷 −1 𝐴𝑇 𝐿 0)

Unconstrained minimization 10.30

Wolfram Mathematica Tutorial Collection
No ratings yet
Wolfram Mathematica Tutorial Collection
38 pages
Real 2 MCQs
50% (2)
Real 2 MCQs
2 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Calculus: EC All GATE Questions
No ratings yet
Calculus: EC All GATE Questions
54 pages
Newton Method and Self-Concordance: October 23, 2008
No ratings yet
Newton Method and Self-Concordance: October 23, 2008
27 pages
BCOM Maths Practice Questions
No ratings yet
BCOM Maths Practice Questions
4 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
【书】nonlinear optimization （SC function）
No ratings yet
【书】nonlinear optimization （SC function）
158 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Hw2sol PDF
100% (1)
Hw2sol PDF
5 pages
Z Transform
No ratings yet
Z Transform
35 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Jiyue Zeng Honors Thesis
No ratings yet
Jiyue Zeng Honors Thesis
59 pages
QT Study Pack 2006 Wip
100% (1)
QT Study Pack 2006 Wip
446 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
10 Unconstrained
No ratings yet
10 Unconstrained
41 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
L10 - Subgrad - PGD (Partially Annotated)
No ratings yet
L10 - Subgrad - PGD (Partially Annotated)
39 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Optim
No ratings yet
Optim
70 pages
11 Equality
No ratings yet
11 Equality
27 pages
Tylor Series
No ratings yet
Tylor Series
21 pages
Newton-Raphson en INGLES
No ratings yet
Newton-Raphson en INGLES
24 pages
Exam 2018
No ratings yet
Exam 2018
18 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
14 Newton
No ratings yet
14 Newton
24 pages
Lecture 5 Si416 2025
No ratings yet
Lecture 5 Si416 2025
21 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
0105 Stoch Subgrad Notes
No ratings yet
0105 Stoch Subgrad Notes
17 pages
Chapter 10
No ratings yet
Chapter 10
16 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
M2 Exam 2022-23 Solutions
No ratings yet
M2 Exam 2022-23 Solutions
12 pages
Nocedal - Wright CH - 02-02
No ratings yet
Nocedal - Wright CH - 02-02
12 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Chương 9
No ratings yet
Chương 9
12 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Chapter8-Unconstrained Optimization
No ratings yet
Chapter8-Unconstrained Optimization
14 pages
Anti Derivatives
No ratings yet
Anti Derivatives
21 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Newton Scribed
No ratings yet
Newton Scribed
7 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
10.1 Types of Constrained Optimization Algorithms
No ratings yet
10.1 Types of Constrained Optimization Algorithms
24 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
MAE Opti Worksheet 4 Correction
No ratings yet
MAE Opti Worksheet 4 Correction
3 pages
Optimal Dispatch of Generation Part I: Unconstrained Parameter Optimization
No ratings yet
Optimal Dispatch of Generation Part I: Unconstrained Parameter Optimization
9 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
HW 2 Sol
No ratings yet
HW 2 Sol
5 pages
Class: 11 SUBJECT: Mathematics CHAPTER: 2 Relations and Functions Marks: 30 Date
No ratings yet
Class: 11 SUBJECT: Mathematics CHAPTER: 2 Relations and Functions Marks: 30 Date
2 pages
Spanning Trees
No ratings yet
Spanning Trees
4 pages
The Number of All One-One Functions From Set A (1, 2, 3) To Itself Is
No ratings yet
The Number of All One-One Functions From Set A (1, 2, 3) To Itself Is
4 pages
Handling of Computational in Vitro/in Vivo Correlation Problems by Microsoft Excel: IV. Generalized Matrix Analysis of Linear Compartment Systems
No ratings yet
Handling of Computational in Vitro/in Vivo Correlation Problems by Microsoft Excel: IV. Generalized Matrix Analysis of Linear Compartment Systems
7 pages
.Class 12 Isc Trigonometric Formulas
No ratings yet
.Class 12 Isc Trigonometric Formulas
3 pages
YR 13 PUREdacs
No ratings yet
YR 13 PUREdacs
20 pages
17.pertemuan 11 Double Integrals Over General Regions-854-862
No ratings yet
17.pertemuan 11 Double Integrals Over General Regions-854-862
9 pages
Hyperbolic Functions 1 of 5 Hyperbolic Functions in Algebra and Calculus
No ratings yet
Hyperbolic Functions 1 of 5 Hyperbolic Functions in Algebra and Calculus
5 pages
Curves in R: Graphs Vs Level Sets
100% (1)
Curves in R: Graphs Vs Level Sets
17 pages
23ma202 - Discrete Mathematics - QB
No ratings yet
23ma202 - Discrete Mathematics - QB
11 pages
Module 3A
No ratings yet
Module 3A
17 pages
MATLAB ACTIVITY 4 - Determinant Using MATLAB
No ratings yet
MATLAB ACTIVITY 4 - Determinant Using MATLAB
9 pages
Best of Both Worlds - Combine KG and Vector Search For Enhanced RAG - Neo4j
No ratings yet
Best of Both Worlds - Combine KG and Vector Search For Enhanced RAG - Neo4j
40 pages
Unit Guide - CCSS Algebra 1 - Unit 8 - Functions
No ratings yet
Unit Guide - CCSS Algebra 1 - Unit 8 - Functions
15 pages
D1 Networks (Dijkstra)
No ratings yet
D1 Networks (Dijkstra)
26 pages
Different Types of Systems: TF X (TF)
No ratings yet
Different Types of Systems: TF X (TF)
20 pages
Basic Concepts of Function
No ratings yet
Basic Concepts of Function
43 pages
Math-231-Section-4.7, 4.8 (Sol)
No ratings yet
Math-231-Section-4.7, 4.8 (Sol)
17 pages
William E. Boyceelementary Differential Equations and Boundary Value Problems Wiley 2017
No ratings yet
William E. Boyceelementary Differential Equations and Boundary Value Problems Wiley 2017
1 page
Absolute Value Functions
No ratings yet
Absolute Value Functions
7 pages
Computer Vision Assignment
No ratings yet
Computer Vision Assignment
9 pages
Julia 8
No ratings yet
Julia 8
2 pages
Test Bank - Exponential and Logarithmic Functions
No ratings yet
Test Bank - Exponential and Logarithmic Functions
3 pages
Đề Thi Linear HK2 - 21 - 22
No ratings yet
Đề Thi Linear HK2 - 21 - 22
2 pages

Unconstrained

Uploaded by

Unconstrained

Uploaded by

L.

Vandenberghe ECE236B (Winter 2022)

10. Unconstrained minimization

• terminology and assumptions

• 𝑓 convex, twice continuously differentiable (hence dom 𝑓 open)

Unconstrained minimization methods

• produce sequence of points 𝑥 (𝑘) ∈ dom 𝑓 , 𝑘 = 0, 1, . . . , with

• can be interpreted as iterative methods for solving optimality condition

Unconstrained minimization 10.2

algorithms in this chapter require a starting point 𝑥 (0) such that

• equivalent to condition that epi 𝑓 is closed

examples of differentiable functions with closed sublevel sets:

Unconstrained minimization 10.3

𝑓 is strongly convex on 𝑆 if there exists an 𝑚 > 0 such that

∇2 𝑓 (𝑥)  𝑚𝐼 for all 𝑥 ∈ 𝑆

useful as stopping criterion (if you know 𝑚 )

Unconstrained minimization 10.4

𝑥 (𝑘+1) = 𝑥 (𝑘) + 𝑡 (𝑘) Δ𝑥 (𝑘) with 𝑓 (𝑥 (𝑘+1) ) < 𝑓 (𝑥 (𝑘) )

• Δ𝑥 is the step, or search direction; 𝑡 is the step size, or step length

General descent method

Unconstrained minimization 10.5

Exact line search: 𝑡 = argmin𝑡>0 𝑓 (𝑥 + 𝑡Δ𝑥)

Backtracking line search (with parameters 𝛼 ∈ (0, 1/2) , 𝛽 ∈ (0, 1) )

𝑓 (𝑥 + 𝑡Δ𝑥) < 𝑓 (𝑥) + 𝛼𝑡∇ 𝑓 (𝑥)𝑇 Δ𝑥

• graphical interpretation: backtrack until 𝑡 ≤ 𝑡0

Gradient descent: general descent method with Δ𝑥 = −∇ 𝑓 (𝑥)

• stopping criterion usually of the form k∇ 𝑓 (𝑥) k2 ≤ 𝜖

𝑓 (𝑥 (𝑘) ) − 𝑝★ ≤ 𝑐 𝑘 ( 𝑓 (𝑥 (0) ) − 𝑝★)

𝑐 ∈ (0, 1) depends on 𝑚 , 𝑥 (0) , line search type

Unconstrained minimization 10.7

𝑓 (𝑥) = 21 (𝑥12 + 𝛾𝑥22) (𝛾 > 0)

with exact line search, starting at 𝑥 (0) = (𝛾, 1) :

𝑓 (𝑥1, 𝑥2) = 𝑒 𝑥1+3𝑥2−0.1 + 𝑒 𝑥1−3𝑥2−0.1 + 𝑒 −𝑥1−0.1

backtracking line search exact line search

Unconstrained minimization 10.9

‘linear’ convergence, i.e., a straight line on a semilog plot

Unconstrained minimization 10.10

Normalized steepest descent direction (at 𝑥 , for norm k · k ):

Δ𝑥nsd = argmin{∇ 𝑓 (𝑥)𝑇 𝑣 | k𝑣k = 1}

interpretation: for small 𝑣,

direction Δ𝑥 nsd is unit-norm step with most negative directional derivative

(Unnormalized) steepest descent direction

Δ𝑥sd = k∇ 𝑓 (𝑥) k∗Δ𝑥nsd

satisfies ∇ 𝑓 (𝑥)𝑇 Δ𝑥 sd = −k∇ 𝑓 (𝑥) k∗2

Steepest descent method

Unconstrained minimization 10.11

• Euclidean norm: Δ𝑥sd = −∇ 𝑓 (𝑥)

Δ𝑥sd = −𝑃−1 ∇ 𝑓 (𝑥)

• ℓ1-norm: Δ𝑥sd = −(𝜕 𝑓 (𝑥)/𝜕𝑥𝑖 )𝑒𝑖 , where |𝜕 𝑓 (𝑥)/𝜕𝑥𝑖 | = k∇ 𝑓 (𝑥) k∞

Unconstrained minimization 10.12

shows choice of 𝑃 has strong effect on speed of convergence

Unconstrained minimization 10.13

Δ𝑥nt = −∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥)

• 𝑥 + Δ𝑥nt minimizes second order approximation

• 𝑥 + Δ𝑥nt solves linearized optimality condition

Unconstrained minimization 10.14

k𝑢k∇2 𝑓 (𝑥) = (𝑢𝑇 ∇2 𝑓 (𝑥)𝑢) 1/2

dashed lines are contour lines of 𝑓 ; ellipse is {𝑥 + 𝑣 | 𝑣𝑇 ∇2 𝑓 (𝑥)𝑣 = 1}

arrow shows −∇ 𝑓 (𝑥)

Unconstrained minimization 10.15

𝜆(𝑥) = (∇ 𝑓 (𝑥)𝑇 ∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥)) 1/2

a measure of the proximity of 𝑥 to 𝑥★

• gives an estimate of 𝑓 (𝑥) − 𝑝★, using quadratic approximation b

𝜆(𝑥) = (Δ𝑥𝑇nt ∇2 𝑓 (𝑥)Δ𝑥nt) 1/2

• directional derivative in the Newton direction: ∇ 𝑓 (𝑥)𝑇 Δ𝑥nt = −𝜆(𝑥) 2

Unconstrained minimization 10.16

given: a starting point 𝑥 ∈ dom 𝑓 , tolerance 𝜖 > 0

Δ𝑥nt := −∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥) ; 𝜆2 := ∇ 𝑓 (𝑥)𝑇 ∇2 𝑓 (𝑥) −1 ∇ 𝑓 (𝑥)

𝑦 (𝑘) = 𝑇 −1𝑥 (𝑘)

• independent of linear changes of coordinates

Unconstrained minimization 10.17

• 𝑓 strongly convex on 𝑆 with constant 𝑚

k∇2 𝑓 (𝑥) − ∇2 𝑓 (𝑦) k2 ≤ 𝐿k𝑥 − 𝑦k2

( 𝐿 measures how well 𝑓 can be approximated by a quadratic function)

Outline: there exist constants 𝜂 ∈ (0, 𝑚 2/𝐿) , 𝛾 > 0 such that

• if k∇ 𝑓 (𝑥) k2 ≥ 𝜂, then 𝑓 (𝑥 (𝑘+1) ) − 𝑓 (𝑥 (𝑘) ) ≤ −𝛾

∇2 𝑓 (𝑥) 𝑚𝐼 for all 𝑥 ∈ 𝑆