Hauser Lecture2
Hauser Lecture2
Unconstrained Optimisation
• The algorithms we will construct have the common feature that, starting
from an initial educated guess x0 ∈ Rn for a solution of (UCM), a sequence
of solutions (xk )N ⊂ Rn is produced such that
xk → x∗ ∈ Rn
such that the first and second order necessary optimality conditions
g(x∗) = 0,
H(x∗) 0 (positive semidefiniteness)
are satisfied.
• We usually wish to make progress towards solving (UCM) in every itera-
tion, that is, we will construct xk+1 so that
f (xk+1 ) < f (xk )
(descent methods).
Actual methods differ from one another in how steps i) and ii) are computed.
Computing a Step Length αk
2.5
1.5
0.5
2.5
1.5
0.5
(the objective function f (x) = x2 and the iterates xk+1 = xk + αk pk generated by the descent
directions pk = −1 and steps αk = 1/2k+1 from x0 = 2).
Exact Line Search:
• Formulate a criterion that assures that steps are neither too long nor too
short.
• Construct sequence of updates that satisfy the above criterion after very
few steps.
Backtracking Line Search:
1. Given αinit > 0 (e.g., αinit = 1), let α(0) = αinit and l = 0.
3. Set αk = α(l) .
This method prevents the step from getting too small, but it does not prevent
steps that are too long relative to the decrease in f .
0.12
0.1
0.08
0.06
0.04
0.02
−0.02
−0.04
1. Given αinit > 0 (e.g., αinit = 1), let α(0) = αinit and l = 0.
1 ),
i) set α(l+1) = τ α(l), where τ ∈ (0, 1) is fixed (e.g., τ = 2
ii) increment l by 1.
3. Set αk = α(l).
Theorem 1 (Termination of Backtracking-Armijo). Let f ∈ C 1 with gradient
g(x) that is Lipschitz continuous with constant γ k at xk , and let pk be a
descent direction at xk . Then, for fixed β ∈ (0, 1),
ii) and furthermore, for fixed τ ∈ (0, 1) the stepsize generated by the backtracking-
Armijo line search terminates with
k ] T pk
2τ (β − 1)[g
αk ≥ min αinit, .
γ k kpk k22
[g k ]Tpk
k T k
iii) limk→∞ min |[g ] p |, = 0.
kpk k2
Computing a Search Direction pk
• pk is a descent direction.
• pk is cheap to compute.
iii) limk→∞ g k = 0.
Advantages and disadvantages of steepest descent:
0.5
−0.5
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Contours for the objective function f (x, y) = 10(y − x2 )2 + (x − 1)2 (Rosenbrock function),
and the iterates generated by the generic line search steepest-descent method.
More General Descent Methods:
B k pk = −g k
iii) limk→∞ g k = 0,
ii) limk→∞ xk = x∗ ,
iii) the sequence converges Q-quadratically, that is, there exists κ > 0 such
that
kxk+1 − x∗k
lim ≤ κ.
k→∞ kxk − x∗ k2
The mechanism that makes Theorem 5 work is that once the sequence
(xk )N enters a certain domain of attraction of x∗, it cannot escape again
and quadratic convergence to x∗ commences.
Note that this is only a local convergence result, that is, Newton’s method is
not guaranteed to converge to a local minimiser from all starting points.
The fast convergence of Newton’s method becomes apparent
when we apply it to the Rosenbrock function:
1.5
0.5
−0.5
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Contours for the objective function f (x, y) = 10(y −x2 )2 +(x−1)2 , and the iterates generated
by the Generic Linesearch Newton method.
Modified Newton Methods:
• M k = max(0, −λmin (H k )) I.
3. Output pk ≈ p(i) .
Important features of the conjugate gradient method:
• [g k ]Tp(i) < 0 for all i, that is, the algorithm always stops with
a descent direction as an approximation to pk .