Trust Region Methods
Trust Region Methods
Authored by:
Benabida Sif Eddine
Mai 2024
Contents
List of Figures 1
1 Introduction 2
4 Global convergence 9
4.1 Sufficient reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 The Cauchy point . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Convergence to stationary points . . . . . . . . . . . . . . . . . . . 10
4.2.1 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Algorithm based on Newton’s method . . . . . . . . . . . . . 11
5 Numerical example 12
6 Conclusions 12
References 13
List of Figures
1 Trust region step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 ’∥p(λ)∥ as a function of λ’ . . . . . . . . . . . . . . . . . . . . . . . 7
3 The dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1 Introduction
Trust region methods,or sometimes called restricted step methods, are methods
used to solve unconstrained minimization problems .To read about Trust region
methods for constrained optimization I refer to [2] and [1] . The motivation for
the creation of these methods is to deal with the case of the non-positive definite
hessian matrices in the Newton’s method. I refer to Fletcher book [3] to read about
Newton’s method.
min f (x)
x∈Rn
The strategy of trust region methods is to construct a model mk that has similar
behavior to the objective function f over some domain near the current iterate
xk using the information gathered about this objective function. This model is
obtained by truncating the Taylor series for f (xk + p) which is:
1
f (xk + p) = f (xk ) + ∇f (xk )T p + pT ∇2 f (xk + tp)p
2
for p ∈ Rn and some t ∈ (0, 1).
In other words, Trust region methods define a region around the current iter-
ate within which they trust the model to be an adequate representation of the
objective function, and then choose the step to be the approximate minimizer of
the model in this region, and if the reduction in f predicted by the model function
is not acceptable according to some criteria that we are going to talk about later,
they reduce the size of this region and find a new minimizer.
On the other hand, if the minimization of the model inside the trust region
is producing good steps and predicting the behavior of the objective function, we
could increase the size of the trust region to allow more efficient steps to be taken
(i.e. that predict the behavior the objective function more efficiently).
Briefly, if the region is too small, it might unnecessarily limit the algorithm’s
ability to explore and find the optimal solution efficiently, and from another point
of view, if it’s too large, the algorithm might take steps that lead to invalid or
undesirable regions.
2
2 The basic trust region algorithm
Before getting into the outlines of the main algorithm, let’s talk about uncon-
strained optimization algorithms.
Algorithms for this type of problems require the user to supply a starting point
denoted x0 ,and starting from it , the algorithm generate a sequence of feasible
solutions x1 , x2 , x3 , ... called iterates . The algorithms stop when there is no more
progress to be made (i.e.f (xk ) ≤ f (x) for all x), or when we get an accurate
approximation to the solution of the problem.
The algorithm purpose is to find a new iterate xk+1 that has a lower function
value than xk .There are non-monotone algorithms that doesn’t insist on that, but
there will be always a decrease in f after some number of iterations m, that is:
f (xk+m ) ≤ f (xk ).
Now let’s define the model mk , as we said earlier , it is based on the Taylor
series for f (xk + p) , by replacing the hessian of the function f by an approximation
Hk to it , we get the model definition :
1
mk (p) = f (xk ) + ∇f (xk )T p + pT Hk p
2
Where Hk is a symmetric matrix.
To find the step that should be taken from the iterate xk we solve this subproblem:
Where ∆ is the trust region radius and ∥.∥ is the Euclidean norm. So p∗ is the
minimizer of mk in the ball of radius ∆ ( see figure 1).
3
Figure 1: Trust region step
In σk , note that f (xk ) − f (xk + pk ) is the actual reduction in f and mk (0) − mk (pk )
is the predicted reduction , so σk is simply the reduction in f predicted by the
model function .This is the criteria that we talked about earlier to measure the
accuracy to which the reduction in mk predicts the reduction in f , in the sense
that the closer σk is to unity, the better is the agreement.
4
Notice here that we’re talking about the size of the trust region, not whether
the step taken is acceptable or no. Actually the step is acceptable whenever
f (xk ) > f (xk + pk ) (i.e. whenever σk > 0 ).
So let’s translate what we’ve just said into the general algorithm of the trust
region method (how does the algorithm works at the iteration k):
We allow ourselves to drop the subscript k from the subproblem and rewrite
it as follow:
minn m(p) s.t ∥p∥ ≤ ∆ (3)
p∈R
There are different approaches to solve that, but we’re going to talk about two
of them: the trust region Newton method and the dogleg method(to see other
approaches I refer to [6] p33-52) , but first, let’s talk about the theorem (due to
5
Moré and Sorensen) that define the necessary and sufficient optimality conditions
for the subproblem (3):
1. (H + λI)p∗ = −∇f
2. λ(∆ − ∥p∗ ∥) = 0
3. H + λI is positive semidefinite.
H = QΛQT
6
Figure 2: ’∥p(λ)∥ as a function of λ’
n
qiT ∇f
p(λ) = −Q(Λ + λI)QT ∇f = −
X
qi
i=1 λi + λ
and we find:
n
(qiT ∇f )2
∥p(λ)∥2 =
X
i=1 (λi + λ)
2
In this case we can solve the problem (4) using the root finding Newton’s method
(I refer to [5] p633 to read more about this method ), and the algorithm goes as
follows:
(ii) Factor H + λk I = LT L;
Note that the factorization used in this algorithm is the Cholesky factorization
(I refer to [5] p608 to read about it ). Some safeguards must be added to the
algorithm to make it practical , for example λk > −λ1 because when λk ≤ −λ1 the
Cholesky factorization will not exist.
7
3.1.2 The hard case
When q1T ∇f = 0 , the situation becomes a little complicated because there may
not be a value λ∗ ∈ (−λ1 , +∞) such that ∥p(λ)∥ = ∆ , but from theorem 1, we
can show that λ∗ ∈ [−λ1 , +∞) , so it remains one possibility, that is: λ∗ = −λ1 .
In this case we set :
n
qiT ∇f
p(λ) = − qi + αz
X
i=2 λi + λ
Where z is the eigenvector of H that corresponds to the eigenvalue λ1 , α is a scalar
and : n
(qiT ∇f )2
∥p(λ)∥2 = + α2
X
i=2 (λ i + λ) 2
2. ∆ < ∥pF ∥ (i.e. the full step is not allowed by the trust region). In this case
we have to use the dogleg method that we are going to talk about.
We start by creating a path consisting of two line segments. The first line segment
runs from the origin (the current iterate) to the minimizer of m along the steepest
descent direction −∇f , called the Cauchy point, which is :
∇T f ∇f
pu = − ∇f
∇T f H∇f
and the second line goes from pu to pF (see figure 3).
We formally denote this trajectory by pd (µ) where :
µp
u 0≤µ<1
pd (µ) =
pu + (µ − 1)(pF − pu ) 1 ≤ µ ≤ 2
There is a lemma that ensures that the trust region boundary ∥p∥ = ∆ intersect
with this path at exactly one point when the full step is not allowed by the trust
region, and nowhere else. The chosen value of p will be at the point of intersection
8
Figure 3: The dogleg method
4 Global convergence
To yield global convergence, the method must iterates towards a critical point
and at each iteration, pk must gives a sufficient reduction in the model.
9
if :
∥∇f (xk )∥
!
mk (0) − mk (pk ) ≥ c1 ∥∇f (xk )∥ min ∆k , (5)
∥Hk ∥
For some c ∈ (0, 1].
S = {x | f (x) ≤ f (x0 )}
and an open neighborhood of this set by:
10
Where r is a positive constant.
Theorem 2
3. pk satisfies (5).
Theorem 3
2. Hk = ∇2 f (xk )
Then:
I omit the proof, which can be found in Moré and Sorensen [4] section 4.
11
5 Numerical example
In this section, we’re going to apply the Trust region method and the dogleg
strategy to find the minimum of the Rosenbrock function defined by:
then:
6 Conclusions
The main purpose of this work has been to discuss the main trust region
algorithm used to solve unconstrained minimization problems, with an aim towards
understanding the best way to implement it in order to achieve the convergence
towards the global solution. We introduced theoretical aspects of the method, gived
the necessary and sufficient conditions in order to achieve global convergence, and
finally implemented the algorithm on the Rosenbrock function.
12
References
[1] Richard H Byrd, Robert B Schnabel, and Gerald A Shultz. A trust region
algorithm for nonlinearly constrained optimization. SIAM Journal on Numerical
Analysis, 24(5):1152–1170, 1987.
[2] Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. Trust region
methods. SIAM, 2000.
[3] Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000.
[4] Jorge J Moré and Danny C Sorensen. Computing a trust region step. SIAM
Journal on scientific and statistical computing, 4(3):553–572, 1983.
[5] Jorge Nocedal and Stephen J Wright. Numerical optimization. Springer, 1999.
13