0% found this document useful (0 votes)
122 views14 pages

Trust Region Methods

TRM for unconstrained opt

Uploaded by

Benabida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views14 pages

Trust Region Methods

TRM for unconstrained opt

Uploaded by

Benabida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

University Mohamed El-Bachir El-Ibrahimi of

Bordj Bou Arreridj


Faculty of mathematic and computer science
Department of operations research

Non linear optimization mini-project

Trust region methods for unconstrained


minimization problems

Authored by:
Benabida Sif Eddine

Mai 2024
Contents

List of Figures 1

1 Introduction 2

2 The basic trust region algorithm 3

3 Solutions of the subproblem 5


3.1 Trust region Newton’s method: . . . . . . . . . . . . . . . . . . . . 6
3.1.1 The easy case . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 The hard case . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 The dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Global convergence 9
4.1 Sufficient reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 The Cauchy point . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Convergence to stationary points . . . . . . . . . . . . . . . . . . . 10
4.2.1 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Algorithm based on Newton’s method . . . . . . . . . . . . . 11

5 Numerical example 12

6 Conclusions 12

References 13

List of Figures
1 Trust region step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 ’∥p(λ)∥ as a function of λ’ . . . . . . . . . . . . . . . . . . . . . . . 7
3 The dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1 Introduction
Trust region methods,or sometimes called restricted step methods, are methods
used to solve unconstrained minimization problems .To read about Trust region
methods for constrained optimization I refer to [2] and [1] . The motivation for
the creation of these methods is to deal with the case of the non-positive definite
hessian matrices in the Newton’s method. I refer to Fletcher book [3] to read about
Newton’s method.

The mathematical formulation of the unconstrained minimization problem is:

min f (x)
x∈Rn

where f: Rn → R is a smooth function (a function that has continuous derivatives


up to some desired order over some domain).

The strategy of trust region methods is to construct a model mk that has similar
behavior to the objective function f over some domain near the current iterate
xk using the information gathered about this objective function. This model is
obtained by truncating the Taylor series for f (xk + p) which is:
1
f (xk + p) = f (xk ) + ∇f (xk )T p + pT ∇2 f (xk + tp)p
2
for p ∈ Rn and some t ∈ (0, 1).

In other words, Trust region methods define a region around the current iter-
ate within which they trust the model to be an adequate representation of the
objective function, and then choose the step to be the approximate minimizer of
the model in this region, and if the reduction in f predicted by the model function
is not acceptable according to some criteria that we are going to talk about later,
they reduce the size of this region and find a new minimizer.

On the other hand, if the minimization of the model inside the trust region
is producing good steps and predicting the behavior of the objective function, we
could increase the size of the trust region to allow more efficient steps to be taken
(i.e. that predict the behavior the objective function more efficiently).

Briefly, if the region is too small, it might unnecessarily limit the algorithm’s
ability to explore and find the optimal solution efficiently, and from another point
of view, if it’s too large, the algorithm might take steps that lead to invalid or
undesirable regions.

2
2 The basic trust region algorithm
Before getting into the outlines of the main algorithm, let’s talk about uncon-
strained optimization algorithms.

Algorithms for this type of problems require the user to supply a starting point
denoted x0 ,and starting from it , the algorithm generate a sequence of feasible
solutions x1 , x2 , x3 , ... called iterates . The algorithms stop when there is no more
progress to be made (i.e.f (xk ) ≤ f (x) for all x), or when we get an accurate
approximation to the solution of the problem.

The algorithm purpose is to find a new iterate xk+1 that has a lower function
value than xk .There are non-monotone algorithms that doesn’t insist on that, but
there will be always a decrease in f after some number of iterations m, that is:
f (xk+m ) ≤ f (xk ).

Now let’s define the model mk , as we said earlier , it is based on the Taylor
series for f (xk + p) , by replacing the hessian of the function f by an approximation
Hk to it , we get the model definition :
1
mk (p) = f (xk ) + ∇f (xk )T p + pT Hk p
2
Where Hk is a symmetric matrix.

When ∇2 f (xk ) is available and positive definite, we take Hk to be the exact


hessian of the objective function, we call our approach a Trust region Newton
method.

To find the step that should be taken from the iterate xk we solve this subproblem:

min mk (pk ) s.t ∥pk ∥ ≤ ∆k (1)


pk ∈Rn

Where ∆ is the trust region radius and ∥.∥ is the Euclidean norm. So p∗ is the
minimizer of mk in the ball of radius ∆ ( see figure 1).

To calculate the reduction in f predicted by the model in the iteration k we define

3
Figure 1: Trust region step

the ratio σk by:


f (xk ) − f (xk + pk )
σk = (2)
mk (0) − mk (pk )
Where pk is the step taken at the iteration k with ∥pk ∥ ≤ ∆k and ∆k is the radius
of the Trust region at this iteration.

In σk , note that f (xk ) − f (xk + pk ) is the actual reduction in f and mk (0) − mk (pk )
is the predicted reduction , so σk is simply the reduction in f predicted by the
model function .This is the criteria that we talked about earlier to measure the
accuracy to which the reduction in mk predicts the reduction in f , in the sense
that the closer σk is to unity, the better is the agreement.

We distinguish here three cases:

1. When σk is close to 1, there is a good agreement between f and the model


function over this step (i.e. we could expand the size of the trust region in
the next iteration.)

2. When σk is positive but significantly smaller than 1 , we do not change the


size of the trust region.

3. When σk is close to zero or negative, then there is no agreement between the


objective function and the model over this step, so we shrink the size of the
trust region in the next iteration.

4
Notice here that we’re talking about the size of the trust region, not whether
the step taken is acceptable or no. Actually the step is acceptable whenever
f (xk ) > f (xk + pk ) (i.e. whenever σk > 0 ).

So let’s translate what we’ve just said into the general algorithm of the trust
region method (how does the algorithm works at the iteration k):

Algorithm 1 (Trust region)

(i) Given xk , ∆∞ and ∆k ∈ (0, ∆∞ ).

(ii) Solve (1) to obtain pk ;

(iii) Evaluate σk from ??;

(iv) If σk < 1/4 then ∆k+1 = 14 ∆k ;

(v) Else if σk > 3


4
and ∥pk ∥ = ∆k then ∆k+1 = min(2∆k , ∆∞ );

(vi) Else ∆k+1 = ∆k ;

(vii) If σk > 0 then xk+1 = xk + pk ;

(viii) Else xk+1 = xk

Where ∆∞ is the the overall bound of ∥pk ∥ .

3 Solutions of the subproblem


As you can see in the algorithm 1, finding the approximate solution of the
subproblem is critical in order to find the step that reduce the model value inside
the trust region.

We allow ourselves to drop the subscript k from the subproblem and rewrite
it as follow:
minn m(p) s.t ∥p∥ ≤ ∆ (3)
p∈R

There are different approaches to solve that, but we’re going to talk about two
of them: the trust region Newton method and the dogleg method(to see other
approaches I refer to [6] p33-52) , but first, let’s talk about the theorem (due to

5
Moré and Sorensen) that define the necessary and sufficient optimality conditions
for the subproblem (3):

Theorem 1 (Moré and Sorensen)

We say that p∗ is a global solution of the subproblem (3) if and only if p∗ is


feasible and there exists a scalar λ ≥ 0 such that :

1. (H + λI)p∗ = −∇f

2. λ(∆ − ∥p∗ ∥) = 0

3. H + λI is positive semidefinite.

3.1 Trust region Newton’s method:


To find the solution p∗ of the subproblem (3), we’re going to base our approach
on the theorem mentioned above.
We can see that:
• Either ∥p∗ ∥ ≤ ∆ then λ = 0 and so : p∗ = −H −1 ∇f with H positive
semidefinite. In this case we call p∗ the full step.

• Or else we define p(λ) = −(H +λI)−1 ∇f where H +λI is positive semidefinite


and ∥p∗ ∥ = ∆, so we’re going to search for a scalar λ ≥ 0 that satisfy the
features just described .
To do that we’re going to solve the one dimensional root finding problem in the
variable λ , that is :
∥p(λ)∥ = ∆ (4)
Since H is symmetric , we can apply a spectral decomposition on it and write :

H = QΛQT

Such that : Λ = diag(λ1 , λ2 , . . . , λn ) where λ1 ≤ λ2 ≤ ... ≤ λn , are the eigenvalues


of H and Q = [q1 |q2 |. . . |qn ] where qi is the ith eigenvector of H.
Here we are going to consider two cases based on the value of q1T ∇f :

3.1.1 The easy case


When q1T ∇f ̸= 0, it can be shown that there is a unique λ∗ ∈ (−λ1 , +∞) that
satisfy ∥p(λ)∥ = ∆ and the conditions of the theorem 1, (see figure 3).
for λ ̸= λi we have :

6
Figure 2: ’∥p(λ)∥ as a function of λ’

n
qiT ∇f
p(λ) = −Q(Λ + λI)QT ∇f = −
X
qi
i=1 λi + λ
and we find:
n
(qiT ∇f )2
∥p(λ)∥2 =
X

i=1 (λi + λ)
2

In this case we can solve the problem (4) using the root finding Newton’s method
(I refer to [5] p633 to read more about this method ), and the algorithm goes as
follows:

Algorithm 2 (Trust region subproblem)

(i) Given λk and ∆ > 0

(ii) Factor H + λk I = LT L;

(iii) Solve LT Lpk = −∇f ; LT qk = pk ;

(iv) Set λk+1 = λk + ( ∥p k ∥ 2 ∥pk ∥−∆


∥qk ∥
) ( ∆ );

Note that the factorization used in this algorithm is the Cholesky factorization
(I refer to [5] p608 to read about it ). Some safeguards must be added to the
algorithm to make it practical , for example λk > −λ1 because when λk ≤ −λ1 the
Cholesky factorization will not exist.

7
3.1.2 The hard case
When q1T ∇f = 0 , the situation becomes a little complicated because there may
not be a value λ∗ ∈ (−λ1 , +∞) such that ∥p(λ)∥ = ∆ , but from theorem 1, we
can show that λ∗ ∈ [−λ1 , +∞) , so it remains one possibility, that is: λ∗ = −λ1 .
In this case we set :
n
qiT ∇f
p(λ) = − qi + αz
X

i=2 λi + λ
Where z is the eigenvector of H that corresponds to the eigenvalue λ1 , α is a scalar
and : n
(qiT ∇f )2
∥p(λ)∥2 = + α2
X

i=2 (λ i + λ) 2

And we choose the value of α to ensure that : ∥p(λ)∥ = ∆.

3.2 The dogleg method


The second method that we’re going to talk about can be used when H is
positive definite.
When H is positive definite, we distinguish two cases from theorem 1 :

1. ∥pF ∥ = ∥ − H −1 ∇f ∥ ≤ ∆ and in this case we obviously take : p∗ = pF , such


that pF is the full step.

2. ∆ < ∥pF ∥ (i.e. the full step is not allowed by the trust region). In this case
we have to use the dogleg method that we are going to talk about.

We start by creating a path consisting of two line segments. The first line segment
runs from the origin (the current iterate) to the minimizer of m along the steepest
descent direction −∇f , called the Cauchy point, which is :

∇T f ∇f
pu = − ∇f
∇T f H∇f
and the second line goes from pu to pF (see figure 3).
We formally denote this trajectory by pd (µ) where :

µp
u 0≤µ<1
pd (µ) = 
pu + (µ − 1)(pF − pu ) 1 ≤ µ ≤ 2
There is a lemma that ensures that the trust region boundary ∥p∥ = ∆ intersect
with this path at exactly one point when the full step is not allowed by the trust
region, and nowhere else. The chosen value of p will be at the point of intersection

8
Figure 3: The dogleg method

of the dogleg and the trust region boundary.


To find this point, we solve the following scalar quadratic equation:

∥pu + (1 − µ)(pF − pu )∥2 = ∆2


When ∥pu ∥ ≥ ∆ (i.e. the Cauchy step is not inside the trust region) the solution is
given by:

p∗ = pu
∥pu ∥
Note that when H is an indefinite matrix, we can’t use the dogleg strategy because
pF is not the unconstrained minimizer of m in this case.

4 Global convergence
To yield global convergence, the method must iterates towards a critical point
and at each iteration, pk must gives a sufficient reduction in the model.

4.1 Sufficient reduction


The reduction gotten by pk in the model mk is called the predicted reduction,
that is: mk (0) − mk (pk ). Actually, we say that the predicted reduction is sufficient

9
if :
∥∇f (xk )∥
!
mk (0) − mk (pk ) ≥ c1 ∥∇f (xk )∥ min ∆k , (5)
∥Hk ∥
For some c ∈ (0, 1].

4.1.1 The Cauchy point


The sufficient reduction can be quantified in terms of the Cauchy point defined
by:
∆k
pu = −τk ∇f (xk )
∥∇f (xk )∥
where:

1 ∇f (xk )T Hk ∇f (xk ) ≤ 0
τk =   3

min 1, ∆k ∇f∥∇f (xk )∥
(xk )T Hk ∇f (xk )
otherwise
Actually, it can be shown that the Cauchy point satisfies (5) with c = 1
2
(you can
check the proof of this in [5]). We conclude from this property that if:

mk (0) − mk (pk ) ≥ c2 [mk (0) − mk (pu )]


then pk satisfies (5) with c1 = c22 .
From that we can say that the reduction achieved by our approximate solution pk
is sufficient if it is at least some fixed fraction c2 of the reduction achieved by the
Cauchy point.

4.2 Convergence to stationary points


In this part we are going to consider the convergence of the general algorithm
and then when the subproblem is solved by the trust region Newton method.

4.2.1 General case


To ensure that the algorithm 1 converges to stationary point we are going to
consider some conditions that must be satisfied in order to achieve that.
We define the level set S by:

S = {x | f (x) ≤ f (x0 )}
and an open neighborhood of this set by:

S(r) = {x | ∥x − y∥ < r for some y ∈ S}

10
Where r is a positive constant.

Theorem 2

Algorithm 1 converge to stationary points, which means:

lim inf ∥∇f (xk )∥ = 0


k→∞

If these conditions hold:

1. ∥Hk ∥ ≤ β for some constant β

2. f is bounded below on the level set S and Lipschtz continuously differen-


tiable in the neighborhood S(r).

3. pk satisfies (5).

4. ∥pk ∥ ≤ γ∆k for some constant γ ≥ 1

I refer to the proof of this theorem in [5] p80-82.

4.2.2 Algorithm based on Newton’s method


When we use the Newton’s method to solve the subproblem, there are some
additional conditions that must be satisfied in order to achieve convergence to
critical points.
The following theorem describe how this could be achieved:

Theorem 3

Suppose that the assumptions of theorem 2 are satisfied and in addition:

1. f is twice continuously differentiable in the level set S (i.e. f ∈ C 2 (S).)

2. Hk = ∇2 f (xk )

Then:

lim ∥∇f (xk )∥ = 0


k→∞

I omit the proof, which can be found in Moré and Sorensen [4] section 4.

11
5 Numerical example
In this section, we’re going to apply the Trust region method and the dogleg
strategy to find the minimum of the Rosenbrock function defined by:

f (x, y) = (a − x)2 + b(y − x2 )2


where a and b are constants.
For a = 1 and b = 100 we have :

f (x, y) = (1 − x)2 + 100(y − x2 )2

then:

∇f (x, y) = (−2(1 − x) − 400x(y − x2 ), 200(y − x2 ))T


and:

2 − 400(y − 3x2 ) −400x


" #
∇ f (x, y) =
2
−400x 200
By implementing the algorithm for x0 = (2, 2) we find the optimal solution:
x∗ = (1, 1) with f ∗ = 0. In finance, the Rosenbrock function could be used
in portfolio optimization, where the goal is to find the allocation of assets that
minimizes the risk while maximizing the return on investment. The Rosenbrock
function can represent the objective function that captures the trade-off between
risk and return, and optimization techniques can be employed to find the optimal
portfolio allocation.

6 Conclusions
The main purpose of this work has been to discuss the main trust region
algorithm used to solve unconstrained minimization problems, with an aim towards
understanding the best way to implement it in order to achieve the convergence
towards the global solution. We introduced theoretical aspects of the method, gived
the necessary and sufficient conditions in order to achieve global convergence, and
finally implemented the algorithm on the Rosenbrock function.

12
References
[1] Richard H Byrd, Robert B Schnabel, and Gerald A Shultz. A trust region
algorithm for nonlinearly constrained optimization. SIAM Journal on Numerical
Analysis, 24(5):1152–1170, 1987.

[2] Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. Trust region
methods. SIAM, 2000.

[3] Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000.

[4] Jorge J Moré and Danny C Sorensen. Computing a trust region step. SIAM
Journal on scientific and statistical computing, 4(3):553–572, 1983.

[5] Jorge Nocedal and Stephen J Wright. Numerical optimization. Springer, 1999.

[6] Mostafa Rezapour. Trust-Region Methods for Unconstrained Optimization


Problems. Washington State University, 2020.

13

You might also like