0% found this document useful (0 votes)
13 views7 pages

Exercise 09

The document contains 6 tasks related to optimization methods: 1) Applying gradient descent, Newton's method, and Broyden's method to minimize a function starting from a point (2,2). 2) Calculating the first two terms of a zero-convergent sequence using Aitken's acceleration. 3) Analyzing Newton's method on the function f(x,y)=x^4+y^4 and its convergence properties. 4) Performing two steps of Broyden's method on the same function starting from (1,1). 5) Studying implementation variants of Newton's and Broyden's methods. 6) Using Aitken's acceleration to

Uploaded by

raafet slimen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

Exercise 09

The document contains 6 tasks related to optimization methods: 1) Applying gradient descent, Newton's method, and Broyden's method to minimize a function starting from a point (2,2). 2) Calculating the first two terms of a zero-convergent sequence using Aitken's acceleration. 3) Analyzing Newton's method on the function f(x,y)=x^4+y^4 and its convergence properties. 4) Performing two steps of Broyden's method on the same function starting from (1,1). 5) Studying implementation variants of Newton's and Broyden's methods. 6) Using Aitken's acceleration to

Uploaded by

raafet slimen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Optimization, Autumn 2023

Prof. Dr. Lin Himmelmann

Exercise 9
Task 1
R R
Consider the function f : 2 → , defined by f (x, y) = (x2 − 2xy + x)2 . We would like
to determine a point at which the function f takes on its minimum value. Starting from x0 =
(x0 , y0 ) = (2, 2), calculate the next iteration point x1 = (x1 , y1 ) according to each of the
following methods:

a) Gradient method with successive halving of the step size;

b) Gradient method with successive halving and parabola fitting;

c) Newton’s method;

d) Broyden’s method. Compute also the second iteration point x2 = (x2 , y2 ).

Task 2
Calculate the first two terms we get by applying Aitken’s acceleratioen method to the zero con-
vergent sequence starting with 100, 10, 2, 21 , . . .

Task 3
The function f : R2 → R defined by
f (x, y) = x4 + y 4

clearly attains its minimum at (0, 0). In this task we investigate the behaviour of Newton’s me-
thod when determining this minimum, starting from an arbitrary point (x0 , y0 ) 6= (0, 0).

a) Determine the first iteration point (x1 , y1 ) as a function of the starting point (x0 , y0 ).

b) Give a general formula for the n-th iteration point (xn , yn ).

c) What is the convergence speed in this example? Are you surprised?

d) Apply Aitken’s acceleration method to the sequence found in b).(Use the starting point
(x0 , y0 ) = (1, 1) for simplicity. What do you find? Can you explain your result?

1
Optimization, Autumn 2023
Prof. Dr. Lin Himmelmann

Task 4
For f : R2 → R defined by
f (x, y) = x4 + y 4
as in Task 3, perform two steps of Broyden’s method starting from (x0 , y0 ) = (1, 1). Determine
the matrix (A1 )−1 used as an approximation for the inverse of the Hessian matrix in the second
iteration step, and compare it to the exact inverse of the Hessian matrix that would be applicable.

Task 5∗
Study the implementation variants of Newton’s and Broyden’s method available from the ho-
mepage (exact derivatives and approximate derivatives), and try them out on the Bazarra-Shetty
function and other examples.

Task 6∗
Study the implementation of Aitken’s acceleration method available from the homepage, and
use it to improve the convergence of the programs from Task 5 and from Task 4 of Exercise 8.

2
Optimization, Autumn 2023
Prof. Dr. Lin Himmelmann

Solutions to Exercise 9
Solution to Task 1
As a preliminary, calculate the partial derivatives of the function f :

f (x, y) = (x2 − 2xy + x)2


fx (x, y) = 2(x2 − 2xy + x)(2x − 2y + 1)
fy (x, y) = 2(x2 − 2xy + x)(−2x)
fxx (x, y) = 2(2x − 2y + 1)(2x − 2y + 1) + 2(x2 − 2xy + x)2
fxy (x, y) = −12x2 − 8x + 16xy
fyx (x, y) = −12x2 − 8x + 16xy
fyy (x, y) = 8x2

Thus the gradient at the point (x, y) is


   2 
fx (x, y) 2(x − 2xy + x)(2x − 2y + 1)
∇f (x, y) = = ,
fy (x, y) −4x(x2 − 2xy + x)

and the Hessian matrix at the point (x, y) is

2(2x − 2y + 1)2 + 4(x2 − 2xy + x) −12x2 − 8x + 16xy


   
fxx (x, y) fxy (x, y)
Hf (x, y) = = .
fyx (x, y) fyy (x, y) −12x2 − 8x + 16xy 8x2

a) At the point (x0 , y0 ) = (2, 2), we have

f (2, 2) = 4,  
−4
∇f (2, 2) = .
16

While doing the successive halving, we encounter the following values:


       
x1 x0 2 −4
β = − β · ∇f (x0 , y0 ) = −β f (x1 , y1 )
y 1
    y 0     2 16
x1 2 −4 6
1 = −1 = 44100
 y1  2  16  −14
1 x1 2 −4 4
2 = − 21 = 4624
 y1  2  16  −6
1 x1 2 −4 3
4 = − 41 = 576
y
 1   2 16
    −2
1 x1 2 −4 2.5
8 = − 81 = 76.5625
y
 1   2   0 
16
1 x1 2 1 −4 2.25
16 = − 16 = 7.910156
y 1
    2    1 
16
1 x1 2 1 −4 2.125
32 = − 32 = 0.07056 <4 done!
y1 2 16 1.5

3
Optimization, Autumn 2023
Prof. Dr. Lin Himmelmann

The next iteration point therefore is x1 = (x1 , y1 ) = (2.125, 1.5) (when only using suc-
cessive halving).
b) Using the results from a), we first fit a parabola P (t) = at2 + bt + c through the following
three sample points. (Recall from a) that after the successive halving phase we have β =
1/32.)
   
! 2 −4
t P (t) = f −t
2
  16   
! 2 −4 2
0 P (0) = f −0 =f =4
2  16
  2 
1 1 ! 2 1 −4 2.125
32 P ( 32 ) = f − 32 =f = 0.07056
2  16   1.5
1 1 ! 2 1 −4 2.25
16 P ( 16 )=f − 16 =f = 7.910156
2 16 1

The vertex of the parabola is therefore at β ∗ = − 2a


b
= 0.02602457.
Therefore, we consider
       
x1 2 −4 2.104098
= − 0.02602457 =
y1 2 16 1.583607
as our next iteration point. But before actually choosing it, we need to check the resulting
function value:
For β = 1/32 = 0.03125 we got the function value f (2.125, 1.5) = 0.07056 (see a)),
for β ∗ = 0.02602457 we get the function value f (2.104098, 1.583607) = 0.017636,
which is better. The next iteration point therefore is x1 = (x1 , y1 ) = (2.104098, 1.583607).

c) We calculate the gradient and the Hessian matrix at the iteration point (x0 , y0 ) = (2, 2):
 
−4
∇f (x0 , y0 ) =
 16
2(4 − 4 + 1)2 + 4(4 − 8 + 2) −48 − 16 + 64
  
−6 0
Hf (x0 , y0 ) = =
−48 − 16 + 64 32 0 32
We obtain      −1
x1 x0
= − Hf (x0 , y0 ) ∇f (x0 , y0 )
y1 y0
   −1  
2 −6 0 −4
= −
2
   1 0 32   16 
2 −6 0 −4
= − 1
2  40  32 16
2 6
= − 16
2  32
4
= 3 .
3
2

4
Optimization, Autumn 2023
Prof. Dr. Lin Himmelmann

Therefore the iteration point x1 in Newton’s method is given by x1 = (x1 , y1 ) = (1.3̄, 1.5).

d) The first iteration step (and hence also the iteration point x1 ) in Broyden’s method is the
same as in Newton’s method. We therefore use the results from c):
   
x0 2
=
y0 2  
−4
∇f (x0 , y0 ) =
16
 −1  −1 − 1 0 
A 0 = Hf (x0 , y0 ) = 6
1
    0 32
x1 1.3̄
=
y1 1.5

 −1
We now calculate the gradient and an approximation A1 for the inverse of the Hes-
sian Matrix at the iteration point x1 = (x1 , y1 ) = (1.3̄, 1.5):
 
−1.185185
∇f (x1 , y1 ) =
4.740741
 2
−3
d1 = x1 − x0 =
− 12  
1 2.814815
g = ∇f (x1 , y1 ) − ∇f (x0 , y0 ) =
 −11.25926
0 )−1 g 1 − d1 (d1 )T (A0 )−1  
 −1  −1
1 0
(A −0.211579 0.006316
A = A − =
(d1 )T (A0 )−1 g 1 −0.033684 0.035987

We obtain
     
x2 x1 −1
= − A1 ∇f (x1 , y1 )
y2  y4 1   
3 −0.211579 0.006316 −1.185185
= 3 −
2  −0.033684 0.035987 4.740741
1.053
= ,
1.289

i.e. the next iteration point in Broyden’s method is x2 = (x2 , y2 ) = (1.053, 1.289).

−1
For  inverse Hessian at (x1 , y1 ) = (1.3̄, 1.5) is (Hf (x1 , y1 )) =
 comparison: The exact
−0.375 0
, so the approximation by (A1 )−1 is not very good here.
0 0.0703125

Solution to Task 2
a2 = 1.2195122 and a3 = 0.1538462.

5
Optimization, Autumn 2023
Prof. Dr. Lin Himmelmann

Solution to Task 3
 3
4x
a) ∇f (x, y) =
4y 3
12x2
 
0
Hf (x, y) =
0 12y 2
−1 
12(x0 )2 4(x0 )3
           
x1 x0 0 x0 x0 /3 2 x0
= − · = − =
y1 y0 0 12(y0 )2 4(y0 )3 y0 y0 /3 3 y0
   
xn 2 n
 x0
b) = 3
yn y0
c) The convergence speed is only linear, despite the fact that Newton’s method usually“

converges quadratically. The general statements about convergence speeds discussed in
the lecture do not apply in this example because at (0, 0) not only the gradient (=first deri-
vative), but also the Hessian matrix (=second derivative) vanishes. Note that this happens
in particular when trying to find a minimum that is a multiple root of a (univariate) poly-
nomial with Newton’s method, like e.g. the unique minimum at x = 2 in f1 (x) = (x−2)2
or in f2 (x) = (x − 2)5 (x2 + x + 1).

d) We only consider the first component xi , the second component yi behaves identically.
We have  i  i−1
1 2 i
 
2 2
∆xi = − =−
3 3 2 3
and therefore  i  i−1 !
1 2 i
 
1 2 1 2
∆2 xi = − − − = .
2 3 2 3 4 3

The terms of the Aitken sequence therefore evaluates to


  2
1 2 i
 i
2 − 2 3
− = 0,
3 1 2 i

4 3

for any i. As the sequence xi converges exactly linear“ towards 0, the Aitken conver-

gence improvement works perfectly here and correctly predicts“ 0 as the goal of the

convergence already after the first step.

Solution to Task 4
According to Task 9.3we have (x1 
, y1 ) = (2/3, 2/3), and the Hessian matrix at this point is
12 · ( 23 )2 16

0 0
= 3 16 .The matrix (A1 )−1 we need to compute should therefore
0 12 · ( 23 )2 0 3
3 
16 0
be understood as an approximation for 3 .
0 16

6
Optimization, Autumn 2023
Prof. Dr. Lin Himmelmann

1
 
0
Further we have (A0 )−1= (Hf (x0 , y0 = ))−1 1
12
0 12
         
x1 x0 2/3 1 −1/3
d1 = − = − = ,
y1 y0 2/3 1 −1/3
4 · ( 32 )3
   32 
and with ∇f (x1 , y1 ) = ∇f (2/3, 2/3) = = 2732 we get
4 · ( 32 )3
 32     7627 
1 27 4 − 27
g = ∇f (x1 , y1 ) − ∇f (x0 , y0 ) = 32 − = .
27 4 − 76
27
Our approximation for the inverted Hessian matrix therefore is

((A0 )−1 g1 −d1 )(d1 )T (A0 )−1


(A1 )−1 = (A0 )−1 − −1 g 1
(d1 )T (A0 )
      
1 1
 12
0 −76/27 −1/3 0
1 − (−1/3,−1/3) 12 1
1 0 12 −76/27 −1/3
 
12 0 0 12
= 1 − 
1
 
0 12 0 −76/27
(−1/3,−1/3) 12 1
 
0 12 
−76/27 
1
−76/(27 · 12) + 1/3 0
 (−1/3,−1/3) 12 1
1 −76/(27 · 12) + 1/3
 
12 0 0 12
= 1 − 2· 13 · 12
1 76
· 27
0 12    
1
8/81 (−1/3,−1/3) 12 0
1
 
1
 
12 0 8/81 0 12
= 1 − 2· 13 · 12
1 76
· 27
 01 12   
12 0 1
8 1 1
· · 1
= 1 + 81 3 12
2· 13 · 12
1 76
· 27
 01 12   1 1
12 0 1 1 1
= 1 + 57
 023 12 
1
1 1 
228 57 0.1009 0.0175
= 1 23 ≈
57 228 0.0175 0.1009

Solution to Task 5 and Task 6


See R and/or Matlab files on Moodle.

You might also like