0% found this document useful (0 votes)
73 views29 pages

BSC Part 3

The document describes the steepest descent method for solving unconstrained optimization problems. It begins with an introduction to optimization problems and numerical techniques. It then outlines the general idea of the steepest descent method, which takes iterative steps in the direction of the negative gradient to minimize an objective function. The document provides examples of applying the steepest descent method to minimize basic quadratic functions. It also discusses convergence theory and extensions of the method.

Uploaded by

Rishabh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views29 pages

BSC Part 3

The document describes the steepest descent method for solving unconstrained optimization problems. It begins with an introduction to optimization problems and numerical techniques. It then outlines the general idea of the steepest descent method, which takes iterative steps in the direction of the negative gradient to minimize an objective function. The document provides examples of applying the steepest descent method to minimize basic quadratic functions. It also discusses convergence theory and extensions of the method.

Uploaded by

Rishabh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Steepest Descent Method for

Unconstrained Optimization Problems

A project report
submitted by

Akshay Kumar Ranwa


(IVR No: 201700027139)

under the supervision of


Dr.
Contents

1 Introduction 2

2 General Idea 4
2.1 Method of Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Convergence Theory 11
3.1 Quadratic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Convergence rate for Yj . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Relative decrease in F . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 Kantorovich inequality . . . . . . . . . . . . . . . . . . . . . . 14

4 Scaling 16

5 Extensions 21

6 Applications 23
6.1 Application 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 Application 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1
Chapter 1

Introduction

Always analytic methods may not work because of the complexity of the problem and
because of problems are not convex, we have numerical techniques to solve such types
of problems. The optimization problem is the task of selecting the best answer from
a set of alternatives, optimization problems basically divides in two type of problems
one is constrained optimization problems and another one is unconstrained optimiza-
tion problems. The only difference between them is that we have to minimizing or
maximizing the function in both type of problems but in constrained optimization we
have to maximizing or minimizing the function subject to some constrained or we can
say restrictions are there.
There are various unconstrained methods in order to solve the minimization or max-
imization problems, basically it is divide in two parts one is direct search methods
and second is descent methods(Gradient method). There is no derivative approach in
direct search methods so it is also called as zeroth order methods.
For nonlinear optimization there is lot of gradient methods but the steepest descent
method is the simplest one and it is a newton type method. The steepest descent
method is not as old as newtons method. Cauchy (1789 − 1857) developed it in the
nineteenth century (1847) about two centuries later than newton’s method. It is a lot
less complicated than newton’s method, in steepest descent algorithm we use only first
derivative of function it does not necessitate the computation of second derivative, for
finding the search direction there is no need to solve system of linear equations, and
there is no need for matrix storage. As a result, it lowers costs of newton’s method
in every way in terms of iteration costs.

2
Steepest descent method has a slower rate of convergence than newton’s methods
that is a negative side of this method. The steepest descent method converges linearly
to its minimum. Steepest descent method is used in solving a system of nonlinear
equations of the form
g(y1 , y2 , .......yn ) = 0

, where g is a real valued differentiable function and g have continuous first partial
derivatives and method accept negative of gradient.
Descent property : For a function g, g(xk+1 ) < g(xk ) for all k i.e. as we proceed,
the value of objective function should decrease.

3
Chapter 2

General Idea

2.1 Method of Steepest Descent


Suppose that we want to determine the minimum of a function g(y), y∈Rn choose an
initial point Y,

2.1.1 Step 1

Calculate the best search direction sj .


For steepest descent search direction is given by

sj = −∇gj

The function value decreases at the quickest rate if we walk along a negative gradient
path from any point to n-dimensional space.

2.1.2 Step 2

Select a step length in the search direction to reduce g(y). In order to select the ideal
step length λj in the direction of sj and set

yj+1 = yj + λj sj

4
Begin from point Y and work your way down the sharpest downhill directions until
you reach the optimization point.

How to pick λ- Task of λ is to minimize a function in a search direction. It is


chosen either analytically or numerically.
Analytic Method: In this method s is fixed and we have to pick ‘λ’

δyj = λj sj

g(yj +1 ) = g(yj + λj sj )

Taylor’s expansion

1
= g(yj ) + ∇T g(yj )(δyj ) + (δyj )T H(yj )(δyj )
2

Thus,
d(yj + λj sj )
=⇒ =0
dλj
∇gj sj
λj = −
sTj Hj sj

sTj sj
λj =
sTj Hj sj

2.2 Examples
1. Minimum of
g(y) = (y1 − 7)2 + (y2 − 2)2

Choose initial point Y1 = (5.5, 3)T


Iteration 1:
Step 1: Search for best direction sj for steepest descent method Sj = −∇gj
 
δg
sj = −  δy1 
δg
δy2

5
 
2(y1 − 7)
sj = −  
2(y2 − 2)
 
3
s1 = −  
−2

Step 2: Step length in direction of search direction

sTj sj
λj =
sTj HJ sj

here H is hessian matrix ( positive definite matrix).


   
(δ)2 g (δ)2 g
2 0
H=  (δy1 )2 (δy1 )(δy2 ) 
= 
(δ)2 g (δ)2 g
(δy2 )(δy1 ) (δy2 )2
0 2

 
  3
3 −2  
−2
λ1 =   
  2 0 3
3 −2   
0 2 −2

λ1 = 1/2

Now,
Yj+1 = Yj + λsj

Y2 = Y1 + λs1
 
7
Y2 =  
2

Check point Y 2 is optional or not,

(∇g)Y2 = 0

So optimal point is Y 2 .

6
Example 2:
Minimize
g(y1 , y2 ) = y1 − y2 + 2(y1 )2 + 2y1 y2 + (y2 )2

Choose initial point Y1 = (0, 0)T .


Iteration 1: At Y1 .
Step 1: search for best direction sj
for steepest descent method
sj = −∇gj
 
δg
sj = −  δy1 
δg
δy2
 
1 + 4y1 + 2y2
sj = −  
−1 + 2y1 + 2y2

 
−1
s1 =  
1

Step 2: Step length in direction of s1

sTj sj
λj =
sTj Hsj

here H is hessian matrix ( positive definite matrix).


   
(δ)2 g (δ)2 g
4 2
H=  (δy1 )2 (δy1 )(δy2 ) 
= 
(δ)2 g (δ)2 g
(δy2 (δy1 )2 (δy2 )2
2 2

λ1 = 1

Now
Yj+1 = Yj + λsj

Y2 = Y1 + λs1

7
 
−1
Y2 =  
1

Check point Y2 is optimal or not


   
−1 0
(∇g)Y 2 =   ̸=  
1 0

Y2 is not optimal point so move on next iteration.

Iteration 2: At Y2 .
Step-1: Search for best direction s2

s2 = −∇g2
 
1
s2 =  
1

Step-2: Step length in direction of s2

sT2 s2
λ2 =
sT2 Hs2

Where  
4 2
H= 
2 2

1
λ2 =
5

Now
Y3 = Y2 + λ2 s2

     
−1 1 1 −0.08
Y3 =   +   =  
1 5 1 1.2

8
Check point Y3 is optimal or not
  

0.2 0
(∇g)Y 3 =  ̸=  
−0.2 0

Y3 is not an optimal point so move on next iteration.

Iteration 3: At Y3
Step-1: Search for best direction s3 .
For steepest descent method
s3 = −∇g3
 
−0.2
s3 =  
0.2

Step-2: step length in direction of s3

sT3 s3
λ3 =
sT3 Hs3

Here  
4 2
H= 
2 2

λ3 = 1

Now
Y4 = Y3 + λs3

   
−0.8 −0.2
Y4 =  + 
1.2 0.2
 
−1.0
Y4 =  
1.4

9
Check point Y4 is optimal or not
   
−0.2 0
(∇g)Y 4 =  ̸=  
−0.2 0

Y4 is not an optimal point so move on next iteration.

Iteration 4: At Y4
Step-1: Best direction s4
s4 = −∇g4
 
0.2
s4 =  
0.2

Step-2:  
4 2
H= 
2 2

sT4 s4
λ4 =
sT4 Hs4
1
λ4 =
5
Now,
Y5 = Y4 + λs4
   
−1.0 0.2
Y5 =  + 1 
1.4 5 0.2
 
0.96
Y5 =  
1.44

Check point Y5 is optimal or not


   
−0.04 0
(∇g)Y 5 =  ∼
= 
−0.04 0

Y5 is optimum.

10
Chapter 3

Convergence Theory

The SDM has a nice convergence theory, which is one of its key advantages. It’s not
difficult to demonstrate that rate of convergence of SDM is linear, which is unsur-
prising because of the method’s simplicity regrettably, even for modestly nonlinear
systems, problems. As a result, convergence will be too slow for any practical use
application.
The importance of theory of SD approach’s convergence in comprehending the be-
haviour of convergence.
Let’s see how the SDM converges to its minimum in the quadratic situation. This
particular situation is critical because even if a function is not quadratic, it will behave
quadratic around the optimal point, hence it is critical to investigate the behaviour
of quadratic functions.

3.1 Quadratic Case


To start, consider the following quadratic function with minimizing problem

1
g(Y ) = Y T HY − dT Y
2

where d∈Rn , and H is an n × n symmetric positive definite matrix. All of the eigen-
values of H are real and positive because H is symmetric and positive definite matrix.
Let’s say the eigenvalues of the matrix H are e1 , e2 , e3 ,..., en . Where e1 is the smallest
eigenvalue and en is the largest eigenvalue of H. We know that the gradient of the

11
given quadratic function g is
s(y) = HY − d

and if we set the gradient to the zero then it gives the optimal point Y*.

Y ∗ = H −1 d

Since all the eigenvalues of H are positive and real, determinant of H is non zero so
H −1 exist. Thus the method of steepest descent represented as

Yj+1 = Yj − λj sj

where s(j) = HYj − d and λj is step length in direction of sj such that λj minimizes
g(Yj − λj sj ). We can determine the value of λj

1
g(Yj − λsj ) = (Yj − λsj )T H(Yj − λsj ) − (Yj − λsj )T d
2
which is minimized at λj , it can be determine by differentiating with respect to λ,

sTj sj
λj =
sTj Hsj

Hence the method of steepest descent reduces in the form(explicit form)

sTj sj
yj+1 = yj − ( )sj
sTj Hsj

where s(j) = HYj − d

3.1.1 Convergence rate for Yj

It is easy to examine convergence by considering the quantity g(Yj ) − g(Y ∗ ), where


Y* is the point where function takes it’s global minimum value. We started with the
introducing the new function

1
FY = (Y − Y ∗ )T H(Y − Y ∗ )
2

12
The only difference between FY and gY is of a constant term 21 (Y ∗ )T H(Y ∗ )

1
FY = gY − (Y ∗ )T H(Y ∗ )
2

Further we work on FY because minimization of gY is same as minimization of FY .


The unique minimizer point of the equation is given by the linear system

HY ∗ = d

3.1.2 Relative decrease in F

Define Xj = Yj − Y ∗ and using HYj = sj we have relation,

sTj sj
Yj+1 = Yj − ( )sj
sTj Hsj

Now,

F (Yj ) − F (Yj+1 ) (Yj − Y ∗ )T H(Yj − Y ∗ ) − (Yj+1 − Y ∗ )T H(Yj+1 − Y ∗ )


=
F (Yj ) XjT HXJ

2λj sTj sj − λ2j sTj Hsj


=
XjT HXJ
sT
j sj
Substituting λj = sT
, We get
j Hsj

F (Yj ) − F (Yj+1 ) (sTj sj )2


= T
F (Yj ) (sj Hsj )(sTj H −1 sj )

We require a bound on the right hand side of the equation to get a bound on rate of
convergence. Kantorovich and his lemma gives the best bound which is as described
below, is a valuable generic tool in convergence analysis.

13
3.1.3 Kantorovich inequality

Let H is an n × n symmetric positive definite matrix and let e1 and en be the smallest
and largest eigenvalues of H respectively then for any X ̸= 0

(X T X)2 e1 en
T T −1
≥4
(X HX)(X H X) (e1 + en )2

Using this inequality,


F (Yj ) − F (Yj+1 ) e1 en
≥4
F (Yj ) (e1 + en )2
Therefore,
en − e1 2
F (Y j + 1) ≤ [ ] F (Y j )
en + e1

• SDM converges linearly and the maximum convergence rate is given by

en − e1 2
[ ]
en + e1

• Rate of convergence of SDM also depends on y ∗

• Condition number of H is given by formula

en
r=
e1

• Clearly we can see the rate of convergence of SDM depends on condition number
of hessian.

• If we calculate r=1 then the contours are circular and in one iteration we obtain
optimum or convergence.

• If we get r ≥ 1 then contours are elliptical and we obtain slow convergence.

Example: As we see in example 1 SDM applied to g(y) with exact line search and we
obtained optimul or fast convergence in single iteration from any initial point.

14
And in example 2 On g(y) we applied SDM with line search and it takes many
iterations for convergence.

15
Chapter 4

Scaling

Even for a quadratic function, the SD approach’s rate of convergence is at best lin-
ear. The SD method’s rate of convergence can be enhanced By scaling the design
variables. The condition number of the Hessian of the function may be reduced by
scaling. It is possible to scale the design variables for a quadratic function so that the
Hessian matrix’s condition number is unity with respect to the new design variables.
A matrix’s condition number is determined as the ratio of the matrix’s biggest to
lowest eigenvalues.
Example will be used to highlight the benefits of scaling design variables. If g =
1 T
2
Y [B]Y denotes a quadratic case, lets see a transformation of the form,
    
 y  s s  z 
1 11 12 1
Y = [S]Z or = 
 y  s21 s22  z2 
2

It’s possible to get a new quadratic term using this method as

1 T 1
Z [B̃]Z = ZT [S]T [B][S]Z
2 2

The matrix [S] can be selected to make [B̃] = [S]T [B][S] diagonal (i.e., mixed
quadratic terms will be eleminated). For this, the eigenvectors of the matrix [B]
will be columns of the matrix [S]. After that the diagonal elements of the matrix [B̃]
can be reduced to one (we have to fixed the condition number of the resulting matrix

16
such that it will be 1 ) here we use the transformation,
    
 z  c11 0  d1 
1
Z = [C]D or =  
 z  0 c22  d2 
2

Matrix C is  
c11 = √1 0
b11
[C] =  
0 c22 = √1
b̄22

As a consequence, the complete transformation of g’s Hessian matrix into an identity


matrix is
X = [S][C]D ≡ [T ]D

so that the quadratic term 12 YT [B]Y reduces to 12 DT [I]D.


If
1
g(Y) = c + AT Y + YT [B]Y
2
where c = f (Yi )
 
∂2f ∂2f
∂x21 x
··· ∂x1 ∂xn
 i xi 
 .. .. 
[A] = 
 . . 


∂2f ∂2f
 
∂xn ∂x1
··· ∂x2n
xi xi

EXAMPLE: 4.1

g (y1 , y2 ) = 6y12 − 6y1 y2 + 2y22 − y1 − 2y2

SOLUTION : The quadratic function is written as

1
g(Y) = AT Y + YT [B]Y
2

where      
 y   −1  12 −6
1
Y= ,A = , and [B] =  
 y   −2  −6 4
2

As previously stated, the necessary variable scaling may be performed in two phases.

17
Stage 1: The eigenvectors of the matrix [B] are calculated by solving the eigenvalue
problem. Reduce [B] to a Diagonal Form

[[B] − βi [I]] wi = 0

where βi is the i th eigenvalue and wi is the corresponding eigenvector. In the present


case, the eigenvalues, βi , are given by

12 − βi −6


= βi2 − 16βi + 12 = 0
−6 4 − βi

√ √
which yield β1 = 8 + 52 = 15.2111 and β2 = 8 − 52 = 0.7889. The eigenvector ui
corresponding to βi can be found by solving Eq.
    
12 − β1 −6  w11   0 
  = or (12 − β1 ) w11 − 6w21 = 0
−6 4 − β1  w21   0 

or w21 = −0.5332w11 that is,


   
 w   1.0 
11
w1 = =
 w   −0.5332 
21

and
    
12 − β2 −6  w   0 
12
  = or (12 − β2 ) w12 − 6w22 = 0
−6 4 − β2  w22   0 

or w22 = 1.8685w12 that is,


   
 w   1.0 
12
w2 = =
 w   1.8685 
22

As a result, the transformation that converts [B] to a diagonal form is


  
h i 1 1  z1 
Y = [S]Z = w1 w2 Z= 
−0.5352 1.8685  z2 

18
that is,
y1 = z1 + z2
y2 = −0.5352z1 + 1.8685z2

As a result, the new quadratic term is as follows: 12 ZT [B̃]Z, where


 
19.5682 0.0
[B̃] = [S]T [B][S] =  
0.0 3.5432

As a result, the quadratic function is

1
f (z1 , z2 ) = AT [S]Z + ZT [B̄]Z
2
1 1
= 0.0704z1 − 4.7370z2 + (19.8682)z12 + (3.5432)z22
2 2

Stage 2: Reducing [B] to a Unit Matrix The transformation is Z = [C]D, where


   
1

19.5682
0 0.2262 0.0
[C] =  = 
0 √ 1 0.0 0.5313
3.5432

Stage 3: A total transformation


The total transformation is given by

Y = [S]Z = [S][C]D = [T ]D

where   
1 1 0.2262 0
[T ] = [S][C] =   
−0.5352 1.8685 0 0.5313
 
0.2262 0.5313
= 
−0.1211 0.9927
or
y1 = 0.2262d1 + 0.5313d2
y2 = −0.1211d1 + 0.9927d2

19
Figure 4.1: Contours of the original function.

Figure 4.2: Contours of the transformed function.

The quadratic function becomes after this change,

1
f (d1 , d2 ) = AT [T ]D + DT [T ]T [B][T ]D
2
1 1
= 0.0160d1 − 2.5167d2 + d21 + d22
2 2

20
Chapter 5

Extensions

The SDM has been modified in a number of ways. Barzilai and Borwein presented
two new step sizes for the negative gradient direction in 1988. Despite the fact that
their method did not ensure descent in the objective function values, the numerical
results showed that it was a significant improvement over the traditional SDM. The
goal of their strategy was to hasten the convergence of the SDM. The Barzilai-Borwein
approach necessitates a small number of storage places and low-cost computations.
Although the Newton method and quasi-Newton methods are useful for addressing
unconstrained minimization problems, they cannot be used to solve large-scale un-
constrained minimization problems directly. As a result, numerical approaches based
on the SD direction are favoured since they do not require the storing of matrices.
For unconstrained minimization problems, SDM is the simplest gradient method.
The following precise step size

Yj+1 = Yj + λj sj

where
sTj sj
λj =
sTj H i sj

is used in the SDM, which can be traced back to Cauchy (1847).


Unfortunately, it is well known that it converges slowly in the majority of circum-
stances. This low performance is due to the best choice of step size rather than the
steepest downhill direction.

21
As a result, multiple authors experimented with different step sizes in order to address
this flaw. The new iterate as:

1
yk+1 = yk − gk
βk

Instead of performing a line search or employing the quadratic case approach, the
step length βK is calculated using the following formula:

sTk−1 xk−1
βk = ,
sTk−1 sk−1
where sk−1 = xk − xk−1 and xk−1 = gk − gk−1 . The Barzilai and Borwein approach
requires just O(n) floating point operations and a gradient evaluation for each itera-
tion.
During the process, there are no matrices to computation or line searches to do. In
the two-dimensional quadratic example, Barzilai and Borwein presented a convergence
analysis.They established R-superlinear convergence for that particular scenario.

22
Chapter 6

Applications

The basic convergence theory, as expressed by the rate of convergence formula has
been devised and proved to truly define SDM behaviour, it is necessary to demonstrate
how the theory may be used. We don’t recommend computing the numerical value
of the formula since it includes eigenvalues Or eigenvalue ratios that are difficult to
calculate. Nevertheless, the formula itself is quite useful in practise since it allows one
to compare various circumstances hypothetically, without a hypothesis like this. The
only option would be to depend only on experimental comparisons.

6.1 Application 1
Penalty Methods: Penalty techniques are methods for approximating restricted op-
timization problems using unconstrained problems. In the case of penalty approaches,
the approximation is achieved by adding a term to the objective function that spec-
ifies a high cost for violating the restrictions. There are two major flaws with the
method. The first is how closely the unconstrained problem resembles the restricted
one. The second difficulty, which is most relevant from a practical standpoint, is how
to solve an unconstrained problem with a penalty term in its objective function.
Look at the problem,

minimize g(y)

subject to Y ∈ T

23
where T is a constraint set in E n and g is a continuous function on E n . In most
instances, T is implicitly defined by a set of functional constraints, although the more
general description can be addressed in this section. A penalty function technique
replaces a restricted problem with an unconstrained problem of the type,

minimize g(y) + cR(y).

R is a function on E n and c is a positive constant that fulfils the following conditions:


(i) R is continuous,
(ii) R(y) ⩾ 0 for all y ∈ E n , and
(iii) R(y) = 0 if and only if y ∈ T .

Example 1: Exterior penalty approach:- Start outside the feasible region and
slowly converge on a minimum from the outside.
Equality constraints
minimum g(y)
s.t. ki (y) = 0, i ∈ E

It is possible to approximate by the unconstrained problem

X
minimum F (y, c) = g(y) + σ h2i (y)
i∈E

Penalizes both positive and negative values of hi (y).

Example 2: Assume that T is defined by a set of inequalities: Penalize positive


values of gi , no effect if negative values.

T = {y : hj (y) ⩽ 0, j = 1, 2, . . . , r}

In this scenario, a highly helpful penalty function is

r
1X
R(y) = (max [0, hj (y)])2
2 j=1

24
Figure 6.1: plot of cR

In the one-dimensional case, the function cR(y) is shown in Figure 6.1 with

h1 (y) = y − b, h2 (y) = a − y.

It is obvious that for high c, the smallest point of problem will be in an area where R
is small. When a result, it is predicted that as c rises, the appropriate solution points
will approach the feasible area T and if near enough, will minimize g. The penalised
problem’s solution point should ideally converge to the constrained problem’s solution
point as c → ∞.
The Method: The procedure for solving problem by the penalty function method
is this: Let {cn } , n = 1, 2, . . . be a sequence tending to infinity such that for each
n, cn ⩾ 0, cn+1 > cn . Define the function

q(c, y) = g(y) + cR(y)

For each y solve the problem

minimize q (cn , y) ,

obtaining a solution point yn . We’re going to assume that each n problem has a
solution. For example, if q(c, y) grows unboundedly as |y| → ∞, this will be true.

25
Let’s look at a problem with a single restriction once more:

minimize g(y)

subject to h(y) = 0

One way to address this problem is to transform it to an unconstrained problem.

1
minimize g(y) + µh(y)2 ,
2

where µ is a (large) coefficient of penalty. Because of the penalty, the solution will
tend to have a small h(y). The method of SD may be used to solve the given problem
as an unconstrained problem. What will be the outcome? Consider the scenario when
g is quadratic and h is linear for the sake of simplicity. We focus on the problem in
particular,
1 T
minimize y Qy − dT y
2
subject to cT y = 0

The objective of the associated penalty problem is

= (1/2) yT Qy + µyT ccT y − dT y.




The matrix Q + µccT defines the quadratic form connected with this objective and
accordingly, the condition number of this matrix will determine the SDM convergence
rate. The initial matrix Q has been supplemented by a big rank-one matrix. This
addition will inevitably result in one of the matrix’s eigenvalues being large (on the
order of mu). As a result, the condition number is proportional to mu. As a result,
the rate of convergence becomes exceedingly poor as µ is increased in order to provide
an accurate solution to the initial limited issue.

26
6.2 Application 2
Solution of Gradient Equation: Solving the equations ∇g(y) = 0 that express the
essential conditions is one technique to minimising of a function g. These equations
might be solved by applying SDM to the function k(y) = |∇g(y)|2 according to one
theory. The fact that the minimum value is known is a benefit of this strategy. We
want to know is this strategy is likely to be quicker or slower than using SDM on
the original function g. We only address the scenario where g is quadratic simplicity.
Thus let,
g(y) = (1/2)yT Qy − dT y

Then the gradient of g is,


f (y) = Qy − b

and
k(y) = |k(y)|2 = yT Q2 y − 2yT Qd + dT d

As a result, k(y) is a quadratic function. The eigenvalues of the matrix Q2 will


determine the rate of convergence of SDM applied to k. The rate will be,
 2
c̄ − 1
c̄ + 1

The matrix Q2 has condition number that is c̄ . However, we get the eigenvalues of
Q2 by squares of those of Q itself, so c̄ = c2 , Where c is the condition number of Q.
Thus it is obvious that the suggested method’s convergence rate would be lower than
SDM applied to the original function. We may even go a step further and predict
how much slower the recommended approach will be. We have SDM rate, if c is big
 2
c−1
= ≃ (1 − 1/c)4
c+1

proposed method rate


2
c2 − 1

4
= ≃ 1 − 1/c2 .
c2 + 1
c
Since (1 − 1/c2 ) ≃ 1 − 1/c, As a result, one step of the new method takes around c
steps to match one step of standard SDM.

27
Bibliography

[1] Atkinson, K. E., An Introduction to Numerical Analysis, John Wily and Sons,
1989.

[2] Bazaraa, M. S., Sherali, H. D., Shetty, C.M., Nonlinear Programming Theory
and Algorithms, A John Wiley Sons, Inc., Publication, 2006.

[3] Griva, I., Nash, S. G., Sofer, A., Linear and Nonlinear Optimization, Society for
Industrial and Applied Mathematics, 2009.

[4] Luenberger, D. G., Ye, Y., Linear and Nonlinear Programming, Springer Science,
2008.

[5] Rao, S. S., Engineering Optimization Theory and Practice, New Age International
Publishers, 2013.

28

You might also like