0% found this document useful (0 votes)

73 views29 pages

BSC Part 3

The document describes the steepest descent method for solving unconstrained optimization problems. It begins with an introduction to optimization problems and numerical techniques. It then outlines the general idea of the steepest descent method, which takes iterative steps in the direction of the negative gradient to minimize an objective function. The document provides examples of applying the steepest descent method to minimize basic quadratic functions. It also discusses convergence theory and extensions of the method.

Uploaded by

Rishabh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views29 pages

BSC Part 3

Uploaded by

Rishabh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Steepest Descent Method for

Unconstrained Optimization Problems

A project report
submitted by

Akshay Kumar Ranwa

(IVR No: 201700027139)

under the supervision of

Dr.
Contents

1 Introduction 2

2 General Idea 4
2.1 Method of Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Convergence Theory 11
3.1 Quadratic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Convergence rate for Yj . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Relative decrease in F . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 Kantorovich inequality . . . . . . . . . . . . . . . . . . . . . . 14

4 Scaling 16

5 Extensions 21

6 Applications 23
6.1 Application 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 Application 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1
Chapter 1

Introduction

Always analytic methods may not work because of the complexity of the problem and
because of problems are not convex, we have numerical techniques to solve such types
of problems. The optimization problem is the task of selecting the best answer from
a set of alternatives, optimization problems basically divides in two type of problems
one is constrained optimization problems and another one is unconstrained optimiza-
tion problems. The only difference between them is that we have to minimizing or
maximizing the function in both type of problems but in constrained optimization we
have to maximizing or minimizing the function subject to some constrained or we can
say restrictions are there.
There are various unconstrained methods in order to solve the minimization or max-
imization problems, basically it is divide in two parts one is direct search methods
and second is descent methods(Gradient method). There is no derivative approach in
direct search methods so it is also called as zeroth order methods.
For nonlinear optimization there is lot of gradient methods but the steepest descent
method is the simplest one and it is a newton type method. The steepest descent
method is not as old as newtons method. Cauchy (1789 − 1857) developed it in the
nineteenth century (1847) about two centuries later than newton’s method. It is a lot
less complicated than newton’s method, in steepest descent algorithm we use only first
derivative of function it does not necessitate the computation of second derivative, for
finding the search direction there is no need to solve system of linear equations, and
there is no need for matrix storage. As a result, it lowers costs of newton’s method
in every way in terms of iteration costs.

2
Steepest descent method has a slower rate of convergence than newton’s methods
that is a negative side of this method. The steepest descent method converges linearly
to its minimum. Steepest descent method is used in solving a system of nonlinear
equations of the form
g(y1 , y2 , .......yn ) = 0

, where g is a real valued differentiable function and g have continuous first partial
derivatives and method accept negative of gradient.
Descent property : For a function g, g(xk+1 ) < g(xk ) for all k i.e. as we proceed,
the value of objective function should decrease.

3
Chapter 2

General Idea

2.1 Method of Steepest Descent

Suppose that we want to determine the minimum of a function g(y), y∈Rn choose an
initial point Y,

2.1.1 Step 1

Calculate the best search direction sj .

For steepest descent search direction is given by

sj = −∇gj

The function value decreases at the quickest rate if we walk along a negative gradient
path from any point to n-dimensional space.

2.1.2 Step 2

Select a step length in the search direction to reduce g(y). In order to select the ideal
step length λj in the direction of sj and set

yj+1 = yj + λj sj

4
Begin from point Y and work your way down the sharpest downhill directions until
you reach the optimization point.

How to pick λ- Task of λ is to minimize a function in a search direction. It is

chosen either analytically or numerically.
Analytic Method: In this method s is fixed and we have to pick ‘λ’

δyj = λj sj

g(yj +1 ) = g(yj + λj sj )

Taylor’s expansion

1
= g(yj ) + ∇T g(yj )(δyj ) + (δyj )T H(yj )(δyj )
2

Thus,
d(yj + λj sj )
=⇒ =0
dλj
∇gj sj
λj = −
sTj Hj sj

sTj sj
λj =
sTj Hj sj

2.2 Examples
1. Minimum of
g(y) = (y1 − 7)2 + (y2 − 2)2

Choose initial point Y1 = (5.5, 3)T

Iteration 1:
Step 1: Search for best direction sj for steepest descent method Sj = −∇gj
 
δg
sj = −  δy1 
δg
δy2

5
 
2(y1 − 7)
sj = −  
2(y2 − 2)
 
3
s1 = −  
−2

Step 2: Step length in direction of search direction

sTj sj
λj =
sTj HJ sj

here H is hessian matrix ( positive definite matrix).

   
(δ)2 g (δ)2 g
2 0
H=  (δy1 )2 (δy1 )(δy2 ) 
= 
(δ)2 g (δ)2 g
(δy2 )(δy1 ) (δy2 )2
0 2

 
3
3 −2  
−2
λ1 =   
2 0 3
3 −2   
0 2 −2

λ1 = 1/2

Now,
Yj+1 = Yj + λsj

Y2 = Y1 + λs1
 
7
Y2 =  
2

Check point Y 2 is optional or not,

(∇g)Y2 = 0

So optimal point is Y 2 .

6
Example 2:
Minimize
g(y1 , y2 ) = y1 − y2 + 2(y1 )2 + 2y1 y2 + (y2 )2

Choose initial point Y1 = (0, 0)T .

Iteration 1: At Y1 .
Step 1: search for best direction sj
for steepest descent method
sj = −∇gj
 
δg
sj = −  δy1 
δg
δy2
 
1 + 4y1 + 2y2
sj = −  
−1 + 2y1 + 2y2

 
−1
s1 =  
1

Step 2: Step length in direction of s1

sTj sj
λj =
sTj Hsj

here H is hessian matrix ( positive definite matrix).

   
(δ)2 g (δ)2 g
4 2
H=  (δy1 )2 (δy1 )(δy2 ) 
= 
(δ)2 g (δ)2 g
(δy2 (δy1 )2 (δy2 )2
2 2

λ1 = 1

Now
Yj+1 = Yj + λsj

Y2 = Y1 + λs1

7
 
−1
Y2 =  
1

Check point Y2 is optimal or not

   
−1 0
(∇g)Y 2 =   ̸=  
1 0

Y2 is not optimal point so move on next iteration.

Iteration 2: At Y2 .
Step-1: Search for best direction s2

s2 = −∇g2
 
1
s2 =  
1

Step-2: Step length in direction of s2

sT2 s2
λ2 =
sT2 Hs2

Where  
4 2
H= 
2 2

1
λ2 =
5

Now
Y3 = Y2 + λ2 s2

     
−1 1 1 −0.08
Y3 =   +   =  
1 5 1 1.2

8
Check point Y3 is optimal or not
  

0.2 0
(∇g)Y 3 =  ̸=  
−0.2 0

Y3 is not an optimal point so move on next iteration.

Iteration 3: At Y3
Step-1: Search for best direction s3 .
For steepest descent method
s3 = −∇g3
 
−0.2
s3 =  
0.2

Step-2: step length in direction of s3

sT3 s3
λ3 =
sT3 Hs3

Here  
4 2
H= 
2 2

λ3 = 1

Now
Y4 = Y3 + λs3

   
−0.8 −0.2
Y4 =  + 
1.2 0.2
 
−1.0
Y4 =  
1.4

9
Check point Y4 is optimal or not
   
−0.2 0
(∇g)Y 4 =  ̸=  
−0.2 0

Y4 is not an optimal point so move on next iteration.

Iteration 4: At Y4
Step-1: Best direction s4
s4 = −∇g4
 
0.2
s4 =  
0.2

Step-2:  
4 2
H= 
2 2

sT4 s4
λ4 =
sT4 Hs4
1
λ4 =
5
Now,
Y5 = Y4 + λs4
   
−1.0 0.2
Y5 =  + 1 
1.4 5 0.2
 
0.96
Y5 =  
1.44

Check point Y5 is optimal or not

   
−0.04 0
(∇g)Y 5 =  ∼
= 
−0.04 0

Y5 is optimum.

10
Chapter 3

Convergence Theory

The SDM has a nice convergence theory, which is one of its key advantages. It’s not
difficult to demonstrate that rate of convergence of SDM is linear, which is unsur-
prising because of the method’s simplicity regrettably, even for modestly nonlinear
systems, problems. As a result, convergence will be too slow for any practical use
application.
The importance of theory of SD approach’s convergence in comprehending the be-
haviour of convergence.
Let’s see how the SDM converges to its minimum in the quadratic situation. This
particular situation is critical because even if a function is not quadratic, it will behave
quadratic around the optimal point, hence it is critical to investigate the behaviour
of quadratic functions.

3.1 Quadratic Case

To start, consider the following quadratic function with minimizing problem

1
g(Y ) = Y T HY − dT Y
2

where d∈Rn , and H is an n × n symmetric positive definite matrix. All of the eigen-
values of H are real and positive because H is symmetric and positive definite matrix.
Let’s say the eigenvalues of the matrix H are e1 , e2 , e3 ,..., en . Where e1 is the smallest
eigenvalue and en is the largest eigenvalue of H. We know that the gradient of the

11
given quadratic function g is
s(y) = HY − d

and if we set the gradient to the zero then it gives the optimal point Y*.

Y ∗ = H −1 d

Since all the eigenvalues of H are positive and real, determinant of H is non zero so
H −1 exist. Thus the method of steepest descent represented as

Yj+1 = Yj − λj sj

where s(j) = HYj − d and λj is step length in direction of sj such that λj minimizes
g(Yj − λj sj ). We can determine the value of λj

1
g(Yj − λsj ) = (Yj − λsj )T H(Yj − λsj ) − (Yj − λsj )T d
2
which is minimized at λj , it can be determine by differentiating with respect to λ,

sTj sj
λj =
sTj Hsj

Hence the method of steepest descent reduces in the form(explicit form)

sTj sj
yj+1 = yj − ( )sj
sTj Hsj

where s(j) = HYj − d

3.1.1 Convergence rate for Yj

It is easy to examine convergence by considering the quantity g(Yj ) − g(Y ∗ ), where

Y* is the point where function takes it’s global minimum value. We started with the
introducing the new function

1
FY = (Y − Y ∗ )T H(Y − Y ∗ )
2

12
The only difference between FY and gY is of a constant term 21 (Y ∗ )T H(Y ∗ )

1
FY = gY − (Y ∗ )T H(Y ∗ )
2

Further we work on FY because minimization of gY is same as minimization of FY .

The unique minimizer point of the equation is given by the linear system

HY ∗ = d

3.1.2 Relative decrease in F

Define Xj = Yj − Y ∗ and using HYj = sj we have relation,

sTj sj
Yj+1 = Yj − ( )sj
sTj Hsj

Now,

F (Yj ) − F (Yj+1 ) (Yj − Y ∗ )T H(Yj − Y ∗ ) − (Yj+1 − Y ∗ )T H(Yj+1 − Y ∗ )

=
F (Yj ) XjT HXJ

2λj sTj sj − λ2j sTj Hsj

=
XjT HXJ
sT
j sj
Substituting λj = sT
, We get
j Hsj

F (Yj ) − F (Yj+1 ) (sTj sj )2

= T
F (Yj ) (sj Hsj )(sTj H −1 sj )

We require a bound on the right hand side of the equation to get a bound on rate of
convergence. Kantorovich and his lemma gives the best bound which is as described
below, is a valuable generic tool in convergence analysis.

13
3.1.3 Kantorovich inequality

Let H is an n × n symmetric positive definite matrix and let e1 and en be the smallest
and largest eigenvalues of H respectively then for any X ̸= 0

(X T X)2 e1 en
T T −1
≥4
(X HX)(X H X) (e1 + en )2

Using this inequality,

F (Yj ) − F (Yj+1 ) e1 en
≥4
F (Yj ) (e1 + en )2
Therefore,
en − e1 2
F (Y j + 1) ≤ [ ] F (Y j )
en + e1

• SDM converges linearly and the maximum convergence rate is given by

en − e1 2
[ ]
en + e1

• Rate of convergence of SDM also depends on y ∗

• Condition number of H is given by formula

en
r=
e1

• Clearly we can see the rate of convergence of SDM depends on condition number
of hessian.

• If we calculate r=1 then the contours are circular and in one iteration we obtain
optimum or convergence.

• If we get r ≥ 1 then contours are elliptical and we obtain slow convergence.

Example: As we see in example 1 SDM applied to g(y) with exact line search and we
obtained optimul or fast convergence in single iteration from any initial point.

14
And in example 2 On g(y) we applied SDM with line search and it takes many
iterations for convergence.

15
Chapter 4

Scaling

Even for a quadratic function, the SD approach’s rate of convergence is at best lin-
ear. The SD method’s rate of convergence can be enhanced By scaling the design
variables. The condition number of the Hessian of the function may be reduced by
scaling. It is possible to scale the design variables for a quadratic function so that the
Hessian matrix’s condition number is unity with respect to the new design variables.
A matrix’s condition number is determined as the ratio of the matrix’s biggest to
lowest eigenvalues.
Example will be used to highlight the benefits of scaling design variables. If g =
1 T
2
Y [B]Y denotes a quadratic case, lets see a transformation of the form,
    
 y  s s  z 
1 11 12 1
Y = [S]Z or = 
 y  s21 s22  z2 
2

It’s possible to get a new quadratic term using this method as

1 T 1
Z [B̃]Z = ZT [S]T [B][S]Z
2 2

The matrix [S] can be selected to make [B̃] = [S]T [B][S] diagonal (i.e., mixed
quadratic terms will be eleminated). For this, the eigenvectors of the matrix [B]
will be columns of the matrix [S]. After that the diagonal elements of the matrix [B̃]
can be reduced to one (we have to fixed the condition number of the resulting matrix

16
such that it will be 1 ) here we use the transformation,
    
 z  c11 0  d1 
1
Z = [C]D or =  
 z  0 c22  d2 
2

Matrix C is  
c11 = √1 0
b11
[C] =  
0 c22 = √1
b̄22

As a consequence, the complete transformation of g’s Hessian matrix into an identity

matrix is
X = [S][C]D ≡ [T ]D

so that the quadratic term 12 YT [B]Y reduces to 12 DT [I]D.

If
1
g(Y) = c + AT Y + YT [B]Y
2
where c = f (Yi )
 
∂2f ∂2f
∂x21 x
··· ∂x1 ∂xn
 i xi 
 .. .. 
[A] = 
 . . 


∂2f ∂2f
 
∂xn ∂x1
··· ∂x2n
xi xi

EXAMPLE: 4.1

g (y1 , y2 ) = 6y12 − 6y1 y2 + 2y22 − y1 − 2y2

SOLUTION : The quadratic function is written as

1
g(Y) = AT Y + YT [B]Y
2

where      
 y   −1  12 −6
1
Y= ,A = , and [B] =  
 y   −2  −6 4
2

As previously stated, the necessary variable scaling may be performed in two phases.

17
Stage 1: The eigenvectors of the matrix [B] are calculated by solving the eigenvalue
problem. Reduce [B] to a Diagonal Form

[[B] − βi [I]] wi = 0

where βi is the i th eigenvalue and wi is the corresponding eigenvector. In the present

case, the eigenvalues, βi , are given by

12 − βi −6

= βi2 − 16βi + 12 = 0
−6 4 − βi

√ √
which yield β1 = 8 + 52 = 15.2111 and β2 = 8 − 52 = 0.7889. The eigenvector ui
corresponding to βi can be found by solving Eq.
    
12 − β1 −6  w11   0 
  = or (12 − β1 ) w11 − 6w21 = 0
−6 4 − β1  w21   0 

or w21 = −0.5332w11 that is,

   
 w   1.0 
11
w1 = =
 w   −0.5332 
21

and
    
12 − β2 −6  w   0 
12
  = or (12 − β2 ) w12 − 6w22 = 0
−6 4 − β2  w22   0 

or w22 = 1.8685w12 that is,

   
 w   1.0 
12
w2 = =
 w   1.8685 
22

As a result, the transformation that converts [B] to a diagonal form is

  
h i 1 1  z1 
Y = [S]Z = w1 w2 Z= 
−0.5352 1.8685  z2 

18
that is,
y1 = z1 + z2
y2 = −0.5352z1 + 1.8685z2

As a result, the new quadratic term is as follows: 12 ZT [B̃]Z, where

 
19.5682 0.0
[B̃] = [S]T [B][S] =  
0.0 3.5432

As a result, the quadratic function is

1
f (z1 , z2 ) = AT [S]Z + ZT [B̄]Z
2
1 1
= 0.0704z1 − 4.7370z2 + (19.8682)z12 + (3.5432)z22
2 2

Stage 2: Reducing [B] to a Unit Matrix The transformation is Z = [C]D, where

   
1
√
19.5682
0 0.2262 0.0
[C] =  = 
0 √ 1 0.0 0.5313
3.5432

Stage 3: A total transformation

The total transformation is given by

Y = [S]Z = [S][C]D = [T ]D

where   
1 1 0.2262 0
[T ] = [S][C] =   
−0.5352 1.8685 0 0.5313
 
0.2262 0.5313
= 
−0.1211 0.9927
or
y1 = 0.2262d1 + 0.5313d2
y2 = −0.1211d1 + 0.9927d2

19
Figure 4.1: Contours of the original function.

Figure 4.2: Contours of the transformed function.

The quadratic function becomes after this change,

1
f (d1 , d2 ) = AT [T ]D + DT [T ]T [B][T ]D
2
1 1
= 0.0160d1 − 2.5167d2 + d21 + d22
2 2

20
Chapter 5

Extensions

The SDM has been modified in a number of ways. Barzilai and Borwein presented
two new step sizes for the negative gradient direction in 1988. Despite the fact that
their method did not ensure descent in the objective function values, the numerical
results showed that it was a significant improvement over the traditional SDM. The
goal of their strategy was to hasten the convergence of the SDM. The Barzilai-Borwein
approach necessitates a small number of storage places and low-cost computations.
Although the Newton method and quasi-Newton methods are useful for addressing
unconstrained minimization problems, they cannot be used to solve large-scale un-
constrained minimization problems directly. As a result, numerical approaches based
on the SD direction are favoured since they do not require the storing of matrices.
For unconstrained minimization problems, SDM is the simplest gradient method.
The following precise step size

Yj+1 = Yj + λj sj

where
sTj sj
λj =
sTj H i sj

is used in the SDM, which can be traced back to Cauchy (1847).

Unfortunately, it is well known that it converges slowly in the majority of circum-
stances. This low performance is due to the best choice of step size rather than the
steepest downhill direction.

21
As a result, multiple authors experimented with different step sizes in order to address
this flaw. The new iterate as:

1
yk+1 = yk − gk
βk

Instead of performing a line search or employing the quadratic case approach, the
step length βK is calculated using the following formula:

sTk−1 xk−1
βk = ,
sTk−1 sk−1
where sk−1 = xk − xk−1 and xk−1 = gk − gk−1 . The Barzilai and Borwein approach
requires just O(n) floating point operations and a gradient evaluation for each itera-
tion.
During the process, there are no matrices to computation or line searches to do. In
the two-dimensional quadratic example, Barzilai and Borwein presented a convergence
analysis.They established R-superlinear convergence for that particular scenario.

22
Chapter 6

Applications

The basic convergence theory, as expressed by the rate of convergence formula has
been devised and proved to truly define SDM behaviour, it is necessary to demonstrate
how the theory may be used. We don’t recommend computing the numerical value
of the formula since it includes eigenvalues Or eigenvalue ratios that are difficult to
calculate. Nevertheless, the formula itself is quite useful in practise since it allows one
to compare various circumstances hypothetically, without a hypothesis like this. The
only option would be to depend only on experimental comparisons.

6.1 Application 1
Penalty Methods: Penalty techniques are methods for approximating restricted op-
timization problems using unconstrained problems. In the case of penalty approaches,
the approximation is achieved by adding a term to the objective function that spec-
ifies a high cost for violating the restrictions. There are two major flaws with the
method. The first is how closely the unconstrained problem resembles the restricted
one. The second difficulty, which is most relevant from a practical standpoint, is how
to solve an unconstrained problem with a penalty term in its objective function.
Look at the problem,

minimize g(y)

subject to Y ∈ T

23
where T is a constraint set in E n and g is a continuous function on E n . In most
instances, T is implicitly defined by a set of functional constraints, although the more
general description can be addressed in this section. A penalty function technique
replaces a restricted problem with an unconstrained problem of the type,

minimize g(y) + cR(y).

R is a function on E n and c is a positive constant that fulfils the following conditions:

(i) R is continuous,
(ii) R(y) ⩾ 0 for all y ∈ E n , and
(iii) R(y) = 0 if and only if y ∈ T .

Example 1: Exterior penalty approach:- Start outside the feasible region and
slowly converge on a minimum from the outside.
Equality constraints
minimum g(y)
s.t. ki (y) = 0, i ∈ E

It is possible to approximate by the unconstrained problem

X
minimum F (y, c) = g(y) + σ h2i (y)
i∈E

Penalizes both positive and negative values of hi (y).

Example 2: Assume that T is defined by a set of inequalities: Penalize positive

values of gi , no effect if negative values.

T = {y : hj (y) ⩽ 0, j = 1, 2, . . . , r}

In this scenario, a highly helpful penalty function is

r
1X
R(y) = (max [0, hj (y)])2
2 j=1

24
Figure 6.1: plot of cR

In the one-dimensional case, the function cR(y) is shown in Figure 6.1 with

h1 (y) = y − b, h2 (y) = a − y.

It is obvious that for high c, the smallest point of problem will be in an area where R
is small. When a result, it is predicted that as c rises, the appropriate solution points
will approach the feasible area T and if near enough, will minimize g. The penalised
problem’s solution point should ideally converge to the constrained problem’s solution
point as c → ∞.
The Method: The procedure for solving problem by the penalty function method
is this: Let {cn } , n = 1, 2, . . . be a sequence tending to infinity such that for each
n, cn ⩾ 0, cn+1 > cn . Define the function

q(c, y) = g(y) + cR(y)

For each y solve the problem

minimize q (cn , y) ,

obtaining a solution point yn . We’re going to assume that each n problem has a
solution. For example, if q(c, y) grows unboundedly as |y| → ∞, this will be true.

25
Let’s look at a problem with a single restriction once more:

minimize g(y)

subject to h(y) = 0

One way to address this problem is to transform it to an unconstrained problem.

1
minimize g(y) + µh(y)2 ,
2

where µ is a (large) coefficient of penalty. Because of the penalty, the solution will
tend to have a small h(y). The method of SD may be used to solve the given problem
as an unconstrained problem. What will be the outcome? Consider the scenario when
g is quadratic and h is linear for the sake of simplicity. We focus on the problem in
particular,
1 T
minimize y Qy − dT y
2
subject to cT y = 0

The objective of the associated penalty problem is

= (1/2) yT Qy + µyT ccT y − dT y.

The matrix Q + µccT defines the quadratic form connected with this objective and
accordingly, the condition number of this matrix will determine the SDM convergence
rate. The initial matrix Q has been supplemented by a big rank-one matrix. This
addition will inevitably result in one of the matrix’s eigenvalues being large (on the
order of mu). As a result, the condition number is proportional to mu. As a result,
the rate of convergence becomes exceedingly poor as µ is increased in order to provide
an accurate solution to the initial limited issue.

26
6.2 Application 2
Solution of Gradient Equation: Solving the equations ∇g(y) = 0 that express the
essential conditions is one technique to minimising of a function g. These equations
might be solved by applying SDM to the function k(y) = |∇g(y)|2 according to one
theory. The fact that the minimum value is known is a benefit of this strategy. We
want to know is this strategy is likely to be quicker or slower than using SDM on
the original function g. We only address the scenario where g is quadratic simplicity.
Thus let,
g(y) = (1/2)yT Qy − dT y

Then the gradient of g is,

f (y) = Qy − b

and
k(y) = |k(y)|2 = yT Q2 y − 2yT Qd + dT d

As a result, k(y) is a quadratic function. The eigenvalues of the matrix Q2 will

determine the rate of convergence of SDM applied to k. The rate will be,
2
c̄ − 1
c̄ + 1

The matrix Q2 has condition number that is c̄ . However, we get the eigenvalues of
Q2 by squares of those of Q itself, so c̄ = c2 , Where c is the condition number of Q.
Thus it is obvious that the suggested method’s convergence rate would be lower than
SDM applied to the original function. We may even go a step further and predict
how much slower the recommended approach will be. We have SDM rate, if c is big
2
c−1
= ≃ (1 − 1/c)4
c+1

proposed method rate

2
c2 − 1

4
= ≃ 1 − 1/c2 .
c2 + 1
c
Since (1 − 1/c2 ) ≃ 1 − 1/c, As a result, one step of the new method takes around c
steps to match one step of standard SDM.

27
Bibliography

[1] Atkinson, K. E., An Introduction to Numerical Analysis, John Wily and Sons,
1989.

[2] Bazaraa, M. S., Sherali, H. D., Shetty, C.M., Nonlinear Programming Theory
and Algorithms, A John Wiley Sons, Inc., Publication, 2006.

[3] Griva, I., Nash, S. G., Sofer, A., Linear and Nonlinear Optimization, Society for
Industrial and Applied Mathematics, 2009.

[4] Luenberger, D. G., Ye, Y., Linear and Nonlinear Programming, Springer Science,
2008.

[5] Rao, S. S., Engineering Optimization Theory and Practice, New Age International
Publishers, 2013.

Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Assignment 2 CE-415 Artificial Intelligence
No ratings yet
Assignment 2 CE-415 Artificial Intelligence
3 pages
Microprocessor
No ratings yet
Microprocessor
626 pages
Download
No ratings yet
Download
7 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Cauchy Gradient Based Technique Lecture 5
No ratings yet
Cauchy Gradient Based Technique Lecture 5
21 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
FALLSEM2023-24 EEE1020 ETH VL2023240103124 2023-08-19 Reference-Material-I
No ratings yet
FALLSEM2023-24 EEE1020 ETH VL2023240103124 2023-08-19 Reference-Material-I
9 pages
Optimal Filter
No ratings yet
Optimal Filter
26 pages
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
No ratings yet
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
46 pages
Optim
No ratings yet
Optim
70 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Newton-Raphson en INGLES
No ratings yet
Newton-Raphson en INGLES
24 pages
20 Notes 6250 f13
No ratings yet
20 Notes 6250 f13
8 pages
Chương 9
No ratings yet
Chương 9
12 pages
Opt Class CH17102 - Unit 2
No ratings yet
Opt Class CH17102 - Unit 2
25 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Fletcher Reeves Gradient Based Techniques
No ratings yet
Fletcher Reeves Gradient Based Techniques
24 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Method of Steepest Descent and Its Applications: Department of Engineering, University of Tennessee, Knoxville, TN 37996
No ratings yet
Method of Steepest Descent and Its Applications: Department of Engineering, University of Tennessee, Knoxville, TN 37996
3 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
IB352 Warwick Wk4 - Lecture-4
No ratings yet
IB352 Warwick Wk4 - Lecture-4
22 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Unconstrained Optimization
No ratings yet
Unconstrained Optimization
27 pages
Steepest Descent in Unconstrained Optimization
No ratings yet
Steepest Descent in Unconstrained Optimization
12 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Steepest Decent and CG
No ratings yet
Steepest Decent and CG
68 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
50 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
No ratings yet
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
37 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
#9 Steepest Descent
No ratings yet
#9 Steepest Descent
17 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Cs3491 - Aiml - Unit III - Gradient Descent
No ratings yet
Cs3491 - Aiml - Unit III - Gradient Descent
12 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
MSC 1
No ratings yet
MSC 1
49 pages
BSC Part 1st
No ratings yet
BSC Part 1st
9 pages
Mineral Water Plant Gayatri Devi
No ratings yet
Mineral Water Plant Gayatri Devi
24 pages
Climate Change - RG
No ratings yet
Climate Change - RG
40 pages
Aasis Sawati
No ratings yet
Aasis Sawati
1 page
Nishant Vyas: Mechanical Fitter
No ratings yet
Nishant Vyas: Mechanical Fitter
2 pages
Certificate of Identity
No ratings yet
Certificate of Identity
1 page
Basic Course: Advance Course:: Fee Structure:-Rs. 500/ - For More Details Please Contact
No ratings yet
Basic Course: Advance Course:: Fee Structure:-Rs. 500/ - For More Details Please Contact
1 page
Attendance & Award Sheet
No ratings yet
Attendance & Award Sheet
6 pages
07 HYY RK3588 Brief Datasheet
No ratings yet
07 HYY RK3588 Brief Datasheet
3 pages
Data Integration Concepts, Processes, and Techniques
No ratings yet
Data Integration Concepts, Processes, and Techniques
10 pages
Easergy CL110 Catalogue Page - 3
100% (1)
Easergy CL110 Catalogue Page - 3
1 page
Unit 3
No ratings yet
Unit 3
29 pages
Hands-On Lab 7 - Using Pivot Tables
100% (1)
Hands-On Lab 7 - Using Pivot Tables
4 pages
FINAL Paper of Computer Organization & Assembly Language (COAL)
No ratings yet
FINAL Paper of Computer Organization & Assembly Language (COAL)
20 pages
Summer-2 Tentative Courses and Slots
No ratings yet
Summer-2 Tentative Courses and Slots
4 pages
Lab 2
No ratings yet
Lab 2
4 pages
CV Preey Shah PDF
No ratings yet
CV Preey Shah PDF
1 page
Dec50143 PW1
No ratings yet
Dec50143 PW1
11 pages
S5 Bot
No ratings yet
S5 Bot
2 pages
CNN and RNN Comparative Study For Intrusion Detection System
No ratings yet
CNN and RNN Comparative Study For Intrusion Detection System
12 pages
Linear Programming: Presented By: Saba Arif Rao M Nasir Bilal Ahmed Toor Akif Jamal
No ratings yet
Linear Programming: Presented By: Saba Arif Rao M Nasir Bilal Ahmed Toor Akif Jamal
18 pages
Opportunities For Neuromoriphic Computing
No ratings yet
Opportunities For Neuromoriphic Computing
10 pages
Cloudwatch 2
No ratings yet
Cloudwatch 2
11 pages
Hardware Key
No ratings yet
Hardware Key
3 pages
Five Digits Magic Prediction
No ratings yet
Five Digits Magic Prediction
4 pages
Seminar 09 - Shubham Keskar
No ratings yet
Seminar 09 - Shubham Keskar
45 pages
F5-101-Master-Cheat-Sheet Yo
No ratings yet
F5-101-Master-Cheat-Sheet Yo
13 pages
PCS-9705S (Bcu)
No ratings yet
PCS-9705S (Bcu)
2 pages
BBM384 Software Engineering Laboratory: R.A. Burcu Yalçiner R.A. Bahar Gezici
No ratings yet
BBM384 Software Engineering Laboratory: R.A. Burcu Yalçiner R.A. Bahar Gezici
13 pages
BCA SEM4 OCT 2022 Question Papers
No ratings yet
BCA SEM4 OCT 2022 Question Papers
11 pages
Poweredge-T430 - User's Guide15 - En-Us
No ratings yet
Poweredge-T430 - User's Guide15 - En-Us
116 pages
A Risc MIPS, Mflops
No ratings yet
A Risc MIPS, Mflops
16 pages
Python Unit 1
No ratings yet
Python Unit 1
9 pages
Computer Graphics All MCQ
100% (1)
Computer Graphics All MCQ
58 pages
Java MCQ Worksheet-10 MCQ
No ratings yet
Java MCQ Worksheet-10 MCQ
5 pages
VSAM Interview Questions
No ratings yet
VSAM Interview Questions
15 pages

BSC Part 3

Uploaded by

BSC Part 3

Uploaded by

Steepest Descent Method for

Unconstrained Optimization Problems

Akshay Kumar Ranwa

under the supervision of

2.1 Method of Steepest Descent

Calculate the best search direction sj .

How to pick λ- Task of λ is to minimize a function in a search direction. It is

Choose initial point Y1 = (5.5, 3)T

Step 2: Step length in direction of search direction

here H is hessian matrix ( positive definite matrix).

Check point Y 2 is optional or not,

Choose initial point Y1 = (0, 0)T .

Step 2: Step length in direction of s1

here H is hessian matrix ( positive definite matrix).

Check point Y2 is optimal or not

Y2 is not optimal point so move on next iteration.

Step-2: Step length in direction of s2

Y3 is not an optimal point so move on next iteration.

Step-2: step length in direction of s3

Y4 is not an optimal point so move on next iteration.

Check point Y5 is optimal or not

3.1 Quadratic Case

Hence the method of steepest descent reduces in the form(explicit form)

where s(j) = HYj − d

3.1.1 Convergence rate for Yj

It is easy to examine convergence by considering the quantity g(Yj ) − g(Y ∗ ), where

Further we work on FY because minimization of gY is same as minimization of FY .

3.1.2 Relative decrease in F

Define Xj = Yj − Y ∗ and using HYj = sj we have relation,

F (Yj ) − F (Yj+1 ) (Yj − Y ∗ )T H(Yj − Y ∗ ) − (Yj+1 − Y ∗ )T H(Yj+1 − Y ∗ )

2λj sTj sj − λ2j sTj Hsj

F (Yj ) − F (Yj+1 ) (sTj sj )2

Using this inequality,

• SDM converges linearly and the maximum convergence rate is given by

• Rate of convergence of SDM also depends on y ∗

• Condition number of H is given by formula

• If we get r ≥ 1 then contours are elliptical and we obtain slow convergence.

It’s possible to get a new quadratic term using this method as

As a consequence, the complete transformation of g’s Hessian matrix into an identity

so that the quadratic term 12 YT [B]Y reduces to 12 DT [I]D.

g (y1 , y2 ) = 6y12 − 6y1 y2 + 2y22 − y1 − 2y2

SOLUTION : The quadratic function is written as

where βi is the i th eigenvalue and wi is the corresponding eigenvector. In the present

or w21 = −0.5332w11 that is,

or w22 = 1.8685w12 that is,

As a result, the transformation that converts [B] to a diagonal form is

As a result, the new quadratic term is as follows: 12 ZT [B̃]Z, where

As a result, the quadratic function is

Stage 2: Reducing [B] to a Unit Matrix The transformation is Z = [C]D, where

Stage 3: A total transformation

Figure 4.2: Contours of the transformed function.

The quadratic function becomes after this change,

is used in the SDM, which can be traced back to Cauchy (1847).

minimize g(y) + cR(y).

R is a function on E n and c is a positive constant that fulfils the following conditions:

It is possible to approximate by the unconstrained problem

Penalizes both positive and negative values of hi (y).

Example 2: Assume that T is defined by a set of inequalities: Penalize positive

In this scenario, a highly helpful penalty function is

q(c, y) = g(y) + cR(y)

For each y solve the problem

One way to address this problem is to transform it to an unconstrained problem.

The objective of the associated penalty problem is

= (1/2) yT Qy + µyT ccT y − dT y.

Then the gradient of g is,

As a result, k(y) is a quadratic function. The eigenvalues of the matrix Q2 will

proposed method rate

You might also like