0% found this document useful (0 votes)
73 views

Lec3 Gradient Based Method Part I

This document discusses optimization methods for problems with more than one design variable. It begins by introducing Rosenbrock's banana function as a test function for optimization algorithms. It then provides an overview of deterministic optimization methods, including gradient-based and non-gradient based approaches. The rest of the document focuses on gradient-based methods, outlining the general procedure which involves identifying a search direction and performing a line search to minimize the objective function. It also discusses concepts like gradients, Hessians, convexity, and the use of gradients and Hessians in optimization methods.

Uploaded by

Abhay Jindal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Lec3 Gradient Based Method Part I

This document discusses optimization methods for problems with more than one design variable. It begins by introducing Rosenbrock's banana function as a test function for optimization algorithms. It then provides an overview of deterministic optimization methods, including gradient-based and non-gradient based approaches. The rest of the document focuses on gradient-based methods, outlining the general procedure which involves identifying a search direction and performing a line search to minimize the objective function. It also discusses concepts like gradients, Hessians, convexity, and the use of gradients and Hessians in optimization methods.

Uploaded by

Abhay Jindal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Deterministic Unconstrained

Optimisation – Part I

Rosenbrock's banana function


J=f(x1,x2) = (a-x1)2 + b(x2-x12)2

Gobal minima is at x1 = a and x2 =b


minimum f = 0

(Usually a and b are set to 1 and 100 respectively)


Contents

 Comments of characteristics of real life problems


 Classification of optimization problems
 Deterministic optimization methods
 General procedure
 Gradient & Hessian
 General line search methods
 Steepest descent method
 Conjugate gradient method
 Some other methods
Unconstrained minimization
 Characteristics of real-life problems:
 Design variables are invariably more than one
 The objective function may be non-linear
 The objective function may be non-deterministic (not an
issue for the time being)
 Evaluation of objective function may be expensive
 Gradient or Hessian of objective function may not be
available
 We discuss various deterministic methods of
optimization when number of design variables is
more than one
 We also assume that the design variables have
only side constraint (unconstrained optimisation)
Brute force method Line search
method along ei
 Choose ei (unit vectors) as the
set of search directions e2

 Minimize J by searching along


unit vectors one after the other
e1
till the function is minimum
 Method fails if J has a narrow
valley at an angle to the unit
vectors
 Note that a better set of directions than the ei’s
should be possible. Such directions should permit
large step size along narrow valleys be “non-
interfering” directions
Powell’s method
 Powell’s method is an extension of brute force line
search method which uses basis vectors as the
search directions.
 Powell’s method starts with initial guess P0 and uses
each of the basis vector direction, one after the
other, to minimise the function in n steps to locate
Pn. This step is identical to brute force line search
method.
 It then locates the optimal point by line search
method using the vector given by (Pn − P0)
 The method is iterative and the each iteration
requires (n+1) line searches
Gradient based multidimensional
unconstrained minimization
Optimization methods in
n dimensions

Gradient based Non gradient


methods based methods

Methods Methods deterministic Non-deterministic


that do that
not require Nelder– Genetic
require Hessian Mead Algorithms
Hessian Simplex
Simulated
Divided Annealing
Rectangles
Method Particle
Swarm
Optimization
Gradient based multidimensional
unconstrained minimization
General procedure

 Assume that the mathematical statement of the


problem is ready involving
 the objective function
 the design variables (they must be independent) and
 other parameters
 Iteratively search for the optima involving
 identifying the search direction along which optima lies
 searching in that direction for locating the position of
optima by using line search method
 Most procedures require the objective function
and its gradient G
 Some procedures also require Hessian H
Convex design space
 Most optimisation algorithms assume convex
design space
 A real-valued function defined on an n-
dimensional interval is convex if the line segment
between any two points on the graph of the
function is above or on the graph in a Euclidean
space
 In reality design space can be non-convex.
 It is essential to find out if the design space is
convex before attempting optimisation
Convex/concave Design Space in 2D

(x12 ,x22) (x12 ,x22)

(x11 ,x2 1) (x11 ,x21)

Convex 2D domain Concave 2D domain

Convex Sets
a, b S  a  (1   )b S   0,1
Convex vs non-convex function
Condition for convexity
f (x1  (1   ) x2 )  f ( x1 )  (1   ) f ( x2 ), 0    1
y convex function y non-convex function

x2
x2
x2 )
)

x1
x1
1
(
x1

local optima
f (

x x
local and also global optima
global optima
Gradient and Hessian
 J 
 X 
 1
 J 
 X 2 
 
 Gradient of a function is J  G ( J )   . 
 . 
 
 . 
 J 
 X 
 n

 The gradient vector is perpendicular to hyperplane


tangent to the contour surfaces of constant J
 For n=1 and 2, hyperplanes are points and
contour lines respectively
Hessian and its use
 The second derivative of objective function produces
n(n+1)/2 partial derivatives
2 J 2 J
, if i  j and ,i j
x1x2 xi2

 The second-order partial derivatives represents the


Hessian matrix
 2J 2J 2J  Hessian matrix is real square
 ...  (n x n) symmetric matrix
 21x 2
x1x2 x1xn 
We note any real square
  J 2J 2J 
H ... symmetric matrix has
x x x2 2
x2 xn 
 22 1  o only real Eigen values
  J  J  J 
2 2
o has real distinct orthogonal
...
 xn x1 xn x2 xn2 
Eigen vectors if Eigen values
are distinct
Hessian and its use
 Near the minimum, J can be approximated to be
a quadratic in X and can be expressed in terms
of gradient G and Hessian matrix H:

J ( X)  12 XT HX  G T X  C
 It can be seen that condition for minimum is
J ( X)  12 (HX  XT H )  G T  0 or
 12 (H T X  HX)  G  0

If H is symmetric then HX  G  0
 Thus minimisation of J identical to solution of the
linear algebraic equations usually written as
AX=b (H=A and b=-G)
Hessian and its use
 Expanding J(x) about the stationary point x* in a
direction p and noting that G(x*) = 0, at the
stationary point the behavior of the function is
determined by H 0

J (x *   p)  J (x*)  G (x*) T  p  12  2pT Hp


 J (x*)  12  2pT Hp
 H is a symmetric matrix, and therefore it has real
orthogonal eigenvectors, i.e.
Hui  ui , u  1
J (x *   u i )  J (x*)  12  2uTi Hu i
 J (x*)  12  2 i
Gradient and Hessian
 Thus J(x*+ui) increases or decrease over
and above J(x*) depending on whether λi is
positive, negative or zero
 For J to be minimum H be must be positive
definite, i.e. all its eigen values must be
positive
Gradient based methods
 Assume that J is quadratic and G and H are
constants
J (x)  a  G T x  12 x T Hx and
J  G  Hx
 Therefor unique minimum for J will be given by
J  G  Hx*  0 or
x*  H 1G
 If n is very large, the method is not feasible as it
requires inverse of (n x n) H matrix
 Realistic methods minimize the n-dimensional
function through several 1D line-minimizations.
Line search methods
 Start with X0 and a direction (a vector S0 in n
dimensions)
 Use 1D minimization method and minimize J(α) =
J(X0 + α S0), S0 (or p0) is the initial search
direction and α is the step size.
 Sk is the search direction for major iteration. αk is
the step length from the line search
 The important distinguishing feature of a gradient-
based algorithm its search direction
 Any line search that satisfies sufficient decrease
can be used, but one that satisfies the Strong
Wolfe conditions (on step size) is recommended.
A general gradient based method

start

Input: Initial guess, X0


Search Output: Optimum, X*
direction
k←0
while Not converged do
Compute a search direction Sk
Line Find a step length αk, such that
Update x search J(Xk+αk Sk) < J(Xk)
(the curvature condition may
also be included)

n Is J Update the design variables:


min ? Xk+1 ← Xk + αk Sk
k←k+1
end while
y
stop
Standard procedure (flow chart)

Some methods
do not need H(X)

Sensitivity Analysis
Analysis analysis
Perform
0
Calculate Calculate search
Input X 1D search
J ( X), G ( X), H ( X) direction S q
X q  X q 1   S q
q=q+1
n
y
stop Converged?
The search direction
 There are many algorithms
 Random search
 Powell method
 Steepest descent
 Flecture-Reeves (FR) method
 David-Flecture-Powell (DFP) method
 Broydon-Fletcher-GoldFarb-Shanno (BFGS) method
 Newton's method

 Some of the above are explained


Newton's method- the simplest variant
 If J is twice differentiable, J can be expressed by
using Taylor's series in terms of G and H
     
G X k 1  G X k  H X k ( X k 1  X k )
 
but G X k 1  0 condition for optimality
X  X k 1  X k  H 1G X k or  
X k 1  X k  H 1G ( X k ) or
X k 1  X k  H 1 J ( X k )

 The above expression is similar to Newton’s


method in 1D.
y ' ( x k 1 )  y ' ( x k )  y ' ' ( x k )( x k 1  x k ) or
k 1
x  x  y' ( x ) / y' ' ( x )
k k k
A variant of Newton's method -
Method of Steepest descent
 In the quasi Newton method, the Hessian matrix
is approximated to be the Identity matrix
Xk 1  Xk   I J Xk  
 This is the Method of steepest descent. It uses
the negative of the gradient of objective function
(steepest direction)as the search direction
 Chose 0 <  < 1 for stability (as is usually done)
 We may assume that the change in the magnitude
of X is the same as the one obtained in the
previous iteration. Note that pk=Gk/|Gk|
( k 1)T k 1
G p
 k 1 G ( k 1)T p k 1   k G ( k )T p k   k   k 1
G ( k )T p k
A variant of Newton's method -
Method of Steepest descent
 Alternately, an analytic formula for k can also be
found out by assuming quadratic J in x with G = -b
and H = A calculated at xk

J (x)  12 xT Ax  xT b
J (x k   p k )  12 (x k   p k )T A(x k   p k )  (x k   p k )T b
 12  2 p ( k )T A p k   p ( k )T Ax k   p ( k )T b  constants
as A is (n×n) is symmetric and positive definite
 To minimise J wrt , we set dJ/d=0, which gives

p ( k )T
( Ax k
 b)
 p A p  p A x  p b  0 or   
( k )T k ( k )T k ( k )T

p ( k )T A p k
Method of Steepest descent

 Justification for quasi Newton method


 
Xk 1  Xk   I J Xk , 0    1
 Consider Taylor expansion about Xn

       X
J X  X  J X  J X
k k k T

 Note that LHS and hence RHS must be negative.

J X  X  0
k T

 It can be seen that method of steepest descent


method involves the negative of the gradient of
objective function as the search direction
 It can be shown that the method does not give fast
convergence when close to the local minima
Method of Steepest descent

Input: Initial guess, X0, convergence tolerances, εg, εa and εr.


Output: Optimum design variables, X*
k←0
repeat
Compute the J and G(Xk) ≡ ∇J(Xk); if | G(Xk) |< εg stop
otherwise compute the search direction, Sk ← −G(Xk)/|G(Xk)|
Perform line search to find step length k
Update the current point, Xk+1 ← Xk + k Sk
k←k+1
until |J(Xk) − J(Xk−1)| ≤ εa + εr|J(Xk−1)|

εg absolute tolerance on gradient (typically 10-6)


εa relative tolerance on objective function (typically 10-2)
εr absolute tolerance on the function (typically 10-2)
Method of Steepest descent
 |J(Xk+1)-J(Xk)|≤a+≤r|J(Xk| is a check for the successive
reductions in J
 If J is of order 1, r dominates, if J is smaller than 1, then
the absolute error dominates
 The method of steepest descent has a problem that with
the exact line search, the steepest descent direction at
each iteration is orthogonal to the previous one
dJ ( X k 1 )
0
d k 1
J ( X )  ( X k   S k )
 0
X K 1

 T J ( X k 1 )S k  0

 G T ( X k 1 )G ( X k )  0
Method of Steepest descent

J ( X)  12 ( X 12  10 X 22 )

 The method is inefficient as successive search directions


are perpendicular to each other.
 Error decreases in the first few iterations, but the method is
slow near the minimum.
 The algorithm is guaranteed to converge, but no of
iterations can be infinite. The rate of convergence is linear.
Steepest descent
Graphical interpretation

The method suffers from poor convergence


Lecture Ends

You might also like