0% found this document useful (0 votes)

89 views

Lec3 Gradient Based Method Part I

This document discusses optimization methods for problems with more than one design variable. It begins by introducing Rosenbrock's banana function as a test function for optimization algorithms. It then provides an overview of deterministic optimization methods, including gradient-based and non-gradient based approaches. The rest of the document focuses on gradient-based methods, outlining the general procedure which involves identifying a search direction and performing a line search to minimize the objective function. It also discusses concepts like gradients, Hessians, convexity, and the use of gradients and Hessians in optimization methods.

Uploaded by

Abhay Jindal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

Lec3 Gradient Based Method Part I

Uploaded by

Abhay Jindal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Deterministic Unconstrained

Optimisation – Part I

Rosenbrock's banana function

J=f(x1,x2) = (a-x1)2 + b(x2-x12)2

Gobal minima is at x1 = a and x2 =b

minimum f = 0

(Usually a and b are set to 1 and 100 respectively)

Contents

 Comments of characteristics of real life problems

 Classification of optimization problems
 Deterministic optimization methods
 General procedure
 Gradient & Hessian
 General line search methods
 Steepest descent method
 Conjugate gradient method
 Some other methods
Unconstrained minimization
 Characteristics of real-life problems:
 Design variables are invariably more than one
 The objective function may be non-linear
 The objective function may be non-deterministic (not an
issue for the time being)
 Evaluation of objective function may be expensive
 Gradient or Hessian of objective function may not be
available
 We discuss various deterministic methods of
optimization when number of design variables is
more than one
 We also assume that the design variables have
only side constraint (unconstrained optimisation)
Brute force method Line search
method along ei
 Choose ei (unit vectors) as the
set of search directions e2

 Minimize J by searching along

unit vectors one after the other
e1
till the function is minimum
 Method fails if J has a narrow
valley at an angle to the unit
vectors
 Note that a better set of directions than the ei’s
should be possible. Such directions should permit
large step size along narrow valleys be “non-
interfering” directions
Powell’s method
 Powell’s method is an extension of brute force line
search method which uses basis vectors as the
search directions.
 Powell’s method starts with initial guess P0 and uses
each of the basis vector direction, one after the
other, to minimise the function in n steps to locate
Pn. This step is identical to brute force line search
method.
 It then locates the optimal point by line search
method using the vector given by (Pn − P0)
 The method is iterative and the each iteration
requires (n+1) line searches
Gradient based multidimensional
unconstrained minimization
Optimization methods in
n dimensions

Gradient based Non gradient

methods based methods

Methods Methods deterministic Non-deterministic

that do that
not require Nelder– Genetic
require Hessian Mead Algorithms
Hessian Simplex
Simulated
Divided Annealing
Rectangles
Method Particle
Swarm
Optimization
Gradient based multidimensional
unconstrained minimization
General procedure

 Assume that the mathematical statement of the

problem is ready involving
 the objective function
 the design variables (they must be independent) and
 other parameters
 Iteratively search for the optima involving
 identifying the search direction along which optima lies
 searching in that direction for locating the position of
optima by using line search method
 Most procedures require the objective function
and its gradient G
 Some procedures also require Hessian H
Convex design space
 Most optimisation algorithms assume convex
design space
 A real-valued function defined on an n-
dimensional interval is convex if the line segment
between any two points on the graph of the
function is above or on the graph in a Euclidean
space
 In reality design space can be non-convex.
 It is essential to find out if the design space is
convex before attempting optimisation
Convex/concave Design Space in 2D

(x12 ,x22) (x12 ,x22)

(x11 ,x2 1) (x11 ,x21)

Convex 2D domain Concave 2D domain

Convex Sets
a, b S  a  (1   )b S   0,1
Convex vs non-convex function
Condition for convexity
f (x1  (1   ) x2 )  f ( x1 )  (1   ) f ( x2 ), 0    1
y convex function y non-convex function

x2
x2
x2 )
)

x1
x1
1
(
x1

local optima
f (

x x
local and also global optima
global optima
Gradient and Hessian
 J 
 X 
 1
 J 
 X 2 
 
 Gradient of a function is J  G ( J )   . 
 . 
 
 . 
 J 
 X 
 n

 The gradient vector is perpendicular to hyperplane

tangent to the contour surfaces of constant J
 For n=1 and 2, hyperplanes are points and
contour lines respectively
Hessian and its use
 The second derivative of objective function produces
n(n+1)/2 partial derivatives
2 J 2 J
, if i  j and ,i j
x1x2 xi2

 The second-order partial derivatives represents the

Hessian matrix
 2J 2J 2J  Hessian matrix is real square
 ...  (n x n) symmetric matrix
 21x 2
x1x2 x1xn 
We note any real square
  J 2J 2J 
H ... symmetric matrix has
x x x2 2
x2 xn 
 22 1  o only real Eigen values
  J  J  J 
2 2
o has real distinct orthogonal
...
 xn x1 xn x2 xn2 
Eigen vectors if Eigen values
are distinct
Hessian and its use
 Near the minimum, J can be approximated to be
a quadratic in X and can be expressed in terms
of gradient G and Hessian matrix H:

J ( X)  12 XT HX  G T X  C
 It can be seen that condition for minimum is
J ( X)  12 (HX  XT H )  G T  0 or
 12 (H T X  HX)  G  0

If H is symmetric then HX  G  0
 Thus minimisation of J identical to solution of the
linear algebraic equations usually written as
AX=b (H=A and b=-G)
Hessian and its use
 Expanding J(x) about the stationary point x* in a
direction p and noting that G(x*) = 0, at the
stationary point the behavior of the function is
determined by H 0

J (x *   p)  J (x)  G (x) T  p  12  2pT Hp

 J (x*)  12  2pT Hp
 H is a symmetric matrix, and therefore it has real
orthogonal eigenvectors, i.e.
Hui  ui , u  1
J (x *   u i )  J (x*)  12  2uTi Hu i
 J (x*)  12  2 i
Gradient and Hessian
 Thus J(x*+ui) increases or decrease over
and above J(x*) depending on whether λi is
positive, negative or zero
 For J to be minimum H be must be positive
definite, i.e. all its eigen values must be
positive
Gradient based methods
 Assume that J is quadratic and G and H are
constants
J (x)  a  G T x  12 x T Hx and
J  G  Hx
 Therefor unique minimum for J will be given by
J  G  Hx*  0 or
x*  H 1G
 If n is very large, the method is not feasible as it
requires inverse of (n x n) H matrix
 Realistic methods minimize the n-dimensional
function through several 1D line-minimizations.
Line search methods
 Start with X0 and a direction (a vector S0 in n
dimensions)
 Use 1D minimization method and minimize J(α) =
J(X0 + α S0), S0 (or p0) is the initial search
direction and α is the step size.
 Sk is the search direction for major iteration. αk is
the step length from the line search
 The important distinguishing feature of a gradient-
based algorithm its search direction
 Any line search that satisfies sufficient decrease
can be used, but one that satisfies the Strong
Wolfe conditions (on step size) is recommended.
A general gradient based method

start

Input: Initial guess, X0

Search Output: Optimum, X*
direction
k←0
while Not converged do
Compute a search direction Sk
Line Find a step length αk, such that
Update x search J(Xk+αk Sk) < J(Xk)
(the curvature condition may
also be included)

n Is J Update the design variables:

min ? Xk+1 ← Xk + αk Sk
k←k+1
end while
y
stop
Standard procedure (flow chart)

Some methods
do not need H(X)

Sensitivity Analysis
Analysis analysis
Perform
0
Calculate Calculate search
Input X 1D search
J ( X), G ( X), H ( X) direction S q
X q  X q 1   S q
q=q+1
n
y
stop Converged?
The search direction
 There are many algorithms
 Random search
 Powell method
 Steepest descent
 Flecture-Reeves (FR) method
 David-Flecture-Powell (DFP) method
 Broydon-Fletcher-GoldFarb-Shanno (BFGS) method
 Newton's method

 Some of the above are explained

Newton's method- the simplest variant
 If J is twice differentiable, J can be expressed by
using Taylor's series in terms of G and H
     
G X k 1  G X k  H X k ( X k 1  X k )
 
but G X k 1  0 condition for optimality
X  X k 1  X k  H 1G X k or  
X k 1  X k  H 1G ( X k ) or
X k 1  X k  H 1 J ( X k )

 The above expression is similar to Newton’s

method in 1D.
y ' ( x k 1 )  y ' ( x k )  y ' ' ( x k )( x k 1  x k ) or
k 1
x  x  y' ( x ) / y' ' ( x )
k k k
A variant of Newton's method -
Method of Steepest descent
 In the quasi Newton method, the Hessian matrix
is approximated to be the Identity matrix
Xk 1  Xk   I J Xk  
 This is the Method of steepest descent. It uses
the negative of the gradient of objective function
(steepest direction)as the search direction
 Chose 0 <  < 1 for stability (as is usually done)
 We may assume that the change in the magnitude
of X is the same as the one obtained in the
previous iteration. Note that pk=Gk/|Gk|
( k 1)T k 1
G p
 k 1 G ( k 1)T p k 1   k G ( k )T p k   k   k 1
G ( k )T p k
A variant of Newton's method -
Method of Steepest descent
 Alternately, an analytic formula for k can also be
found out by assuming quadratic J in x with G = -b
and H = A calculated at xk

J (x)  12 xT Ax  xT b
J (x k   p k )  12 (x k   p k )T A(x k   p k )  (x k   p k )T b
 12  2 p ( k )T A p k   p ( k )T Ax k   p ( k )T b  constants
as A is (n×n) is symmetric and positive definite
 To minimise J wrt , we set dJ/d=0, which gives

p ( k )T
( Ax k
 b)
 p A p  p A x  p b  0 or   
( k )T k ( k )T k ( k )T

p ( k )T A p k
Method of Steepest descent

 Justification for quasi Newton method

 
Xk 1  Xk   I J Xk , 0    1
 Consider Taylor expansion about Xn

       X
J X  X  J X  J X
k k k T

 Note that LHS and hence RHS must be negative.

J X  X  0
k T

 It can be seen that method of steepest descent

method involves the negative of the gradient of
objective function as the search direction
 It can be shown that the method does not give fast
convergence when close to the local minima
Method of Steepest descent

Input: Initial guess, X0, convergence tolerances, εg, εa and εr.

εg absolute tolerance on gradient (typically 10-6)

εa relative tolerance on objective function (typically 10-2)
εr absolute tolerance on the function (typically 10-2)
Method of Steepest descent
 |J(Xk+1)-J(Xk)|≤a+≤r|J(Xk| is a check for the successive
reductions in J
 If J is of order 1, r dominates, if J is smaller than 1, then
the absolute error dominates
 The method of steepest descent has a problem that with
the exact line search, the steepest descent direction at
each iteration is orthogonal to the previous one
dJ ( X k 1 )
0
d k 1
J ( X )  ( X k   S k )
 0
X K 1

 T J ( X k 1 )S k  0

 G T ( X k 1 )G ( X k )  0
Method of Steepest descent

J ( X)  12 ( X 12  10 X 22 )

 The method is inefficient as successive search directions

are perpendicular to each other.
 Error decreases in the first few iterations, but the method is
slow near the minimum.
 The algorithm is guaranteed to converge, but no of
iterations can be infinite. The rate of convergence is linear.
Steepest descent
Graphical interpretation

The method suffers from poor convergence

Lecture Ends

Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
DTAM: Dense Tracking and Mapping in Real-Time Seminar
No ratings yet
DTAM: Dense Tracking and Mapping in Real-Time Seminar
34 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Optimization2
No ratings yet
Optimization2
40 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Optim
No ratings yet
Optim
70 pages
Lecture8_UnconstrainedII_2023
No ratings yet
Lecture8_UnconstrainedII_2023
57 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
No ratings yet
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
46 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
No ratings yet
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
37 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Opt Class CH17102 - Unit 2
No ratings yet
Opt Class CH17102 - Unit 2
25 pages
Lecture 7 (with notes)
No ratings yet
Lecture 7 (with notes)
39 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
GA_ex_2
No ratings yet
GA_ex_2
21 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
An Overview of Traditional Optimization Methods - Truncated
No ratings yet
An Overview of Traditional Optimization Methods - Truncated
17 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
OPTFIT aflevering
No ratings yet
OPTFIT aflevering
9 pages
Multivariable Optimization
No ratings yet
Multivariable Optimization
48 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
50 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Math Optimization
No ratings yet
Math Optimization
11 pages
US - TMC - 05 - Optimization 2022
No ratings yet
US - TMC - 05 - Optimization 2022
43 pages
HW 3 Unconstrained-Optimization Advanced
No ratings yet
HW 3 Unconstrained-Optimization Advanced
9 pages
Chương 9
No ratings yet
Chương 9
12 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Exam1Review Annotated
No ratings yet
Exam1Review Annotated
13 pages
Exam With Solutions PDF
0% (1)
Exam With Solutions PDF
17 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Download
No ratings yet
Download
7 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
#9 Steepest Descent
No ratings yet
#9 Steepest Descent
17 pages
Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX
No ratings yet
Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX
9 pages
OptimisationII_notes (1) (2)
No ratings yet
OptimisationII_notes (1) (2)
94 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
69 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Unconstrained Optimization Gradient Search Method
No ratings yet
Unconstrained Optimization Gradient Search Method
8 pages
12 - ASAP - NPTEL - Neural Network - Let4
No ratings yet
12 - ASAP - NPTEL - Neural Network - Let4
13 pages
Industrial Training Report: Course "Artificial Intelligence"
No ratings yet
Industrial Training Report: Course "Artificial Intelligence"
30 pages
A Stein Variational Newton Method: Preprint. Work in Progress
No ratings yet
A Stein Variational Newton Method: Preprint. Work in Progress
14 pages
Deep learning
No ratings yet
Deep learning
15 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
(Springer Series in Materials Science 245) Christopher R. Weinberger, Garritt J. Tucker (Eds.) - Multiscale Materials Modeling For Nanomechanics-Springer International Publishing (2016)
No ratings yet
(Springer Series in Materials Science 245) Christopher R. Weinberger, Garritt J. Tucker (Eds.) - Multiscale Materials Modeling For Nanomechanics-Springer International Publishing (2016)
554 pages
Unconstrained and Constrained Optimization Algorithms by Soman K.P
No ratings yet
Unconstrained and Constrained Optimization Algorithms by Soman K.P
166 pages
An Introduction to Optimization 2nd Edition Edwin K. P. Chong instant download
100% (1)
An Introduction to Optimization 2nd Edition Edwin K. P. Chong instant download
49 pages
Lecture 01 Linear Regression Single - Var PDF
No ratings yet
Lecture 01 Linear Regression Single - Var PDF
16 pages
Deep Learning - IIT Ropar - Unit 7 - Week 4
No ratings yet
Deep Learning - IIT Ropar - Unit 7 - Week 4
6 pages
Nurul A. Chowdhury - Numerical Methods
100% (1)
Nurul A. Chowdhury - Numerical Methods
336 pages
Summative Assessment
No ratings yet
Summative Assessment
31 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
fileml
No ratings yet
fileml
54 pages
Deep Learning Notes PDF
No ratings yet
Deep Learning Notes PDF
26 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
OA Notes
No ratings yet
OA Notes
62 pages
TheoryDL
No ratings yet
TheoryDL
227 pages
Learning Rules of ANN
No ratings yet
Learning Rules of ANN
25 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
Section 3 Multi Variable Optimization
No ratings yet
Section 3 Multi Variable Optimization
78 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
55 pages
Introduction To Optimization
No ratings yet
Introduction To Optimization
13 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Andrew NG
No ratings yet
Andrew NG
31 pages

Lec3 Gradient Based Method Part I

Uploaded by

Lec3 Gradient Based Method Part I

Uploaded by

Deterministic Unconstrained

Rosenbrock's banana function

Gobal minima is at x1 = a and x2 =b

(Usually a and b are set to 1 and 100 respectively)

 Comments of characteristics of real life problems

 Minimize J by searching along

Gradient based Non gradient

Methods Methods deterministic Non-deterministic

 Assume that the mathematical statement of the

(x12 ,x22) (x12 ,x22)

(x11 ,x2 1) (x11 ,x21)

Convex 2D domain Concave 2D domain

 The gradient vector is perpendicular to hyperplane

 The second-order partial derivatives represents the

J (x *   p)  J (x*)  G (x*) T  p  12  2pT Hp

Input: Initial guess, X0

n Is J Update the design variables:

 Some of the above are explained

 The above expression is similar to Newton’s

 Justification for quasi Newton method

 Note that LHS and hence RHS must be negative.

 It can be seen that method of steepest descent

Input: Initial guess, X0, convergence tolerances, εg, εa and εr.

εg absolute tolerance on gradient (typically 10-6)

 The method is inefficient as successive search directions

The method suffers from poor convergence

You might also like

J (x *   p)  J (x)  G (x) T  p  12  2pT Hp