0% found this document useful (0 votes)

83 views

Multi-Variable Optimization Methods

The document discusses gradient-based optimization methods. It introduces the gradient and Hessian of functions with multiple variables. It then describes several optimization algorithms: steepest descent uses the negative gradient direction at each step; conjugate gradient incorporates previous gradient directions; Newton's method uses the Hessian to fit a quadratic model and find the minimum in one step, but may not converge for non-quadratic functions. Modified Newton's method performs a line search along the Newton direction instead of accepting the step that minimizes the quadratic model.

Uploaded by

Neha Randhar Daga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views

Multi-Variable Optimization Methods

Uploaded by

Neha Randhar Daga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

3 Gradient-Based Optimization

3.1 Optimality Conditions

Consider a function f(x) where x is a vector x
T
= [x
1
, x
2
, . . . , x
n
].
The gradient vector of this function is given by the partial derivatives with respect to each of
the independent variables,
f(x) g(x)
2
6
6
6
4
f
x
1
f
x
2
.
.
.
f
x
n
3
7
7
7
5
(1)
In the multivariate case, the gradient vector is perpendicular to the the hyperplane tangent to
the contour surfaces of constant f.
1
Higher derivatives of multi-variable functions are dened as in the single-variable case, but note
that the number of gradient component increases by a factor of n for each dierentiation.
While the gradient of a function of n variables is an n-vector, the second derivative of an
n-variable function is dened by n
2
partial derivatives of the n rst partial derivatives with
respect to th n variables.

2
f
x
i
x
j
, i = j;

2
f
x
2
i
, i = j (2)
If the partial derivatives f/x
i
, f/x
j
and
2
f/x
i
x
j
are continuous, then

2
f/x
i
x
j
exists and
2
f/x
i
x
j
=
2
f/x
j
x
i
. These second order partial derivatives
can be represented by a square symmetric matrix called the Hessian matrix,

2
f(x) H(x)
2
6
6
4

2
f

2
x
1

2
f
x
i
x
n
.
.
.
.
.
.

2
f
x
1
x
n

2
f

2
x
n
3
7
7
5
(3)
If f is a quadratic function, the Hessian of f is constant and f can be expressed as
f(x) =
1
2
x
T
Hx + g
T
x + (4)
2
As in the single-variable case the optimality conditions can be derived from the Taylor-series
expansion of f about x

,
f(x

+ p) = f(x

) + p
T
g(x

) +
1
2

2
p
T
H(x

+ p)p, (5)
where 0 1, is a scalar, and p is an n-vector.
For x

to be a local minimum, then for any vector p, there must be a nite such that
f(x

+ p) f(x

), i.e. there is a neighborhood in which this condition holds. If this

condition is satised, then f(x

+ p) f(x

) 0 and the rst and second order terms in

the Taylor-series expansion must be greater of equal to zero.
As in the single variable case, and for the same reason, we start by considering the rst order
terms. Since p is an arbitrary vector and can be either positive or negative, then every
component of the gradient vector g(x

) must be zero.
Now we have to consider the second order term,
2
p
T
H(x

+ p)p. For this term to be

non-negative, H(x

+ p) has to be positive semi-denite, and by continuity, the Hessian at

the optimum, H(x

) must also be positive semi-denite.

3
Necessary conditions:
g(x

) = 0; H(x

) is positive semi-denite (6)

Sucient conditions:
g(x

) = 0; H(x

) is positive denite (7)

4
3.2 General Algorithm for Smooth Functions
All algorithms for unconstrained gradient-based optimization can be described as follows. We
start with k = 0 and an estimate of x

, x
k
.
1. Test for convergence. If the conditions for convergence are satised, then we can stop and
x
k
is the solution. Else, go to Step 2.
2. Compute a search direction. Compute the vector p
k
that denes the direction in n-space
along which we will search.
3. Compute the step length. Find a positive scalar,
k
such that f(x
k
+
k
p
k
) < f(x
k
).
4. Update the design variables. Set x
k+1
= x
k
+
k
p
k
, k = k + 1 and go back to 1.
x
k+1
= x
k
+
k
p
k
| {z }
x
k
(8)
5
3.3 Steepest Descent Method
The steepest descent method uses the negative of the gradient vector at each point as the
search direction for each iteration. As mentioned previously, the gradient vector is orthogonal to
the plane tangent to the isosurfaces of the function.
The gradient vector at a point, g(x
k
), is also the direction of maximum rate of change
(maximum increase) of the function at that point. This rate of change is given by the norm,
g(x
k
).
Steepest descent algorithm:
1. Select starting point x
0
, and convergence parameters
g
,
a
and
r
.
2. Compute g(x
k
) f(x
k
). If g(x
k
)
g
then stop. Otherwise, compute the
normalized search direction to p
k
= g(x
k
)/g(x
k
).
3. Find the positive step length
k
such that f(x
k
+ p
k
) is minimized.
4. Update the current point, x
k+1
= x
k
+ p
k
.
5. Evaluate f(x
k+1
). If the condition |f(x
k+1
) f(x
k
)|
a
+
r
|f(x
k
)| is satised for
two successive iterations then stop. Otherwise, set k = k + 1, x
k+1
= x
k
+ 1 and return
to step 2.
6
Note that the steepest descent direction at each iteration is orthogonal to the previous one, i.e.,
g
T
(x
k+1
)g(x
k
) = 0. Therefore the method zigzags in the design space and is rather
inecient.
The algorithm is guaranteed to converge, but it may take an innite number of iterations. The
rate of convergence is linear.
Usually, a substantial decrease is observed in the rst few iterations, but the method is very
slow after that.
7
3.4 Conjugate Gradient Method
A small modication to the steepest descent method takes into account the history of the
gradients to move more directly towards the optimum.
1. Select starting point x
0
, and convergence parameters
g
,
a
and
r
.
2. Compute g(x
k
) f(x
k
). If g(x
k
)
g
then stop. Otherwise, go to step 5.
3. Compute g(x
k
) f(x
k
). If g(x
k
)
g
then stop. Otherwise continue.
4. Compute the new conjugate gradient direction p
k
= g(x
k
) +
k
p
k1
, where
=

g
k

g
k1

2
=
g
T
k
g
k
g
T
k1
g
k1
.
5. Find the positive step length
k
such that f(x
k
+ p
k
) is minimized.
6. Update the current point, x
k+1
= x
k
+ p
k
.
7. Evaluate f(x
k+1
). If the condition |f(x
k+1
) f(x
k
)|
a
+
r
|f(x
k
)| is satised for
two successive iterations then stop. Otherwise, set k = k + 1, x
k+1
= x
k
+ 1 and return
to step 3.
Usually, a restart is performed every n iterations for computational stability, i.e. we start with a
steepest descent direction.
8
3.5 Newtons Method
The steepest descent and conjugate gradient methods only use rst order information, i.e. the
rst derivative term in the Taylor series to obtain a local model of the function.
Newtons methods uses a second-order Taylor series expansion of the function about the current
design point, i.e. a quadratic model
f(x
k
+ s
k
) f
k
+ g
T
k
s
k
+
1
2
s
T
k
H
k
s
k
, (9)
where s
k
is the step to the minimum. Dierentiating this with respect to s
k
and setting it to
zero, we can obtain the step for that minimizes this quadratic,
H
k
s
k
= g
k
. (10)
This is a linear system which yields a Newton step, s
k
, as a solution.
9
If H
k
is positive denite, only one iteration is required for a quadratic function, from any
starting point. For a general nonlinear function, Newtons method converges quadratically if x
0
is suciently close to x

and the Hessian is positive denite at x

.
As in the single variable case, diculties and even failure may occur when the quadratic model
is a poor approximation of f far from the current point. If H
k
is not positive denite, the
quadratic model might not have a minimum or even a stationary point. For some nonlinear
functions, the Newton step might be such that f(x
k
+ p
k
) > f(x
k
) and the method is not
guaranteed to converge.
Another disadvantage of Newtons method is the need to compute not only the gradient, but
also the Hessian, i.e. n(n + 1)/2 second order derivatives.
10
3.5.1 Modied Newtons Method
A small modication to Newtons method is to perform a line search along the Newton
direction, rather than accepting the step size that would minimize the quadratic model.
1. Select starting point x
0
, and convergence parameter
g
.
2. Compute g(x
k
) f(x
k
). If g(x
k
)
g
then stop. Otherwise, continue.
3. Compute H(x
k
)
2
f(x
k
) and the search direction, p
k
= H
1
g
k
.
4. Find the positive step length
k
such that f(x
k
+
k
p
k
) is minimized. (Start with

k
= 1).
5. Update the current point, x
k+1
= x
k
+
k
p
k
and return to step 2.
Although this modication increases the probability that f(x
k
+ p
k
) < f(x
k
), it still
vulnerable to the problem of having an Hessian that is not positive denite and has all the other
disadvantages of the pure Newtons method.
11
3.6 Quasi-Newton Methods
This class of methods uses rst order information only, but build second order information an
approximate Hessian based on the sequence of function values and gradients from previous
iterations. The method also forces H to be symmetric and positive denite, greatly improving
its convergence properties.
When using quasi-Newton methods, we usually start with the Hessian initialized to the identity
matrix and then update it at each iteration. Since what we actually need is the inverse of the
Hessian, we will work with V
k
H
1
k
. The update at each iteration is written as

V
k
and is
added to the current one,
V
k+1
= V
k
+

V
k
. (11)
Let s
k
be the step taken from x
k
, and consider the Taylor-series expansion of the gradient
function about x
k
,
g(x
k
+ s
k
) = g
k
+ H
k
s
k
+ (12)
Truncating this series and setting the variation of the gradient to y
k
= g(x
k
+ s
k
) g
k
yields
H
k
s
k
= y
k
. (13)
Then, the new approximate inverse of the Hessian, V
k+1
must satisfy the quasi-Newton
condition,
V
k+1
y
k
= s
k
. (14)
12
3.6.1 DavidonFletcherPowell (DFP) Method
One of the rst quasi-Newton methods was devised by Davidon (in 1959) and modied by
Fletcher and Powell (1963).
1. Select starting point x
0
, and convergence parameter
g
. Set k = 0 and H
0
= I.
2. Compute g(x
k
) f(x
k
). If g(x
k
)
g
then stop. Otherwise, continue.
3. Compute the search direction, p
k
= V
k
g
k
.
4. Find the positive step length
k
such that f(x
k
+
k
p
k
) is minimized. (Start with

k
= 1).
5. Update the current point, x
k+1
= x
k
+
k
p
k
, set s
k
=
k
p
k
, and compute the change in
the gradient, y
k
= g
k+1
g
k
.
6. Update V
k+1
by computing
A
k
=
V
k
y
k
y
T
k
V
k
y
T
k
V
k
y
k
B
k
=
s
k
s
T
k
s
T
k
y
k
V
k+1
= V
k
A
k
+ B
k
7. Set k = k + 1 and return to step 2.
13
3.6.2 BroydenFletcherGoldfarbShanno (BFGS) Method
The BFGS method is also a quasi-Newton method with a dierent updating formula
V
k+1
=
"
I
s
k
y
T
k
s
T
k
y
k
#
V
k
"
I
s
k
y
T
k
s
T
k
y
k
#
+
s
k
s
T
k
s
T
k
y
k
(15)
The relative performance between the DFP and BFGS methods is problem dependent.
14
3.7 Trust Region Methods
Trust region, or restricted-step methods are a dierent approach to resolving the weaknesses
of the pure form of Newtons method, arising from an Hessian that is not positive denite or a
highly nonlinear function.
One way to interpret these problems is to say that they arise from the fact that we are stepping
outside a the region for which the quadratic approximation is reasonable. Thus we can
overcome this diculties by minimizing the quadratic function within a region around x
k
within
which we trust the quadratic model.
15
1. Select starting point x
0
, a convergence parameter
g
and the initial size of the trust region,
h
0
.
2. Compute g(x
k
) f(x
k
). If g(x
k
)
g
then stop. Otherwise, continue.
3. Compute H(x
k
)
2
f(x
k
) and solve the quadratic subproblem
minimize q(s
k
) = f(x
k
) + g(x
k
)
T
s
k
+
1
2
s
T
k
H(x
k
)s
k
(16)
w.r.t. s
k
(17)
s.t. h
k
s
k
h
k
, i = 1, . . . , n (18)
4. Evaluate f(x
k
+ s
k
) and compute the ratio that measures the accuracy of the quadratic
model,
r
k
=
f
q
=
f(x
k
) f(x
k
+ s
k
)
f(x
k
) q(s
k
)
.
5. Compute the size for the new trust region as follows:
h
k+1
=
s
k

4
if r
k
< 0.25, (19)
h
k+1
= 2h
k
if r
k
< 0.25 and h
k
= s
k
, (20)
h
k+1
= h
k
otherwise. (21)
16
6. Determine the new point:
x
k+1
= x
k
if r
k
0 (22)
x
k+1
= x
k
+ s
k
otherwise. (23)
7. Set k = k + 1 and return to 2.
The initial value of h is usually 1. The same stopping criteria used in other gradient-based
methods are applicable.
[1, 2, 3]
17
3.8 Convergence Characteristics for a Quadratic Function
18
3.9 Convergence Characteristics for the Rosenbrock Function
19
3.10 Convergence Characteristics for the Rosenbrock Function
20
References
[1] A. D. Belegundu and T. R. Chandrupatla. Optimization Concepts and Applications in
Engineering, chapter 3. Prentice Hall, 1999.
[2] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization, chapter 4. Academic
Press, 1981.
[3] C. Onwubiko. Introduction to Engineering Design Optimization, chapter 4. Prentice Hall,
2000.
21

Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Lab 1 Fisca 2
No ratings yet
Lab 1 Fisca 2
6 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Opt_Lec_10
No ratings yet
Opt_Lec_10
16 pages
Rrrdesdelinear and Nonlinear Programming-4
No ratings yet
Rrrdesdelinear and Nonlinear Programming-4
3 pages
Optimization2
No ratings yet
Optimization2
40 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
14 Newton
No ratings yet
14 Newton
24 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
HW 3 Unconstrained-Optimization Advanced
No ratings yet
HW 3 Unconstrained-Optimization Advanced
9 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Chương 9
No ratings yet
Chương 9
12 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
No ratings yet
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
23 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
BFGS
No ratings yet
BFGS
9 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Second Order Method: Newton Method Quasi Newton Method
No ratings yet
Second Order Method: Newton Method Quasi Newton Method
11 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
Chapter 6vh
No ratings yet
Chapter 6vh
12 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
Lec 02
No ratings yet
Lec 02
43 pages
2.NCC-SFC-LMT-KKT 2
No ratings yet
2.NCC-SFC-LMT-KKT 2
56 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Lecture_7_8_other_descent_methods
No ratings yet
Lecture_7_8_other_descent_methods
7 pages
FALLSEM2020-21 CHE1011 TH VL2020210101704 Reference Material I 05-Sep-2020 Lecture 17 PDF
No ratings yet
FALLSEM2020-21 CHE1011 TH VL2020210101704 Reference Material I 05-Sep-2020 Lecture 17 PDF
19 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
OPTFIT aflevering
No ratings yet
OPTFIT aflevering
9 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Optim
No ratings yet
Optim
70 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Unit-5
No ratings yet
Unit-5
24 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Preguntas del examen
No ratings yet
Preguntas del examen
8 pages
Download
No ratings yet
Download
7 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Newton's Method For Unconstrained Optimization
No ratings yet
Newton's Method For Unconstrained Optimization
14 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
06_23ECE216_GradientDescent_v2
No ratings yet
06_23ECE216_GradientDescent_v2
73 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Thermodynamic Processes
100% (2)
Thermodynamic Processes
13 pages
Hempel, On The Nature of Mathematical Truth
No ratings yet
Hempel, On The Nature of Mathematical Truth
14 pages
Photostabilization of Poly (Vinyl Chloride) - Still On The Run
No ratings yet
Photostabilization of Poly (Vinyl Chloride) - Still On The Run
28 pages
Thegaslawscomplete 120227061412 Phpapp01
100% (1)
Thegaslawscomplete 120227061412 Phpapp01
51 pages
Acoustic Transmission Loss Presentation 51
No ratings yet
Acoustic Transmission Loss Presentation 51
10 pages
Heat Transfer
No ratings yet
Heat Transfer
27 pages
Dynamic Response Testing of Process Control Instrumentation: ISA-S26-1968
No ratings yet
Dynamic Response Testing of Process Control Instrumentation: ISA-S26-1968
48 pages
SaptarishisFacebookResearchGroupTechnique2BW PDF
No ratings yet
SaptarishisFacebookResearchGroupTechnique2BW PDF
9 pages
Com Solutions Manual
No ratings yet
Com Solutions Manual
966 pages
Spe 19008 Pa
100% (1)
Spe 19008 Pa
9 pages
Permeometro - 1
No ratings yet
Permeometro - 1
6 pages
Chemicals Zetag DATA LDP Zetag 7879 - 0410
No ratings yet
Chemicals Zetag DATA LDP Zetag 7879 - 0410
2 pages
Fatigue Crack Growth Rate Analysis in API5LX70 PDF
No ratings yet
Fatigue Crack Growth Rate Analysis in API5LX70 PDF
11 pages
A Review of Bamboo As A Reinforcement Material in Modern Construction
No ratings yet
A Review of Bamboo As A Reinforcement Material in Modern Construction
12 pages
Drying Characteristics of Mango Slices Using The Refractance Window™ Technique PDF
No ratings yet
Drying Characteristics of Mango Slices Using The Refractance Window™ Technique PDF
7 pages
Practical 5 Questions
No ratings yet
Practical 5 Questions
8 pages
Ecosurf SA-7
100% (1)
Ecosurf SA-7
2 pages
Medición Infrarrojospdf
No ratings yet
Medición Infrarrojospdf
3 pages
Auger Electron Spectros
No ratings yet
Auger Electron Spectros
7 pages
Instrument Strategies and Patient Cases
No ratings yet
Instrument Strategies and Patient Cases
75 pages
Christian Bucherie, BV
No ratings yet
Christian Bucherie, BV
27 pages
9702 s09 QP 31
No ratings yet
9702 s09 QP 31
12 pages
Design & Simulation of 150 WATT S-Band Solid State POWER AMPLIFIER'
No ratings yet
Design & Simulation of 150 WATT S-Band Solid State POWER AMPLIFIER'
69 pages
PHD All
No ratings yet
PHD All
587 pages
Additive Manufacturing: Ben Wittbrodt, Joshua M. Pearce
No ratings yet
Additive Manufacturing: Ben Wittbrodt, Joshua M. Pearce
7 pages
Astm D2270-04
No ratings yet
Astm D2270-04
6 pages
Groups, Rings, and Fields: 4 Year-Course, Ccsit, Uoa
No ratings yet
Groups, Rings, and Fields: 4 Year-Course, Ccsit, Uoa
21 pages
Models For Vapor-Phase and Liquid-Phase Mass Transfere On Distillation Trays
No ratings yet
Models For Vapor-Phase and Liquid-Phase Mass Transfere On Distillation Trays
5 pages
TOEFL Exercise 1
No ratings yet
TOEFL Exercise 1
9 pages

Multi-Variable Optimization Methods

Uploaded by

Multi-Variable Optimization Methods

Uploaded by

3 Gradient-Based Optimization

3.1 Optimality Conditions

), i.e. there is a neighborhood in which this condition holds. If this

) 0 and the rst and second order terms in

+ p)p. For this term to be

+ p) has to be positive semi-denite, and by continuity, the Hessian at

) must also be positive semi-denite.

) is positive semi-denite (6)

) is positive denite (7)

and the Hessian is positive denite at x

You might also like