ADMAT-Automatic Differentiation Toolbox For Use With MATLAB
ADMAT-Automatic Differentiation Toolbox For Use With MATLAB
1 Introduction 1
2 Installation of ADMAT 5
2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Obtaining ADMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Installation Instructions for Windows Users . . . . . . . . . . . . . . . 5
2.4 Installation Instructions for Unix (Linux) Users . . . . . . . . . . . . . 6
5 Advanced ADMAT 33
5.1 Computing Sparsity Patterns of Jacobian and Hessian Matrices . . . . 33
5.2 Efficient Computation of Structured Gradients . . . . . . . . . . . . . . 35
5.3 Forward Mode AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Storage of deriv Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Reverse Mode AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.6 Computing Second-Order Derivatives . . . . . . . . . . . . . . . . . . . 51
5.7 1-D Interpolation in ADMAT . . . . . . . . . . . . . . . . . . . . . . . 56
6 Newton Computations 59
6.1 Traditional Newton Computation . . . . . . . . . . . . . . . . . . . . . 59
i
Cayuga Research ii
9 Troubleshooting 89
A Applications of ADMAT 91
A.1 Quasi-Newton Computation . . . . . . . . . . . . . . . . . . . . . . . . 91
A.2 A Sensitivity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Introduction
This toolbox is designed to help a MATLAB user compute first and second derivatives,
and related structures, efficiently, accurately, and automatically. ADMAT employs
many sophisticated techniques, exploiting sparsity and structure, to gain efficiency
in the calculation of derivative structures (e.g., gradients, Jacobians, and Hessians).
Moreover, ADMAT can directly calculate Newton steps for nonlinear systems, often
with great efficiency.
To use ADMAT, a MATLAB user need only supply an M-file to evaluate a smooth
nonlinear ‘objective function’ at a given argument. On request and when appropriate,
ADMAT will ensure that in addition to the objective function evaluation, the Jaco-
bian matrix (the gradient is a special case) and the Hessian matrix (i.e., the symmetric
matrix of second derivatives) and possibly the Newton step will also be evaluated at
the supplied argument. The user need not supply derivative codes or approximation
schemes.
• A template design for the efficient calculation of ‘structured’ Jacobian and Hes-
1
Cayuga Research 2
sian matrices.
• Efficient direct computation of Newton steps (In some cases avoiding the full
computation of the Jacobian and/or Hessian matrix).
Limitations of ADMAT
ADMAT supports most frequently used computations for the first and second deriva-
tives. However, the limitation of the current version of ADMAT is that it does not
support the second derivation computation for a matrix. In other words, the matrix
in a function for the second derivation computation has to be a constant, not a variable.
Guide Organization
• Chapter 2: ADMAT installation.
• Bibliography.
Acknowledgement
ADMAT 2.0 builds on the original work of Thomas Coleman and Arun Verma (AD-
MAT [12], ADMIT-2[14]). ADMAT 2.0 has increased functionality and is more ef-
ficient than those pioneering efforts. The technology behind ADMAT 2.0 is derived
from research published by Coleman and colleagues over a number of years; many of
the most relevant publications are listed in the Bibliography.
We thank Arun Verma for several illuminating discussions over the past years.
Installation of ADMAT
In this chapter, the installation of ADMAT on a Unix (Linux) and Windows platform
is discussed.
2.1 Requirements
ADMAT belongs to the “operator overloading” class of AD tools and uses object
oriented programming features. Thus, ADMAT requires MATLAB 6.5 or above.
A package of reduced functionality, ADMAT 2.0 Student, can be obtained from Cayuga
Research and evaluated free of charge for a period of up to 3 weeks. Note that AD-
MAT 2.0 Student computes only the first derivatives by forward and reverse modes
of automatic differentiation. Hence a user of ADMAT 2.0 Student can only use the
functionalities described in §5.3, 5.4 and 5.5 and test the corresponding demos in the
Demos\ Chapter 5 directory.
5
Cayuga Research 6
1. Place the ADMAT package to an appropriate directory and unzip the package
by any unzip software.
ADMAT computes the gradient ∇f (x) automatically given both the MATLAB M-file
to evaluate f (x) and the value of the argument x (an n-vector). ADMAT can compute
the gradient using the overloaded MATLAB function ‘feval’.
“Reverse mode” computes the gradient in time proportional to the time required
to evaluate the function f itself. This is optimal; reverse-mode can be approximately
n-times faster than forward mode when applied to the computation of gradients. The
downside of using reverse mode is that the space requirements can be quite large since
the entire computational graph required to evaluate f must be saved and then ac-
cessed in reverse order. See §5.2 for advice on how to use the inherent structure in
7
Cayuga Research 8
We first illustrate with an example and then describe the general situation, followed
by a second example.
See DemoGrad.m
This example shows how to compute the gradient of the Brown function in ADMAT
in three different ways: default (reverse), forward, and reverse. The definition of the
Brown function is as follows:
n−1
X 2 2
y= {(x2i )xi+1 +1 + (x2i+1 )xi +1 }.
i=1
The following is an illustration of the use of ADMAT with the Brown function:
1. Set the function to be differentiated to be the Brown function.
>> myfun = ADfun(’brown’, 1);
Note: the second input argument in ADfun, ‘1’, is a flag indicating a scalar
mapping, f : Rn → R1 ; more generally, the second argument is set to ‘m’ for a
vector-valued function, F : Rn → Rm .
>> n = 5;
3. Initialize vector x.
>> x = ones(n,1)
x=
1
1
1
1
1
4. Call feval to get the function value and the gradient of Brown function, allowing
ADMAT to choose the mode (by default ADMAT chooses to use reverse mode
for computing the gradients).
5. Use the forward mode to compute the gradient of f . In this case the third input
argument of feval is set to the empty array, ‘[ ]’ since no parameters are stored
in the input variable Extra.
6. Use the reverse mode to compute the gradient g. As the above case, the input
Extra is ‘[]’.
fun= ADfun(infun,scalar)
Input arguments
‘infun’ is the function to be differentiated; a string representing the function name.
‘scalar’ is the dimension of the objective function value; a scalar
Output arguments
‘fun ’ is the function to overload feval; a MATLAB cell structure.
Input arguments
‘fun’ is the function to be differentiated; an ADMAT ADfun class object.
‘x’ is a vector of the independent variables; either a row or column vector.
vector x is the ‘point’ at which the function and its gradient will be evaluated.
‘Extra’ stores parameters required in function fun; a MATLAB cell function.
‘options’ allows the user to choose the forward mode or reverse mode of auto-
matic differentiation. It has to be defined through ADMAT function setgradopt.
The default mode for computing gradients for feval is the reverse mode.
Output arguments
‘f ’ is the function value at point x; a scalar.
‘g ’ is the gradient of f ; a row vector.
Input arguments
y = functionName(x, Extra),
where y is the output, x is the independent input variable and Extra is a MATLAB cell
structure, which stores all other parameters required in functionName. When there
are no parameters stored in Extra, the empty array, ‘[ ]’ is passed. The definitions of
the Broyden and Brown functions in this chapter satisfy the requirement.
What can be done when a function has more than two input arguments? For ex-
ample, suppose y = f(x, mu, gamma) where x is the independent variable. In this
case the second and third input arguments can be encapsulated as Extra.mu, and Ex-
tra.gamma. Thus, the new function call is y = f(x, Extra). Users can store scalars,
vectors and even matrices as fields in Extra, but currently, matrices stored in MATLAB
sparse format are not supported. We illustrate how to rewrap the original function to
satisfy the interface requirement in Example 3.1.2.
See DemoFeval.m
This example illustrates the use of input parameter ‘Extra’. The weighted mean of a
vector is computed.
Now, the function mean feval satisfies the 2-input argument requirement, so we can
use feval to compute its gradient.
>> n = 5;
>> rand(‘seed’,0);
3. Define a vector x.
>> x = rand(n,1)
x=
0.2190
0.0470
0.6789
0.6793
0.9347
>> mu = rand(n,1)
mu =
0.3835
0.5194
0.8310
0.0346
0.0535
>> mu = mu/sum(mu);
mu =
0.2105
0.2851
0.4561
0.0190
0.0293
differentiated along the columns of V to obtain the result J(x) · V . So if a user chooses
V to be the n-by-n identity matrix I, then the forward-mode result is the Jacobian
matrix J(x). Alternatively, given an arbitrary matrix W with m columns, the ‘reverse-
mode’ of ADMAT will produce, directly and accurately, the product J T (x) · W . So
if the user chooses W to be the m-by-m identity matrix, then the reverse-mode of
ADMAT produces the transpose of the Jacobian matrix, J T (x).
When the ‘forward mode’ in ADMAT is used, given the MATLAB function to evaluate
F , the ‘current point’ x, and a matrix V (with n rows), then the product J(x) · V is
determined automatically in time proportional to cV · ω(F ), where cV is the number of
columns of V and ω(F ) is the time taken for a single evaluation of F . There are no sig-
nificant additional space requirements when the ‘forward mode’ version of ADMAT is
used. Alternatively, when the ‘reverse mode’ in ADMAT is used, given the MATLAB
function to evaluate F , the ‘current point’ x, and a matrix W (with m columns), then
the product J T (x) · W is determined automatically in time proportional to cW · ω(F ),
where cW is the number of rows of W and ω(F ) is the time taken for a single evaluation
of F . The ‘reverse mode’ accesses the computational graph representing the evalua-
tion of F in reverse order; therefore, ‘reverse mode’ requires that the computational
graph be saved - this can result in serious additional space demands. Thus, from a
strict time complexity point of view, when m << n, the reverse mode is preferable
to the forward mode; however, this advantage can sometimes be mitigated when the
space requirements become excessive. See Chapters 4 and 5 to see how sparsity and
structure can be used to reduce the space requirements for the ‘reverse mode’ option.
The following example illustrates how to compute Jacobians and their products.
See DemoJac.m
The Broyden function is derived from a chemical engineering application [7]. Its
definition is as follows.
>> n = 5
>> x = ones(n,1)
x=
1
1
1
1
1
4. Call feval to compute the function value and the Jacobian matrix at x. We omit
the input argument (Extra) when calling feval, given that it is empty.
0
−1
−1
−1
1
J=
−1 −2 0 0 0
−1 −1 −2 0 0
0 −1 −1 −2 0
0 0 −1 −1 −2
0 0 0 −1 −1
5. Using the forward mode AD, compute the product J(x) ×V , where V is an n×3
‘all ones’ matrix. Set the third input argument to ‘[ ]’, since no parameters are
stored in Extra.
>> V = ones(n,3);
>> options = setopt(’forwprod’, V); % set options to forward mode AD
>> [F,JV] = feval(myfun, x, [ ], options)
F=
0
−1
−1
−1
1
JV =
−3 −3 −3
−4 −4 −4
−4 −4 −4
−4 −4 −4
−2 −2 −2
>> W = ones(n,3);
>> options = setopt(’revprod’, W); % set options to reverse mode AD
>> [F, JTW] = feval(myfun, x, [], options)
F=
0
−1
−1
−1
1
JTW =
−2 −2 −2
−4 −4 −4
−4 −4 −4
−4 −4 −4
−3 −3 −3
ADMAT can compute the Hessian matrix, using feval, given the M-file for defining
the scalar-valued function f and an argument x.
Function feval has two interfaces for Hessian computation. One interface returns
the function value, gradient, and Hessian matrix; the other returns the product of the
Hessian, H, and a matrix V .
• [f, grad, H] = feval(fun, x, Extra)
Input arguments
‘fun’ is the function to be differentiated; a string representing the function
name.
‘x’ is the vector of the independent variables; either a row or column vector.
‘Extra’ stores parameters required in function fun; a MATLAB cell structure.
Output arguments
‘f ’ is the function value at point x; a scalar.
‘grad’ is the gradient at point x; a row vector.
‘H’ is the Hessian matrix evaluated at point x; a matrix.
Output Argument
‘HV ’ is the product H × V , a matrix.
See DemoHess.m
>> n = 5;
>> x = ones(n,1)
x=
1
1
1
1
1
4. Call feval to get the function value, gradient and Hessian of the Brown function.
We omit the third input argument, Extra, since it is empty.
>> V = ones(n,3);
>> options = setopt(’htimesv’, V); % set options for H times V
>> HV = feval(myfun, x, [], options)
HV =
20 20 20
40 40 40
40 40 40
40 40 40
20 20 20
definition. This will ensure that the data type of the dependent variable is consis-
tent with that of the input independent variable. Otherwise, undetected errors in the
derivative computation may occur.
y = cons(y, x),
Input arguments
“y” is the dependent variable, whose data type should be consistent with x in data
type.
“x” is the independent variable.
Output arguments
First, we define two similar functions. Sample2 calls the data type consistent function,
cons while sample1 does not.
y=zeros(m,1); y = zeros(m,1);
y = cons(y,x);
for i = 1:m for i = 1:m
y(i) = x(i)*x(i); y(i) = x(i)*x(i);
end end
In the above function definitions, the only difference is that function sample2 calls
cons to make the data type of y consistent with that of x. Now we compare the results
from these two functions.
The result from function sample1 is the function value (only) even through the input
x is an ADMAT forward mode type. This occurs because the MATLAB assignment
operation in the for loop automatically converts the result of x(i)*x(i) into a double
type due to the data type of y(i). Thus, ADMAT did nothing in this function call.
However, with the use of function cons in sample2, the data type of y is consistent with
x, which is the ADMAT forward mode type, deriv. Sample2 returns the function value
and the derivative simultaneously as desired. Therefore, we highly recommend users to
call cons after initializing the dependent variables in order to avoid some undetectable
errors in the derivative computation by ADMAT.
Efficient computation of sparse Jacobian and Hessian matrices is one of the key fea-
tures of ADMAT. The overall strategy is based on graph coloring techniques to exploit
matrix sparsity [9, 10, 13].
Informally, a matrix is viewed as sparse if most of its entries are zero. There is
no precise definition of a sparse matrix; however, the pragmatic view, which we adopt,
is that a matrix is regarded as sparse if it is cost-effective to do so. That is, the use
of sparse techniques can result in significant time savings.
Description of getjpi
Function “getjpi” computes the sparsity information for the efficient computation of a
sparse Jacobian matrix. It is invoked as follows:
Input arguments
23
Cayuga Research 24
name.
“n” is the number of columns of the Jacobian matrix; a scalar.
“m” is the number of rows of the Jacobian matrix; by default, m = n; a scalar.
“Extra” stores parameters required in function fun apart from the independent vari-
able; a MATLAB cell structure.
“method” sets techniques used to get the coloring Information:
“SPJ” is the user specified sparsity pattern of the Jacobian, in MATLAB sparse format.
Output arguments
“JPI” includes the sparsity pattern, coloring information and other information re-
quired for efficient Jacobian computation.
“SPJ” is the sparsity pattern of the Jacobian, represented in the MATLAB sparse
format.
Note that ADMAT computes the sparsity pattern, ‘SPJ’, of the Jacobian first be-
fore calculating ‘JPI’. This is an expensive operation. If the user already knows the
sparsity pattern ‘SPJ’, then it can be passed to the function ‘getjpi’ as an input argu-
ment, so that ADMAT will calculate ‘JPI’ based on the user-specified sparsity pattern.
There is no need to recalculate the sparsity pattern, ‘SPJ’; it is inefficient to do so.
Description of evalj
Function evalj computes the sparse Jacobian matrix based on the sparsity pattern
and coloring information obtained from getjpi. It is invoked as follows:
Input arguments
Output arguments
See DemoSprJac.m
function y= arrowfun(x,Extra)
y = x.*x;
y(1) = y(1)+x’*x;
y = y + x(1)*x(1);
10
15
20
25
30
35
40
45
50
0 5 10 15 20 25 30 35 40 45 50
nz = 148
2. Initialize x.
>> x = ones(n,1)
x=
1
1
1
1
1
4. Compute the function value and the Jacobian (in MATLAB sparse format) based
on the computed “JPI”. Set the input argument, Extra, to ‘[ ]’ since it is empty.
F=
7
2
2
2
2
J=
(1, 1) 6
(2, 1) 2
(3, 1) 2
(4, 1) 2
(5, 1) 2
(1, 2) 2
(2, 2) 2
(1, 3) 2
(3, 3) 2
(1, 4) 2
(4, 4) 2
(1, 5) 2
(5, 5) 2
The function “getjpi” determines the sparsity pattern of the Jacobian of arrowfun. It
only needs to be executed once for a given function. In other words, once the sparsity
pattern is determined by “getjpi”, ADMAT calculates the Jacobian at a given point
based on the pattern.
Description of gethpi
Function gethpi computes the sparsity pattern of a Hessian matrix, and corresponding
coloring information. It is invoked as follows:
Input arguments
• method= ‘i-a’: the default, ignore the symmetry. Computing exactly using AD.
• method= ‘i-f’: ignore the symmetry and use finite differences (FD)
Note that details for the different methods are given in [12, 14].
Output arguments
“HPI” includes the sparsity pattern, coloring information and other information re-
quired for efficient Hessian computation.
“SPH” is sparsity pattern of Hessian, represented in the MATLAB sparse format.
Similar to ‘getjpi’, users can pass ‘SPH’ to function ‘gethpi’ when the sparsity pat-
tern of Hessian is already known. The computation of ‘HPI’ will be based on the user-
specified sparsity pattern.
Description of evalh
Input arguments
Output arguments
See DemoSprHess.m
>> n = 5
>> x = ones(n,1)
x=
1
1
1
1
1
4. Evaluate the function value, gradient and Hessian at x. We set the input argu-
ment Extra to‘[ ]’ since it is empty.
grad =
4 8 8 8 4
H=
(1, 1) 12
(2, 1) 8
(1, 2) 8
(2, 2) 24
(3, 2) 8
(2, 3) 8
(3, 3) 24
(4, 3) 8
(3, 4) 8
(4, 4) 24
(5, 4) 8
(4, 5) 8
(5, 5) 12
Similar to the sparse Jacobian situation, function gethpi encapsulates the sparsity
structure and relevant coloring information for efficient calculation of the sparse Hes-
sian H. Function gethpi only needs to be executed once for a given function. In other
words, once the sparsity pattern is determined by “gethpi”, ADMAT calculates the
Hessian at a given point based on the pattern.
4.3 Reporting
For both evalj and evalh, two different levels of display output are possible. Users can
choose the input argument verb to set up the display level. We illustrate the use of
verb with evalj.
We will repeat Example 4.1.1 here, but with the different values of verb.
• verb = 0. There is no information displayed.
• verb = 1. Verbose mode. For example, below is the the output produced by the
direct bi-coloring methods:
2 2
4 4
6 6
8 8
10 10
0 5 10 0 5 10
nz = 28 nz = 9
10
0 5 10
nz = 19
shows the whole sparsity pattern of the Jacobian while the other two show the
sparsity patterns computed by the forward mode AD and the reverse mode AD
in the bi-coloring method, respectively.
Advanced ADMAT
In the previous chapters, we introduced ADMAT fundamentals that allow for the
computation of gradients, Jacobians, and Hessians. In this chapter we introduce a
number of advanced features: Jacobian and Hessian sparsity pattern computation,
efficient structured gradient computation, and derivative computation by forward and
reverse mode with the direct use of the overloaded operations. These features can
help users compute the sparsity patterns without forming the Jacobian or Hessian
explicitly, reduce the space requirement for gradient computation by reverse mode,
and avoid the function definition restrictions on “feval” calls.
Input arguments
33
Cayuga Research 34
Output argument
See DemoSP.m
>> n = 5;
Input arguments
Output argument
See DemoSP.m
>> n = 5;
and this structure can be exploited to reduce the practical computing time [8].
(F̃E )x (F˜E )y
JE = .
∇x f¯T ∇y f¯T
The key point is that reverse mode automatic differentiation can be applied to the
various structured steps indicated above, in turn. This can result in considerable space
savings, and in practice, a shorter running time.
Next, we illustrate how to use this technique to reduce the memory requirements
of reverse mode AD.
See DemoStct.m
%
% compute the gradient of f(x) by sparse blocks
%
%
% compute the sparsity of function S
JPI = getjpi(‘funcS’, n, n, Extra);
% evaluate J1, J2, ...., Jp, each with the same sparsity.
J = sparse(eye(n));
y = x;
for i = 1 : Extra.p
[y, J1] = evalj(‘funcS’, y, Extra, n, JPI);
J = J1*J;
end
1400
1000
Time Ratio
800
600
400
200
0
0 500 1000 1500 2000 2500 3000
Number of Steps (k)
Figure 5.1: The time ratio of ω(∇f )/ω(f ) for Euler’s method.
Figure 5.1 plots the ratio of ω(∇f )/ω(f ), where ω(f ) is the execution time of evaluat-
ing function f , on a fixed problem size n = 800 with different number of Euler’s steps.
The vertical axis is the time ω(∇f ) taken to compute the gradient divided by the time
ω(f ) taken to evaluate the function. All calculations are done through ADMAT . The
horizontal axis is the number of steps for Euler’s method.
Figure 5.1 illustrates that the straightforward reverse mode performs very well for
small problems, but the computing time spikes upward when the internal memory
limitation is reached. The straightforward forward mode can outperform the reverse
mode when the memory requirements of reverse mode become excessive. The third
approach exploits the sparsity of Ji . This approach is the fastest of the three proce-
dures. For more details about the efficient computation of structured gradients, please
refer to [8].
Define a deriv class object for the forward mode. Each object of deriv class
is a MATLAB struct array with two fields: val and deriv.
Input arguments
“x” is the value of independent variable, value for the field val.
“V” is the value V of the product J × V , where J is the first derivative of the
function to be differentiated at point x, value for the field deriv.
Output arguments
• val = getval(y),
Input arguments
Output arguments
• ydot = getydot(y),
Input arguments
Output arguments
See DemoFwd1.m
This example shows how to compute the function value and the first order deriva-
tive of y = x2 by using forward mode AD.
2. Compute y = x2 .
y=xˆ2
val =
9
deriv =
6
See DemoFwd1.m
1. Size of matrix A.
>> n = 5;
2. A is a 5 × 5 random matrix.
>> A = rand(n)
A=
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389
>> x = ones(n,1)
x=
1
1
1
1
1
4. Initialization
>> y = A *x
val =
2.7913
2.7679
3.2772
2.4657
2.5448
deriv =
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389
yval =
2.7913
2.7679
3.2772
2.4657
2.5448
Note that the forward mode AD computes a product of the Jacobian matrix and V .
Users can specify their own matrix V when defining a deriv object. We set V as an
identity matrix in above example so that the Jacobian matrix, J, is obtained. If V
is not an identity matrix, then the forward mode returns the product J × V , rather
than J itself. If V is not specified in defining a deriv object, say x = deriv(a), then V
is set to be zero by default. The storage of deriv class will discuss in §5.4.
See DemoFwd2.m
This example illustrates how to use the ADMAT forward mode on the Broyden func-
tion.
1 0 0
0 1 0
0 0 1
• If x.val is a scalar, then its x.deriv field is a row vector, whose length equals globp;
• If x.val is a row/column vector, then its x.deriv field is a matrix, whose number
of columns equals to globp and number of rows equals the length of x.val;
• If x.val is a matrix, then its x.deriv field is a 3-D matrix, e.g., A(i, j, k). The
value of k varies from 1 to globp.
See DemoStorage.m
This example illustrates the storage of deriv class object without user-specified the
second input argument, V, when the input value is a scalar, vector or matrix with
different values of globp.
• globp = 1
>> x = deriv(1)
val =
1
deriv =
0
>> x = deriv(ones(3,1))
val =
1
1
1
deriv =
0
0
0
>> x = deriv(ones(3))
val =
1 1 1
1 1 1
1 1 1
deriv =
0 0 0
0 0 0
0 0 0
• globp = 2
>> x = deriv(1)
val =
1
deriv =
0 0
>> x = deriv(ones(3,1))
val =
1
1
1
deriv =
0 0
0 0
0 0
>> x = deriv(ones(3))
val =
1 1 1
1 1 1
1 1 1
deriv(:,:,1) =
0 0 0
0 0 0
0 0 0
deriv(:,:,2) =
0 0 0
0 0 0
0 0 0
Define a derivtape class for the reverse mode. Each derivtape object is a MAT-
LAB struct array with two fields: val and varcount.
Input arguments
“x” is the value of the independent variable, value for the field val.
“flag” is a flag for the beginning of tape.
“1” - create a derivtape object x , save x on the tape and set the number of
the cell storing x as the beginning of the tape.
“0” - create a derivtape object and save the value to the tape;
None - create a derivtape object, but do not save the value on the tape.
Output arguments
• val = getval(y),
Input arguments
Output arguments
• JTW = parsetape(W),
Input arguments
Output arguments
See DemoRvs1.m
2. Compute y = x2 .
>> y = x ˆ 2
val =
9
varcount =
2
3. Get value of y.
>> yval = getval(y)
yval =
9
See DemoRvs1.m
>> A = rand(n)
A=
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389
3. Define x as a derivtape object and set x as the beginning of the tape.
>> x = derivtape(ones(n,1),1)
val =
1
1
1
1
1
varcount =
1
4. Compute Ax.
>> y = A*x
val =
2.7913
2.7679
3.2772
2.4657
2.5448
varcount =
2
5. Get the value of y
>> JT = parsetape(eye(n))
JT =
0.9501 0.2311 0.6068 0.4860 0.8913
0.7621 0.4565 0.0185 0.8214 0.4447
0.6154 0.7919 0.9218 0.7382 0.1763
0.4057 0.9355 0.9169 0.4103 0.8936
0.0579 0.3529 0.8132 0.0099 0.1389
Note that the reverse mode AD computes a product of the transpose of Jacobian
matrix with W , that is J T W . Users can specify their own W when parsing the tape
by “parsetape(W)”. We set W as an identity matrix in above example so that J T is
obtained.
See DemoRvs2.m
>> x = derivtape([1,1,1],1)
val =
111
varcount =
1
>> y = broyden(x)
val =
0
−1
1
varcount =
40
yval =
0
−1
1
5. Get the transpose of Jacobian of y
>> JT = parsetape(eye(3))
JT =
−1 −1 0
−2 −1 −1
0 −2 −1
Note that if a user-specified function is a mapping from Rn to R, the Jacobian matrix
reduces to the gradient. Theoretically, the reverse mode AD computes the gradient
much faster than the forward mode AD does [9], but the reverse mode requires a large
amount memory space since it records each operation on a tape. This drawback some-
times leads to accessing low speed storage media which slows down the computation
significantly.
Input arguments
“1” - create a derivtapeH object x , save x on the tape and set the cell storing
x as the beginning of the tape.
“0” - create a derivtapeH object and save the value to the tape;
None - create a derivtapeH object, but do not save the value on the tape.
“V” is the matrix for computing the product g T × V , where g is the first deriva-
tive of the function to be differentiated.
Output argument
“y” is an initialized derivtapeH class object, which takes two cells on tape. One
is for the independent variable x, the other is for V.
• val = getval(y),
Input argument
Output argument
• ydot = getydot(y),
Input argument
Output argument
• HW = parsetape(W),
Input argument
Output argument
See Demo2nd1.m
1. First define a derivtapeH object with value 3.
>> x = derivtapeH(3,1,1)
val =
3
varcount =
1
val =
1
varcount =
2
2. Compute y = x2 .
>> y = x ˆ 2
val =
9
varcount =
3
val =
6
varcount =
6
See Demo2nd2.m
2. >> x = ones(n,1)
x=
1
1
1
1
1
>> x = derivtapeH(x,1,eye(n))
val =
1
1
1
1
1
varcount =
7
val =
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
varcount =
8
>> y = brown(x)
val =
8
varcount =
95
val =
4 8 8 8 4
varcount =
96
>> H = parsetape(eye(n))
H=
12 8 0 0 0
8 24 8 0 0
0 8 24 8 0
0 0 8 24 8
0 0 0 8 12
See DemoInter1.m
In this example, we will use two methods, linear interpolation method and cubic
spline interpolation method, to estimate the function value and Jacobian at point x0
with the ADMAT 1-Dimension interpolation function, ‘interp1 AD’.
>> x = 0 : 10;
>> y = sin(x);
2. Function points.
>> n = length(x0);
>> xi = derivtape(x0,1);
>> yi = interp1 AD(x,y,xi);
>> y0 = getval(yi)
y0 =
0.1683 0.3366 0.5049 0.6732 0.8415
>> J = parsetape(eye(n))
J=
0.8415 0 0 0 0
0 0.8415 0 0 0
0 0 0.8415 0 0
0 0 0 0.8415 0
0 0 0 0 0.0678
Cubic spline interpolation method.
(a) Cubic spline interpolation at point x0 by forward mode AD.
>> xi = derivtape(x0,1);
>> yi = interp1 AD(x,y,xi,‘spline’);
>> y0 = getval(yi)
y0 =
0.2181 0.4134 0.5837 0.7270 0.8415
>> J = parsetape(eye(n))
J=
1.0351 0 0 0 0
0 0.9155 0 0 0
0 0 0.7859 0 0
0 0 0 0.6462 0
0 0 0 0 0.4965
Note that some other interpolation methods, such as piecewise cubic Hermite inter-
polation method and nearest neighbor interpolation method, are also supported in
‘interp1 AD’. Users can refer to MATLAB ‘interp1’ help documentation for more usage
details of ‘interp1 AD’.
Newton Computations
ADMAT provides functions for the Newton step computation for nonlinear systems
and optimization. There are two basic options for users. There is the ‘traditional’
Newton computation - the Jacobian (or Hessian) is first computed and then the New-
ton step is determined by solving the linear Newton system - and there is the use of
an expanded Jacobian (Hessian) matrix formed through the use of structure [16]. The
latter can yield significant cost benefits. We will illustrate both approaches in this
chapter.
where J(x) is the n-by-n Jacobian of F (x). ADMAT is a good tool to compute the
Jacobian J(x).
If the Jacobian matrix J(x) is sparse, i.e., most of the entries of J remain zero for all x,
then ADMAT can be used both to determine this sparsity structure (once) and then
efficiently determine the values of the non-zero components of J(x) at each successive
point x. Each Jacobian evaluation is followed by a sparse linear solver to determine
the Newton step.
59
Cayuga Research 60
for general nonlinear systems and arbitrary starting points, since it may not converge,
but we include it here because it illustrates the use of ADMAT in the context of a
sparse nonlinear process. For general nonlinear systems, we recommend procedures
that force convergence from arbitrary starting points (i.e., use line search or trust
region strategy). MATLAB functions fsolve and fminunc are examples of such proce-
dures.
Function newton requires the user to pass several arguments: the name of the function
to be solved, starting point x, a solution tolerance, and a bound on the number of
iterations. newton will then apply the pure Newton process and return the solution
(with the given tolerance) or will return the iterate achieved upon exceeding the iter-
ation bound.
case 3
itNum = 100;
Extra = [];
case 4
Extra = [];
otherwise
end
x = x0; n = length(x0);
% initialize the iteration counter
it = 0;
% Newton steps
while ( normy > tol) && (it < itNum)
% evaluate the function value and the Jacobian matrix at x
[y, J] = evalj(func, x, Extra, [],JPI);
delta = -J\y;
normy = norm(y);
x = x + delta;
it = it + 1;
end
See DemoNewton.m
>> n = 5;
>> rand(‘seed’,0);
3. Set starting point.
>> x = rand(n,1)
x=
0.2190
0.0470
0.6789
0.6793
0.9347
4. Apply the Newton process to the system arrowfun(x) = 0
However, many (perhaps most) nonlinear systems with expensive dense Jacobians
show a certain structure in their computations and if the code to evaluate F is written
to expose this structure (illustrated below) then it turns out that the Newton step can
often be computed without fully computing the actual Jacobian matrix; this technique
can result in great cost savings [16].
Suppose that the computation z = F (x) can be broken down into the following ‘macro’
steps, performed in top-down order:
Solve for y1 : F1E (x, y1 ) = 0
E
Solve for y2 : F2 (x, y1 , y2) = 0
.. ..
. . . (6.1)
E
Solve for yp : Fp (x, y1 , y2, · · · , yp ) = 0
E
“Solve” for output z : z − Fp+1 (x, y1 , y2, · · · , yp ) = 0
where the square (extended) Jacobian matrix J E is a block lower Hessenberg matrix:
∂F1 ∂F1
∂x ∂y1
∂F2 ∂F2 ∂F2
∂x ∂y1 ∂y2
E
.. .. .. ..
J =
. . . .
(6.3)
∂Fp ∂Fp ∂Fp
∂x ∂y1
··· ··· ∂yp
∂Fp+1 ∂Fp+1 ∂Fp+1
∂x ∂y1
··· ··· ∂yp
Note that in (6.3) the pair (F E , J E ) is evaluated at the current point x, and the current
vector y is implicitly defined by (6.1). If we label J E consistent with the partition
illustrated in (6.3),
E A B
J = (6.4)
C D
then, assuming the uniqueness of the Newton step, matrix B is nonsingular and the
Newton step for the system F (X) = 0, at the current point, is
Below is the description of newton expand. The subsequent example illustrates the
use of this function to solve the above ODE.
[x, fval, it] = newton expand (func, func y, funG, x0, tol, itNum, Extra, p)
Input arguments
Output arguments
>> xT = rand(N,1)
xT =
0.2190
0.0470
0.6789
0.6793
0.9347
0.3835
0.5194
0.8310
Note that, we set the right hand side function of the ODE to be f (x) = x
so that the actual solution of the above ODE is y = ex .
6. Set the starting point x.
>> x = ones(n,1);
7. Solve the ODE by the expanded Newton computation.
>> [x, fval, it] = newton expand(func, func y, x, tol, itNum, Extra, p)
x=
1.2448
1.0482
1.9716
1.9725
2.5464
1.4674
1.6810
2.2955
fval =
1.0415e-015
it =
2
8. Compare the computed solution with the target function value.
>> The difference between computed solution with the target function value
norm(x-exp(xT)) = 2.605725e-007
Example 6.2.2. Newton step comparisons.
See DemoRawExp.m
Consider the composite function, F (x) = F̄ (A−1 F̃ (x)), where F̄ and F̃ are Broy-
den functions [7] (their Jacobian matrices are tridiagonal)
√ and√ the structure of A is
based on 5-point Laplacian defined on a square ( n + 2)-by-( n + 2) grid. For each
nonzero element of A, Aij is defined as a function of x, specifically, Aij = xj . Thus,
the nonzero elements of A depend on x; the structure of Ax · v, for any v, is equal to
the structure of A.
J E = Ax y2 −I A . (6.9)
0 0 J¯
The Newton step can be obtained by solving
δx 0
J E δy1 = 0 . (6.10)
δy2 −F (x)
In this example, we consider two approaches to the Newton step computation.
• Straight Newton computation. In this approach, the structure is not ex-
ploited. Specifically, the Jacobian matrix J is formed by differentiating F using
the forward-mode automatic differentiation(AD) (equivalent in cost to obtain-
ing J column-by-column using forward finite-differences)[4]. Finally, the dense
system JsN = −F by using the Matlab linear system solver ‘\’.
• Structured Newton computation. This approach involves forming J E (6.9)
via structured automatic differentiation, i.e.. Then the expanded system (6.10)
is solved by using the Matlab linear system solver ‘\’.
Figure 6.1 plots the running times in seconds of one single step of the two Newton
approaches. The experiment was carried out using Matlab 6.5 (R13) on a laptop with
Intel 1.66 GHz Duo Core CPU and 1GB RAM. All the matrices in the experiments
are sparse except for the matrix J. Clearly, the Newton step computation is greatly
accelerated by exploiting the structure of F .
The structured Newton concept can also be applied to solving minimization prob-
lems. Suppose that the (sufficiently) smooth minimization problem,
min f (x) (6.11)
x
∂f
∂x1
∂f
∂x2
∇f (x) = (6.13)
...
∂f
∂xn
2
10
Raw Newton
Structured Newton
1
10
CPU time (Sec)
0
10
−1
10
−2
10
0 49 81 121 169 225 289 361
Problem Size (n)
Figure 6.1: Computation times of one step of the straight Newton computation and
one step of the structured Newton computation.
i.e., ∇f T is the Jacobian of f ; H(x) is the Hessian matrix, i.e., the symmetric matrix
of second derivatives of f :
∂2f ∂2f
∂x21
··· ∂x1 ∂xn
H(x) = ∇2 f (x) = .. .. ..
(6.14)
. . .
∂2f ∂2f
∂xn ∂x1
··· ∂x2n
Following the form of (6.1) we define a structured scalar valued function: z = f (x):
Solve for y : F̃ E (x, y) = 0
(6.15)
“Solve” for z : z − f¯(x, y1 , · · · , yp ) = 0
F̃1E (x, y1 , · · · , yp ) y1
F̃2E (x, y1 , · · · , yp ) y2
where F̃ E = F̃ E (x, y1 , y2, · · · , yp) = and y = .. .
..
. .
F̃pE (x, y1 , · · · , yp ) yp
Note that each yi , i = 1, · · · , p, is itself a vector (varying lengths). By our structured
assumption, F̃ E represents a triangular computation:
Solve for y1 : F̃1E (x, y1 ) = 0
Solve for y2 : F̃2E (x, y1 , y2) = 0
.. (6.16)
.
Solve for yp : F̃pE (x, y1 , y2, · · · , yp ) = 0
Similar to the Newton step computation for systems of nonlinear equations, a larger
but sparse system based on differentiating (6.15, 6.16) can be solved in order to obtain
the Newton step (6.12). The analogy to system (6.2) is, solve
δw 0
H E δy = 0 (6.17)
δx −∇x f
where sN , the Newton step defined in (6.12), satisfies sN = δx, and H E is a sym-
metric Hessian matrix,
F̃yE F̃xE
0
H E = (F̃yE )T E T
(F̃yy ) w + ∇2yy f¯ (F̃yx
E T
) w + ∇2yx f¯ (6.18)
(F̃xE )T E T
(F̃xy ) w + ∇2xy f¯ (F̃xx
E T
) w + ∇2xx f¯
F̃xE F̃yE
0 A L
E E T 2 ¯ E T
HP = 0 (F̃yy ) w + ∇yy f (F̃y ) = ,
E T 2 ¯ E T B M
(F̃xx ) w + ∇xx f 0 (F̃x )
The key point is that a cost-effective alternative to forming the Hessian matrix H
and in turn solving HsN = −∇x f is to compute H E via sparse AD or finite-difference
technology and then solve (6.17) with a sparse solver; sN = δx.
See DemoExpHess.m
T
We minimize z = F̄ (F̃ (x)) + xP x, wherePF̃ (x) is Broyden function, and F̄ (y) is a
n 2 n−1
scalar-valued function, F̄ (y) = i=1 yi + i=1 5yi yi+1 . The triangular computation
system (6.15) of z can be constructed as follows,
Solve for y : F̃ E (x, y) = y − F̃ (x) = 0,
“Solve” for z : z − f¯(x, y) = z − [F̄ (y) + xT x] = 0.
−J˜
0 I
H E = −J˜T ∇2yy f¯ 0 ,
I 0 E T
(F̃xx ) w + ∇2xx f¯
where I is an n-by-n identity matrix, and w is equal to −∇y (f¯). Thus, the one
step of structured Newton computation for this minimization problem can be written
as follows.
1. Initial value of the problem size.
>> Extra.y = x;
>> [z, grady, H2]= evalH(BarF, y, Extra, hpif);
(d) Compute the function value, gradient and Hessian of (F̃ E )T w with respect
to x .
>> Extra.y = y;
>> [z, grad, Ht] = evalH(TildeFw, x, Extra, hpiFw);
(f) Compute the function value and gradient of the original function.
>> HE = sparse(HE);
>> d = -HE [zeros(2*n,1); gradx(:)];
>> x = x + d(2*n+1:3*n);
Note that this section only gives samples of structured Newton computation for solving
nonlinear equations and minimization problems. Users can refer to [16] for details
about its advantages, applications, performances and parallelism.
The MATLAB optimization toolbox includes solvers for many nonlinear problems,
such as multidimensional nonlinear minimization, nonlinear least squares with upper
and lower bounds, nonlinear system of equations, and so on. Typically, these solvers
use derivative information such as gradients, Jacobians, and Hessians. In this chap-
ter, we illustrate how to conveniently use ADMAT to accurately and automatically
compute derivative information for use with the MATLAB Optimization Toolbox.
where F (x) maps Rm to Rn and k k is 2-norm. By default this solver employs the finite
difference method to estimate gradients and Hessians (unless users specify an alter-
native). ADMAT, for example, can be specified as an alternative to finite differences.
Users just need to set up a flag in input argument ‘options’ without changing any orig-
inal codes. The following examples illustrate how to solve nonlinear least squares with
ADMAT used to compute derivatives. For the details of input and output arguments
of the MATLAB solver, ‘lsqnonlin’, please refer to MATLAB help documentation.
75
Cayuga Research 76
See DemoLSq.m
>> n = 5;
3. Initialize x.
>> x0 = rand(n,1)
x0 =
0.2190
0.0470
0.6789
0.6793
0.9347
4. Set the lower bound for x.
>> l = -ones(n,1);
>> u = ones(n,1);
7. Turn on the Jacobian flag in input argument ‘options’. This means that the user
will provide the method to compute Jacobians (In this case, the use of ADMAT).
9. Call ‘lsqnonlin’ to solve the nonlinear least squares problem using ADMAT to
compute derivatives.
min f (x)
s.t. Ax ≤ b, Aeq ∗ x = B (linear constraints)
c(x) ≤ 0, ceq(x) ≤ 0 (nonlinear constraints)
l ≤ x ≤ u,
where f (x) maps Rn to a R1 . The MATLAB solver for this problem is ‘fmincon’.
min f (x),
where f (x) maps Rn to a scalar. This unconstrained problem can be solved by ‘fmi-
nunc’. ADMAT can be used to compute the gradient or both the gradient and the
Hessian matrix.
See DemoFminunc.m
min brown(x),
where brown(x) is the Brown function defined in §3.1. In this example, we will solve
the problem twice. In the first time, ADMAT is used to compute gradients only, and
Hessians are estimated by the default finite difference method. In the second solution,
ADMAT is used to compute both gradients and Hessians.
>> n = 5;
3. Initial value of x.
>> x0 = rand(n,1)
x0 =
0.2190
0.0470
0.6780
0.6793
0.9347
4. Set the function to be differentiated by ADMAT. The function call ‘feval’ is over-
loaded by the one defined in ADMAT. It can return the function value, gradient
and Hessian in each ‘feval’ call (See §3.1 for details).
6. Solve the problem using ADMAT to determine gradients (but not Hessians).
(a) Turn on the gradient flag in input argument ‘options’ (but not the Hessian
flag). Thus, the solver will use ADMAT to compute gradients, but will
estimate Hessians by finite difference method.
(b) Call the MATLAB constrained nonlinear minimization solver ‘fmincon’ with
ADMAT used to determine gradients only.
7. Solve the problem using ADMAT to compute both gradients and Hessians.
(a) Turn on both gradient and Hessian flags in input argument ‘options’. Thus,
the solver will use the user specified method (ADMAT) to compute both
gradients and Hessians.
(b) Call the MATLAB constrained nonlinear minimization solver ‘fmincon’ us-
ing ADMAT to compute derivatives.
−0.1487
−0.1217
0.2629
−0.0173
−0.5416
FVAL =
4.8393e − 009
Example 7.2.2 Solve the constrained nonlinear minimization
problems using ADMAT.
See DemoFmincon.m
>> n = 5;
2. Initialize the random seed.
>> x0 = rand(n,1)
x0 =
0.2190
0.0470
0.6789
0.6793
0.9347
4. Set the lower bound for x.
>> l = -ones(n,1);
5. Set the upper bound for x.
>> u = ones(n,1);
8. Setting up ‘options’ so that both gradients and Hessians are computed by AD-
MAT.
1. Set up the Jacobian, gradient and Hessian flags in the input argument ‘options’.
2. Set the function to be differentiated by ADMAT using the ‘ADfun’ function call
to overload ‘feval’.
ADMAT can differentiate any function defined in an M-file. ADMAT cannot be ap-
plied to any external files, such as MEX files. However, ADMAT can be combined
with finite-differencing to enable M-file/external file combinations.
The C file, mexbroy.c, for the Broyden function and its MEX function are as fol-
lows.
/*************************************************************
%
% Evaluate the Broyden nonlinear equations test function.
%
%
% INPUT:
% x - The current point (row vector).
%
%
% OUTPUT:
% y - The (row vector) function value at x.
%
****************************************************************/
83
Cayuga Research 84
♯include “mex.h”
♯define x IN 0
♯define y OUT 0
n = mxGetM(prhs[x IN]);
if (n == 1)
n = mxGetN(prhs[x IN]);
x = mxGetPr(prhs[x IN]);
y = mxGetPr(plhs[y OUT]);
mexbroy(n, x, y);
return;
}
Once the compilation succeeds, the Broyden function can be called as a MATLAB
function. File CBroy.m integrates mexbroy.c file into ADMAT via finite-differencing.
global globp;
global fdeps;
n = length(x);
if isa(x, ’deriv’) % x is an objective of deriv class
val = getval(x); % get the value of x
drv = getydot(x); % get the derivative part of x
y = mexbroy(val); % compute the function value at x
ydot = zeros(getval(n), globp); % initialize the derivative of y
% compute the derivative by finite difference method
for i = 1 : globp
tmp = mexbroy(val + fdeps*drv(:,i));
ydot(:,i) = (tmp - y)/fdeps;
end
>> n = 5;
>> x = ones(n,1);
>> x = deriv(x, eye(n));
>> y = CBroy(x)
val =
0
−1
−1
−1
1
deriv =
−1.0000 −2.0000 0 0 0
−1.0000 −1.0000 −2.0000 0 0
0 −1.0000 −1.0000 −2.0000 0
0 0 −1.0000 −1.0000 −2.0000
0 0 0 −1.0000 −1.0000
4. Extract Jacobian matrix from the finite differencing.
Troubleshooting
Below we list some potential problems that may occur in the use of ADMAT.
This usually means a deriv class object is assigned to a double class variable.
Check both sides on the assignment and make sure both sides are of the same
type.
Some MATLAB functions are not overloaded in ADMAT yet. Please contact
Cayuga Research for extending ADMAT to the MATLAB function of your in-
terest.
ADMAT is not installed yet. Please refer to Chapter 2 to make sure ADMAT
is properly installed.
ADMAT detects a possible license error. Please restart ADMAT. If you got
the following message,
“The ADMAT 2.0 license has expired. Please contact Cayuga Research for a li-
cense extension”,
89
Cayuga Research 90
it means the license for ADMAT 2.0 expired. Please contact us for renewing
license.
5. Do not use Matlab command “clear all” to clear your workspace while using
ADMAT. This will remove all ADMAT global variables from memory: unpre-
dictable errors may then occur. Instead, use “clear” selectively as needed.
6. ADMAT cannot perform 3-D or higher operations. ADMAT only performs the
1-D and 2-D matrix operations.
7. Derivatives are incorrect. Please make sure following issues was checked.
If there is still an error, please contact Cayuga Research for further help.
Applications of ADMAT
In this chapter we illustrate two applications of ADMAT. First, we show how to trigger
the quasi-Newton computation in MATLAB unconstrained nonlinear minimization
solver ‘fminunc’ with ADMAT used to determine gradients. Second, we present a
sensitivity problem.
See DemoQNFminunc.m
>> n = 5;
91
Cayuga Research 92
>> rand(’seed’,0);
3. Initialize x.
>> x0 = rand(n,1)
x0=
0.2190
0.0470
0.6789
0.6793
0.9347
6. Turn on the gradient flag of input argument ‘options’. Thus, the solver uses
user-specified method to compute gradient (i.e., ADMAT).
7. Set the flag of ‘LargeScale’ to off, so that the quasi-Newton method will be used.
−0.1487
−0.1217
0.2629
−0.0173
−0.5416
FVAL =
4.8393e-009
See DemoSens.m
The function brownv(x, V ) is similar to the Brown function defined in §3.1. Its defi-
nition is as follows.
function f = brownv(x,V)
% length of input x
n=length(x);
% if any input is a ‘deriv’ class object, set n to ‘deriv’ class as
>> n = 5;
2. Set the initial value of V to 0.5. We first solve the minimization problem,
min brownv(x, V ), at V = 0.5, then analyze the sensitivity of brownv(xopt , V )
with respect to V .
>> V = 0.5
>> rand(‘seed’,0);
>> x0 = 0.1*rand(n,1)
x0 =
0.2190
0.0470
0.6789
0.6793
0.9347
>> V = deriv(V,1);
7. Compute the function value of brownv(x, V ) at the optimal point x1 with ‘deriv’
class objective V .
>> f = brownv(x1, V)
val =
0.0795
deriv =
-0.0911
• Differentiate the function f (x, µ) with respect to µ at the optimal point xopt to
get the sensitivity of f (xopt , µ) with respect to µ.
97
Cayuga Research 98
[11] T. F. Coleman and A. Verma, Structure and efficient Hessian calculation, Ad-
vances in Nonlinear Programming, Yaxiang Yuan(ed.), 1998, 57-72.
[16] T. F. Coleman and W. Xu, Fast Newton computations, SIAM J. Sci. Comput.,
Vol.31, 2008, 1175-1191.
[20] A. Griewank, D. Juedes and J. Utke, Algorithm 755: ADOL-C: A Package for the
Automatic Differentiation of Algorithms Written in C/C++, ACM Transactions
on Mathematical Software, vol 22, 1996, 131–167.
[24] S. Stamatiadis, R. Prosmiti and S. C. Farantos, auto deriv: Tool for automatic
differentiation of a fortran code, Comput. Phys. Commun., Vol. 127, 2000, 343-
355.