0% found this document useful (0 votes)
225 views

ADMAT-Automatic Differentiation Toolbox For Use With MATLAB

This document is the user guide for ADMAT 2.0, an automatic differentiation toolbox for MATLAB. It describes how to install ADMAT on Windows and Unix systems, and provides an overview of its capabilities for computing gradients, Jacobians, Hessians, and performing Newton computations. The guide also covers advanced topics like sparse computations and combining ADMAT with MEX files.

Uploaded by

alawi747594
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
225 views

ADMAT-Automatic Differentiation Toolbox For Use With MATLAB

This document is the user guide for ADMAT 2.0, an automatic differentiation toolbox for MATLAB. It describes how to install ADMAT on Windows and Unix systems, and provides an overview of its capabilities for computing gradients, Jacobians, Hessians, and performing Newton computations. The guide also covers advanced topics like sparse computations and combining ADMAT with MEX files.

Uploaded by

alawi747594
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

ADMAT : Automatic Differentiation Toolbox

For use with MATLAB

User’s Guide Cayuga Research


Version 2.0

ADMAT: Automatic Differentiation Toolbox


c
2008-2010 Cayuga Research. All Rights Reserved.
Contents

1 Introduction 1

2 Installation of ADMAT 5
2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Obtaining ADMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Installation Instructions for Windows Users . . . . . . . . . . . . . . . 5
2.4 Installation Instructions for Unix (Linux) Users . . . . . . . . . . . . . 6

3 Computing Gradients, Jacobians and Hessians 7


3.1 Computing Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Computing Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Computing Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Data Type Consistency in Assignment Operation . . . . . . . . . . . . 20

4 Sparse Jacobian and Hessian Matrices 23


4.1 Sparse Jacobian Matrix Computation . . . . . . . . . . . . . . . . . . . 23
4.2 Sparse Hessian Computation . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Advanced ADMAT 33
5.1 Computing Sparsity Patterns of Jacobian and Hessian Matrices . . . . 33
5.2 Efficient Computation of Structured Gradients . . . . . . . . . . . . . . 35
5.3 Forward Mode AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Storage of deriv Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Reverse Mode AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.6 Computing Second-Order Derivatives . . . . . . . . . . . . . . . . . . . 51
5.7 1-D Interpolation in ADMAT . . . . . . . . . . . . . . . . . . . . . . . 56

6 Newton Computations 59
6.1 Traditional Newton Computation . . . . . . . . . . . . . . . . . . . . . 59

i
Cayuga Research ii

6.2 Structured Newton Computation . . . . . . . . . . . . . . . . . . . . . 62

7 Using ADMAT with the MATLAB Optimization Toolbox 75


7.1 Nonlinear Least Squares Solver ‘lsqnonlin’ . . . . . . . . . . . . . . . . 75
7.2 Multidimensional Nonlinear Minimization Solvers ‘fmincon’ and ‘fminunc’ 77

8 Combining C/Fortran with ADMAT 83

9 Troubleshooting 89

A Applications of ADMAT 91
A.1 Quasi-Newton Computation . . . . . . . . . . . . . . . . . . . . . . . . 91
A.2 A Sensitivity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

ADMAT: Automatic Differentiation Toolbox


Chapter 1

Introduction

Many scientific computing tasks require the repeated computation of derivatives.


Hand-coding of derivative functions can be tedious, complex, and error-prone. More-
over, the computation of first and second derivatives, and sometimes the Newton step,
is often a dominant expense in a scientific computing code. Derivative approximations
such as finite-differences involve additional errors (and heuristic choice of parameters).

This toolbox is designed to help a MATLAB user compute first and second derivatives,
and related structures, efficiently, accurately, and automatically. ADMAT employs
many sophisticated techniques, exploiting sparsity and structure, to gain efficiency
in the calculation of derivative structures (e.g., gradients, Jacobians, and Hessians).
Moreover, ADMAT can directly calculate Newton steps for nonlinear systems, often
with great efficiency.

To use ADMAT, a MATLAB user need only supply an M-file to evaluate a smooth
nonlinear ‘objective function’ at a given argument. On request and when appropriate,
ADMAT will ensure that in addition to the objective function evaluation, the Jaco-
bian matrix (the gradient is a special case) and the Hessian matrix (i.e., the symmetric
matrix of second derivatives) and possibly the Newton step will also be evaluated at
the supplied argument. The user need not supply derivative codes or approximation
schemes.

ADMAT 2.0 Features:


• Efficient gradient computation (by ‘reverse mode’).

• Efficient evaluation of sparse Jacobian and Hessian matrices.

• A template design for the efficient calculation of ‘structured’ Jacobian and Hes-

1
Cayuga Research 2

sian matrices.

• Efficient direct computation of Newton steps (In some cases avoiding the full
computation of the Jacobian and/or Hessian matrix).

• Mechanisms and procedures for combining automatic differentiation of M-files


with the finite differencing approximation for MEX files (for C and Fortran
subfunctions).

.... and for the aficionado


• “ Forward” mode of automatic differentiation: A new MATLAB class “deriv”
which overloads more than 100 MATLAB built-in functions.

• “Reverse” mode of automatic differentiation: A new MATLAB class “derivtape”


which uses a virtual tape to record all functions and overloads more than 100
MATLAB built-in functions.

• MATLAB interpolation function INTERP1 is available.

Limitations of ADMAT

ADMAT supports most frequently used computations for the first and second deriva-
tives. However, the limitation of the current version of ADMAT is that it does not
support the second derivation computation for a matrix. In other words, the matrix
in a function for the second derivation computation has to be a constant, not a variable.

Guide Organization
• Chapter 2: ADMAT installation.

• Chapter 3: Computing gradients, Jacobians and Hessians.

• Chapter 4: Sparse and structured computations.

• Chapter 5: Advanced use of ADMAT; Additional sparsity computations, etc.

• Chapter 6: Newton computations.

• Chapter 7: Using ADMAT with the MATLAB Optimization Toolbox.

• Chapter 8: Connecting with MEX files (Fortran, C) and finite-differencing.

• Chapter 9: Errors that may occur. Troubleshooting.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 3

• Appendix A : ADMAT application examples.

• Bibliography.

Acknowledgement

ADMAT 2.0 builds on the original work of Thomas Coleman and Arun Verma (AD-
MAT [12], ADMIT-2[14]). ADMAT 2.0 has increased functionality and is more ef-
ficient than those pioneering efforts. The technology behind ADMAT 2.0 is derived
from research published by Coleman and colleagues over a number of years; many of
the most relevant publications are listed in the Bibliography.

We thank Arun Verma for several illuminating discussions over the past years.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 4

ADMAT: Automatic Differentiation Toolbox


Chapter 2

Installation of ADMAT

In this chapter, the installation of ADMAT on a Unix (Linux) and Windows platform
is discussed.

2.1 Requirements
ADMAT belongs to the “operator overloading” class of AD tools and uses object
oriented programming features. Thus, ADMAT requires MATLAB 6.5 or above.

2.2 Obtaining ADMAT


The complete ADMAT package, ADMAT 2.0 Professional, can be licensed from
Cayuga Research. See www.cayugaresearch.com for details. ADMAT 2.0 Professional
can be evaluated free of charge for a period of up to 3 weeks.

A package of reduced functionality, ADMAT 2.0 Student, can be obtained from Cayuga
Research and evaluated free of charge for a period of up to 3 weeks. Note that AD-
MAT 2.0 Student computes only the first derivatives by forward and reverse modes
of automatic differentiation. Hence a user of ADMAT 2.0 Student can only use the
functionalities described in §5.3, 5.4 and 5.5 and test the corresponding demos in the
Demos\ Chapter 5 directory.

2.3 Installation Instructions for Windows Users


ADMAT is supplied in the zipped file ADMAT-2.0.zip. Please follow these installation
instructions.

5
Cayuga Research 6

1. Place the ADMAT package to an appropriate directory and unzip the package
by any unzip software.

2. There are two ways to set the search path in MATLAB.

******************* Method 1. *******************

(a) click “File” in MATLAB window.


(b) choose “Set Path” option.
(c) click “Add with Subfolders” button.
(d) find the targetted directory for the ADMAT package in “Browse for Folders”
window and click “OK”.
(e) Click the “Save” button to save the path for ADMAT and click the “Close”
button.
(f) type “startup” in the MATLAB prompt or log out MATLAB and relog in.

******************* Method 2. **********************


Access the ADMAT directory; edit startup.m file; add ALL subdirectories of
ADMAT; search paths in the file manually; save the file and type startup in
MATLAB prompt to set up the paths for the package.

3. There is a “success” message as ADMAT is correctly installed. Users can type


“ help ADMAT-2.0” in MATLAB prompt to get a list of main functions in the
package.

2.4 Installation Instructions for Unix (Linux) Users


1. Unzip ADMAT-2.0.zip using “unzip ADMAT-2.0.zip” in the Unix (Linux) prompt.

2. Follow Method 2 listed above.

ADMAT: Automatic Differentiation Toolbox


Chapter 3

Computing Gradients, Jacobians


and Hessians

3.1 Computing Gradients


One of the most common computations in scientific computing is the calculation of
the gradient of a scalar-valued function. That is, if f is a differentiable function that
maps n-vectors x to scalars, i.e., f : Rn → R1 , the gradient of f at a point x is the
vector of first derivatives:
∂f
 
∂x1
∇f (x) = 
 ..  .
. 
∂f
∂xn

ADMAT computes the gradient ∇f (x) automatically given both the MATLAB M-file
to evaluate f (x) and the value of the argument x (an n-vector). ADMAT can compute
the gradient using the overloaded MATLAB function ‘feval’.

If “forward mode” is chosen, the gradient is computed in time comparable to that


required by the forward finite-differencing. However, ADMAT does not require a dif-
ferencing parameter and incurs no truncation error. The advantage of forward mode,
compared to reverse mode, is that there are no ‘extra’ space requirements.

“Reverse mode” computes the gradient in time proportional to the time required
to evaluate the function f itself. This is optimal; reverse-mode can be approximately
n-times faster than forward mode when applied to the computation of gradients. The
downside of using reverse mode is that the space requirements can be quite large since
the entire computational graph required to evaluate f must be saved and then ac-
cessed in reverse order. See §5.2 for advice on how to use the inherent structure in

7
Cayuga Research 8

the program that evaluates f to significantly reduce the space requirements.

The overloaded ADMAT function ‘feval’ provides a universal interface to compute


gradients. It can be overloaded through use of function ‘ADfun’.

We first illustrate with an example and then describe the general situation, followed
by a second example.

Example 3.1.1. Compute the gradient of the Brown function.

See DemoGrad.m

This example shows how to compute the gradient of the Brown function in ADMAT
in three different ways: default (reverse), forward, and reverse. The definition of the
Brown function is as follows:
n−1
X 2 2
y= {(x2i )xi+1 +1 + (x2i+1 )xi +1 }.
i=1

A MATLAB function to evaluate the brown function is given below:

function value = brown(x,Extra)


% Evaluate the problem size.
n = length(x);
% Initialize intermediate variable y.
y=zeros(n,1);
i=1:(n-1);
y(i)=(x(i) .ˆ 2) .ˆ (x(i+1) .ˆ 2+1);
y(i)=y(i)+(x(i+1) .ˆ 2) .ˆ (x(i) .ˆ 2+1);
value=sum(y);

The following is an illustration of the use of ADMAT with the Brown function:
1. Set the function to be differentiated to be the Brown function.
>> myfun = ADfun(’brown’, 1);

Note: the second input argument in ADfun, ‘1’, is a flag indicating a scalar
mapping, f : Rn → R1 ; more generally, the second argument is set to ‘m’ for a
vector-valued function, F : Rn → Rm .

2. Set the dimension of x.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 9

>> n = 5;

3. Initialize vector x.

>> x = ones(n,1)
x=
1
1
1
1
1

4. Call feval to get the function value and the gradient of Brown function, allowing
ADMAT to choose the mode (by default ADMAT chooses to use reverse mode
for computing the gradients).

>> [f, grad] = feval(myfun, x)


f=
8
grad =
4 8 8 8 4

5. Use the forward mode to compute the gradient of f . In this case the third input
argument of feval is set to the empty array, ‘[ ]’ since no parameters are stored
in the input variable Extra.

% Compute the gradient by forward mode


%
% set options to forward mode AD. Input n is the problem size.
>> options = setgradopt(’forwprod’, n);
% compute gradient
>> [f,grad] = feval(myfun, x, [ ], options)
f=
8
grad =
4 8 8 8 4

6. Use the reverse mode to compute the gradient g. As the above case, the input
Extra is ‘[]’.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 10

% Compute the gradient by reverse mode


%
% set options to reverse mode AD. Input n is the problem size.
>> options = setgradopt(’revprod’, n);
% compute gradient
>> [f, grad] = feval(myfun, x, [], options)
f=
8
grad =
4 8 8 8 4
Functions ADfun and feval can be summarized as follows.

Description of ADfun and feval

fun= ADfun(infun,scalar)

Input arguments
‘infun’ is the function to be differentiated; a string representing the function name.
‘scalar’ is the dimension of the objective function value; a scalar
Output arguments
‘fun ’ is the function to overload feval; a MATLAB cell structure.

[f, g] = feval(fun, x, Extra, options).

Input arguments
‘fun’ is the function to be differentiated; an ADMAT ADfun class object.
‘x’ is a vector of the independent variables; either a row or column vector.
vector x is the ‘point’ at which the function and its gradient will be evaluated.
‘Extra’ stores parameters required in function fun; a MATLAB cell function.
‘options’ allows the user to choose the forward mode or reverse mode of auto-
matic differentiation. It has to be defined through ADMAT function setgradopt.
The default mode for computing gradients for feval is the reverse mode.
Output arguments
‘f ’ is the function value at point x; a scalar.
‘g ’ is the gradient of f ; a row vector.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 11

options = setgradopt(mode, siz)

Input arguments

‘mode’ sets up the ‘mode’ of automatic differentiation, a string.

• mode = ‘forwprod’: compute gradients by the forward mode.

• mode = ‘revprod’: compute gradients by the reverse mode.

‘siz ’ is the size of the problem, a scalar.

There is a requirement on defining the function to be differentiated when invoking


ADMAT in this manner: the function interface must contain just one output and two
input arguments. For example,

y = functionName(x, Extra),

where y is the output, x is the independent input variable and Extra is a MATLAB cell
structure, which stores all other parameters required in functionName. When there
are no parameters stored in Extra, the empty array, ‘[ ]’ is passed. The definitions of
the Broyden and Brown functions in this chapter satisfy the requirement.

What can be done when a function has more than two input arguments? For ex-
ample, suppose y = f(x, mu, gamma) where x is the independent variable. In this
case the second and third input arguments can be encapsulated as Extra.mu, and Ex-
tra.gamma. Thus, the new function call is y = f(x, Extra). Users can store scalars,
vectors and even matrices as fields in Extra, but currently, matrices stored in MATLAB
sparse format are not supported. We illustrate how to rewrap the original function to
satisfy the interface requirement in Example 3.1.2.

Example 3.1.2. Compute the gradient of a weighted mean


function.

See DemoFeval.m

This example illustrates the use of input parameter ‘Extra’. The weighted mean of a
vector is computed.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 12

function val = mean weighted(x, mu, n)


%
% Compute the mean of the n vector x
% with the weight mu.
%
% INPUT
% x – a vector x
% mu – weight for mean computation
% n – length of x
%
% OUTPUT
% val – weighted mean of x
%
y = mu .* x;
val = sum(y)/n;
Obviously, the above function does not satisfy the 2-input requirement mentioned
above for function feval. So we revise the function as indicated to satisfy the require-
ment.

function val = mean feval(x, Extra)


%
% Compute the mean of the n vector x
% with the weight mu.
%
% Note that: this function satisfies
% the requirement for feval.
%
% INPUT
% x – vector x
% Extra – stores other parameters, mu and n
%
% OUTPUT
% val – weighted mean of x
%
mu = Extra.mu;
n = Extra.n;
y = mu .* x;
val = sum(y)/n;

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 13

Now, the function mean feval satisfies the 2-input argument requirement, so we can
use feval to compute its gradient.

1. Set problem size.

>> n = 5;

2. Initialize the random seed.

>> rand(‘seed’,0);

3. Define a vector x.

>> x = rand(n,1)
x=
0.2190
0.0470
0.6789
0.6793
0.9347

4. Define a weight vector mu.

>> mu = rand(n,1)
mu =
0.3835
0.5194
0.8310
0.0346
0.0535
>> mu = mu/sum(mu);
mu =
0.2105
0.2851
0.4561
0.0190
0.0293

5. Assign variables mu and n to Extra.

>> Extra.mu = mu;


>> Extra.n = n;

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 14

6. Set mean feval as the function to be differentiated.

>> myfun = ADfun(‘mean feval’, 1);

7. Compute the gradient of mean feval

>> [f, grad ] = feval(myfun, x, Extra);


f=
0.0819
grad =
0.0421 0.0570 0.0912 0.0038 0.0059
This 2-input argument requirement can limit the flexibility of ADMAT. For example,
if users would like to compute the derivatives of function y = f(x, mu, gamma) with
regard to x, mu and gamma, respectively, the function must be rewrapped three times
to satisfy the requirement before calling feval. In this situation, users can use the ad-
vanced features of ADMAT in §5.3 and §5.5 to compute the derivatives by the forward
or reverse mode without the need to rewrap the function interface.

The gradient is a special case of the first derivative of a function F : Rn → Rm


(i.e., m = 1). In the next section we discuss the general situation when m > 1.

3.2 Computing Jacobians


The Jacobian is the first order derivative of a vector-valued function F . That is, if
F is a differentiable vector-valued function, F : Rn → Rm , then its corresponding
Jacobian, evaluated at a point x, is an m-by-n matrix:
 ∂y1 ∂y1 ∂y1 
∂x1 ∂x2
· · · ∂x n
 ∂y2 ∂y2 · · · ∂y2 
J(x) =  ∂x. 1 ∂x2 ∂xn 
,

 .. .. 
. 
∂ym ∂ym ∂ym
∂x1 ∂x2
··· ∂xn

where y = F (x), y = (y1 , y2 , · · · , ym)T and x = (x1 , x2 , · · · , xn )T .

Given a user-supplied MATLAB function (M-file) to evaluate the function F , and


the current vector x, ADMAT can automatically (and accurately) determine the Ja-
cobian matrix J(x) at the point x. In fact, ADMAT is more general than this. Given
an arbitrary matrix V with n rows, ADMAT can directly and accurately compute the
product J(x) · V , using the ‘forward mode’ version of automatic differentiation. F is

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 15

differentiated along the columns of V to obtain the result J(x) · V . So if a user chooses
V to be the n-by-n identity matrix I, then the forward-mode result is the Jacobian
matrix J(x). Alternatively, given an arbitrary matrix W with m columns, the ‘reverse-
mode’ of ADMAT will produce, directly and accurately, the product J T (x) · W . So
if the user chooses W to be the m-by-m identity matrix, then the reverse-mode of
ADMAT produces the transpose of the Jacobian matrix, J T (x).

When the ‘forward mode’ in ADMAT is used, given the MATLAB function to evaluate
F , the ‘current point’ x, and a matrix V (with n rows), then the product J(x) · V is
determined automatically in time proportional to cV · ω(F ), where cV is the number of
columns of V and ω(F ) is the time taken for a single evaluation of F . There are no sig-
nificant additional space requirements when the ‘forward mode’ version of ADMAT is
used. Alternatively, when the ‘reverse mode’ in ADMAT is used, given the MATLAB
function to evaluate F , the ‘current point’ x, and a matrix W (with m columns), then
the product J T (x) · W is determined automatically in time proportional to cW · ω(F ),
where cW is the number of rows of W and ω(F ) is the time taken for a single evaluation
of F . The ‘reverse mode’ accesses the computational graph representing the evalua-
tion of F in reverse order; therefore, ‘reverse mode’ requires that the computational
graph be saved - this can result in serious additional space demands. Thus, from a
strict time complexity point of view, when m << n, the reverse mode is preferable
to the forward mode; however, this advantage can sometimes be mitigated when the
space requirements become excessive. See Chapters 4 and 5 to see how sparsity and
structure can be used to reduce the space requirements for the ‘reverse mode’ option.

The following example illustrates how to compute Jacobians and their products.

Example 3.2.1. Compute the Jacobian of the Broyden func-


tion by feval.

See DemoJac.m

The Broyden function is derived from a chemical engineering application [7]. Its
definition is as follows.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 16

function fvec = broyden(x, Extra)


% Evaluate the length of input
n = length(x);
% Initialize fvec. It has to be allocated in memory first in
ADMAT .
fvec=zeros(n,1);
i=2:(n-1);
fvec(i)= (3-2.*x(i)).*x(i)-x(i-1)-2*x(i+1) + 1;
fvec(n)= (3-2.*x(n)).*x(n)-x(n-1)+1;
fvec(1)= (3-2.*x(1)).*x(1)-2*x(2)+1;

1. Set the problem size.

>> n = 5

2. Set ‘broyden’ as the function to be differentiated.

>> myfun = ADfun(’broyden’, n);

Note that the Broyden function is a vector-valued function which maps Rn to


Rn . Thus, the second input argument in ADfun is set to n corresponding to the
column dimension.

3. Set the independent variable x as an ‘all ones’ vector.

>> x = ones(n,1)
x=
1
1
1
1
1

4. Call feval to compute the function value and the Jacobian matrix at x. We omit
the input argument (Extra) when calling feval, given that it is empty.

>> [F, J] = feval(myfun, x)


F=

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 17

0
−1
−1
−1
1
J=
−1 −2 0 0 0
−1 −1 −2 0 0
0 −1 −1 −2 0
0 0 −1 −1 −2
0 0 0 −1 −1

5. Using the forward mode AD, compute the product J(x) ×V , where V is an n×3
‘all ones’ matrix. Set the third input argument to ‘[ ]’, since no parameters are
stored in Extra.

>> V = ones(n,3);
>> options = setopt(’forwprod’, V); % set options to forward mode AD
>> [F,JV] = feval(myfun, x, [ ], options)
F=
0
−1
−1
−1
1
JV =
−3 −3 −3
−4 −4 −4
−4 −4 −4
−4 −4 −4
−2 −2 −2

6. Using the reverse mode AD compute the product of J T (x) × W , where W is an


n × 3 ‘all ones’ matrix. We pass ‘[ ]’ to the third input argument of feval since
no parameters are stored in Extra.

>> W = ones(n,3);
>> options = setopt(’revprod’, W); % set options to reverse mode AD
>> [F, JTW] = feval(myfun, x, [], options)
F=

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 18

0
−1
−1
−1
1
JTW =
−2 −2 −2
−4 −4 −4
−4 −4 −4
−4 −4 −4
−3 −3 −3

3.3 Computing Hessians


The Hessian matrix, H, is the symmetric matrix of second derivatives of a twice
continuously-differentiable scalar-valued function f : Rn → R1 :
 ∂2f ∂2f 2f
· · · ∂x∂1 ∂x

∂x21 ∂x1 ∂x2 n
 ∂2f ∂2f 2f
· · · ∂x∂2 ∂x

 ∂x2 ∂x1 ∂x 2 
H= ..
2
..
n .
. .
 
 
∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2
··· ∂x2
n

ADMAT can compute the Hessian matrix, using feval, given the M-file for defining
the scalar-valued function f and an argument x.

Function feval has two interfaces for Hessian computation. One interface returns
the function value, gradient, and Hessian matrix; the other returns the product of the
Hessian, H, and a matrix V .
• [f, grad, H] = feval(fun, x, Extra)
Input arguments
‘fun’ is the function to be differentiated; a string representing the function
name.
‘x’ is the vector of the independent variables; either a row or column vector.
‘Extra’ stores parameters required in function fun; a MATLAB cell structure.

Output arguments
‘f ’ is the function value at point x; a scalar.
‘grad’ is the gradient at point x; a row vector.
‘H’ is the Hessian matrix evaluated at point x; a matrix.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 19

• HV = feval(fun, x, Extra, options)


Input arguments
‘fun’ is the function to be differentiated; a string representing the function
name.
‘x’ is a vector of the independent variables (can be either a column or row
vector).
‘Extra’ stores parameters required in function fun.
‘options’ is the flag for computing H × V .
− options = ‘HtimesV’: compute H × V , where H is a Hessian. Matrix V is
one of entries of options; a MATLAB cell structure.

Output Argument
‘HV ’ is the product H × V , a matrix.

Example 3.3.1. Compute the gradient and Hessian of Brown


function using function “feval”.

See DemoHess.m

1. Set the Brown function as the function to be differentiated.


>> myfun = ADfun(’brown’, 1);

2. Set the problem size.

>> n = 5;

3. Set the independent variable x to be an ’all ones’ vector.

>> x = ones(n,1)
x=
1
1
1
1
1
4. Call feval to get the function value, gradient and Hessian of the Brown function.
We omit the third input argument, Extra, since it is empty.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 20

>> [v, grad, H] = feval(myfun, x)


v=
8
grad =
4 8 8 8 4
H=
12 8 0 0 0
8 24 8 0 0
0 8 24 8 0
0 0 8 24 8
0 0 0 8 12

5. Compute the product of H and V , where V is an n × 3 ‘all ones’ matrix. We


pass ‘[ ]’ to the third input argument of feval since no parameters are stored in
Extra.

>> V = ones(n,3);
>> options = setopt(’htimesv’, V); % set options for H times V
>> HV = feval(myfun, x, [], options)
HV =
20 20 20
40 40 40
40 40 40
40 40 40
20 20 20

In summary, function feval provides a common interface to compute gradients, as


well as Jacobian and Hessian matrices. When there is sparsity or structure present
the use of feval in the manner described above may not be efficient. In Chapter 4,
we will discuss the efficient computation of sparse Hessian and Jacobian matrices,
especially when evaluating Hessian or Jacobian matrices at different points with the
same structure. Efficiency in the presence of more general structure, beyond sparsity,
can be achieved using the techniques described in §5.2.

3.4 Data Type Consistency in Assignment Opera-


tion
Due to automatic data type conversion in Matlab 7.3 and above, we highly recommend
users call function cons after initializing the dependent variable in their own function

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 21

definition. This will ensure that the data type of the dependent variable is consis-
tent with that of the input independent variable. Otherwise, undetected errors in the
derivative computation may occur.

Descriptions of cons function

y = cons(y, x),

Make the data type of y consistent with that of x.

Input arguments

“y” is the dependent variable, whose data type should be consistent with x in data
type.
“x” is the independent variable.

Output arguments

“y” is the dependent variable, consistent with x in data type.

The following example illustrates the benefit of using function cons.

Example 3.4.1 Compare the results with and without calling


cons.
See DemoCons.m

First, we define two similar functions. Sample2 calls the data type consistent function,
cons while sample1 does not.

function y = sample1(x,m) function y = sample2(x,m)

y=zeros(m,1); y = zeros(m,1);
y = cons(y,x);
for i = 1:m for i = 1:m
y(i) = x(i)*x(i); y(i) = x(i)*x(i);
end end

In the above function definitions, the only difference is that function sample2 calls

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 22

cons to make the data type of y consistent with that of x. Now we compare the results
from these two functions.

1. Set problem size.


>> n = 3;

2. Set independent variable.


% Define a ‘deriv’ tape variable x, which is the ADMAT forward
% mode type. Please refer to Section 5.3 for details
>> x = deriv([1;2;3], eye(n));

3. Call sample1 function


>> y1 = sample1(x,n)
y1 =
1
4
9
4. Call sample2 function
>> y2 = sample2(x, n)
val =
1
4
9
deriv =
2 0 0
0 4 0
0 0 6

The result from function sample1 is the function value (only) even through the input
x is an ADMAT forward mode type. This occurs because the MATLAB assignment
operation in the for loop automatically converts the result of x(i)*x(i) into a double
type due to the data type of y(i). Thus, ADMAT did nothing in this function call.
However, with the use of function cons in sample2, the data type of y is consistent with
x, which is the ADMAT forward mode type, deriv. Sample2 returns the function value
and the derivative simultaneously as desired. Therefore, we highly recommend users to
call cons after initializing the dependent variables in order to avoid some undetectable
errors in the derivative computation by ADMAT.

ADMAT: Automatic Differentiation Toolbox


Chapter 4

Sparse Jacobian and Hessian


Matrices

Efficient computation of sparse Jacobian and Hessian matrices is one of the key fea-
tures of ADMAT. The overall strategy is based on graph coloring techniques to exploit
matrix sparsity [9, 10, 13].

Informally, a matrix is viewed as sparse if most of its entries are zero. There is
no precise definition of a sparse matrix; however, the pragmatic view, which we adopt,
is that a matrix is regarded as sparse if it is cost-effective to do so. That is, the use
of sparse techniques can result in significant time savings.

4.1 Sparse Jacobian Matrix Computation


In this section, we illustrate how to evaluate a sparse Jacobian matrix. We give de-
tailed descriptions of two popular sparsity functions “getjpi” and “evalj”.

Description of getjpi

Function “getjpi” computes the sparsity information for the efficient computation of a
sparse Jacobian matrix. It is invoked as follows:

[JPI, SPJ] = getjpi(fun, n, m, Extra, method, SPJ)

Input arguments

“fun” is the function to be differentiated; a string variable representing the function

23
Cayuga Research 24

name.
“n” is the number of columns of the Jacobian matrix; a scalar.
“m” is the number of rows of the Jacobian matrix; by default, m = n; a scalar.
“Extra” stores parameters required in function fun apart from the independent vari-
able; a MATLAB cell structure.
“method” sets techniques used to get the coloring Information:

• method = ‘d’: direct bi-coloring (the default),

• method = ‘s’: substitution bi-coloring,

• method = ‘c’: one-sided column method,

• method = ‘r’: one-sided row method,

• method = ‘f’: sparse finite-difference.

Detailed background on the various methods is given in [12, 14].

“SPJ” is the user specified sparsity pattern of the Jacobian, in MATLAB sparse format.

Output arguments

“JPI” includes the sparsity pattern, coloring information and other information re-
quired for efficient Jacobian computation.
“SPJ” is the sparsity pattern of the Jacobian, represented in the MATLAB sparse
format.

Note that ADMAT computes the sparsity pattern, ‘SPJ’, of the Jacobian first be-
fore calculating ‘JPI’. This is an expensive operation. If the user already knows the
sparsity pattern ‘SPJ’, then it can be passed to the function ‘getjpi’ as an input argu-
ment, so that ADMAT will calculate ‘JPI’ based on the user-specified sparsity pattern.
There is no need to recalculate the sparsity pattern, ‘SPJ’; it is inefficient to do so.

Description of evalj

Function evalj computes the sparse Jacobian matrix based on the sparsity pattern
and coloring information obtained from getjpi. It is invoked as follows:

[F, J] = evalj(fun, x, Extra, m, JPI, verb, stepsize)

Input arguments

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 25

“fun” is the function to be differentiated; a string representing the function name.


“x” is the vector of independent variables; a vector.
“Extra” contains parameters required in function fun; a MATLAB cell structure.
“m” is the row dimension of the vector mapping, that is F : Rn → Rm ; a scalar.
“JPI” is the sparsity pattern and coloring information recorded for the current sparse
Jacobian Jacobian.
“verb” holds flags for display level. (See §4.3 for details)
• verb = 0: no display (the default).
• verb = 1: display the number of groups used.
• verb ≥ 2: display in graph upon termination.
“stepsize” is the step size for finite-differencing and is only needed when JPI is com-
puted by the finite-difference method, ‘f’. By default, stepsize = 1e-5.

Output arguments

“F ” is the function value at point x; a vector.


“J” is the Jacobian matrix at point x; a sparse matrix.

Example 4.1.1. Compute the Jacobian matrix of an arrow-


head function.

See DemoSprJac.m

Let y = F (x), F : Rn → Rn , where


y(1) = 2x(1)2 + ni=1 x(i)2
P
y(i) = x(i)2 + x(1)2 , i = 2 : n
The following is the arrowhead function in MATLAB.

function y= arrowfun(x,Extra)
y = x.*x;
y(1) = y(1)+x’*x;
y = y + x(1)*x(1);

The corresponding Jacobian matrix has an arrowhead sparsity structure, as shown


in Figure 4.1 for n = 50 with 148 nonzeros. The procedure to evaluate the Jacobian
J at an ”all ones” vector for n = 5, is as follows.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 26

10

15

20

25

30

35

40

45

50
0 5 10 15 20 25 30 35 40 45 50
nz = 148

Figure 4.1: Jacobian Sparsity pattern of arrowfun.

1. Set problem size.


>> n = 5;

2. Initialize x.
>> x = ones(n,1)
x=
1
1
1
1
1

3. Compute “JPI” corresponding to function arrowfun.

>> JPI = getjpi(’arrowfun’, n);

4. Compute the function value and the Jacobian (in MATLAB sparse format) based
on the computed “JPI”. Set the input argument, Extra, to ‘[ ]’ since it is empty.

>> [F, J] = evalj(’arrowfun’, x, [], n, JPI)

F=

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 27

7
2
2
2
2
J=
(1, 1) 6
(2, 1) 2
(3, 1) 2
(4, 1) 2
(5, 1) 2
(1, 2) 2
(2, 2) 2
(1, 3) 2
(3, 3) 2
(1, 4) 2
(4, 4) 2
(1, 5) 2
(5, 5) 2

The function “getjpi” determines the sparsity pattern of the Jacobian of arrowfun. It
only needs to be executed once for a given function. In other words, once the sparsity
pattern is determined by “getjpi”, ADMAT calculates the Jacobian at a given point
based on the pattern.

4.2 Sparse Hessian Computation


The process for determining sparse Hessian matrices is similar to the process for sparse
Jacobians.

Description of gethpi

Function gethpi computes the sparsity pattern of a Hessian matrix, and corresponding
coloring information. It is invoked as follows:

[HPI, SPH] = gethpi(fun, n, Extra, method, SPH)

Input arguments

“fun” is the function to be differentiated; a string representing the function name.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 28

“n” is the order of the Hessian; a scalar.


“Extra” stores parameters required in function fun (apart from the independent vari-
able); a MATLAB cell structure.
“method” sets techniques used to get the coloring information.

• method= ‘i-a’: the default, ignore the symmetry. Computing exactly using AD.

• method= ‘d-a’: direct method, using AD.

• method= ‘s-a’: substitution method using AD.

• method= ‘i-f’: ignore the symmetry and use finite differences (FD)

• method= ‘d-f’: direct method with FD.

• method= ‘s-f’: substitution method with FD.

Note that details for the different methods are given in [12, 14].

“SPH” is the user-specified sparsity pattern of Hessian in MATLAB sparse format.

Output arguments

“HPI” includes the sparsity pattern, coloring information and other information re-
quired for efficient Hessian computation.
“SPH” is sparsity pattern of Hessian, represented in the MATLAB sparse format.

Similar to ‘getjpi’, users can pass ‘SPH’ to function ‘gethpi’ when the sparsity pat-
tern of Hessian is already known. The computation of ‘HPI’ will be based on the user-
specified sparsity pattern.

Description of evalh

[v, grad, H] = evalh(fun, x, Extra, HPI, verb)

Input arguments

“fun” is function to be differentiated; a string representing the function name.


“x” is the independent variable; a vector.
“Extra” stores parameters required in function fun.
“HPI” is the sparsity pattern and coloring information of Hessian.
“verb” is flag for display level. (See §4.3)

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 29

• verb = 0: no display (the default).

• verb = 1: display the number of groups used.

• verb ≥ 2: display in graph upon termination.

Output arguments

“v ” is the function value at point x; a scalar.


“grad” is the gradient at point x; a vector.
“H” is Hessian at point x; a sparse matrix.

Example 4.2.1. Compute the Hessian of the Brown function


at point x′ = [1, 1, 1, 1, 1].

See DemoSprHess.m

1. Set problem size.

>> n = 5

2. Set independent variable.

>> x = ones(n,1)
x=
1
1
1
1
1

3. Compute relevant sparsity information encapsulated in “HPI”.

>> HPI = gethpi(‘brown’, n);

4. Evaluate the function value, gradient and Hessian at x. We set the input argu-
ment Extra to‘[ ]’ since it is empty.

>> [v, grad, H] = evalh(‘brown’, x, [], HPI)


v=
8

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 30

grad =
4 8 8 8 4
H=
(1, 1) 12
(2, 1) 8
(1, 2) 8
(2, 2) 24
(3, 2) 8
(2, 3) 8
(3, 3) 24
(4, 3) 8
(3, 4) 8
(4, 4) 24
(5, 4) 8
(4, 5) 8
(5, 5) 12
Similar to the sparse Jacobian situation, function gethpi encapsulates the sparsity
structure and relevant coloring information for efficient calculation of the sparse Hes-
sian H. Function gethpi only needs to be executed once for a given function. In other
words, once the sparsity pattern is determined by “gethpi”, ADMAT calculates the
Hessian at a given point based on the pattern.

4.3 Reporting
For both evalj and evalh, two different levels of display output are possible. Users can
choose the input argument verb to set up the display level. We illustrate the use of
verb with evalj.

We will repeat Example 4.1.1 here, but with the different values of verb.
• verb = 0. There is no information displayed.
• verb = 1. Verbose mode. For example, below is the the output produced by the
direct bi-coloring methods:

Number of Row groups = 1


Number of column groups = 2
Total Number of groups = 3
• verb ≥ 2. Additional sparsity patterns are shown. For example, for the bi-
coloring method, three subplots are shown in Figure 4.2. The upper left subplot

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 31

Sparsity Structure of J Jr: Part computed by reverse AD


0 0

2 2

4 4

6 6

8 8

10 10

0 5 10 0 5 10
nz = 28 nz = 9

Jc: Part computed by forward AD


0

10

0 5 10
nz = 19

Figure 4.2: Jacobian Sparsity patterns of arrowfun computed by bi-coloring method


with problem size 10.

shows the whole sparsity pattern of the Jacobian while the other two show the
sparsity patterns computed by the forward mode AD and the reverse mode AD
in the bi-coloring method, respectively.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 32

ADMAT: Automatic Differentiation Toolbox


Chapter 5

Advanced ADMAT

In the previous chapters, we introduced ADMAT fundamentals that allow for the
computation of gradients, Jacobians, and Hessians. In this chapter we introduce a
number of advanced features: Jacobian and Hessian sparsity pattern computation,
efficient structured gradient computation, and derivative computation by forward and
reverse mode with the direct use of the overloaded operations. These features can
help users compute the sparsity patterns without forming the Jacobian or Hessian
explicitly, reduce the space requirement for gradient computation by reverse mode,
and avoid the function definition restrictions on “feval” calls.

5.1 Computing Sparsity Patterns of Jacobian and


Hessian Matrices
ADMAT provides functions to compute the sparsity patterns of Jacobian and Hessian
matrices.

Sparsity Pattern of Jacobian

SPJ = jacsp(fun, m, n, Extra),

Input arguments

“fun” is the function to be differentiated; a string representing the function name.


“m” is the row dimension of the Jacobian; a scalar
“n” is the column dimension of the Jacobian; a scalar
“Extra” stores parameters required in fun; a MATLAB cell structure.

33
Cayuga Research 34

Output argument

“SPJ” is the sparsity pattern of the Jacobian; in MATLAB sparse format

Example 5.1.1. Compute the sparsity pattern of the Broy-


den function.

See DemoSP.m

1. Set problem size.

>> n = 5;

2. Compute the Jacobian sparsity pattern of the Broyden function.

>> SPJ = jacsp(’broyden’, n,n)


SPJ =
(1, 1) 1
(2, 1) 1
(1, 2) 1
(2, 2) 1
(3, 2) 1
(2, 3) 1
(3, 3) 1
(4, 3) 1
(3, 4) 1
(4, 4) 1
(5, 4) 1
(4, 5) 1
(5, 5) 1

Sparsity Pattern of Hessian

SPH = hesssp(fun, n, Extra)

Input arguments

“fun” is the function to be differentiated; a string representing the function name.


“n” is the order of the Hessian matrix; a scalar.
“Extra” stores parameters required in fun; a MATLAB cell structure.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 35

Output argument

“SPH” is the sparsity pattern of the Hessian; in MATLAB sparse format.

Example 5.1.2. Compute the sparsity pattern of the Hessian


of the Brown function.

See DemoSP.m

1. Set problem size.

>> n = 5;

2. Compute the sparsity pattern of the Hessian of the Brown function.

>> SPH = hesssp(’brown’, n)


SPH =
(1, 1) 1
(2, 1) 1
(1, 2) 1
(2, 2) 1
(3, 2) 1
(2, 3) 1
(3, 3) 1
(4, 3) 1
(3, 4) 1
(4, 4) 1
(5, 4) 1
(4, 5) 1
(5, 5) 1

5.2 Efficient Computation of Structured Gradients


As mentioned in Chapter 3, the reverse mode is preferable to the forward mode when
computing gradients in that the required computing time, in theory, is proportional
to the time required to compute the objective function f itself. This is optimal. How-
ever, reverse mode requires a large amount of memory since all intermediate results are
required to be saved. Thus, on occasion the memory requirement of the reverse mode
surpasses the internal (fast) memory available and the effective performance is signifi-
cantly degraded. Fortunately, many practical computing problems exhibit ‘structure’,

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 36

and this structure can be exploited to reduce the practical computing time [8].

First we define a structured function. Given a scalar-valued function z = f (x), f : Rn →


R, a structured computation is defined as follows:

Solve for y1 : F1 (x, y1 ) = 0,


Solve for y2 : F2 (x, y1 , y2 ) = 0,
..
.
Solve for yp : Fp (x, y1 , · · · , yp ) = 0,
Solve for z: z − f¯(x, y1 , · · · , yp ) = 0,

where f¯ is a scalar-valued function. If we define the “extended” function F̃ET =


(F1T , F2T , · · · , FpT ), then the program to evaluate f can be simply rewritten as,

Solve for y: F̃E (x, y) = 0


“Solve” for output z: z − f¯(x, y1 , y2, · · · , yp ) = 0.

Thus, the corresponding Jacobian of FE is in the form

(F̃E )x (F˜E )y
 
JE = .
∇x f¯T ∇y f¯T

The gradient of f , ∇x f , can be obtained from JE through a Schur-decomposition:


eliminate the (2,2)-block, ∇y f¯T , using a block Gaussian transformation and then the
transformed (2,1)-block will hold the desired result, i.e., ∇x f T [8].

The key point is that reverse mode automatic differentiation can be applied to the
various structured steps indicated above, in turn. This can result in considerable space
savings, and in practice, a shorter running time.

Next, we illustrate how to use this technique to reduce the memory requirements
of reverse mode AD.

Example 5.2.1. Computing the gradient while exploiting the


structure

See DemoStct.m

Consider the autonomous ODE


y ′ = F (y),

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 37

where F is the Broyden function. Suppose for an initial state y0 = x, we employ an


explicit Euler’s method to compute the approximation yk to a desired final state y(T ).
Then we estimate the error z = f0 (yk − y(T )), where f0 is the 2-norm.

The function z = f (x) is defined as follows:


y0 = x,
yi = S(yi−1) for i = 1, 2, · · · , k,
z = f0 (yk ),
where S : Rn → Rn is a one step of Euler’s method and f0 : Rn → R is the 2-norm.
The gradient of f (x) is
[∇f (x)]T = [∇f0 (yk )]T Jk−1 Jk−2 · · · J1 J0 ,
where Ji is the Jacobian of S at yi . There are at least three different methods to
compute the gradient.
1. Straightforward use of reverse mode. The reverse mode is used to calculate
the gradient of f (x)(refer to §5.5).
2. Straightforward use of forward mode. The forward mode is used to calcu-
late the gradient (refer to §5.3).
3. Sparse block computation with reverse mode. Since Ji is tridiagonal,
the efficient sparse Jacobian computation introduced in Chapter 4 is used to
compute each matrix Ji . Reverse mode is used to compute ∇f0 (yk ); finally, we
calculate the product, [∇f (x)V ]T . The corresponding source code for the im-
plementation is as follow.

%
% compute the gradient of f(x) by sparse blocks
%
%
% compute the sparsity of function S
JPI = getjpi(‘funcS’, n, n, Extra);
% evaluate J1, J2, ...., Jp, each with the same sparsity.
J = sparse(eye(n));
y = x;
for i = 1 : Extra.p
[y, J1] = evalj(‘funcS’, y, Extra, n, JPI);
J = J1*J;
end

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 38

1400

Straightforward reverse mode


1200 Sparse blocks
Straightforward forward mode

1000

Time Ratio
800

600

400

200

0
0 500 1000 1500 2000 2500 3000
Number of Steps (k)

Figure 5.1: The time ratio of ω(∇f )/ω(f ) for Euler’s method.

% compute the gradient of function f0, where f0 is 2-norm


myfun = ADfun(‘funcF0’,1);
options = setopt(‘revprod’, 1);
[f, gSpr]= feval(myfun, y, Extra, options);
gSpr = gSpr*J;

In the above source code, ADMAT is used to calculate J0 , J1 , · · · , Jk−1 through


‘evalj’. Since the matrices J0 , J1 , · · · , Jk−1 have the same structure, the sparsity
‘JPI’ only needs to be computed once. Then, the gradient, ∇f0 (yk ), is evaluated
by the reverse mode AD through ‘feval’.

Figure 5.1 plots the ratio of ω(∇f )/ω(f ), where ω(f ) is the execution time of evaluat-
ing function f , on a fixed problem size n = 800 with different number of Euler’s steps.
The vertical axis is the time ω(∇f ) taken to compute the gradient divided by the time
ω(f ) taken to evaluate the function. All calculations are done through ADMAT . The
horizontal axis is the number of steps for Euler’s method.

Figure 5.1 illustrates that the straightforward reverse mode performs very well for
small problems, but the computing time spikes upward when the internal memory
limitation is reached. The straightforward forward mode can outperform the reverse

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 39

mode when the memory requirements of reverse mode become excessive. The third
approach exploits the sparsity of Ji . This approach is the fastest of the three proce-
dures. For more details about the efficient computation of structured gradients, please
refer to [8].

5.3 Forward Mode AD


Using forward mode AD, ADMAT can easily compute the first derivatives of functions
which are defined using arithmetic operations and intrinsic functions of MATLAB. The
forward mode AD provides users more flexibility with ADMAT than just using ‘feval’.
Users can define their own functions as usual, no restriction on the number of input
arguments. When there is more than one input argument, the derivative with regard
to any input argument can be computed by the forward mode without any change to
the function definition. In this section, we will give several examples on how to use
the forward mode AD in ADMAT .

Descriptions of forward mode AD functions


• y = deriv(x, V),

Define a deriv class object for the forward mode. Each object of deriv class
is a MATLAB struct array with two fields: val and deriv.

Input arguments

“x” is the value of independent variable, value for the field val.
“V” is the value V of the product J × V , where J is the first derivative of the
function to be differentiated at point x, value for the field deriv.

Output arguments

“y” is an initialized object of deriv class.

• val = getval(y),

Get the value of the deriv class object y.

Input arguments

“y” is the object of class deriv.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 40

Output arguments

“val” is the value of y, that is y.val.

• ydot = getydot(y),

Get the derivative of the deriv class object y.

Input arguments

“y” is the object of class deriv.

Output arguments

“ydot” is the derivative of y, that is y.deriv.

Example 5.3.1. Compute the first derivative of f (x) = x2 at


x = 3.

See DemoFwd1.m

This example shows how to compute the function value and the first order deriva-
tive of y = x2 by using forward mode AD.

1. Define input argument x to be a deriv object with value 3 and derivative 1.

>> x = deriv(3,1) % create a deriv object with value 3 and derivative 1


val =
3
deriv =
1

2. Compute y = x2 .

y=xˆ2
val =
9
deriv =
6

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 41

3. get the value of y = x2

>> yval = getval(y)


yval =
9

4. get the first order derivative of y = x2 .

>> ydot = getydot(y)


ydot =
6

Example 5.3.2. Compute the first order derivative of matrix-


vector multiplication A ∗ x. (Of course the answer is matrix A itself!)

See DemoFwd1.m

Differentiating a matrix-vector multiplication:

1. Size of matrix A.

>> n = 5;

2. A is a 5 × 5 random matrix.

>> A = rand(n)
A=
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389

3. x is an all ones vector.

>> x = ones(n,1)

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 42

x=
1
1
1
1
1

4. Initialization

>> x = deriv(x, eye(n))


val =
1
1
1
1
1
deriv =
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1

5. Compute the multiplication of y = Ax.

>> y = A *x
val =
2.7913
2.7679
3.2772
2.4657
2.5448
deriv =
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389

6. Get the value of y = Ax.

>> yval = getval(y)

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 43

yval =
2.7913
2.7679
3.2772
2.4657
2.5448

7. Get the first order derivative of y,i.e., the Jacobian matrix.

>> ydot = getydot(y)


ydot =
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389

Note that the forward mode AD computes a product of the Jacobian matrix and V .
Users can specify their own matrix V when defining a deriv object. We set V as an
identity matrix in above example so that the Jacobian matrix, J, is obtained. If V
is not an identity matrix, then the forward mode returns the product J × V , rather
than J itself. If V is not specified in defining a deriv object, say x = deriv(a), then V
is set to be zero by default. The storage of deriv class will discuss in §5.4.

Example 5.3.3. Compute the Jacobian matrix of Broyden func-


tion at x = [1, 1, 1].

See DemoFwd2.m

This example illustrates how to use the ADMAT forward mode on the Broyden func-
tion.

Compute the user-specified function

1. Create a deriv class object


>> x = deriv([1,1,1], eye(3))
val =
111
deriv =

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 44

1 0 0
0 1 0
0 0 1

2. Call Broyden function at x


>> y = broyden(x)
val =
0
−1
1
deriv =
−1 −2 0
−1 −1 −2 .
0 −1 −1

3. Get the value of the Broyden function


>> yval = getval(y)
yval =
0
−1
1

4. Get the Jacobian matrix of the Broyden function


>> J = getydot(y)
J=
−1 −2 0
−1 −1 −2
0 −1 −1

5.4 Storage of deriv Class


After users specify matrix V , the number of columns of V is assigned to a global
variable globp, whose initial value is one.

• If x.val is a scalar, then its x.deriv field is a row vector, whose length equals globp;

• If x.val is a row/column vector, then its x.deriv field is a matrix, whose number
of columns equals to globp and number of rows equals the length of x.val;

• If x.val is a matrix, then its x.deriv field is a 3-D matrix, e.g., A(i, j, k). The
value of k varies from 1 to globp.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 45

• ADMAT does not support multi-dimensional arrays above 2D, so it returns an


error when the dimension of x.val is 3 or above.

Example 5.4.1. Storage examples for the deriv class objects.

See DemoStorage.m

This example illustrates the storage of deriv class object without user-specified the
second input argument, V, when the input value is a scalar, vector or matrix with
different values of globp.

• globp = 1

>> x = deriv(1)
val =
1
deriv =
0
>> x = deriv(ones(3,1))
val =
1
1
1
deriv =
0
0
0
>> x = deriv(ones(3))
val =
1 1 1
1 1 1
1 1 1
deriv =
0 0 0
0 0 0
0 0 0
• globp = 2

>> x = deriv(1)
val =
1

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 46

deriv =
0 0
>> x = deriv(ones(3,1))
val =
1
1
1
deriv =
0 0
0 0
0 0
>> x = deriv(ones(3))
val =
1 1 1
1 1 1
1 1 1
deriv(:,:,1) =
0 0 0
0 0 0
0 0 0
deriv(:,:,2) =
0 0 0
0 0 0
0 0 0

5.5 Reverse Mode AD


In the reverse mode, ADMAT uses a virtual tape to record all the intermediate values
and operations performed in the function evaluation. Computation of starts from the
end of the tape, a MATLAB global variable and goes backward through the tape. The
requested derivative is finally recorded at the beginning of the tape. In this section,
we will give examples of computing the first order derivative by the reverse mode AD.

Descriptions of reverse mode AD functions


• y = derivtape(x, flag),

Define a derivtape class for the reverse mode. Each derivtape object is a MAT-
LAB struct array with two fields: val and varcount.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 47

Input arguments

“x” is the value of the independent variable, value for the field val.
“flag” is a flag for the beginning of tape.

 “1” - create a derivtape object x , save x on the tape and set the number of
the cell storing x as the beginning of the tape.
 “0” - create a derivtape object and save the value to the tape;
 None - create a derivtape object, but do not save the value on the tape.

Output arguments

“y” is an initialized derivtape class object.

• val = getval(y),

Get the value of a derivtape class object.

Input arguments

“y” is a derivtape class object.

Output arguments

“val” is the value of y, that is y.val.

• JTW = parsetape(W),

Compute the product of J T ×W , where J T is the transpose of the first derivative


of the differentiation function.

Input arguments

“W” is the user input value.

Output arguments

“JTW” is the product of J T × W .

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 48

Example 5.5.1. Compute the first order derivative of MAT-


LAB operation y = x2 .

See DemoRvs1.m

1. Define a derivtape object and record data on a tape.


>> x = derivtape(3,1) % record value 3 at the beginning of the tape
val = < −− value recorded on the tape.
3
varcount = < −− place where the value is stored on the tape
1

2. Compute y = x2 .
>> y = x ˆ 2
val =
9
varcount =
2

3. Get value of y.
>> yval = getval(y)
yval =
9

4. Get the first order derivative of y.


>> ydot = parsetape(1)
ydot =
6

Example 5.5.2. Compute the first order derivative of matrix-


vector multiplication Ax.

See DemoRvs1.m

1. Set the problem size.

>> n = 5 % size of the matrix

2. Get a random matrix.

>> A = rand(n)

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 49

A=
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389
3. Define x as a derivtape object and set x as the beginning of the tape.

>> x = derivtape(ones(n,1),1)
val =
1
1
1
1
1
varcount =
1
4. Compute Ax.

>> y = A*x
val =
2.7913
2.7679
3.2772
2.4657
2.5448
varcount =
2
5. Get the value of y

>> yval = getval(y)


yval =
2.7913
2.7679
3.2772
2.4657
2.5448
6. Get the transpose of the first order derivative of y

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 50

>> JT = parsetape(eye(n))
JT =
0.9501 0.2311 0.6068 0.4860 0.8913
0.7621 0.4565 0.0185 0.8214 0.4447
0.6154 0.7919 0.9218 0.7382 0.1763
0.4057 0.9355 0.9169 0.4103 0.8936
0.0579 0.3529 0.8132 0.0099 0.1389

Note that the reverse mode AD computes a product of the transpose of Jacobian
matrix with W , that is J T W . Users can specify their own W when parsing the tape
by “parsetape(W)”. We set W as an identity matrix in above example so that J T is
obtained.

Example 5.5.3. Compute the transpose of Jacobian matrix,


J T , of the Broyden function at x = [1, 1, 1] in the reverse mode.

See DemoRvs2.m

1. Define the Broyden function fvec = broyden(x, Extra) as in §3.1.

2. Create a derivtape class object.

>> x = derivtape([1,1,1],1)
val =
111
varcount =
1

3. Call function Broyden function with x

>> y = broyden(x)
val =
0
−1
1
varcount =
40

4. Get the value of y

>> yval = getval(y)

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 51

yval =
0
−1
1
5. Get the transpose of Jacobian of y

>> JT = parsetape(eye(3))
JT =
−1 −1 0
−2 −1 −1
0 −2 −1
Note that if a user-specified function is a mapping from Rn to R, the Jacobian matrix
reduces to the gradient. Theoretically, the reverse mode AD computes the gradient
much faster than the forward mode AD does [9], but the reverse mode requires a large
amount memory space since it records each operation on a tape. This drawback some-
times leads to accessing low speed storage media which slows down the computation
significantly.

5.6 Computing Second-Order Derivatives


Both the forward and reverse modes are used in the second order derivatives compu-
tation. For a scalar-valued function f (x) : Rn → R, the forward mode is used to
compute w = (∇f )T V , then the reverse mode computes ( ∂w
∂x
)T , that is HV · W , where
H is the Hessian, since W usually has fewer columns than the number of variables of x.

Descriptions of functions for computing second order deriva-


tives
• y = derivtapeH(x, flag, V),

Define a derivtapeH class for computing the second order derivative.

Input arguments

“x” is the value of the independent variable.


“flag” is a flag for the beginning of tape.

 “1” - create a derivtapeH object x , save x on the tape and set the cell storing
x as the beginning of the tape.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 52

 “0” - create a derivtapeH object and save the value to the tape;
 None - create a derivtapeH object, but do not save the value on the tape.

“V” is the matrix for computing the product g T × V , where g is the first deriva-
tive of the function to be differentiated.

Output argument

“y” is an initialized derivtapeH class object, which takes two cells on tape. One
is for the independent variable x, the other is for V.

• val = getval(y),

Get the value of the derivtapeH class object.

Input argument

“y” is a derivtapeH class object.

Output argument

“val” is the value of derivtapeH class object.

• ydot = getydot(y),

Get the product g T × V of a derivtapeH class object.

Input argument

“y” is a derivtapeH class object.

Output argument

“val” is the product g T × V of the derivtapeH class object.

• HW = parsetape(W),

Compute the product of H × W , where H is the second order derivative of


the differentiation function.

Input argument

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 53

“W” is the value of product H × W .

Output argument

“HW is the product of H × W .


Example 5.6.1. Compute the first and second order derivatives
of y = x2.

See Demo2nd1.m
1. First define a derivtapeH object with value 3.

>> x = derivtapeH(3,1,1)
val =
3
varcount =
1
val =
1
varcount =
2
2. Compute y = x2 .

>> y = x ˆ 2
val =
9
varcount =
3
val =
6
varcount =
6

3. Get the value of y.

>> yval = getval(y)


yval =
9

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 54

4. Get the first order derivative of y.

>> y1d = getydot(y)


y1d =
6

5. Get the second order derivative of y.

>> y2d = parsetape(1)


y2d =
2

Example 5.6.2. Compute the gradient and Hessian of Brown


function.

See Demo2nd2.m

1. Set problem size.

>> n = 5 % problem size

2. >> x = ones(n,1)
x=
1
1
1
1
1

3. Define a derivtapeH object.

>> x = derivtapeH(x,1,eye(n))
val =
1
1
1
1
1
varcount =
7
val =

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 55

1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
varcount =
8

4. Call Brown function.

>> y = brown(x)
val =
8
varcount =
95
val =
4 8 8 8 4
varcount =
96

5. Get the value of y.

>> yval = getval(y)


yval =
8

6. Get the gradient of Brown function at x.

>> grad = getydot(y)


grad =
4 8 8 8 4

7. Get the Hessian of Brown function at x.

>> H = parsetape(eye(n))
H=
12 8 0 0 0
8 24 8 0 0
0 8 24 8 0
0 0 8 24 8
0 0 0 8 12

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 56

5.7 1-D Interpolation in ADMAT


1-Dimension interpolation is available in the current version of ADMAT under ‘interp1 AD’,
with the same input and output interfaces as the MATLAB 1-Dimension interpola-
tion, ‘interp1.m’. We did not overload the MATLAB interpolation function ‘interp1’;
instead, made a few changes to the Matlab function ‘interp1’ and its dependencies so
that ‘interp1 AD’ is consistent for use with ADMAT. The following example illustrates
the use of ‘interp1 AD’.

Example 5.7.1. 1-Dimension interpolation in ADMAT.

See DemoInter1.m

In this example, we will use two methods, linear interpolation method and cubic
spline interpolation method, to estimate the function value and Jacobian at point x0
with the ADMAT 1-Dimension interpolation function, ‘interp1 AD’.

1. Initial value for the interpolation.

>> x = 0 : 10;
>> y = sin(x);

2. Function points.

>> x0 = 0.2 : 0.2 : 1;

3. Length of function points.

>> n = length(x0);

4. Call ‘interp1 AD’ for interpolation.

Linear interpolation method.

(a) Linear interpolation at point x0 by forward mode AD.

>> xi = deriv(x0, eye(n));


>> yi = interp1 AD(x,y,xi);
>> y0 = getval(yi)
y0 =

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 57

0.1683 0.3366 0.5049 0.6732 0.8415


>> J = getydot(yi)
J=
0.8415 0 0 0 0
0 0.8415 0 0 0
0 0 0.8415 0 0
0 0 0 0.8415 0
0 0 0 0 0.0678
(b) Linear interpolation at point x0 by reverse mode AD.

>> xi = derivtape(x0,1);
>> yi = interp1 AD(x,y,xi);
>> y0 = getval(yi)
y0 =
0.1683 0.3366 0.5049 0.6732 0.8415
>> J = parsetape(eye(n))
J=
0.8415 0 0 0 0
0 0.8415 0 0 0
0 0 0.8415 0 0
0 0 0 0.8415 0
0 0 0 0 0.0678
Cubic spline interpolation method.
(a) Cubic spline interpolation at point x0 by forward mode AD.

>> xi = deriv(x0, eye(n));


>> yi = interp1 AD(x,y,xi, ‘spline’);
>> y0 = getval(yi)
y0 =
0.2181 0.4134 0.5837 0.7270 0.8415
>> J = getydot(yi)
J=
1.0351 0 0 0 0
0 0.9155 0 0 0
0 0 0.7859 0 0
0 0 0 0.6462 0
0 0 0 0 0.4965
(b) Cubic spline interpolation at point x0 by reverse mode AD.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 58

>> xi = derivtape(x0,1);
>> yi = interp1 AD(x,y,xi,‘spline’);
>> y0 = getval(yi)
y0 =
0.2181 0.4134 0.5837 0.7270 0.8415
>> J = parsetape(eye(n))
J=
1.0351 0 0 0 0
0 0.9155 0 0 0
0 0 0.7859 0 0
0 0 0 0.6462 0
0 0 0 0 0.4965
Note that some other interpolation methods, such as piecewise cubic Hermite inter-
polation method and nearest neighbor interpolation method, are also supported in
‘interp1 AD’. Users can refer to MATLAB ‘interp1’ help documentation for more usage
details of ‘interp1 AD’.

ADMAT: Automatic Differentiation Toolbox


Chapter 6

Newton Computations

ADMAT provides functions for the Newton step computation for nonlinear systems
and optimization. There are two basic options for users. There is the ‘traditional’
Newton computation - the Jacobian (or Hessian) is first computed and then the New-
ton step is determined by solving the linear Newton system - and there is the use of
an expanded Jacobian (Hessian) matrix formed through the use of structure [16]. The
latter can yield significant cost benefits. We will illustrate both approaches in this
chapter.

6.1 Traditional Newton Computation


The Newton computation is widely used in solving nonlinear equations and optimiza-
tion problems. For example, with respect to a nonlinear equation F (x) = 0, where
F : Rn → Rn , a Newton iteration defined at ‘current’ point x, is given by

Solve J(x)sN = −F (x),


Update x = x + sN ,

where J(x) is the n-by-n Jacobian of F (x). ADMAT is a good tool to compute the
Jacobian J(x).

If the Jacobian matrix J(x) is sparse, i.e., most of the entries of J remain zero for all x,
then ADMAT can be used both to determine this sparsity structure (once) and then
efficiently determine the values of the non-zero components of J(x) at each successive
point x. Each Jacobian evaluation is followed by a sparse linear solver to determine
the Newton step.

The following function, newton.m, is a pure Newton process. We do not recommend it

59
Cayuga Research 60

for general nonlinear systems and arbitrary starting points, since it may not converge,
but we include it here because it illustrates the use of ADMAT in the context of a
sparse nonlinear process. For general nonlinear systems, we recommend procedures
that force convergence from arbitrary starting points (i.e., use line search or trust
region strategy). MATLAB functions fsolve and fminunc are examples of such proce-
dures.

Function newton requires the user to pass several arguments: the name of the function
to be solved, starting point x, a solution tolerance, and a bound on the number of
iterations. newton will then apply the pure Newton process and return the solution
(with the given tolerance) or will return the iterate achieved upon exceeding the iter-
ation bound.

function [x, normy, it] = newton(func, x0, tol, itNum, Extra)


%
% A straight Newton process. Termination occurs when norm of the vector
% function func is less than the tolerance ’tol’ or the iteration count reaches the
% maximum number of iterations ’itNum’.
%
% INPUT
% func - nonlinear vector function
% x0 – initial value of x
% tol - stopping tolerance
% itNum - iteration count limit
% Extra – parameters for function ’func’
%
% OUTPUT
% x - approximate root of func as prodeuced be the Newton process
% normy – norm of function value at x
% it – number of iterations
%
if (nargin < 2)
error(’At least two input arguments are required.’);
end
switch (nargin)
case 2
tol = 1e-13;
itNum = 100;
Extra = [];

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 61

case 3
itNum = 100;
Extra = [];
case 4
Extra = [];
otherwise
end

x = x0; n = length(x0);
% initialize the iteration counter
it = 0;

% determine sparse Jacobian structure


if nargin < 5
JPI = getjpi(func,n);
else % Extra is not empty
JPI = getjpi(func, n,[], Extra);
end
normy = 1.0;

% Newton steps
while ( normy > tol) && (it < itNum)
% evaluate the function value and the Jacobian matrix at x
[y, J] = evalj(func, x, Extra, [],JPI);
delta = -J\y;
normy = norm(y);
x = x + delta;
it = it + 1;
end

Example 6.1.1. Apply the Newton process to the nonlinear


equation arrowfun(x) = 0, where arrowfun is defined in §4.1.

See DemoNewton.m

1. Set problem size.

>> n = 5;

2. Initialize the random seed.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 62

>> rand(‘seed’,0);
3. Set starting point.

>> x = rand(n,1)
x=
0.2190
0.0470
0.6789
0.6793
0.9347
4. Apply the Newton process to the system arrowfun(x) = 0

>> [x, normy, it] = newton(’arrowfun’, x, 1e-8, 50)


x=
1.0e-004 *
0.0668
0.0144
0.2072
0.2073
0.2852
normy =
8.4471e-009
it =
15

6.2 Structured Newton Computation


The main expense in the determination of a Newton step for a nonlinear system is
often the evaluation of the Jacobian (Hessian) matrix. In particular, when the Jaco-
bian is dense, evaluating the Jacobian matrix can be an expensive proposition (e.g.,
the cost of computing the Jacobian can be a factor of n times the cost of evaluating
the function F itself, where n is the number of columns in J.).

However, many (perhaps most) nonlinear systems with expensive dense Jacobians
show a certain structure in their computations and if the code to evaluate F is written
to expose this structure (illustrated below) then it turns out that the Newton step can
often be computed without fully computing the actual Jacobian matrix; this technique
can result in great cost savings [16].

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 63

Suppose that the computation z = F (x) can be broken down into the following ‘macro’
steps, performed in top-down order:

Solve for y1 : F1E (x, y1 ) = 0 

E
Solve for y2 : F2 (x, y1 , y2) = 0



.. ..

. . . (6.1)
E

Solve for yp : Fp (x, y1 , y2, · · · , yp ) = 0 


E

“Solve” for output z : z − Fp+1 (x, y1 , y2, · · · , yp ) = 0 

For convenience define the ‘extended’ function,


 E 
F1 (x, y1 , · · · , yp )
 F E (x, y1 , · · · , yp ) 
 2
..

E
F (x, y1 , · · · , yp ) =  .
 
 E . 
 Fp (x, y1 , · · · , yp ) 
E
Fp+1 (x, y1 , · · · , yp )

The Newton step for (6.1) can be written, as to solve


   
δx 0
 δy1   0 
   
J E  δy2  = −F E =  0 , (6.2)
   
 ..   .. 
 .   . 
δyp −F

where the square (extended) Jacobian matrix J E is a block lower Hessenberg matrix:
 ∂F1 ∂F1

∂x ∂y1
 ∂F2 ∂F2 ∂F2 
 ∂x ∂y1 ∂y2 
E
 .. .. .. .. 
J =
 . . . . 
 (6.3)
∂Fp ∂Fp ∂Fp

 ∂x ∂y1
··· ··· ∂yp


∂Fp+1 ∂Fp+1 ∂Fp+1
∂x ∂y1
··· ··· ∂yp

Note that in (6.3) the pair (F E , J E ) is evaluated at the current point x, and the current
vector y is implicitly defined by (6.1). If we label J E consistent with the partition
illustrated in (6.3),  
E A B
J = (6.4)
C D

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 64

then, assuming the uniqueness of the Newton step, matrix B is nonsingular and the
Newton step for the system F (X) = 0, at the current point, is

sN = −(C − DB −1 A)−1 F. (6.5)

Indeed, at point x, the Jacobian of F is the (Schur-complement) matrix J = (C −


DB −1 A), where all quantities are evaluated at x. Note that despite any possible
sparsity present in any of the matrices A, B, C, D the Jacobian J is almost surely
dense due to the application of B −1 . However, in many real applications matrix B,
especially, will be very sparse and it is cost effective in these cases to compute sN = δx
by solving the larger (but very sparse) system (6.2), taking advantage of both sparsity
and block-structure in J E . Users can refer to [16] for more details.

We illustrate the ‘structure’ ideas with the following example.

Consider the autonomous ODE,


y ′ = f (y),
suppose y(0) = f (x0), we use an explicit one-step Euler method to compute an ap-
proximation yk to a desired final state y(T ) = φ(u0). Thus, it leads to a recursive
function,
y0 = x
for i = 1, · · · , p
(6.6)
Solve for yi : yi − F (yi−1 ) = 0
Solve for z : z − yp = 0,
where F (yi ) = yi + h · f (yi ) and h is the step size of the Euler method. The corre-
sponding expanded function is
 
y1 − F (y0)
 y2 − F (y1) 
..
 
F E (x, y1 , · · · , yp ) =  .
 
 . 
 yp − F (yp−1 ) 
yp

The subsequent Newton process can be written as,


   
δx 0
 δy1   0 
   
J E  δy2  = −F E =  0 , (6.7)
   
 ..   .. 
 .   . 
δyp −G

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 65

where J E is the Jacobian matrix of F E (x, y1 , · · · , yp ) with respect to (x, y1 , · · · , yp )


and G is the function in nonlinear equation G(x) = 0, that is y −φ(u0) in this example.

Below is the description of newton expand. The subsequent example illustrates the
use of this function to solve the above ODE.

Description of newton expand

[x, fval, it] = newton expand (func, func y, funG, x0, tol, itNum, Extra, p)

Input arguments

“func” is the expanded function, F E , whose intermediate variables y1 , ..., yp


are treated as independent variables; a string.
“func y” is the function, F , which returns intermediate variables, y1 , · · · , yp
with input argument x; a string
“funG” is function G (the right hand side of the Newton process); a string
function
“x0” is the initial value of x; a vector
“tol” is the stopping tolerance; a scalar
“itNum” is the maximum number of iterations; a scalar
“Extra” stores parameters for function func; a MATLAB cell structure
“p” is the number of intermediate variables; a scalar

Output arguments

“x” is the computed solution; a vector


“fval” is function value at the solution; a scalar
“it” is the number of iterations; a scalar

Example 6.2.1. Solve the ODE using newton expand.

See DemoNewton Exp.m

All functions called in DemoNewton Exp.m can be found in Demos\Chapter 6 directory.

1. Set some initial values.

>> p = 5; % number of intermediate variables


>> N = 8; % size of original problem

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 66

>> tol = 1e-13; % convergence tolerance


>> itNum = 40; % maximum number of iterations
>> h = 1e-8; % step size
2. Initialize the random seed.

>> rand(‘seed’, 0);


3. Set the target function’s input point xT .

>> xT = rand(N,1)
xT =
0.2190
0.0470
0.6789
0.6793
0.9347
0.3835
0.5194
0.8310

4. Set the expanded function F E , F and G.

>> func = ‘Exp DS’; % Expanded function with independent variables, x,


>> % y1, y2,...., yM
>> func y = ‘func DS’; % function revealing relation between x, y1,y2,...,yM
>> funG = ’Gx’; % function on the right hand side of Newton process
5. Set parameters required in func = ‘Exp DS’ and func y.

>> Extra.N = N; % size of original problem


>> Extra.M = p; % number of intermediate variables
>> Extra.u0 = xT; % input variable for the target function φ(xT )
>> Extra.fkt = @fx; % function f on the right hand side of the ODE
>> Extra.phi = @exp; % target function φ
>> Extra.h = h; % step size of one step Euler method.

Note that, we set the right hand side function of the ODE to be f (x) = x
so that the actual solution of the above ODE is y = ex .
6. Set the starting point x.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 67

>> x = ones(n,1);
7. Solve the ODE by the expanded Newton computation.

>> [x, fval, it] = newton expand(func, func y, x, tol, itNum, Extra, p)
x=

1.2448
1.0482
1.9716
1.9725
2.5464
1.4674
1.6810
2.2955
fval =
1.0415e-015
it =
2
8. Compare the computed solution with the target function value.

>> The difference between computed solution with the target function value
norm(x-exp(xT)) = 2.605725e-007
Example 6.2.2. Newton step comparisons.

See DemoRawExp.m

Consider the composite function, F (x) = F̄ (A−1 F̃ (x)), where F̄ and F̃ are Broy-
den functions [7] (their Jacobian matrices are tridiagonal)
√ and√ the structure of A is
based on 5-point Laplacian defined on a square ( n + 2)-by-( n + 2) grid. For each
nonzero element of A, Aij is defined as a function of x, specifically, Aij = xj . Thus,
the nonzero elements of A depend on x; the structure of Ax · v, for any v, is equal to
the structure of A.

The evaluation of z = F (x) is a structured computation, F E (x, y1 , y2 ), defined by


the following three steps:

(1) Solve for y1 : y1 − F̃ (x) = 0 
(2) Solve for y2 : Ay2 − y1 = 0 . (6.8)
(3) Solve for z : z − F̄ (y2 ) = 0

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 68

Differentiating F E defined by (6.8) with respect to x, y1 , y2 , yields


−J˜ I 0
 

J E =  Ax y2 −I A  . (6.9)
0 0 J¯
The Newton step can be obtained by solving
   
δx 0
J E  δy1  =  0 . (6.10)
δy2 −F (x)
In this example, we consider two approaches to the Newton step computation.
• Straight Newton computation. In this approach, the structure is not ex-
ploited. Specifically, the Jacobian matrix J is formed by differentiating F using
the forward-mode automatic differentiation(AD) (equivalent in cost to obtain-
ing J column-by-column using forward finite-differences)[4]. Finally, the dense
system JsN = −F by using the Matlab linear system solver ‘\’.
• Structured Newton computation. This approach involves forming J E (6.9)
via structured automatic differentiation, i.e.. Then the expanded system (6.10)
is solved by using the Matlab linear system solver ‘\’.
Figure 6.1 plots the running times in seconds of one single step of the two Newton
approaches. The experiment was carried out using Matlab 6.5 (R13) on a laptop with
Intel 1.66 GHz Duo Core CPU and 1GB RAM. All the matrices in the experiments
are sparse except for the matrix J. Clearly, the Newton step computation is greatly
accelerated by exploiting the structure of F .

The structured Newton concept can also be applied to solving minimization prob-
lems. Suppose that the (sufficiently) smooth minimization problem,
min f (x) (6.11)
x

yields a corresponding Newton step with respect to the gradient,


sN = −H −1 (x)∇f (x). (6.12)
The gradient, ∇f (x), is the n-by-1 matrix of first derivatives,

 ∂f 
∂x1
 ∂f 
∂x2
∇f (x) =  (6.13)
 
 ... 

∂f
∂xn

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 69

2
10

Raw Newton
Structured Newton

1
10
CPU time (Sec)

0
10

−1
10

−2
10
0 49 81 121 169 225 289 361
Problem Size (n)

Figure 6.1: Computation times of one step of the straight Newton computation and
one step of the structured Newton computation.

i.e., ∇f T is the Jacobian of f ; H(x) is the Hessian matrix, i.e., the symmetric matrix
of second derivatives of f :

 
∂2f ∂2f
∂x21
··· ∂x1 ∂xn

H(x) = ∇2 f (x) =  .. .. .. 
(6.14)
 . . . 

∂2f ∂2f
∂xn ∂x1
··· ∂x2n

Following the form of (6.1) we define a structured scalar valued function: z = f (x):


Solve for y : F̃ E (x, y) = 0
(6.15)
“Solve” for z : z − f¯(x, y1 , · · · , yp ) = 0

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 70

   
F̃1E (x, y1 , · · · , yp ) y1
 F̃2E (x, y1 , · · · , yp )   y2 
where F̃ E = F̃ E (x, y1 , y2, · · · , yp) =   and y =  .. .
   
..
 .   . 
F̃pE (x, y1 , · · · , yp ) yp
Note that each yi , i = 1, · · · , p, is itself a vector (varying lengths). By our structured
assumption, F̃ E represents a triangular computation:


Solve for y1 : F̃1E (x, y1 ) = 0 

Solve for y2 : F̃2E (x, y1 , y2) = 0


.. (6.16)
. 


Solve for yp : F̃pE (x, y1 , y2, · · · , yp ) = 0

Similar to the Newton step computation for systems of nonlinear equations, a larger
but sparse system based on differentiating (6.15, 6.16) can be solved in order to obtain
the Newton step (6.12). The analogy to system (6.2) is, solve

   
δw 0
H E  δy  =  0  (6.17)
δx −∇x f

where sN , the Newton step defined in (6.12), satisfies sN = δx, and H E is a sym-
metric Hessian matrix,

F̃yE F̃xE
 
0
H E =  (F̃yE )T E T
(F̃yy ) w + ∇2yy f¯ (F̃yx
E T
) w + ∇2yx f¯  (6.18)
(F̃xE )T E T
(F̃xy ) w + ∇2xy f¯ (F̃xx
E T
) w + ∇2xx f¯

and w is the (vector) solution to the (typically) sparse system,


(F̃yE )T w = −∇y f¯. (6.19)
Using (6.19), H E in (6.18) can be rewritten
F̃yE F̃xE
 
0
H E =  (F̃yE )T (F̃yy ) w + ∇2yy f¯
E T
0 .
E T E T 2 ¯
(F̃x ) 0 (F̃xx ) w + ∇xx f

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 71

If we permute the columns and rows of H E

F̃xE F̃yE
   
0 A L
E E T 2 ¯ E T 
HP =  0 (F̃yy ) w + ∇yy f (F̃y ) =  ,
E T 2 ¯ E T B M
(F̃xx ) w + ∇xx f 0 (F̃x )

then the Hessian matrix of f is H = B − ML−1 A.

The key point is that a cost-effective alternative to forming the Hessian matrix H
and in turn solving HsN = −∇x f is to compute H E via sparse AD or finite-difference
technology and then solve (6.17) with a sparse solver; sN = δx.

Example 6.2.3 Structured Newton computation for solving a


minimization problem.

See DemoExpHess.m

T
We minimize z = F̄ (F̃ (x)) + xP x, wherePF̃ (x) is Broyden function, and F̄ (y) is a
n 2 n−1
scalar-valued function, F̄ (y) = i=1 yi + i=1 5yi yi+1 . The triangular computation
system (6.15) of z can be constructed as follows,


Solve for y : F̃ E (x, y) = y − F̃ (x) = 0,
“Solve” for z : z − f¯(x, y) = z − [F̄ (y) + xT x] = 0.

The corresponding expanded Hessian matrix H E is,

−J˜
 
0 I
H E =  −J˜T ∇2yy f¯ 0 ,
I 0 E T
(F̃xx ) w + ∇2xx f¯

where I is an n-by-n identity matrix, and w is equal to −∇y (f¯). Thus, the one
step of structured Newton computation for this minimization problem can be written
as follows.
1. Initial value of the problem size.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 72

>> n = 25; % problem size


>> I = eye(n);

2. Initialization. Variable ‘tildeFw’ represents the inner product, (F̃ E )T w.

>> TildeF = ’broyden’;


>> TildeFw = ’tildeFw’;
>> BarF = ’barF’;

3. Initialize x and values in Extra.

>> xstart = rand(n,1);


>> Extra.w = ones(n,1);
>> Extra.y = ones(n,1);
>> Extra.n = n;

4. Compute the sparsities of the constituent Jacobian and Hessian matrices.

>> JPI = getjpi(TildeF, n);


>> hpif = gethpi(BarF, n, Extra);
>> hpiFw = gethpi(TildeFw, n,Extra);

5. One step of structured Newton computations.

(a) Compute y, F̃xE and F̃yE = I.

>> [y, FEx] = evalj(TildeF, x, [], [], JPI);

(b) Compute ∇y f¯ and ∇2yy f¯.

>> Extra.y = x;
>> [z, grady, H2]= evalH(BarF, y, Extra, hpif);

(c) Set w to −∇y f¯.

>> Extra.w = -grady(:);

(d) Compute the function value, gradient and Hessian of (F̃ E )T w with respect
to x .

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 73

>> Extra.y = y;
>> [z, grad, Ht] = evalH(TildeFw, x, Extra, hpiFw);

(e) Construct the expanded Hessian matrix H E .

>> HE(1:n, n+1: 3*n) = [I, -FEx];


>> HE(n+1:3*n, 1:n) = [I; -FEx’];
>> HE(n+1:2*n, n+1:2*n) = H2;
>> HE(2*n+1:3*n, 2*n+1:3*n) = Ht+2*I;

(f) Compute the function value and gradient of the original function.

>> myfun = ADfun(‘OptmFun’, 1);


>> [nf , gradx] = feval(myfun, x, Extra);

(g) Solve the Newton system and update x.

>> HE = sparse(HE);
>> d = -HE [zeros(2*n,1); gradx(:)];
>> x = x + d(2*n+1:3*n);

Note that this section only gives samples of structured Newton computation for solving
nonlinear equations and minimization problems. Users can refer to [16] for details
about its advantages, applications, performances and parallelism.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 74

ADMAT: Automatic Differentiation Toolbox


Chapter 7

Using ADMAT with the MATLAB


Optimization Toolbox

The MATLAB optimization toolbox includes solvers for many nonlinear problems,
such as multidimensional nonlinear minimization, nonlinear least squares with upper
and lower bounds, nonlinear system of equations, and so on. Typically, these solvers
use derivative information such as gradients, Jacobians, and Hessians. In this chap-
ter, we illustrate how to conveniently use ADMAT to accurately and automatically
compute derivative information for use with the MATLAB Optimization Toolbox.

7.1 Nonlinear Least Squares Solver ‘lsqnonlin’


MATLAB provides a nonlinear least squares solver, “lsqnonlin ”, for solving nonlinear
least squares problems,

min kF (x)k22 s.t. l ≤ x ≤ u,

where F (x) maps Rm to Rn and k k is 2-norm. By default this solver employs the finite
difference method to estimate gradients and Hessians (unless users specify an alter-
native). ADMAT, for example, can be specified as an alternative to finite differences.
Users just need to set up a flag in input argument ‘options’ without changing any orig-
inal codes. The following examples illustrate how to solve nonlinear least squares with
ADMAT used to compute derivatives. For the details of input and output arguments
of the MATLAB solver, ‘lsqnonlin’, please refer to MATLAB help documentation.

75
Cayuga Research 76

Example 7.1.1. Solving a nonlinear least squares problem us-


ing ADMAT to compute the Jacobian matrix.

See DemoLSq.m

Solve the nonlinear least squares problem,

min kF (x)k22 s.t. l ≤ x ≤ u,

where F (x) is the Broyden function defined in §3.2.

1. Set problem size.

>> n = 5;

2. Initialize the random seed.

>> rand(‘seed’ ,0);

3. Initialize x.

>> x0 = rand(n,1)
x0 =
0.2190
0.0470
0.6789
0.6793
0.9347
4. Set the lower bound for x.

>> l = -ones(n,1);

5. Set the upper bound for x.

>> u = ones(n,1);

6. Get the default value of ‘options’ of MATLAB ‘lsqnonlin’ solver.

>> options = optimset(‘lsqnonlin’);

7. Turn on the Jacobian flag in input argument ‘options’. This means that the user
will provide the method to compute Jacobians (In this case, the use of ADMAT).

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 77

>> options = optimset(options, ‘Jacobian’, ‘on’);

8. Set the function to be differentiated by ADMAT. The function call ‘feval’ is


overloaded by the one defined in ADMAT, which returns the function value and
Jacobian on each ‘feval’ call (See Chapter 3.2).

>> myfun = ADfun(’broyden’, n);

9. Call ‘lsqnonlin’ to solve the nonlinear least squares problem using ADMAT to
compute derivatives.

>> [x, RNORM] = lsqnonlin(myfun, x0, [],[], options)


x=
−0.1600
0.2584
0.9885
1.0000
0.2120
RNORM =
0.7375

7.2 Multidimensional Nonlinear Minimization Solvers


‘fmincon’ and ‘fminunc’
Consider a multidimensional constrained nonlinear minimization problem,

min f (x)
s.t. Ax ≤ b, Aeq ∗ x = B (linear constraints)
c(x) ≤ 0, ceq(x) ≤ 0 (nonlinear constraints)
l ≤ x ≤ u,

where f (x) maps Rn to a R1 . The MATLAB solver for this problem is ‘fmincon’.

The unconstrained minimization problem is simply

min f (x),

where f (x) maps Rn to a scalar. This unconstrained problem can be solved by ‘fmi-
nunc’. ADMAT can be used to compute the gradient or both the gradient and the
Hessian matrix.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 78

Example 7.2.1 Solve unconstrained nonlinear minimization


problems using ADMAT.

See DemoFminunc.m

Solve the nonlinear minimization problem,

min brown(x),

where brown(x) is the Brown function defined in §3.1. In this example, we will solve
the problem twice. In the first time, ADMAT is used to compute gradients only, and
Hessians are estimated by the default finite difference method. In the second solution,
ADMAT is used to compute both gradients and Hessians.

1. Set problem size.

>> n = 5;

2. Initialize random seed.

>> rand(‘seed’, 0);

3. Initial value of x.

>> x0 = rand(n,1)
x0 =
0.2190
0.0470
0.6780
0.6793
0.9347

4. Set the function to be differentiated by ADMAT. The function call ‘feval’ is over-
loaded by the one defined in ADMAT. It can return the function value, gradient
and Hessian in each ‘feval’ call (See §3.1 for details).

>> myfun = ADfun(‘brown’,1);

5. Get the default value of ‘options’.

>> options = optimset(‘fminunc’);

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 79

6. Solve the problem using ADMAT to determine gradients (but not Hessians).

(a) Turn on the gradient flag in input argument ‘options’ (but not the Hessian
flag). Thus, the solver will use ADMAT to compute gradients, but will
estimate Hessians by finite difference method.

>> options = optimset(options, ‘GradObj’, ‘on’);

(b) Call the MATLAB constrained nonlinear minimization solver ‘fmincon’ with
ADMAT used to determine gradients only.

>> [x,FVAL] = fminunc(myfun,x0,options)


x=
1.0e − 004 *
−0.1487
−0.1217
0.2629
−0.0173
−0.5416
FVAL =
4.8393e − 009

7. Solve the problem using ADMAT to compute both gradients and Hessians.

(a) Turn on both gradient and Hessian flags in input argument ‘options’. Thus,
the solver will use the user specified method (ADMAT) to compute both
gradients and Hessians.

>> options = optimset(options, ‘GradObj’, ‘on’);


>> options = optimset(options, ‘Hessian’, ‘on’);

(b) Call the MATLAB constrained nonlinear minimization solver ‘fmincon’ us-
ing ADMAT to compute derivatives.

>> [x,FVAL] = fminunc(myfun,x0,options)


x=
1.0e − 004 *

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 80

−0.1487
−0.1217
0.2629
−0.0173
−0.5416
FVAL =
4.8393e − 009
Example 7.2.2 Solve the constrained nonlinear minimization
problems using ADMAT.

See DemoFmincon.m

Solve the nonlinear minimization problem,


min brown(x), l ≤ x ≤ u,
where brown(x) is the Brown function defined in Chapter 3.
1. Set problem size.

>> n = 5;
2. Initialize the random seed.

>> rand(‘seed’, 0);


3. Initial value of x.

>> x0 = rand(n,1)
x0 =
0.2190
0.0470
0.6789
0.6793
0.9347
4. Set the lower bound for x.

>> l = -ones(n,1);
5. Set the upper bound for x.

>> u = ones(n,1);

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 81

6. Set the function to be differentiated by ADMAT.

>> myfun = ADfun(‘brown’,1);

7. Get the default value of ‘options’.

>> options = optimset(‘fmincon’);

8. Setting up ‘options’ so that both gradients and Hessians are computed by AD-
MAT.

>> options = optimset(options, ‘GradObj’, ‘on’);


options = optimset(options, ‘Hessian’, ‘on’);

9. Call MATLAB constrained nonlinear minimization solver ‘fmincon’ with AD-


MAT to compute derivatives.

>> [x,FVAL] = fmincon(myfun,x0,[],[],[],[],l,u,[], options)


x=
1.0e − 007 *
−0.0001
0.0000
−0.1547
−0.1860
0.0869
FVAL =
1.2461e − 015

In summary, ADMAT can be conveniently linked to MATLAB nonlinear least squares


solver ‘lsqnonlin’ and nonlinear minimization solvers ‘fminunc’ and ‘fmincon’ in two
steps:

1. Set up the Jacobian, gradient and Hessian flags in the input argument ‘options’.

2. Set the function to be differentiated by ADMAT using the ‘ADfun’ function call
to overload ‘feval’.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 82

ADMAT: Automatic Differentiation Toolbox


Chapter 8

Combining C/Fortran with


ADMAT

ADMAT can differentiate any function defined in an M-file. ADMAT cannot be ap-
plied to any external files, such as MEX files. However, ADMAT can be combined
with finite-differencing to enable M-file/external file combinations.

Example 8.1.1 Compute the Jacobian of the Broyden func-


tion (which is programmed in C.)

See mexbroy.c and CBroy.m

The C file, mexbroy.c, for the Broyden function and its MEX function are as fol-
lows.

/*************************************************************
%
% Evaluate the Broyden nonlinear equations test function.
%
%
% INPUT:
% x - The current point (row vector).
%
%
% OUTPUT:
% y - The (row vector) function value at x.
%
****************************************************************/

83
Cayuga Research 84

♯include “mex.h”
♯define x IN 0
♯define y OUT 0

extern void mexbroy(int n, double *x, double *y)


{
int i;
y[0] = (3.0-2.0*x[0])*x[0]-2.0*x[1]+1.0;
y[n-1] = (3.0-2.0*x[n-1])*x[n-1]-x[n-2]+1.0;

for(i=1; i<n-1; i++)


y[i]= (3.0 - 2.0*x[i])*x[i]-x[i-1]-2.0*x[i+1] + 1.0;

// MEX Interface function


void mexFunction(int nlhs, mxArray *plhs[], int nrhs, mxArray, *prhs[])
{
// define input and output variables
double *x, *y;
int n;
if (nrhs ! = 1)
mexErrMsgTxt(”One input required”);
else if(nlhs >1)
mexErrMsgTxt(”Too many output argument”);

n = mxGetM(prhs[x IN]);
if (n == 1)
n = mxGetN(prhs[x IN]);

plhs[y OUT] = mxCreateDoubleMatrix(n, 1, mxREAL);

x = mxGetPr(prhs[x IN]);
y = mxGetPr(plhs[y OUT]);

mexbroy(n, x, y);

return;
}

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 85

Note that mexFunction is an interface to integrate C/Fortran functions with MAT-


LAB. It transfers input and output arguments between MATLAB and C/Fortran.
After defining the Broyden and its MEX function, type ‘mex mexbroy.c’ in the MAT-
LAB prompt to compile the MEX file.

Once the compilation succeeds, the Broyden function can be called as a MATLAB
function. File CBroy.m integrates mexbroy.c file into ADMAT via finite-differencing.

function y = CBroy(x, Extra)


% y = CBroy(x, Extra)
%
% compute the broyden function at x by C subroutine, mexbroy.c
% Mapping, CBoy : Rn − − − − > Rn
%
%
% INPUT:
% x - The current point (column vector). When it is an
% object of deriv class, the fundamental operations
% will be overloaded automatically.
% Extra - Parameters required in CBroy.
%
%
% OUTPUT:
% y - The (vector) function value at x. When x is an object
% of deriv class, y will be an object of deriv class as well.
% There are two fields in y. One is the function value
% at x, the other is the Jacobian matrix at x.
%
%
% This is an example of how to use the finite difference method to add
% existing C subroutine into ADMAT package. Users can call this function
% as any forward mode functions in ADMAT .
%
%
% Cayuga Research
% July 2008
%

global globp;

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 86

global fdeps;

n = length(x);
if isa(x, ’deriv’) % x is an objective of deriv class
val = getval(x); % get the value of x
drv = getydot(x); % get the derivative part of x
y = mexbroy(val); % compute the function value at x
ydot = zeros(getval(n), globp); % initialize the derivative of y
% compute the derivative by finite difference method
for i = 1 : globp
tmp = mexbroy(val + fdeps*drv(:,i));
ydot(:,i) = (tmp - y)/fdeps;
end

% set y as an objective of deriv class


y =deriv(y, ydot);
else
y = mexbroy(x);
end
The global variable fdeps is the step size used in finite differencing. Its default value
is 1e-6. Users can specify their own step size.

The final finite-difference calculation is illustrated below.

1. Set problem size.

>> n = 5;

2. Define a deriv input variable.

>> x = ones(n,1);
>> x = deriv(x, eye(n));

3. Compute the Jacobian by finite differencing.

>> y = CBroy(x)
val =

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 87

0
−1
−1
−1
1
deriv =
−1.0000 −2.0000 0 0 0
−1.0000 −1.0000 −2.0000 0 0
0 −1.0000 −1.0000 −2.0000 0
0 0 −1.0000 −1.0000 −2.0000
0 0 0 −1.0000 −1.0000
4. Extract Jacobian matrix from the finite differencing.

>> JFD = getydot(y)


JFD =
−1.0000 −2.0000 0 0 0
−1.0000 −1.0000 −2.0000 0 0
0 −1.0000 −1.0000 −2.0000 0
0 0 −1.0000 −1.0000 −2.0000
0 0 0 −1.0000 −1.0000

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 88

ADMAT: Automatic Differentiation Toolbox


Chapter 9

Troubleshooting

Below we list some potential problems that may occur in the use of ADMAT.

1. ??? Conversion to double from deriv is not possible.

This usually means a deriv class object is assigned to a double class variable.
Check both sides on the assignment and make sure both sides are of the same
type.

2. ??? Error using ==> XXX


Function ‘XXX’ not defined for variables of class ’deriv’.

Some MATLAB functions are not overloaded in ADMAT yet. Please contact
Cayuga Research for extending ADMAT to the MATLAB function of your in-
terest.

3. ??? Undefined function or variable ‘deriv’.

ADMAT is not installed yet. Please refer to Chapter 2 to make sure ADMAT
is properly installed.

4. ??? Error using ==> deriv/deriv


Please restart ADMAT. There may be a license problem.

ADMAT detects a possible license error. Please restart ADMAT. If you got
the following message,

“The ADMAT 2.0 license has expired. Please contact Cayuga Research for a li-
cense extension”,

89
Cayuga Research 90

it means the license for ADMAT 2.0 expired. Please contact us for renewing
license.

5. Do not use Matlab command “clear all” to clear your workspace while using
ADMAT. This will remove all ADMAT global variables from memory: unpre-
dictable errors may then occur. Instead, use “clear” selectively as needed.

6. ADMAT cannot perform 3-D or higher operations. ADMAT only performs the
1-D and 2-D matrix operations.

7. Derivatives are incorrect. Please make sure following issues was checked.

• “clear all” was not called before using the ADMAT.


• Make the data type of the dependent variable consistent with that of the
input independent variable in user-defined function (See §3.4 for details).

If there is still an error, please contact Cayuga Research for further help.

ADMAT: Automatic Differentiation Toolbox


Appendix A

Applications of ADMAT

In this chapter we illustrate two applications of ADMAT. First, we show how to trigger
the quasi-Newton computation in MATLAB unconstrained nonlinear minimization
solver ‘fminunc’ with ADMAT used to determine gradients. Second, we present a
sensitivity problem.

A.1 Quasi-Newton Computation


The MATLAB multidimensional unconstrained nonlinear minimization solver, ‘fmin-
unc’, uses the quasi-Newton approach when the user chooses the medium-scale option
(to classify the size of the problem being solved). This method updates the inverse of
Hessian directly by the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula and, by
default, estimates gradients by finite differencing. However, it allows users to specify
their own gradients computation method to replace the finite difference method. The
following example shows how to trigger the quasi-Newton computation in ‘fminunc’
with ADMAT used to determine gradients.

Example A.1.1 Find a minimizer of the Brown function using


the quasi-Newton approach in fminunc (with gradients deter-
mined by ADMAT.)

See DemoQNFminunc.m

1. Set the problem size.

>> n = 5;

91
Cayuga Research 92

2. Initialize the random seed

>> rand(’seed’,0);

3. Initialize x.

>> x0 = rand(n,1)
x0=
0.2190
0.0470
0.6789
0.6793
0.9347

4. Set the function to be differentiated by ADMAT.

>> myfun = ADfun(‘brown’,1);


Note: the second input argument in ADfun, ‘1’, is a flag indicating a scalar
mapping, f : Rn → R1 ; more generally, the second argument is set to ‘m’ for a
vector-valued function, F : Rn → Rm .

5. Get the default value of argument ‘options’ .

>> options = optimset(‘fmincon’);

6. Turn on the gradient flag of input argument ‘options’. Thus, the solver uses
user-specified method to compute gradient (i.e., ADMAT).

>> options = optimset(options, ‘GradObj’, ‘on’);

7. Set the flag of ‘LargeScale’ to off, so that the quasi-Newton method will be used.

>> options = optimset(options, ‘LargeScale’, ‘off’);

8. Solve the constrained nonlinear minimization by the quasi-Newton method in


‘fmincon’ with ADMAT used to compute gradients.

>> [x,FVAL] = fminunc(myfun,x0, options)


x=
1.0e-004 *

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 93

−0.1487
−0.1217
0.2629
−0.0173
−0.5416
FVAL =
4.8393e-009

A.2 A Sensitivity Problem


This example is concerned with sensitivity with respect to problem parameters, at the
solution point. Consider a nonlinear scalar-valued function f (x, µ), where x is the in-
dependent variable and µ is a vector of problem parameters. This example illustrates
that ADMAT can be used to both compute derivatives with respect to x, in order to
minimize f (x, µ) with respect to x (for a fixed value of µ), and analyze the sensitivity
of f (xopt , µ) with respect to µ at the solution point.

Example A.2.1. Analyze the sensitivity of brownv(x, V ) func-


tion with respect to µ at the optimal point xopt with V = 0.5.

See DemoSens.m

The function brownv(x, V ) is similar to the Brown function defined in §3.1. Its defi-
nition is as follows.

function f = brownv(x,V)
% length of input x
n=length(x);
% if any input is a ‘deriv’ class object, set n to ‘deriv’ class as

if isa(x, ‘deriv’) k isa(V, ‘deriv’)


n = deriv(n);
end
y=zeros(n,1);
i=1:(n-1);
y(i)=(x(i). ˆ 2). ˆ (V*x(i+1). ˆ 2+1)+((x(i+1)+0.25). ˆ 2). ˆ (x(i). ˆ 2+1) + ...
(x(i)+0.2*V). ˆ 2;
f = sum(y);
tmp = V’*x;
f = f - .5*tmp’*tmp;

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 94

1. Set the problem size.

>> n = 5;

2. Set the initial value of V to 0.5. We first solve the minimization problem,
min brownv(x, V ), at V = 0.5, then analyze the sensitivity of brownv(xopt , V )
with respect to V .

>> V = 0.5

3. Initialize the random seed.

>> rand(‘seed’,0);

4. Set the initial value of x.

>> x0 = 0.1*rand(n,1)
x0 =
0.2190
0.0470
0.6789
0.6793
0.9347

5. Solve the unconstrained minimization problem, min brownv(x, V ), using the


MATLAB optimization solver ‘fminunc’ and with ADMAT as a derivative com-
putation method. (See §7.2 for details).

>> options = optimset(‘fminunc’);


>> myfun = ADfun(’brownv’, 1);
>> options = optimset(options, ’GradObj’, ’on’);
>> options = optimset(options, ’Hessian’, ’on’);
>> [ x1, FVAL1 ] = fminunc(myfun, x0, options, V);

6. Set the parameter V to ‘deriv’ class.

>> V = deriv(V,1);

7. Compute the function value of brownv(x, V ) at the optimal point x1 with ‘deriv’
class objective V .

>> f = brownv(x1, V)

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 95

val =
0.0795
deriv =
-0.0911

8. Get the sensitivity of V at optimal point xopt .

>> sen = getydot(f)


sen =
-0.0911

In summary, analyzing the sensitivity f (x, µ) with respect to µ at x = xopt requires


two steps.

• Solve the minimization problem of f (x, µ) with respect to x at a fixed value µ.

• Differentiate the function f (x, µ) with respect to µ at the optimal point xopt to
get the sensitivity of f (xopt , µ) with respect to µ.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 96

ADMAT: Automatic Differentiation Toolbox


Bibliography

[1] autodiff.org, www.autodiff.org, 2008.


[2] C. H. Bischof, H. Martin Bücker, B. Lang, A. Rasch and A. Vehreschild, Com-
bining source transformation and operator overloading techniques to compute
derivatives for MATLAB programs, Proceedings of the Second IEEE Interna-
tional Workshop on Source Code Analysis and Manipulation (SCAM 2002), IEEE
Computer Society, 2002, 65–72.
[3] C. H. Bischof, B. Lang and A. Vehreschild, Automatic differentiation for MAT-
LAB programs, Proceedings in Applied Mathematics and Mechanics, 2003, 50–53
[4] C. H. Bischof, A. Carle, G. F. Corliss, A. Griewank and P. D. Hovland, ADIFOR:
generating derivative codes from Fortran programs, Scientific Programming, Vol.
1, 1992, 11–29.
[5] C. H. Bischof, A. Carle, P. Khademi and A. Mauer, ADIFOR 2.0: automatic
differentiation of Fortran 77 programs, IEEE Computational Science and Engi-
neering, Vol. 3, 1996, 18–32.
[6] C. H. Bischof L. Roh and A. Mauer, ADIC — an extensible automatic differen-
tiation tool for ANSI-C, Software–Practice and Experience, Vol. 27, 1997, 1427–
1456.
[7] C.G. Broyden, A class of methods for solving nonlinear simultaneous equations,
Mathematics of Computaions, Vol. 19, No.92, 1965, 577-593.
[8] T.F. Coleman and G.F. Jonsson, The efficient computation of structured gradients
using automatic differenciation, SIAM J. Sci. Comput. Vol. 20, 1999, 1430–1437.
[9] T.F. Coleman and J.J. Moré, Estimation of sparse Jacobian matrices and graph
coloring problems, SIAM J. Numer. Anal., Vol. 20, No.1, 1983, 187-209.
[10] T.F. Coleman and J.J. Moré, Estimation of sparse Hessian matrices and graph
coloring problems , Math Programming, Vol. 28, 1984, 243-270.

97
Cayuga Research 98

[11] T. F. Coleman and A. Verma, Structure and efficient Hessian calculation, Ad-
vances in Nonlinear Programming, Yaxiang Yuan(ed.), 1998, 57-72.

[12] T. F. Coleman and A. Verma, ADMAT: An automatic differentiation toolbox for


MATLAB, Proceedings of the SIAM Workshop on Object Oriented Methods for
Inter-Operable Scientific and Engineering Computing, SIAM, Philadelphia, PA,
1998.

[13] T. F. Coleman and A. Verma, The eifficient computation of sparse Jacobian


matrices using automatic differentiation, SIAM, J. Sci. Comput., Vol. 19, 1998,
1210-1233.

[14] T. F. Coleman and A. Verma, ADMIT-1: Automatic differentiation and MAT-


LAB interface toolbox, ACM Transactions on Mathematical Software, Vol. 26,
2000, 150-175.

[15] T. F. Coleman, F. Santosa and A. Verma, Semi-Automatic differentiation, Pro-


ceedings of Optimal Design and Control Workshop, VPI, 1997.

[16] T. F. Coleman and W. Xu, Fast Newton computations, SIAM J. Sci. Comput.,
Vol.31, 2008, 1175-1191.

[17] S. A. Forth, An Efficient Overloaded Implementation of Forward Mode Automatic


Differentiation in MATLAB, ACM Transactions on Mathematical Software, vol.
32, 2006, 195–222.

[18] A. Griewank, Some bounds on the complexity gradients, Compleity in Nonlinear


Optimization, P. Pardalos, Ed. World Scientific Publishing Co., Inc., River Edge,
NJ, 1993, 128-161.

[19] A. Griewank and G.F. Corliss, Eds, Automatic Differentiation of Algorithms:


Theory, Implementation and Applications, SIAM, Philadelphia, PA, 1991.

[20] A. Griewank, D. Juedes and J. Utke, Algorithm 755: ADOL-C: A Package for the
Automatic Differentiation of Algorithms Written in C/C++, ACM Transactions
on Mathematical Software, vol 22, 1996, 131–167.

[21] Mathematics and Computer Science Division, Argonne National Laboratory,


Center for High Performance Software Research, Rice University and Compu-
tational Engineering Research Group, RWTH Aachen, Germany, http://www-
unix.mcs.anl.gov/OpenAD, 2008.

ADMAT: Automatic Differentiation Toolbox


Cayuga Research 99

[22] The MathWorks Inc., 3 Apple Hill Drive, Natick MA


01760-2098, C and Fortran API Reference , July 2008,
http://www.mathworks.com/access/helpdesk/help/pdf doc/matlab/apiref.pdf.

[23] C. W. Straka, ADF95: Tool for automatic differentiation of a FORTRAN code


designed for large numbers of independent variables, Computer Physics Commu-
nications, Vol. 168, 2005, 123–139.

[24] S. Stamatiadis, R. Prosmiti and S. C. Farantos, auto deriv: Tool for automatic
differentiation of a fortran code, Comput. Phys. Commun., Vol. 127, 2000, 343-
355.

ADMAT: Automatic Differentiation Toolbox

You might also like