0% found this document useful (0 votes)
127 views103 pages

Automatic Differentiation: Hamid Reza Ghaffari, Jonathan Li, Yang Li, Zhenghua Nie

Automatic differentiation is a technique for accurately calculating derivatives of functions defined by computer programs. It works by applying the chain rule to the sequence of elementary operations that comprise the function. This avoids truncation errors and development time compared to other methods like hand coding or symbolic differentiation. The document goes on to describe forward and reverse mode automatic differentiation, relevant software tools, and complexity analyses of the different approaches.

Uploaded by

Musa Mohammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views103 pages

Automatic Differentiation: Hamid Reza Ghaffari, Jonathan Li, Yang Li, Zhenghua Nie

Automatic differentiation is a technique for accurately calculating derivatives of functions defined by computer programs. It works by applying the chain rule to the sequence of elementary operations that comprise the function. This avoids truncation errors and development time compared to other methods like hand coding or symbolic differentiation. The document goes on to describe forward and reverse mode automatic differentiation, relevant software tools, and complexity analyses of the different approaches.

Uploaded by

Musa Mohammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Automatic Differentiation

Automatic Differentiation
Hamid Reza Ghaffari , Jonathan Li, Yang Li, Zhenghua Nie
Instructor: Prof. Tamas Terlaky
School of Computational Engineering and School
McMaster University

March. 23, 2007

Automatic Differentiation

Outline
1
2

Introductions
Forward and Reverse Mode
Forward methods
Reverse methods
Comparison
Extended knowledge
Case Study
Complexity Analysis
Forward Mode
Complexity
Reverse Mode
Complexity

AD Softwares
AD tools in MATLAB
AD in C/C++ (ADIC)
Developers
introduction
ADIS Anatomy
ADICProcess
Example
Handling Side Effects
References

Automatic Differentiation
Introductions

Why Do we Need Derivatives?


Optimization via gradient method.
Unconstrained Optimization minimize y = f (x) requires
gradient or hessian.
Constrained Optimization minimize y = f (x) such that
c(x) = 0 also requires Jacobian Jc(x) = [cj /xi ].

Solution of Nonlinear Equations f (x) = 0 by Newton


Method


f (x n ) 1
n+1
n
x
=x
f (x n )
x
requires Jacobian JF = [f /x].
Parameter Estimation, Data Assimilation, Sensitivity
Analysis, Inverse Problem, ......

Automatic Differentiation
Introductions

Why Do we Need Derivatives?


Optimization via gradient method.
Unconstrained Optimization minimize y = f (x) requires
gradient or hessian.
Constrained Optimization minimize y = f (x) such that
c(x) = 0 also requires Jacobian Jc(x) = [cj /xi ].

Solution of Nonlinear Equations f (x) = 0 by Newton


Method


f (x n ) 1
n+1
n
x
=x
f (x n )
x
requires Jacobian JF = [f /x].
Parameter Estimation, Data Assimilation, Sensitivity
Analysis, Inverse Problem, ......

Automatic Differentiation
Introductions

Why Do we Need Derivatives?


Optimization via gradient method.
Unconstrained Optimization minimize y = f (x) requires
gradient or hessian.
Constrained Optimization minimize y = f (x) such that
c(x) = 0 also requires Jacobian Jc(x) = [cj /xi ].

Solution of Nonlinear Equations f (x) = 0 by Newton


Method


f (x n ) 1
n+1
n
x
=x
f (x n )
x
requires Jacobian JF = [f /x].
Parameter Estimation, Data Assimilation, Sensitivity
Analysis, Inverse Problem, ......

Automatic Differentiation
Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of the


derivative results;
Computational Cost: the amount of runtime and memory
required for the derivative code;
Development Time: the time it takes to design,
implement, and verify the derivative code, beyond the time
to implement the code for the computation of underlying
function.

Automatic Differentiation
Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of the


derivative results;
Computational Cost: the amount of runtime and memory
required for the derivative code;
Development Time: the time it takes to design,
implement, and verify the derivative code, beyond the time
to implement the code for the computation of underlying
function.

Automatic Differentiation
Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of the


derivative results;
Computational Cost: the amount of runtime and memory
required for the derivative code;
Development Time: the time it takes to design,
implement, and verify the derivative code, beyond the time
to implement the code for the computation of underlying
function.

Automatic Differentiation
Introductions

Main Approaches

Hand Coding
Divided Differences
Symbolic Differentiation
Automatic Differentiation

Automatic Differentiation
Introductions

Hand Coding

An analytic expression for the derivative is identified first and


then implemented by hand using any high-level programming
language.
Advantages
Accuracy up to machine precision, if care is taken.
Highly-optimized implementation depending on the skill of
the implementer.

Disadvantages
Only applicable for "simple" functions and error-prone.
Requires considerable human effort.

Automatic Differentiation
Introductions

Hand Coding

An analytic expression for the derivative is identified first and


then implemented by hand using any high-level programming
language.
Advantages
Accuracy up to machine precision, if care is taken.
Highly-optimized implementation depending on the skill of
the implementer.

Disadvantages
Only applicable for "simple" functions and error-prone.
Requires considerable human effort.

Automatic Differentiation
Introductions

Hand Coding

An analytic expression for the derivative is identified first and


then implemented by hand using any high-level programming
language.
Advantages
Accuracy up to machine precision, if care is taken.
Highly-optimized implementation depending on the skill of
the implementer.

Disadvantages
Only applicable for "simple" functions and error-prone.
Requires considerable human effort.

Automatic Differentiation
Introductions

Hand Coding

An analytic expression for the derivative is identified first and


then implemented by hand using any high-level programming
language.
Advantages
Accuracy up to machine precision, if care is taken.
Highly-optimized implementation depending on the skill of
the implementer.

Disadvantages
Only applicable for "simple" functions and error-prone.
Requires considerable human effort.

Automatic Differentiation
Introductions

Hand Coding

An analytic expression for the derivative is identified first and


then implemented by hand using any high-level programming
language.
Advantages
Accuracy up to machine precision, if care is taken.
Highly-optimized implementation depending on the skill of
the implementer.

Disadvantages
Only applicable for "simple" functions and error-prone.
Requires considerable human effort.

Automatic Differentiation
Introductions

Hand Coding

An analytic expression for the derivative is identified first and


then implemented by hand using any high-level programming
language.
Advantages
Accuracy up to machine precision, if care is taken.
Highly-optimized implementation depending on the skill of
the implementer.

Disadvantages
Only applicable for "simple" functions and error-prone.
Requires considerable human effort.

Automatic Differentiation
Introductions

Divided Differences

Approximate the derivative of a function f w.r.t the ith


component of x at a particular point x0 by difference
numerically, e.g

f (x)
f (x0 + hei ) f (x0 )


xi x0
h
where ei is the ith Cartesian unit vector.

Automatic Differentiation
Introductions

Divided Differences(Ctd.)


f (x)
f (x0 + hei ) f (x0 )

xi x0
h
Advantage:
only f is needed, easy to be implemented, used as a "black
box"
easy to parallelize

Disadvantage:
Accuracy hard to assess, depending on the choice of h
Computational complexity bounded below: (n + 1) cost(f )

Automatic Differentiation
Introductions

Divided Differences(Ctd.)


f (x)
f (x0 + hei ) f (x0 )

xi x0
h
Advantage:
only f is needed, easy to be implemented, used as a "black
box"
easy to parallelize

Disadvantage:
Accuracy hard to assess, depending on the choice of h
Computational complexity bounded below: (n + 1) cost(f )

Automatic Differentiation
Introductions

Divided Differences(Ctd.)


f (x)
f (x0 + hei ) f (x0 )

xi x0
h
Advantage:
only f is needed, easy to be implemented, used as a "black
box"
easy to parallelize

Disadvantage:
Accuracy hard to assess, depending on the choice of h
Computational complexity bounded below: (n + 1) cost(f )

Automatic Differentiation
Introductions

Divided Differences(Ctd.)


f (x)
f (x0 + hei ) f (x0 )

xi x0
h
Advantage:
only f is needed, easy to be implemented, used as a "black
box"
easy to parallelize

Disadvantage:
Accuracy hard to assess, depending on the choice of h
Computational complexity bounded below: (n + 1) cost(f )

Automatic Differentiation
Introductions

Divided Differences(Ctd.)


f (x)
f (x0 + hei ) f (x0 )

xi x0
h
Advantage:
only f is needed, easy to be implemented, used as a "black
box"
easy to parallelize

Disadvantage:
Accuracy hard to assess, depending on the choice of h
Computational complexity bounded below: (n + 1) cost(f )

Automatic Differentiation
Introductions

Divided Differences(Ctd.)


f (x)
f (x0 + hei ) f (x0 )

xi x0
h
Advantage:
only f is needed, easy to be implemented, used as a "black
box"
easy to parallelize

Disadvantage:
Accuracy hard to assess, depending on the choice of h
Computational complexity bounded below: (n + 1) cost(f )

Automatic Differentiation
Introductions

Symbolic Differentiation

Find an explicit derivative expression by computer algebra


systems.
Disadvantages:
The length of the representation of the resulting derivative
expressions increases rapidly with the number, n, of
independent variables;
Inefficient in terms of computing time due to the rapid
growth of the underlying expressions;
Unable to deal with constructs such as branches, loops, or
subroutines that are inherent in computer codes.

Automatic Differentiation
Introductions

Automatic Differentiation

What is Automatic Differentiation?


Algorithmic, or automatic, differentiation (AD) is concerned
with the accurate and efficient evaluation of derivatives for
functions defined by computer programs. No truncation
errors are incurred, and the resulting numerical derivative
values can be used for all scientific computations that are
based on linear, quadratic, or even higher order
approximations to nonlinear scalar or vector functions.

Automatic Differentiation
Introductions

Automatic Differentiation (Cont.)

Whats the idea behind Automatic Differentiation?


Automatic differentiation techniques rely on the fact that
every function no matter how complicated is executed on a
computer as a (potentially very long) sequence of
elementary operations such as additions, multiplications,
and elementary functions such as sin and cos. By
repeated application of the chain rule of derivative calculus
to the composition of those elementary operations, one
can computes in a completely mechanical fashion.

Automatic Differentiation
Introductions

How good AD is?

Reliability
Accurate to machine precision, no truncation error exists.
Computational Cost
Forward Mode: 2 3n cost(f )
Reverse Mode: 5 cost(f )
Human Effort
Spend less time in preparing a code for differentiation, in
particular in situations where computer models are bound
to change frequently.

Automatic Differentiation
Introductions

How widely is AD used?


Sensitivity Analysis of a Mesoscale Weather Model
Application Area: Climate Modeling
Data assimilation for ocean circulation
Application Area: Oceanography
Intensity Modulated Radiation Therapy
Application Area: Biomedicine
Multidisciplinary Design of Aircraft
Application Area: Computational Fluid Dynamics
The NEOS server
Application Area: Optimization
......
Source: http://www.autodiff.org/?module=Applications&submenu=& category=all

Automatic Differentiation
Forward and Reverse Mode

AD methods : SimpleExample

Automatic Differentiation
Forward and Reverse Mode

SimpleExample
Unify all the variable..

Automatic Differentiation
Forward and Reverse Mode

Forward method
Forward method
Differentiate the Code:
ui = xi i = 1, ...n,
ui = ({uj }j<i ) i = n + 1, ..., N
Differentiate:
ui = ei i = 1, ..., n
ui =

X
j<i

ci,j uj i = n + 1, ..., N

Automatic Differentiation
Forward and Reverse Mode

Reverse method
Reverse method
Compute the Adjoint of the Code
uj =

y
(y1 , y2 , ..ym )
=
uj
uj

Compute for dependent variables


u n+p+j =

(y1 , y2 , ..ym )
= ej j = 1, ..., m
uj

Compute for intermediates and independents uj , j = n + p, ..., 1


uj =

X
y
=
ui ci,j
uj
i>j

Automatic Differentiation
Forward and Reverse Mode
Forward methods

Forward methods

Forward method
Method : Compute the gradient of each variable, and use
the chain rule to pass the gradient
The size of computed object: In each computation, it
computes the vectors with input size n.
The computation of gradient of each variable proceeds
with the computation of each variable
Easily implement

Automatic Differentiation
Forward and Reverse Mode
Forward methods

Forward methods
Computing Variable Value Computing Gradient Value

Automatic Differentiation
Forward and Reverse Mode
Reverse methods

Reverse methods

Reverse method
Method : Compute Adjoint of each variable, pass the
Adjoint
The size of computed object: In each computation, it
computes the vectors with output size m. (Note,usually the
output size is 1 in optimization application.)
The computation of Adjoint of each variable proceed after
the completion of the computation of all variables.

Automatic Differentiation
Forward and Reverse Mode
Reverse methods

Reverse methods

Reverse method
Traverse through the Computational Graph reversely and
get the parents of each variable so as to compute the
Adjoint.
Obtain the gradient by compute each partial deriviate one
by one
Harder to implement

Automatic Differentiation
Forward and Reverse Mode
Reverse methods

Reverse methods
Computing Variable Value Computing Adjoint Value

Automatic Differentiation
Forward and Reverse Mode
Reverse methods

Implementation of Reverse mode

Implementation of Reverse mode


As mentioned above, the implementation in Forward mode
is relatively straightforward. We only propose the
comparison of important feature between Source
Transformation and Operator Overloading:
Using Source Transformation: Re-ordering the code upside
down
Using Operator Overloading: Record computation on a
"tape"

Automatic Differentiation
Forward and Reverse Mode
Reverse methods

Implementation of Reverse mode


Re-ordering the code upside down:

Automatic Differentiation
Forward and Reverse Mode
Reverse methods

Implementation of Reverse mode

Record computation on a "tape"


Record:Operation,operands
Related technique: Checkpointing
If the number of operations going large, Checkpointing
prevent the program from exhausting all the memory

Automatic Differentiation
Forward and Reverse Mode
Comparison

Comparison

The following topic is discussed in the comparison


between Forward mode and backward mode
Computational Complexity
Memory Required
Time to develop

Automatic Differentiation
Forward and Reverse Mode
Comparison

Cost of Forward Propagation of Derivs.




N|c|=1 : No. of unit local derivatives ci,j = 1


N|c|6=1 : No. of nonunit local derivatives ci,j 6= 0, 1
Solve for derivatives in forward order 5un+1 , 5un+2 , . . . , 5uN
X
5ui =
ci,j 5uj , i = n + 1, . . . , N,

Define

ji

with each 5ui = (ui /x1 , . . . , ui /xn ), a length n vector.


Flop count flops(fwd) given by,
flops(fwd)

= nN|c|6=1
(mults.ci,j 5uj , ci,j 6= 1, 0)
+n(N|c|6=1 + N|c|=1 ) (adds./subs. + ci,j 5 uj )
n(p + m)
(first n adds./subs.)

flops(fwd) = n(2N|c|6=1 + N|c|=1 p m)

Automatic Differentiation
Forward and Reverse Mode
Comparison

Cost of Reverse Propagation of Adjoints

n+p , u
n+p1 , . . . , u
1
Solve for adjoints in reverse order u
j =
u

i ci,j .
u

ij

j =
with u

uj (y1 , y2 , . . . , ym )

is a length m vector.

Flop count flops(rev ) given by,


flops(rev )

= mN|c|6=1
(mults.ui ci,j , ci,j 6= 1, 0)
= +m(N|c|=1 + N|c|6=1 ) (adds./subs. + (ui ci,j ))
flops(rev ) = m(2N|c|6=1 + N|c|=1 ).

Automatic Differentiation
Forward and Reverse Mode
Comparison

Memory Required

Used Storage:
Its uncertain that which mode takes more memory,
usually, reverse mode takes more.
The cost of memory for Forward mode is from:
Storing size (1) in each variable
Storing input size n in each gradient variable
The cost of memory for Reverse mode is from:
Storing size (1) in each variable
Storing output size m in each Adjoint variable
Storing DAG(directed acyclic graph,which present the
function)

Automatic Differentiation
Forward and Reverse Mode
Comparison

Memory Required

Its more likely to have less memory used while using


forward mode:
1.If there exists reused variable in original function
2.If n is so large that Reverse requires lots of memory to
store DAG.
Its more likely to have less memory used while using
reverse mode:
1.If n is relatively large, so the storage required for storing
gradient is more than storing Adjoint

Automatic Differentiation
Forward and Reverse Mode
Comparison

Time to develop

Time to develop: Usually, its hard to develop Reverse


code than Forward one, especially using Source
Transformation technique.

Automatic Differentiation
Forward and Reverse Mode
Comparison

Time to develop

Conclusion:
Using Forward mode when n  m, such as optimization
Using Reverse mode when m  n, such as Sensitivity
Analysis

Automatic Differentiation
Forward and Reverse Mode
Extended knowledge

Extended knowledge

Directional Derivatives
Forward mode:
seed d = (d1 , ...dn )T
seeding xi = di
calculates Jf d
Multi-directional derivatives : replace d by D,where
D = [dij ]i=1,..n,j=1,..q

Automatic Differentiation
Forward and Reverse Mode
Extended knowledge

Extended knowledge

Directional Adjoints
Reverse mode:
seed v = (v1 , ...vm )
seeding y j = vj
calculates v Jf
Multi-directional Adjoint : replace v by V,where
V = [vij ]i=1,..q,j=1,..m

Automatic Differentiation
Forward and Reverse Mode
Case Study

Case Study

Using FADBAD++:
FADBAD++ were developed by Ole Stauning and Claus
Bendtsen.
Flexible automatic differentiation using templates and
operator overloading in ANSI C++
Only with source code, no additional library required.
Free to use

Automatic Differentiation
Forward and Reverse Mode
Case Study

Case Study

Using FADBAD++:
Q
Test function : f (x) = xi
Objective: Testing different coding of the function in
Forward mode, try to reuse the variable
Result : Basically, no matter how you code,the memory
cost as much as n n 8byte , no different between reuse
variable or not

Automatic Differentiation
Forward and Reverse Mode
Case Study

Case Study

Using FADBAD++:
Q
Test function : f (x) = xi
Objective: Testing Reverse mode
Result : test until n = 6500 , Using Forward mode out of
memory. Reverse is 127 times faster, and only take few
MB.
Remark : Couldnt see how the DAG take the memory
from using reverse mode, its more likely to observe by
using fewer independent variables but more complicated
function.

Automatic Differentiation
Complexity Analysis

Code List
Code-List given by re-writing the code into elemental binary
and unary operations/functions, e.g.
 


log 2 (x1 x2 ) + x2 x32 a x2
y1
p
=
y2
b log(x1 x2 ) + x2 /x3 x2 x32 + a
v1
v2
v3
v4
v5
v6

= x1
= x2
= x3
= v1 v2
= log(v4 )
= v32

v7 = v6 v2
v8 = v7 a
v9 = 1/v3
v10 = v2 v9
v11 = b v5
v12 = v11 + v10

v13
v14
v15
v16
v17

= v8 v2
= v52

= v12
= v14 + v13
= v15 v8

Automatic Differentiation
Complexity Analysis

Code-list (ctd.)
Assume code-list contains
N addition/substractions e.g v14 + v13
N multiplications e.g. v1 v2
Nf nonlinear functions/operations e.g. log(v4 ), 1/v3
Total of p + m = N + N + Nf statements

Then
Each addition/subtraction generates two ci,j = 1
Each multiplication generates two ci,j 6= 1, 0
Each nonlinear function generates one ci,j 6= 1, 0 requiring
one nonlinear function evaluation e.g. v5 = log(v4 ) gives
c5,4 = 1/v4 .

So we have,
N|c|=1 = 2N
N|c|6=1 = 2N + 1Nf

Automatic Differentiation
Complexity Analysis
Forward Mode Complexity

Complexity of Forward Mode


flops(Jf ) = flops(f ) + flops(ci,j ) + flops(fwd)
Assume flops(nonlinear function) = w, w > 1.
Cost of evaluation function is,
flops(f ) = N + N + wNf
Cost of evaluation local derivatives ci,j is,
flops(ci,j ) = wNf .
Cost of forward propagation of derivatives is
flops(fwd) = n(2N|c|6=1 + N|c|=1 p m)
= n(3N + N + Nf )

Automatic Differentiation
Complexity Analysis
Forward Mode Complexity

Complexity of Forward Mode (Ctd.)


Then for forward mode
flops(Jf )
flops(f )

= 1+

wNf +n(3N +N +Nf )


N +N +wNf

b + nN
b + n( 1 + 1 )wN
df
= 1 + 3nN
w
n
where,
b, N
b , wN
df ) =
(N

(N , N , wNf )
.
N + N + wNf

b + N
b + wN
d f = 1 and all coefficients positive,
SinceN
flops(Jf )
1
1
1 + n max(3, 1, ( + )) = 1 + 3n.
flops(f )
w
n
n << m, Forward Mode preferred.

Automatic Differentiation
Complexity Analysis
Reverse Mode Complexity

Complexity of Reverse Mode

flops(rev ) = m(4N + 2N + 2Nf ),


giving,
flops(Jf )
flops(f )

b + 2mN
b + m( 2 +
= 1 + 4mN
w

1 d
m )wN f

and
flops(Jf )
2
1
1 + m max(4, 2, ( + )) = 1 + 4m
flops(f )
w
m
For m = 1
flops(5f ) 5flops(f )

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Differentiation Arithmetic

u = (u, u 0 ),
where u denotes the value of the function u: R R evaluated
at the point x0 , and where u 0 denotes the value u 0 (x0 ).

u + v

u v

u v

u v

= (u + v , u 0 + v 0 )
= (u v , u 0 v 0 )
= (uv , uv 0 + u 0 v )
= (u/v , u 0 (u/v )v 0 /v )
= (x, 1)
= (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Differentiation Arithmetic

u = (u, u 0 ),
where u denotes the value of the function u: R R evaluated
at the point x0 , and where u 0 denotes the value u 0 (x0 ).

u + v

u v

u v

u v

= (u + v , u 0 + v 0 )
= (u v , u 0 v 0 )
= (uv , uv 0 + u 0 v )
= (u/v , u 0 (u/v )v 0 /v )
= (x, 1)
= (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Differentiation Arithmetic

u = (u, u 0 ),
where u denotes the value of the function u: R R evaluated
at the point x0 , and where u 0 denotes the value u 0 (x0 ).

u + v

u v

u v

u v

= (u + v , u 0 + v 0 )
= (u v , u 0 v 0 )
= (uv , uv 0 + u 0 v )
= (u/v , u 0 (u/v )v 0 /v )
= (x, 1)
= (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example of a Rational Function


f (x) = (x+1)(x2)
x+3
f (3) = 2/3, f 0 (3) =?


((x, 1) + (1, 0)) ((x, 1) (2, 0))
( x + 1 )( x 2 )
=
=

((x, 1) + (3, 0))


(x + 3)

Inserting the value x = (3, 1) into f produces


f (x )

f (3, 1)

=
=
=

((3, 1) + (1, 0)) ((3, 1) (2, 0))


((3, 1) + (3, 0))
(4, 1) (1, 1)
(6, 1)


(4, 5)
2 13
=
,
(6, 1)
3 18

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example of a Rational Function


f (x) = (x+1)(x2)
x+3
f (3) = 2/3, f 0 (3) =?


((x, 1) + (1, 0)) ((x, 1) (2, 0))
( x + 1 )( x 2 )
=
=

((x, 1) + (3, 0))


(x + 3)

Inserting the value x = (3, 1) into f produces


f (x )

f (3, 1)

=
=
=

((3, 1) + (1, 0)) ((3, 1) (2, 0))


((3, 1) + (3, 0))
(4, 1) (1, 1)
(6, 1)


(4, 5)
2 13
=
,
(6, 1)
3 18

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example of a Rational Function


f (x) = (x+1)(x2)
x+3
f (3) = 2/3, f 0 (3) =?


((x, 1) + (1, 0)) ((x, 1) (2, 0))
( x + 1 )( x 2 )
=
=

((x, 1) + (3, 0))


(x + 3)

Inserting the value x = (3, 1) into f produces


f (x )

f (3, 1)

=
=
=

((3, 1) + (1, 0)) ((3, 1) (2, 0))


((3, 1) + (3, 0))
(4, 1) (1, 1)
(6, 1)


(4, 5)
2 13
=
,
(6, 1)
3 18

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Derivatives of Element Functions


Chain Rule:
(g u)0 (x) = u 0 (x)(g 0 u)(x)

g ( u ) = g ((u, u 0 )) = (g(u), u 0 g 0 (u))

sin u = sin(u, u 0 ) = (sin u, u 0 cos u)

cos u = cos(u, u 0 ) = (cos u, u 0 sin u)

e u = e(u,u ) = (eu , u 0 eu )
..
.

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Derivatives of Element Functions


Chain Rule:
(g u)0 (x) = u 0 (x)(g 0 u)(x)

g ( u ) = g ((u, u 0 )) = (g(u), u 0 g 0 (u))

sin u = sin(u, u 0 ) = (sin u, u 0 cos u)

cos u = cos(u, u 0 ) = (cos u, u 0 sin u)

e u = e(u,u ) = (eu , u 0 eu )
..
.

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Derivatives of Element Functions


Chain Rule:
(g u)0 (x) = u 0 (x)(g 0 u)(x)

g ( u ) = g ((u, u 0 )) = (g(u), u 0 g 0 (u))

sin u = sin(u, u 0 ) = (sin u, u 0 cos u)

cos u = cos(u, u 0 ) = (cos u, u 0 sin u)

e u = e(u,u ) = (eu , u 0 eu )
..
.

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example of Sin

From ../Intlab/gradient/@gradient/sin.m

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.


f (x) = (1 + x + ex ) sin x

f ( x ) = ( 1 + x + e x )sin x



f (0, 1) =
(1, 0) + (0, 1) + e(0,1) sin(0, 1)


=
(1, 1) + (e0 , e0 ) (sin 0, cos 0)
= (2, 2)(0, 1) = (0, 2).

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.


f (x) = (1 + x + ex ) sin x

f ( x ) = ( 1 + x + e x )sin x



f (0, 1) =
(1, 0) + (0, 1) + e(0,1) sin(0, 1)


=
(1, 1) + (e0 , e0 ) (sin 0, cos 0)
= (2, 2)(0, 1) = (0, 2).

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.


f (x) = (1 + x + ex ) sin x

f ( x ) = ( 1 + x + e x )sin x



f (0, 1) =
(1, 0) + (0, 1) + e(0,1) sin(0, 1)


=
(1, 1) + (e0 , e0 ) (sin 0, cos 0)
= (2, 2)(0, 1) = (0, 2).

Automatic Differentiation
AD Softwares
AD tools in MATLAB

High-order Derivatives

u = (u, u 0 , u 00 ),

u + v

u v

u v

u v

= (u + v , u 0 + v 0 , u 00 + v 00 )
= (u v , u 0 v 0 , u 00 v 00 )
= (uv , uv 0 + u 0 v , uv 00 + 2u 0 v 0 + u 00 v 0 )
= (u/v , u 0 (u/v )v 0 /v , (u 00 2(u/v )0 v 0 (u/v )v 00 )/v )

Automatic Differentiation
AD Softwares
AD tools in MATLAB

INTLab

Developers: Institute for Reliable Computing, Hamburg


University of Technology
Mode: Forward
Method: Operator overloading
Language: MATLAB
URL: http://www.ti3.tu-harburg.de/rump/intlab/
Licensing: Open Source

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Rosenbrock Function
y 1 = 400x1 (x12 x2 ) + 2(x1 1)
y 2 = 200(x12 x2 )

Automatic Differentiation
AD Softwares
AD tools in MATLAB

One Step of Newton Method with INTLab

Automatic Differentiation
AD Softwares
AD tools in MATLAB

TOMLAB/MAD

Developers: Marcus M. Edvall and Kenneth Holmstrom,


Tomlab Optimization Inc. (TOMLAB /MAD
integration)
Shaun A. Forth and Robert Ketzscher, Cranfield
University (MAD)
Mode: Forward
Method: Operator overloading
Language: MATLAB
URL: http://tomlab.biz/products/mad/
Licensing: License

Automatic Differentiation
AD Softwares
AD tools in MATLAB

One Step of Newton Method with MAD

Automatic Differentiation
AD Softwares
AD tools in MATLAB

ADiMat

Developers: Andre Vehreschild, Institute for Scientific


Computing, RWTH Aachen University
Mode: Forward
Method: Source transformation
Operator overloading
Language: MATLAB
URL: http://www.sc.rwthaachen.de/vehreschild/adimat.html
Licensing: under discussion

Automatic Differentiation
AD Softwares
AD tools in MATLAB

ADiMats Example

function [result1, result2]= f(x)


% Compute the sin and square-root of x*2.
% Very simple example for ADiMat website.
% Andre Vehreschild, Institute for
% Scientific Computing,
% RWTH Aachen University, D-52056 Aachen,
% Germany.
% [email protected]
result1= sin(x);
result2= sqrt(x*2);
Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

Automatic Differentiation
AD Softwares
AD tools in MATLAB

ADiMats Example (cont.)

>>
>>
>>
>>
>>
>>

addiff(@f, x, result1,result2);
p=magic(5);
g_p=createFullGradients(p);
[g_r1, r1, g_r2, r2]= g_f(g_p, p);
J1= [g_r1{:}]; % and
J2= [g_r2{:}];

Source: http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

Automatic Differentiation
AD Softwares
AD tools in MATLAB

ADiMats Example (cont.)

function [g_result1, result1, g_result2, result2] = g_f(


% Compute the sin and square-root of x*2.
% Very simple example for ADiMat website.
% Andre Vehreschild, Institute for Scientific Computi
% RWTH Aachen University, D-52056 Aachen, Germany.
% [email protected]
g_result1= ((g_x).* cos(x));
result1= sin(x);
g_tmp_f_00000= g_x* 2;
tmp_f_00000= x* 2;
g_result2= ((g_tmp_f_00000)./ (2.*
sqrt(tmp_f_00000)));
result2= sqrt(tmp_f_00000);
Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Matrix Calculus

Definition: If X is p q and Y is m n, then dY: = dY/dX dX:


where the derivative dY/dX is a large mn pq matrix.
d(X 2 ) : = (XdX + dXX ) :
d(det(X )) = d(det(X T )) = det(X )(X T ) :T dX :
d(ln(det(X ))) = (X T ) :T dX :
Ref: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Vandermonde Function

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Vandermonde Function (cont.)

Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5


Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Vandermonde Function (cont.)

Method
Function
MAD(Full)
MAD(Sparse)
INTLab
ADiMat

10
0.000
0.070
0.071
0.050
0.231

20
0.000
0.060
0.050
0.040
0.140

40
0.000
0.070
0.060
0.040
0.271

80
0.000
0.130
0.060
0.090
0.601

160
0.000
0.581
0.060
0.040
1.362

320
0.010
2.664
0.070
0.050
3.044

640
0.000
10.535
0.100
0.071
7.340

1280
0.000
45.535
0.881
0.120
21.611

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)
Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Arrowhead Function

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Arrowhead Function (cont.)

Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5


Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Arrowhead Function (cont.)

Method
Function
MAD(Full)
MAD(Sparse)
INTLab
ADiMat

20
0.010
0.180
0.060
0.090
0.911

40
0.000
0.050
0.060
0.051
0.311

80
0.000
0.070
0.060
0.050
0.651

160
0.000
0.200
0.070
0.050
1.262

320
0.000
1.111
0.080
0.081
2.704

640
0.000
4.367
0.100
0.140
6.028

1280
0.000
17.796
0.160
0.340
14.581

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)
Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

Automatic Differentiation
AD Softwares
AD tools in MATLAB

BDQRTIC mod

Automatic Differentiation
AD Softwares
AD tools in MATLAB

BDQRTIC mod (cont.)

Method
Function
MAD(Full)
MAD(Sparse)
INTLab
ADiMat

20
12.809
2.604
0.270
2.293
3.455

40
0.010
0.121
0.120
0.080
0.621

80
0.000
0.150
0.130
0.100
1.152

160
0.000
0.490
0.150
0.110
2.544

320
0.000
2.513
0.201
0.150
5.778

640
0.010
10.926
0.260
0.230
14.641

1280
0.000
43.162
0.371
0.481
42.671

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)
Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

Automatic Differentiation
AD Softwares
AD tools in MATLAB

Summary of AD softwares in MATLab

Operator overloading method for AD forward mode is easy


to implement by differentiation arithmetic.
All of AD tools in Matlab are easy to use.
Sparse storage provides a good way to improve the
performance of AD tools.

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

The Computational Differentiation Group at


Argonne National Laboratory

ADIC introduced in 1997 by:

Chrirtian Bischof
Scientific Computing at
RWTH Aachen University

Lucas Roh
founder, president and
CEO of Hostway Co.
and the other team members.

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

State of ADIS
ADIC is an Automatic Differentiation tools In ANSI C/C++.
ADIC was introduced in 1966.
Last updated: June 10, 2005.
Official web site www-new.mcs.anl.gov/adic/down-2.htm.
ADIC is using forward method.
Supported Platforms: Unix/Linux.
Selected Application: NEOS
Related Research Group: Argonne National Laboratory,
USA

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

ADICAnatomy

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

ADICProcess

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

func.c
Untitled
#include "func.h"
#include <math.h>
void func(data_t * pdata)
{
int i;
double *x = pdata->x;
double *y = pdata->y;
double s, temp;
i=0;
for (;i < pdata->len ;){
s = s + x[i]*y[i];
i++;
}
temp = exp(s);
pdata->r = temp;
}

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

driver.c

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

Commands

The first command generates the header file ad_deriv.h


and derivative function func.ad.c;
The second command compiles and links all needed
functions and generates ad_func;

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation
AD Softwares
AD in C/C++ (ADIC)

For Further Reading in ADIC

Christian H. Bischof, Paul D. Hovland, Boyana Norris


Implementation of Automatic Differentiation Tools.
PEPM 02, Jan. 1415, 2002 Portland, OR, USA
Paul D. Hovlan and Boyana Norris
Users Guide to ADIC 1.1.
Users Guide to ADIC 1.1
C. H. Bischof, L. Roh, A. J. Mauer-Oats
ADIC: an extensible automatic differentiation tool for
ANSI-C.
Mathematics and Computer Science Division, Argonne
National Laboratory, Argonne, IL 60439, USA

Automatic Differentiation
Reference

Reference
C.H. Bischof and H. M. Bucker. Computing Derivatives of Computer Programs, in Modern Methods and
Algorithms of Quantum Chemistry: Proceedings, Second Edition, edited by J. Grotendorst,
NIC-Directors,2000, pages 315-327
C. Bischof, A. Carle, P. Khademi, and G. Pusch. Automatic Differentiation: Obtaining Fast and Reliable
Derivatives-Fast, in Control Problems in Industry, edited by I. Lasiecka and B. Morton,1995, pages 1-16
Andreas Griewank. On Automatic Differentiation, in Mathematical Programming: Recent Developments and
Applications, edited by M. Iri and K. Tanabe, Kluwer Academic Publishers, 1989.
Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.
Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, Penn., 2000.
Shaun Forth. Introduction to Automatic Differentiation, presentation slide for The 4th International
Conference on Automatic Differentiation. July 19-23 University of Chicago, Gleacher Centre, Chicago USA,
2004.
G. F. Corliss, Automatic Differentiation.
Warwick Tucker, http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf
http://www.autodiff.org/
http://www.ti3.tu-harburg.de/rump/intlab/
http://tomopt.com/tomlab/products/mad/
http://www.sc.rwth-aachen.de/vehreschild/adimat/index.html
Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in
MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195 222
Siegfried M. Rump INTLAB INTerval LABoratory Developments in Reliable Computing, Kluwer
Academic Publishers, 1999, p77 104
Christian H. Bischof, H. Martin Bucker, Bruno Lang, A. Rasch, Andre Vehreschild Combining Source
Transformation and Operator Overloading Techniques to Compute Derivatives for MATLAB Programs
Conference proceeding, Proceedings of the Second IEEE International Workshop on Source Code Analysis
and Manipulation (SCAM 2002), IEEE Computer Society, 2002

Automatic Differentiation
Thanks & Questions

Thanks!
Questions?

You might also like