Automatic Differentiation Lecture Slides
Automatic Differentiation Lecture Slides
Computational Differentiation
WS 16/17
Uwe Naumann
LuFG Informatik 12: Software and Tools for
Computational Engineering, RWTH Aachen
static program
STCE
static program
void my pow(double z 7, double& X123) {
X123=z 7z 7;
X123=X123X123;
}
???
,
Computational Differentiation, WS 16/17
I do ...
STCE
,
Computational Differentiation, WS 16/17
STCE
Motivation
Who knows how to differentiate ...
I
dynamic expression
n1
X
y=
2
x2i
i=0
I
dynamic program
void f(const int n,
const double const x, double& y) {
y=0;
for (int i=0;i<n;i++)
y+=x[i]x[i];
y=y;
}
???
... you will!
,
Computational Differentiation, WS 16/17
STCE
Aim
1 counter
,
Computational Differentiation, WS 16/17
Outline I
STCE
,
Computational Differentiation, WS 16/17
Formalities
STCE
Lecture on Monday
,
Computational Differentiation, WS 16/17
STCE
,
Computational Differentiation, WS 16/17
Outline
STCE
,
Computational Differentiation, WS 16/17
STCE
,
Computational Differentiation, WS 16/17
10
STCE
,
Computational Differentiation, WS 16/17
11
STCE
Diffusion
We aim to analyze sensitivities of the predicted T (x, c(x)) with respect to c(x)
or even calibrate the uncertain c(x) to given observations O(x) for T (x, c(x))
at time t = 1 by solving the (possibly constrained) least squares problem
Z
2
min f (c(x), x) where f
T (1, x, c(x)) O(x) dx.
c(x)
IRn
dc
dT (1)
dc
and possibly
d2 f
IRnn
dc2
,
Computational Differentiation, WS 16/17
12
STCE
...
For differentiation, is there anything else?
Perturbing the inputs cant imagine this fails.
I pick a small Epsilon, and I wonder ...
...
from: Optimality (Lyrics: Naumann; Music: Think of Fools Gardens Lemon
Tree) in Naumann: The Art of Differentiating Computer Programs. An Introduction
to Algorithmic Differentiation. Number 24 in Software , Environments, and Tools,
SIAM, 2012. Page xvii
,
Computational Differentiation, WS 16/17
13
STCE
Diffusion
y=f(x)
central FD
15.2
adjoint AD
0.7
d2 f
dc2 (n=100, m=50)
run time (s)
???
central FD
63.6
adjoint AD
3.9
x+h
,
Computational Differentiation, WS 16/17
14
STCE
Nice to have?
,
Computational Differentiation, WS 16/17
15
STCE
optimization
result
dJ
solver (gradient descent) n+1 = n d
n
dJ n
requires dn computed by dco/c++ / AMPI
,
Computational Differentiation, WS 16/17
16
STCE
The J
ulich-Aachen Dynamic Optimization Environment (JADE) targets DAEO
y d (t) = f (yd (t), ya (t), t, p), for given yd (0)
ya (t) = argminya IRna h(yd (t), ya , t, p)
s.t.
,
Computational Differentiation, WS 16/17
17
STCE
recovery of atmospheric
state in upper troposphere
/ lower stratosphere for
given radiance measurements along line-of-sight
primal model yi = g(x, i ) with simulated radiances y IRm , atmospheric
parameters x IRn , and line-of-sight elevations IRm
residual IRm 3 F = (oi g(x, i ))i=1...m for measured radiances o IRm
objective G (x, ) = F T S1 F + (x xa )T Sa1 (x xa ) with measurement error
correlation matrix S IRmm , typical atmospheric state values from historical data
xa IRn , and (Tikhonov) regularization matrix Sa IRnn
Gauss-Newton solver requires x F computed by dco/c++
[7]: A 3-D tomographic retrieval approach with advection compensation for the air-borne limb-imager
GLORIA, Atmos. Meas. Tech., 2011.
[3]: A Case Study in Adjoint Sensitivity Analysis of Parameter Calibration, Submitted, 2016.
,
Computational Differentiation, WS 16/17
18
STCE
Sensitivity of local shear stress with respect to geometry (top) and roughness of
sediment (bottom) generated by dco/fortran.
[4]: Reverse engineering of initial and boundary conditions with Telemac and algorithmic differentiation,
Wasserwirtschaft, 2013
,
Computational Differentiation, WS 16/17
19
STCE
[9]: Estimation of Data Assimilation Error: A Shallow-Water Model Study, Monthly Weather Review, 2014
,
Computational Differentiation, WS 16/17
20
STCE
Scenario
I 104 paths
I 360 Euler steps
I 62 uncertain parameters
I pricer takes 1s
Greeks
I first order (dco/c++)
I
I
[?]: Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational
Finance, NAG, 2014.
,
Computational Differentiation, WS 16/17
21
STCE
Original function
Nat. interval ext. underest.
Convex underestimator
Affine underestimator
Upper bound
4
3
Original function
Nat. interval ext. underest.
Convex underestimator
Affine underestimator
Global upper bound
Local upper bound
4
3
500
400
2
2
F(x)
1
1
300
0
0
200
-1
-1
-2
100
-2
-3
-4
-3
-6
-4
-2
4
5
Original function
Nat. interval ext. underest.
Convex underestimator
Affine underestimator
Global upper bound
-6
-2
Original function
Nat. interval ext. underest.
Convex underestimator
Affine underestimator
Global upper bound
-4
800
E[F(x)]+1/2*Var[F(x)]
1
1
600
400
0
0
200
-1
-1
-2
-6
-4
-2
-6
-4
-2
5
x
[1]: Adjoint Mode Computations of Subgradients for McCormick Relaxations, AD Conf., LNCSE, 2012.
M. Beckers: Toward Global Robust Optimization, Ph.D. Thesis, RWTH Aachen, 2014.
22
STCE
For example, nag zero cont func brent locates a simple zero x of a continuous function f
in a given interval [a,b] using Brents method. The adjoint version enables
computation of sensitivities of the solution x wrt. all relevant input parameters
(potentially passed through comm).
void
nag zero cont func brent dco a1s (
dco::a1s::type a, // in: lower bound
dco::a1s::type b, // in: upper bound
dco::a1s::type eps, // in: termination tolerance on x
dco::a1s::type eta, // in: acceptance tolerance for vanishing f(x)
dco::a1s::type (f)(dco::a1s::type x, Nag Comm dco a1s comm), // in: f
dco::a1s::type x, // out: x
Nag Comm dco a1s comm, // inout: parameters
NagError fail // out: error code
);
We develop first- and higher order tangent and adjoint versions of a growing number
of numerical methods using dco/c++, dco/fortran, and hand-coding.
,
Computational Differentiation, WS 16/17
23
STCE
v0
1: 6.89497e+13
[1, 3.6e+04]
[-9.58e+08, 9.58e+08]
y=log(exp(sin(x1))(x1x2))/10+x2/100;
[-1, 1]
v6 =sin(v0)
7: 1.91524e+09
[-1, 1]
[1.04e-11, 9.58e+08]
[1, 3.6e+04]
[0.368, 2.72]
v1
2: 9.57622e+08
[1, 3.6e+04]
[0.01, 2.66e+04]
[1, 3.6e+04]
v7 =exp(v6)
8: 9.57622e+08
[0.368, 2.72]
[2.84e-11, 3.52e+08]
v5 =v0*v1
6: 9.57622e+08
[1, 1.3e+09]
[1.04e-11, 0.739]
[1, 1.3e+09]
v2
3: 359.99
[100, 100]
[-3.6, -0.0001]
[0.01, 0.01]
[-3.6, -0.0001]
v3 =v1/v2
4: 359.99
[0.01, 360]
[1, 1]
[0.368, 2.72]
v8 =v7*v5
9: 9.57622e+08
[0.368, 3.52e+09]
[2.84e-11, 0.272]
[2.84e-10, 2.72]
v9 =log(v8)
10: 2.29825
[-1, 22]
[0.1, 0.1]
v4
5: 2.29825
[10, 10]
[-0.22, 0.01]
[0.1, 0.1]
[1, 1]
[-0.22, 0.01]
v10 =v9/v4
11: 2.29825
[-0.1, 2.2]
[1, 1]
[1, 1]
v11 =v10+v3
12: 362.288
[-0.09, 362]
[1, 1]
[1, 1]
v11 out
13: 362.288
[-0.09, 362]
[1, 1]
,
Computational Differentiation, WS 16/17
24
Recall ...
Basic Mathematical Terminology
continuity
differentiability
Taylor expansion
chain rule
STCE
,
Computational Differentiation, WS 16/17
25
Continuity
STCE
h0,h>0
f (x0 + h) = f (x0 ) .
f is left-continuous at x0 if
lim
h0,h>0
f (x0 h) = f (x0 ) .
,
Computational Differentiation, WS 16/17
26
Continuity
Univariate Scalar Functions: Alternative Formulation
STCE
xx0
,
Computational Differentiation, WS 16/17
27
Continuity
Univariate Scalar Functions: Example
STCE
f (0 h) = f (0) = 0
lim
f (0 + h) = f (0) = 0
h0,h>0
,
Computational Differentiation, WS 16/17
28
Continuity
STCE
xx0
f1
.
m
F =
.. : D IR
fm
,
Computational Differentiation, WS 16/17
29
Differentiability
STCE
h0
f (x0 + h) f (x0 )
h
h0
f (x0 ) f (x0 h)
h
,
Computational Differentiation, WS 16/17
30
Differentiability
Univariate Scalar Function: Example
STCE
h0,h>0
f (0) f (0 h)
0h
= lim
= 1
h0,h>0
h
h
h0,h>0
f (0 + h) f (0)
h
= lim
=1 .
h0,h>0 h
h
The limits are distinct proving that |x| is not differentiable at the origin.
However, |x| is differentiable anywhere else in its domain IR.
,
Computational Differentiation, WS 16/17
31
STCE
,
Computational Differentiation, WS 16/17
32
Differentiability
Univariate Scalar Function: Alternative Formulation
STCE
xx0
r(x)
=0 .
|x x0 |
df
dx (x0 )
,
Computational Differentiation, WS 16/17
33
Differentiability
STCE
xx0
r(x)
=0 .
kx x0 k
df
dx (x0 )
,
Computational Differentiation, WS 16/17
34
STCE
df
(x0 ) .
dxj
fx0 (x0 )
..
IRn
f (x0 )
.
fxn1 (x0 )
,
Computational Differentiation, WS 16/17
35
Gradients
STCE
Example
,
Computational Differentiation, WS 16/17
36
Gradients
Example
STCE
,
Computational Differentiation, WS 16/17
37
Differentiability
STCE
xx0
kr(x)k
=0 .
kx x0 k
dF
dx (x0 )
,
Computational Differentiation, WS 16/17
38
STCE
Jacobians
..
IRmn
F (x0 )
.
,
Computational Differentiation, WS 16/17
39
STCE
Assumption: Differentiability
y=F(x)
x
eps
eps
,
Computational Differentiation, WS 16/17
40
STCE
F00 (x0 )
Fn0 (x0 )
..
.
...
...
F (x )
...
0
F(m1)n
(x0 ) . . .
2
0
Fn1
(x0 )
0
F2n1
(x0 )
IRmnn
..
0
0
Fmn1 (x )
,
Computational Differentiation, WS 16/17
41
Hessians
Example
STCE
,
Computational Differentiation, WS 16/17
42
STCE
Taylor Expansion
h 0
h2
h3
f (x) + f 00 (x) + f 000 (x) + . . .
1!
2!
3!
It follows
f (x h) = f (x) hf 0 (x) + O(h2 )
= f (x)
I
h2
h3
h 0
f (x) + f 00 (x) f 000 (x) + . . .
1!
2!
3!
,
Computational Differentiation, WS 16/17
43
Taylor Expansion
Illustration
int main() {
double x=1;
for (double h=1e1;h>=1e4;h=h/10)
cout << h << \t
<< abs(sin(x+h)(sin(x)+hcos(x)))
<< endl;
cout << endl;
for (double h=1e1;h>=1e4;h=h/10)
cout << h << \t
<< abs(sin(x+h)
(sin(x)+hcos(x)hh/2sin(x)))
<< endl;
cout << endl;
}
STCE
0.1 0.00429385533327507
0.01 4.21632485627078e05
0.001 4.20825507812877e07
0.0001 4.2074449518501e09
0.1 8.65004092356039e05
0.01 8.96993222595979e08
0.001 9.00153702661569e11
0.0001 9.0023962676794e14
,
Computational Differentiation, WS 16/17
44
STCE
be such that both g and h are continuously differentiable over their respective
domains Dg = Ih and Dh = Df . Then f is continuously differentiable over Df
and
df
dg
dg dh
(x ) =
(v ) =
(v )
(x )
dx
dx
dv
dx
for all x Df and v = h(x ).
,
Computational Differentiation, WS 16/17
45
Chain Rule
Multivariate Vector Functions (Standard Formulation)
STCE
be such that both G and H are continuously differentiable over their respective
domains DG = IH and DH = DF . Then F is continuously differentiable over
DF and
dF
dG dH
(x ) =
(z )
(x )
dx
dz
dx
for all x DF and z = H(x ).
Proof follows immediately from product of the two Jacobians.
,
Computational Differentiation, WS 16/17
46
Chain Rule
Multivariate Vector Functions (Generalization)
STCE
G
x
incomplete derivative;
dG
dx
[complete] derivative
,
Computational Differentiation, WS 16/17
47
Chain Rule
Multivariate Vector Functions (Generalization): Proof
STCE
In
0
dH
dG
dF
Ik
=
= 0
du
dv du
G
dG
x
dz
0
In
0 dH
dx
0
0
In
0 0
dH
0 0 =
dx
G
dG
0 0
x + dz
dH
dx
0
0
0
0
0
Hence,
dF
dy dw du
dF
G dG dH
=
= Qm
PnT =
+
dx
dw du dx
du
x
dz dx
where Pn = (In 0) IRn(n+p+m) , and Qm = (0 Im ) IRm(n+p+m) with Ik
denoting the identity matrix in IRkk , k {n, m}, and using appropriate
numbers of zero padding columns.
,
Computational Differentiation, WS 16/17
48
Chain Rule
Complete vs. Incomplete Derivatives
STCE
,
Computational Differentiation, WS 16/17
49
Chain Rule
Complete vs. Incomplete Derivatives: Example
STCE
Let
y = f (x) = g(h1 (x), h2 (x)) = sin(x) cos(x).
By the chain rule the complete [derivative] becomes
df
= cos(x)2 sin(x)2 .
dx
The following 22 = 4 incomplete derivatives exist:
o
f n
0, cos(x)2 , sin(x)2 , cos(x)2 sin(x)2 ,
x
where, admittedly, the vanishing derivative due to assuming no dependence on
x could be considered as obsolete.
,
Computational Differentiation, WS 16/17
50
Chain Rule
Graphical Illustration
STCE
,
Computational Differentiation, WS 16/17
51
Notation
STCE
For F : IRn IRm : y = F (x) we use the following equivalent notations for
total derivatives
dF (x)
F (x)
dx
d2 F (x)
F 00 (x)
2 F (x)
dx2
F 000 (x) . . .
F 0 (x)
dF (x)
xi F (x)
dxi
d2 F (x)
Fxi ,xj (x)
dxi dxj
d2 F (x)
dx2i
i=j
!
xi ,xj F (x)
,
Computational Differentiation, WS 16/17
52
STCE
,
Computational Differentiation, WS 16/17
53
STCE
,
Computational Differentiation, WS 16/17
54
STCE
Algorithm:
1: y = F (x)
2: while kyk > do
3:
A = F 0 (x)
4:
dx = s(y, A)
5:
x x + dx
6:
y = F (x)
7: end while
,
Computational Differentiation, WS 16/17
55
STCE
,
Computational Differentiation, WS 16/17
56
STCE
,
Computational Differentiation, WS 16/17
57
STCE
,
Computational Differentiation, WS 16/17
58
STCE
Algorithm:
1: repeat
2:
(y, g) = f 0 (x)
3:
if kgk > then
4:
1
5:
y y
6:
while y y do
xg
7:
x
8:
y = f (
x)
9:
/2
10:
end while
11:
xx
12:
end if
13: until kgk
,
Computational Differentiation, WS 16/17
59
STCE
,
Computational Differentiation, WS 16/17
60
STCE
Consider
argminxIRn
n1
X
2
x2i
i=0
n
100
200
300
400
500
1000
f (x + ei h)
13
47
104
184
284
1129
f (1)
8
28
63
113
173
689
f(1)
<1
1
2
2.5
3
6
,
Computational Differentiation, WS 16/17
61
STCE
,
Computational Differentiation, WS 16/17
62
STCE
,
Computational Differentiation, WS 16/17
63
STCE
,
Computational Differentiation, WS 16/17
64
STCE
end while
xx
(y, g) = f 0 (x)
end while
,
Computational Differentiation, WS 16/17
65
STCE
,
Computational Differentiation, WS 16/17
66
STCE
In:
implementation of the tangent-linear residual F (1) for computing the
residual y F (x) and its directional derivative y(1) F (x) x(1) in the
tangent-linear direction x(1) IRn at the current point x IRn :
F (1) : IRn IRn IRn IRn , (y, y(1) ) = F (1) (x, x(1) )
starting point for the Newton step: dx x(1) IRn
upper bound on the norm of the residual k y F (x) dxk at the
approximate solution for the Newton step: IR
Out:
approximate solution for the Newton step: dx IRn
,
Computational Differentiation, WS 16/17
67
STCE
Algorithm:
1: x(1) dx
2: (y, y(1) ) F (1) (x, x(1) )
3: p y y(1)
4: r p
5: while r do
6:
x(1) p
7:
(y, y(1) ) F (1) (x, x(1) )
8:
rT r / (pT y(1) )
9:
dx dx + p
10:
rprev r
11:
r r y(1)
12:
rT r / (rTprev rprev )
13:
pr+p
14: end while
,
Computational Differentiation, WS 16/17
68
STCE
,
Computational Differentiation, WS 16/17
69
STCE
Consider
argminxIRn
n1
X
2
x2i
i=0
(2)
(2)
n
100
200
300
400
500
1000
..
.
f (x + ei h)
<1
2
7
17
36
365
..
.
f (1,2)
<1
1
3
9
21
231
..
.
f(1)
<1
<1
1
4
10
138
..
.
f(1) v
<1
<1
<1
<1
<1
<1
..
.
105
> 104
> 104
> 104
,
Computational Differentiation, WS 16/17
70
Optimality
Poem / Song ... I
STCE
,
Computational Differentiation, WS 16/17
71
Optimality
Poem / Song ... II
STCE
,
Computational Differentiation, WS 16/17
72
Optimality
Poem / Song ... III
STCE
,
Computational Differentiation, WS 16/17
73
STCE
2
2 to
F (x + h ei ) F (x )
h
n1
1
i=0
F (x + h ei ) F (x h ei )
2h
F (x ) F (x h ei )
h
n1
i=0
n1
i=0
be relaxed later
,
Computational Differentiation, WS 16/17
74
STCE
,
Computational Differentiation, WS 16/17
75
STCE
df 0
1 d2 f
1 d3 f
(x ) h + 2 (x0 ) h2 3 (x0 ) h3 + . . . .
dx
2! dx
3! dx
Truncation after the respective first derivative terms yields scalar univariate
versions of forward and backward finite difference quotients, respectively,e.g,
from
df
f (x0 + h) = f (x0 ) + h (x0 ) + O(h2 ) .
dx
For 0 < h << 1 the truncation error is dominated by the value of the h2 term
which implies that only accuracy up to the order of h (= h1 and hence
first-order accuracy) can be expected, e.g,
f (x0 h) = f (x0 )
df 0
f (x0 + h) f (x0 ) + O(h2 )
f (x0 + h) f (x0 )
(x ) =
=
+ O(h)
dx
h
h
,
Computational Differentiation, WS 16/17
76
STCE
f (x0 ) +
Truncation after the first derivative term yields the scalar univariate version of
the central finite difference quotient. For small values of h the truncation error
is dominated by the value of the h3 term which implies that only accuracy up
to the order of h2 (second-order accuracy) can be expected, i.e,
f (x0 + h) f (x0 h) + O(h3 )
f (x0 + h) f (x0 h)
df 0
(x ) ==
=
+O(h2 ).
dx
2h
2h
,
Computational Differentiation, WS 16/17
77
STCE
FFD
0.497363752535389
0.536085981011869
0.539881480360327
0.540260231418621
0.540298098505865
0.54030188512133
0.540302264040449
0.540302302898254
0.540302358409406
0.540302247387103
0.540301137164079
0.540345546085064
0.539568389967826
0.544009282066327
0.555111512312578
CFD
0.540077208046432
0.540300054611342
0.540302283355554
0.540302305643836
0.540302305873652
0.540302305895857
0.540302306228924
0.540302291796024
0.540302358409406
0.540303357610128
0.540301137164079
0.540345546085064
0.539568389967826
0.544009282066327
0.555111512312578
EXACT
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
0.54030230586814
,
Computational Differentiation, WS 16/17
78
Real Numbers
Floating-Point Format
STCE
,
Computational Differentiation, WS 16/17
79
Floating-Point Numbers
STCE
Example
1.012 21 = 0.62510
1.112 21 = 0.87510
1.012 20 = 1.2510
1.112 20 = 1.7510
1.002 21 = 210 ,
1.012 21 = 2.510
1.102 21 = 310 ,
1.112 21 = 3.510
,
Computational Differentiation, WS 16/17
80
Floating-Point Numbers
STCE
1
"fps_ex.gnuplot"
0.5
-0.5
-1
-4
-3
-2
-1
,
Computational Differentiation, WS 16/17
81
STCE
,
Computational Differentiation, WS 16/17
82
Impact of Perturbation
Case Study (h.cpp)
STCE
The above happens to be reasonably representative for the general case. Hence,
perturbation of half of the mantissa appears to be a good rule of thumb.
#include <iostream>
#include <cmath>
#include <cfloat>
using namespace std;
int main() {
cout.precision(15);
double x1=1, x2=111111111111111;
cout << x1+sqrt(DBL EPSILON) << endl;
cout << x2+sqrt(DBL EPSILON) << endl;
cout << x1+abs(x1)sqrt(DBL EPSILON) << endl;
cout << x2+abs(x2)sqrt(DBL EPSILON) << endl;
}
Output:
1.00000001490116
111111111111111
1.00000001490116
111111112766796
,
Computational Differentiation, WS 16/17
83
STCE
n1
X
y=
2
x2i
i=0
be implemented in C++ as
template<class T>
void f(const vector<T>& x, T &y) {
y=0;
for (size t i=0; i<x.size(); i++) y=y+x[i]x[i];
y=yy;
}
,
Computational Differentiation, WS 16/17
84
STCE
We are looking for a routine fg(...) returning for a given vector x of length n the
value y of f and its gradient g.
int main(int argc, char argv[]) {
assert(argc==2); cout.precision(15);
size t n=atoi(argv[1]);
vector<double> x(n,0), g(n,0); double y=0;
for (size t i=0;i<n;i++) x[i]=cos(static cast<double>(i));
fg(x,y,g);
cout << y << endl;
for (size t i=0;i<n;i++) cout << g[i] << endl;
return 0;
}
,
Computational Differentiation, WS 16/17
85
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
,
Computational Differentiation, WS 16/17
86
STCE
... takes more than 1 minute for n = 105 to produce the gradient with
second-order accuracy.
,
Computational Differentiation, WS 16/17
87
Motivation
I can do better ... (ga1s.cpp)
STCE
The adjoint
#include dco.hpp
using namespace dco;
template<typename T>
void fg(const vector<T> &xv, T &yv, vector<T> &g) {
typedef ga1s<T> DCO M;
typedef typename DCO M::type DCO T;
typedef typename DCO M::tape t DCO TAPE T;
...
}
int main(int argc, char argv[]) {
...
fg(x,y,g);
...
}
... takes less than 1 second for n = 105 to produce the gradient with machine
accuracy.
,
Computational Differentiation, WS 16/17
88
Outline
STCE
,
Computational Differentiation, WS 16/17
89
I
I
STCE
,
Computational Differentiation, WS 16/17
90
Chain Rule
STCE
Recall ...
be such that both G and H are continuously differentiable over their respective
domains DG = IH DF and DH DF . Then F is continuously differentiable
over DF and
dF
dG
dG dH
G
(x ) =
(z , x ) =
(z , x )
(x ) +
(z , x )
dx
dx
dz
dx
x
for all x DF and z = H(x ).
Notation:
G
x
partial derivative;
dG
dx
total derivative
,
Computational Differentiation, WS 16/17
91
Chain Rule
Recall ...
STCE
,
Computational Differentiation, WS 16/17
92
STCE
vi = i (xi ) = xi
vj = j (vk )kj
j = n, . . . , n + q 1
dj
(vk )kj
dvi
,
Computational Differentiation, WS 16/17
93
STCE
,
Computational Differentiation, WS 16/17
94
SAC
STCE
,
Computational Differentiation, WS 16/17
95
STCE
,
Computational Differentiation, WS 16/17
96
STCE
AD Graphically
3: y[G]
3: y[G]
dG
dz
SAC:
z := H(x)
y := G(z, x)
DAG:
2: z[H]
lDAG:
2: z[H]
G
x
dH
dx
1: x
F (x)
dy
=
dx
1: x
pathlDAG
(i,j)path
dj,i
,
Computational Differentiation, WS 16/17
97
STCE
,
Computational Differentiation, WS 16/17
98
STCE
Proof (Formally) I
vi = i (xi ) = xi
vj = j (vk )kj
j = n, . . . , n + q 1
dj
(vk )kj
dvi
as introduced above.
Consider
F : IRn+q IRn+q
defined as
v q = F (v 0 ) = q (q1 (. . . (1 (v 0 )) . . .)),
,
Computational Differentiation, WS 16/17
99
STCE
j1
kn+1<j
vk
vkj = k (vij1 )ik k n + 1 = j , k = 0, . . . , n + q 1.
0
kn+1>j
v 0 = (x0 . . . xn1 0 . . . 0)T v q = (x0 . . . xn1 vn . . . vn+p1 y0 . . . ym1 )T .
By the chain rule
dy
dy dv q dv 0
=
dx
dv q dv 0 dx
dy
dq
d1 dv 0
=
...
dv q dv q1
dv 0 dx
,
Computational Differentiation, WS 16/17
100
STCE
(xy)G(F )
(i,j)(xy)
dj,i ,
where x X and y Y.
Based on the obvious correctness for the product of two matrices we assume
correctness for chains of length k. Proof by induction requires us to show
correctness for chains of length k + 1.
Let B denote the result of evaluting the chain of length k. W.l.o.g,3 consider
C = A B. To obtain cj,i the inner product of aj, and b,i needs to be
computed.
P For corresponding pairs of nonzero entries aj, and b,i we get
cj,i = aj, b,i . With
b,i =
(i)
(,)(i)
d,
,
Computational Differentiation, WS 16/17
101
STCE
Proof (Formally) IV
we get
aj, b,i =
aj,
(i)
d,
(,)(i)
and hence
cj,i =
X X
(i)
aj,
Y
(,)(i)
d, =
(ij)
(,)(ij)
d, ,
where ranges over the index set induced by corresponding nonzero pairs aj,
and b,i .
3B
A similar
,
Computational Differentiation, WS 16/17
102
STCE
sin(x)
.
x
3: y[G]
1
x
2: v[H]
sin(x)
x2
cos(x)
dF
dG dH G
cos(x) sin(x)
(x ) =
(v , x )
(x )+
(v , x ) =
2
dx
dv
dx
x
x
x
1: x
,
Computational Differentiation, WS 16/17
103
STCE
Consider
y = F (x) =
y0
y1
=
F0 (x0 , x1 )
G0 (x0 , H(x0 , x1 ))
=
F1 (x0 , x1 )
G1 (H(x0 , x1 ))
3: y0 [G0 ]
4: y1 [G1 ]
dG0
dz
G0
x0
2: z[H]
dH
dx0
0: x0
dG1
dz
dH
dx1
1: x1
,
Computational Differentiation, WS 16/17
104
STCE
T
T
G0
dF0
dG0 dH
+ dz dx0
dx0
dF0
= x0
dF0
dx
dG0 dH
dx1
dz dx1
3: y0 [G0 ]
4: y1 [G1 ]
dG0
dz
G0
x0
0: x0
4: y1 [G1 ]
dG1
dz
2: z[H]
dH
dx0
3: y0 [F0 ]
dG1
dz
dF0
dx0
dH
dx1
1: x1
dF0
dx1
dH
dx0
0: x0
2: z[H]
dH
dx1
1: x1
,
Computational Differentiation, WS 16/17
105
STCE
dF0
dF
dx0
= dF
1
dx
dx0
3: y0 [G0 ]
dG0 dH
G0
dF0
x0 + dz dx0
dx1
dF1 =
dG1 dH
dx1
dz dx0
4: y1 [G1 ]
dG0
dz
G0
x0
0: x0
3: y0 [F0 ]
dH
dx1
dH
dx1
4: y1 [F1 ]
dG1
dz
2: z[H]
dH
dx0
dG0
dz
dG1
dz
dF0
dx0
dF0
dx1
dF1
dx0
dF1
dx1
dH
dx1
1: x1
0: x0
1: x1
,
Computational Differentiation, WS 16/17
106
STCE
,
Computational Differentiation, WS 16/17
107
STCE
Mathematicians View
(1)
= F (x) x(1)
... definition of the whole Jacobian column-wise by input directions x(1) IRn equal to the
Cartesian basis vectors in IRn .
,
Computational Differentiation, WS 16/17
108
STCE
In
dF
x(1)
dx
the superscript on x denotes first directional differentiation of F performed in
tangent mode in direction x(1) IRn .
Subscripts will be used later to denote adjoints.
Larger values for superscripts will become relevant in the context of higher
derivatives.
,
Computational Differentiation, WS 16/17
109
STCE
A first-order tangent code F (1) : IRn IRn IRn IRm IRm IRm
z
(1)
z
z
, z, z(1) , z
),
:= F (1) (x, x(1) , x
y
(1)
y
, z, z
)
IRm IRm
3
y := F (x, x
y
!
!
(1)
(1)
z
x
:= F (x, x
, z, z
)
IRm 3
y(1)
z(1)
,
Computational Differentiation, WS 16/17
110
STCE
Variables for which derivatives are computed are referred to as active; x and z
are active inputs; z and y are active outputs.
Variables which depend on active inputs are referred to as varied.
,
Computational Differentiation, WS 16/17
111
STCE
y1 =
,
Computational Differentiation, WS 16/17
112
STCE
,
Computational Differentiation, WS 16/17
113
STCE
:= F (, , t, )
y
as
h1 := tan( t); h2 :=
h1
; y := h2 ; := h2
h1
(1)
y (1)
(1)
(1)
dF
:=
(1)
d(, , t, ) t
(1)
,
Computational Differentiation, WS 16/17
114
STCE
(1)
:= F (x) x(1)
,
Computational Differentiation, WS 16/17
115
STCE
,
Computational Differentiation, WS 16/17
116
STCE
Define
dv
ds
for v {x, y} and some auxiliary s IR assuming that F (x(s)) is continuously
differentiable over its domain.
v (1)
y[F ]
dy
dy dx
=
= F (x) x(1)
ds
dx ds
and hence
y(1) = F (x) x(1) .
x(1)
s
,
Computational Differentiation, WS 16/17
117
STCE
Graphically
tangent-augmented lDAG
2: y[F ]
tangent DAG
5: y(1) [<, >]
3: y[F ]
F
1: x
x(1)
0: s
4
4: F
2
1: x
2: x(1)
,
Computational Differentiation, WS 16/17
118
STCE
y(1) =
dG dH
G
+
x(1)
dz dx
x
dG (1) G (1)
=
z +
x
dz
x
dG (1)
=
z + y(1)
dz
dF
dF
=
x(1) =
ds
dx
= y(1)
(1, 2) :
(1, 3) :
(2, 3) :
dH
x(1)
dx
G (1)
x
x
dG (1)
z
dz
,
Computational Differentiation, WS 16/17
119
STCE
3: y[G]
dG
dz
2: z[H]
dG
dz
G
x
2: z[H]
3: y[G]
dG
dz
G
x
2: z[H]
dH
dx
y(1)
1: x
z(1)
x(1)
0: s
3: y[G]
1: x
y(1)
z(1)
x(1)
0: s
0: s
0: s
An edge is back eliminated by multiplying its label with the label(s) of the
incoming edge(s) of its source followed by its removal. If the source has no
further emanating edges, then it is also removed.
First-order tangent code back eliminates all back eliminatable edges in
topological order (no storage of lDAG).
,
Computational Differentiation, WS 16/17
120
STCE
Tangent SAC
For i = 0, . . . , n 1
vi
(1)
vi
For i = n, . . . , q 1
!
vi
:=
(1)
vi
!
:=
xi
(1)
xi
!
(seed)
i (vk )ki
P
di (vk )ki
dvj
vn+p+i
(1)
vn+p+i
ji
(1)
vj
(propagate)
For i = 0, . . . , m 1
yi
(1)
yi
:=
!
(harvest)
,
Computational Differentiation, WS 16/17
121
6: y1 (c)
5 [x0 ]
6 [c]
4: /
We consider
y0
x0 sin(x0 x1 )/x1
=
y1
sin(x0 x1 )/x1 c
implemented as
3 [1/x1 ]
4 [v4 ]
STCE
3: sin
7 [v4 /x1 ]
t := sin(x0 x1 )/x1
y0 := x0 t; y1 := t c
2 [cos(v2 )]
yielding SAC
2:
0 [x1 ]
1 [x0 ]
0: x0
1: x1
(1)
(1)
[x0 ]
[x1 ]
Tangent lDAG
v2
v3
v4
y0
:= x0 x1
:= sin(v2 )
:= v3 /x1
:= x0 v4 ; y1 := v4 c
,
Computational Differentiation, WS 16/17
122
STCE
Seed
5: y0 ()
6: y1 (c)
[x0 ]
[c]
4: /
x0 :=?
x1 :=?
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
(1)
x0 :=?
(1)
x1 :=?
[x0 ]
0: x0
1: x1
(1)
(1)
[x0 ]
[x1 ]
,
Computational Differentiation, WS 16/17
123
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
v2 := x0 x1
(1)
(1)
(1)
v2 := x1 x0 + x0 x1
2:
[x1 ]
[x0 ]
0: x0
1: x1
(1)
(1)
[x0 ]
[x1 ]
,
Computational Differentiation, WS 16/17
124
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
v2 := x0 x1
(1)
(1)
(1)
v2 := x1 x0 + x0 x1
v3 := sin(v2 )
(1)
(1)
v3 := cos(v2 ) v2
2:
(1)
0: x0
1: x1
[v2 ]
(1)
(1)
[x0 ]
[x1 ]
,
Computational Differentiation, WS 16/17
125
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
(1)
[v3 ]
0: x0
v2 := x0 x1
(1)
(1)
(1)
v2 := x1 x0 + x0 x1
v3 := sin(v2 )
(1)
(1)
v3 := cos(v2 ) v2
v4 := v3 /x1
(1)
(1)
(1)
v4 := (v3 v4 x1 )/x1
1: x1
(1)
(1)
[x0 ]
[x1 ]
,
Computational Differentiation, WS 16/17
126
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[v4 ]
(1)
[v4 ]
v2 := x0 x1
(1)
(1)
(1)
v2 := x1 x0 + x0 x1
v3 := sin(v2 )
(1)
(1)
v3 := cos(v2 ) v2
v4 := v3 /x1
(1)
(1)
(1)
v4 := (v3 v4 x1 )/x1
y0 := x0 v4
(1)
(1)
(1)
y0 := v4 x0 + x0 v4
0: x0
(1)
[x0 ]
,
Computational Differentiation, WS 16/17
127
5: y0 ()
STCE
6: y1 (c)
[c]
4: /
(1)
[y0 ]
(1)
[v4 ]
v2 := x0 x1
(1)
(1)
(1)
v2 := x1 x0 + x0 x1
v3 := sin(v2 )
(1)
(1)
v3 := cos(v2 ) v2
v4 := v3 /x1
(1)
(1)
(1)
v4 := (v3 v4 x1 )/x1
y0 := x0 v4
(1)
(1)
(1)
y0 := v4 x0 + x0 v4
y1 := v4 c
(1)
(1)
y1 := c v4
,
Computational Differentiation, WS 16/17
128
5: y0 ()
STCE
6: y1 (c)
(1)
(1)
[y0 ]
[y1 ]
v2 := x0 x1
(1)
(1)
(1)
v2 := x1 x0 + x0 x1
v3 := sin(v2 )
(1)
(1)
v3 := cos(v2 ) v2
v4 := v3 /x1
(1)
(1)
(1)
v4 := (v3 v4 x1 )/x1
y0 := x0 v4
(1)
(1)
(1)
y0 := v4 x0 + x0 v4
y1 := v4 c
(1)
(1)
y1 := c v4
,
Computational Differentiation, WS 16/17
129
STCE
For i = 0, . . . , n 1
y
[F ],i
:= F (1) (x, ei ),
,
Computational Differentiation, WS 16/17
130
STCE
,
Computational Differentiation, WS 16/17
131
STCE
,
Computational Differentiation, WS 16/17
132
STCE
,
Computational Differentiation, WS 16/17
133
STCE
For scalar tangent mode AD, a class dco t1s type (tangent 1st-order scalar type)
is defined with double precision members v (value) and t (tangent).
// tangent 1storder scalar derivative type
class dco t1s type {
public :
double v; // value
double t; // tangent
dco t1s type(const double&);
dco t1s type();
dco t1s type& operator=(const dco t1s type&);
};
,
Computational Differentiation, WS 16/17
134
STCE
,
Computational Differentiation, WS 16/17
135
STCE
,
Computational Differentiation, WS 16/17
136
STCE
...
#include dco t1s type.hpp // tangent type definition
const int n=4;
void f(dco t1s type x, dco t1s type &y) { ... }
int main() {
dco t1s type x[n], y;
for (int i=0;i<n;i++) x[i]=1;
for (int i=0;i<n;i++) {
x[i].t=1; // seed
f(x,y);
x[i].t=0; // reset for next Cartesian basis direction
cout << y.t << endl; // harvest
}
return 0;
}
,
Computational Differentiation, WS 16/17
137
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
see case_studies/race/gt1s
,
Computational Differentiation, WS 16/17
138
STCE
,
Computational Differentiation, WS 16/17
139
STCE
(1)
:= F (x) X (1)
... harvesting of the whole Jacobian by seeding input directions
X (1) [i] IRn , i = 0, . . . , n 1,
with the Cartesian basis vectors
in IRn . Note concurrency!
,
Computational Differentiation, WS 16/17
140
STCE
template<typename T>
void f(const vector<T>& x, vector<T>& y) {
T v = tan(x[2] x[3]);
T w = x[1] v;
y[0] = x[0] v / w;
y[1] = y[0] x[1];
}
,
Computational Differentiation, WS 16/17
141
STCE
Example: Lighthouse I
2.79402 5.01252
2.79402 7.80654
11.025
11.025
1
11.025
1
14.2435
=
.
11.025 1
11.4495
1
,
Computational Differentiation, WS 16/17
142
Example: Lighthouse
First-Order Scalar Tangent Code (lighthouse/gt1s.cpp)
STCE
void driver(
const vector<double>& xv, const vector<double>& xt,
vector<double>& yv, vector<double>& yt
){
typedef gt1s<double>::type DCO T;
const int n=xv.size(), m=yv.size();
vector<DCO T> x(n), y(m);
for (int i=0;i<n;i++) { value(x[i])=xv[i]; derivative(x[i])=xt[i]; } // seed
f(x,y); // overloaded primal
for (int i=0;i<m;i++) { yv[i]=value(y[i]); yt[i]=derivative(y[i]); } // harvest
}
,
Computational Differentiation, WS 16/17
143
Example: Lighthouse
First-Order Vector Tangent Code (lighthouse/gt1v.cpp)
STCE
,
Computational Differentiation, WS 16/17
144
STCE
,
Computational Differentiation, WS 16/17
145
STCE
,
Computational Differentiation, WS 16/17
146
STCE
and where < ., . >IRn and < ., . >IRm denote appropriate scalar products in IRn
and IRm , respectively.
Theorem
(F ) = (F )T .
< (F )T y(1) , x(1) >IRn =< y(1) , F x(1) >IRm
[=:x(1) ]
[=:y(1) ]
,
Computational Differentiation, WS 16/17
147
STCE
,
Computational Differentiation, WS 16/17
148
STCE
,
Computational Differentiation, WS 16/17
149
STCE
Mathematicians View
... definition of the whole Jacobian row-wise through input directions y(1) IRm equal to the
Cartesian basis vectors in IRm .
,
Computational Differentiation, WS 16/17
150
STCE
Notation
In
dF
dx
T
y(1)
,
Computational Differentiation, WS 16/17
151
STCE
x(1)
z(1)
y(1)
T
computes a shifted transposed Jacobian vector product alongside with the function
value:
z
z
m
m
, z, z
)
IR IR 3 := F (x, x
y
y
!
!
!
x(1)
x(1)
z(1)
T
:=
, z, z
)
+ F (x, x
z(1)
0
y(1)
y(1) := 0
,
Computational Differentiation, WS 16/17
152
STCE
The whole (dense) Jacobian can be harvested from the active input adjoints
x(1)
IRm
z(1)
row-wise by seeding active output adjoints
z(1)
IRm
y(1)
with the Cartesian basis vectors in IRm and for x(1) := 0 on input.
,
Computational Differentiation, WS 16/17
153
STCE
:= F (, , t, )
y
as
h1 := tan( t); h2 :=
h1
; y := h2 ; := h2
h1
(1)
(1)
T
(1)
dF
(1)
:= (1) +
t(1)
t(1)
y(1)
d(, , t, )
(1)
0
y(1) := 0
in addition to the function value; see later for details.
,
Computational Differentiation, WS 16/17
154
STCE
,
Computational Differentiation, WS 16/17
155
STCE
,
Computational Differentiation, WS 16/17
156
STCE
Define
dt T
dv
for v {x, y} and some auxiliary t IR assuming that t(F (x)) is continuously
differentiable over its domain.
v (1)
= y(1)
F (x)
dx
dy dx
T
y(1)
y[F ]
and hence
x(1)
dt T
= F (x)T y(1) .
dx
dF
dx
,
Computational Differentiation, WS 16/17
157
STCE
Graphically
adjoint-augmented lDAG
3: t
adjoint DAG
5: x(1) [<, >]
3: y[F ]
y(1)
2: y[F ]
dF
dx
1: x
4: F
1: x
2: y(1)
Note inner product notation < y(1) , F (x) > F (x)T y(1) .
Computational Differentiation, WS 16/17
158
STCE
Let
y = F (x) = G(H(x), x)
with F : IRn IRm , H : IRn IRk , z = H(x), and G : IRn+k IRm
continuously differentiable over their respective domains. By the chain rule
x(1)
dt T
dF T
=
y(1) =
dx
dx
dH T dG T
G T
+
dx
dz
x
dH T
z(1) + G x(1)
=
dx
= x(1)
!
y(1)
"
dG T
3:
y(1)
d(z, x)
"
#
dH T
2:
z(1)
dx
,
Computational Differentiation, WS 16/17
159
STCE
4: t
T
y(1)
3: y[G]
zT(1)
G xT(1)
dG
dz
2: z[H]
G
x
dH
dx
xT(1)
2: z[H]
dH
dx
1: x
1: x
A vertex is eliminated by multiplying the labels of its incoming edges with the
label(s) of the edge(s) emanating from its target (resulting in new edges or
incrementation of existing edge labels) followed by its removal.
First-order adjoint code eliminates all eliminatable vertices in reverse
topological order (reverses primal data flow).
,
Computational Differentiation, WS 16/17
160
STCE
,
Computational Differentiation, WS 16/17
161
STCE
,
Computational Differentiation, WS 16/17
162
STCE
di (vk )ki
)
dvj
For i = 0, . . . , m 1
yi := vn+p+i
,
Computational Differentiation, WS 16/17
163
STCE
,
Computational Differentiation, WS 16/17
164
y1(1)
[x0 ]
[c]
5: y0 ()
6: y1 (c)
t := sin(x0 x1 )/x1
y0 := x0 t
y1 := t c
[1/x1 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
We consider
y0
x0 sin(x0 x1 )/x1
=
y1
sin(x0 x1 )/x1 c
implemented as
4: /
[v4 ]
STCE
[x0 ]
0: x0
1: x1
Adjoint lDAG
yielding SAC
v2
v3
v4
y0
y1
:= x0 x1
:= sin(v2 )
:= v3 /x1
:= x0 v4
:= v4 c
,
Computational Differentiation, WS 16/17
165
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
x0 :=?
x1 :=?
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
166
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
v2 := x0 x1
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
167
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
v2 := x0 x1
v3 := sin(v2 )
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
168
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
169
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
v2
v3
v4
y0
:= x0 x1
:= sin(v2 )
:= v3 /x1
:= x0 v4
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
170
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
v2
v3
v4
y0
y1
:= x0 x1
:= sin(v2 )
:= v3 /x1
:= x0 v4
:= v4 c
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
171
t
[y0(1) ]
[y1(1) ]
[x0 ]
[c]
5: y0 ()
6: y1 (c)
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
y0(1) :=?
y1(1) :=?
x0(1) :=?
x1(1) :=?
v2(1) := 0
v3(1) := 0
v4(1) := 0
1: x1
,
Computational Differentiation, WS 16/17
172
STCE
Interpret (Tape)
t
[y0(1) ]
[y1(1) ]
[x0 ]
[c]
5: y0 ()
6: y1 (c)
4: /
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
[1/x1 ]
[v4 ]
3: sin
[cos(v2 )]
v4(1) + = c y1(1)
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
173
STCE
t
[y0(1) ]
[y1 v4(1) ]
5: y0 ()
[x0 ]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
[cos(v2 )]
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
174
STCE
[v4(1) ]
4: /
[1/x1 ]
[y0 x0(1) ]
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
u := 1/x1
v3(1) + = u v4(1)
x1(1) = v4 u v4(1)
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
175
[v3(1) ]
[y0 x0(1) ]
[v4 x1(1) ]
3: sin
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
u := 1/x1
v3(1) + = u v4(1)
x1(1) = v4 u v4(1)
v2(1) + = cos(x2 ) v3(1)
1: x1
,
Computational Differentiation, WS 16/17
176
[v2(1) ]
[y0 x0(1) ]
[v4 x1(1) ]
2:
[x1 ]
[x0 ]
0: x0
1: x1
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
u := 1/x1
v3(1) + = u v4(1)
x1(1) = v4 u v4(1)
v2(1) + = cos(x2 ) v3(1)
x0(1) + = x1 v2(1)
x1(1) + = x0 v2(1)
,
Computational Differentiation, WS 16/17
177
[x0(1) ]
[x1(1) ]
0: x0
1: x1
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
u := 1/x1
v3(1) + = u v4(1)
x1(1) = v4 u v4(1)
v2(1) + = cos(x2 ) v3(1)
x0(1) + = x1 v2(1)
x1(1) + = x0 v2(1)
,
Computational Differentiation, WS 16/17
178
STCE
,
Computational Differentiation, WS 16/17
179
STCE
,
Computational Differentiation, WS 16/17
180
STCE
,
Computational Differentiation, WS 16/17
181
STCE
,
Computational Differentiation, WS 16/17
182
STCE
,
Computational Differentiation, WS 16/17
183
STCE
,
Computational Differentiation, WS 16/17
184
STCE
#include <iostream>
#include dco a1s type.hpp // adjoint type definition
using namespace std;
const int n=4;
extern dco a1s tape entry dco a1s tape[DCO A1S TAPE SIZE]; // tape
// overloaded primal
void f(dco a1s type x, dco a1s type &y) {
y=0;
for (int i=0;i<n;i++) y=y+x[i]x[i];
y=yy;
}
...
,
Computational Differentiation, WS 16/17
185
STCE
int main() {
dco a1s type x[n], y;
for (int j=0;j<n;j++) x[j]=1
f(x,y); // overloaded primal builds tape
dco a1s tape[y.va].a=1; // seed
dco a1s interpret tape(); // tape interpreter
cout << i << \t
<< dco a1s tape[x[i].va].a << endl; // harvest
dco a1s reset tape(); // here obsolete ...
return 0;
}
,
Computational Differentiation, WS 16/17
186
value tape
I
I
gradient tape
I
I
partial tape
I
STCE
mixtures ...
,
Computational Differentiation, WS 16/17
187
STCE
For i = 0, . . . , n 1
xi(1) := 0
For i = 0, . . . , m 1
y
[F ]i,
:= F(1) (x, x(1) , ei ),
,
Computational Differentiation, WS 16/17
188
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
,
Computational Differentiation, WS 16/17
189
STCE
,
Computational Differentiation, WS 16/17
190
STCE
,
Computational Differentiation, WS 16/17
191
STCE
template<typename T>
void f(const vector<T>& x, vector<T>& y) {
T v = tan(x[2] x[3]);
T w = x[1] v;
y[0] = x[0] v / w;
y[1] = y[0] x[1];
}
,
Computational Differentiation, WS 16/17
192
STCE
1
2.79402 2.79402
4.58804
1 5.01252 7.80654 1
11.8191
=
=
1 + 11.025
23.0501 .
11.025
1
1
11.025
11.025
23.0501
,
Computational Differentiation, WS 16/17
193
Example: Lighthouse
First-Order Scalar Adjoint Code (lighthouse/ga1s.cpp) I
STCE
void driver(
const vector<double>& xv, vector<double>& xa,
vector<double>& yv, vector<double>& ya
){
// generic adjoint 1storder scalar dco mode
typedef ga1s<double> DCO M;
typedef DCO M::type DCO T; // dco type
typedef DCO M::tape t DCO TAPE T; // dco tape type
DCO M::global tape=DCO TAPE T::create(); // tape creation
int n=xv.size(), m=yv.size();
vector<DCO T> x(n), y(m);
for (int i=0;i<n;i++) { // independent tape entries
x[i]=xv[i];
DCO M::global tape>register variable(x[i]);
}
f(x,y); // overloaded primal
for (int i=0;i<m;i++) {
DCO M::global tape>register output variable(y[i]); // dependent tape entries
yv[i]=value(y[i]); derivative(y[i])=ya[i]; // seed
}
,
Computational Differentiation, WS 16/17
194
Example: Lighthouse
First-Order Scalar Adjoint Code (lighthouse/ga1s.cpp) II
STCE
,
Computational Differentiation, WS 16/17
195
Example: Lighthouse
First-Order Vector Adjoint Code (lighthouse/ga1v.cpp) I
STCE
,
Computational Differentiation, WS 16/17
196
Example: Lighthouse
First-Order Vector Adjoint Code (lighthouse/ga1v.cpp) II
STCE
yv[i]=value(y[i]);
for (int j=0;j<m;j++) derivative(y[i])[j] = ya[i][j]; // vector adjoints
}
for (int i=0;i<n;i++) {
for (int j=0;j<m;j++) derivative(x[i])[j] = xa[i][j];
}
DCO M::global tape>interpret adjoint();
for (int i=0;i<n;i++) {
for (int j=0;j<m;j++) xa[i][j]=derivative(x[i])[j];
}
for (int i=0;i<m;i++) {
for (int j=0;j<m;j++) ya[i][j]=derivative(y[i])[j];
}
DCO TAPE T::remove(DCO M::global tape);
}
,
Computational Differentiation, WS 16/17
197
STCE
,
Computational Differentiation, WS 16/17
198
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
x0 :=?
x1 :=?
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
199
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
v2 := x0 x1
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
200
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
v2 := x0 x1
v3 := sin(v2 )
3: sin
[v4 /x1 ]
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
201
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[1/x1 ]
[v4 ]
3: sin
[v4 /x1 ]
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
[cos(v2 )]
2:
[x1 ]
[x0 ]
0: x0
1: x1
,
Computational Differentiation, WS 16/17
202
5: y0 ()
6: y1 (c)
[x0 ]
[c]
4: /
[v4 ]
[v4 x0(1) ]
[v4 x1(1) ]
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
...
local gradient code exposed to
compiler (optimization)
0: x0
1: x1
,
Computational Differentiation, WS 16/17
203
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[v4 ]
[v4 x0(1) ]
[v4 x1(1) ]
0: x0
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
1: x1
,
Computational Differentiation, WS 16/17
204
5: y0 ()
STCE
6: y1 (c)
[x0 ]
[c]
4: /
[v4 ]
[v4 x0(1) ]
[v4 x1(1) ]
0: x0
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
y1 := v4 c
1: x1
,
Computational Differentiation, WS 16/17
205
t
[y0(1) ]
[y1(1) ]
[x0 ]
[c]
5: y0 ()
6: y1 (c)
4: /
[v4 ]
[v4 x0(1) ]
[v4 x1(1) ]
0: x0
1: x1
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
y1 := v4 c
y0(1) :=?
y1(1) :=?
x0(1) :=?
x1(1) :=?
v2(1) := 0
v3(1) := 0
v4(1) := 0
,
Computational Differentiation, WS 16/17
206
STCE
t
[y0(1) ]
[y1(1) ]
[x0 ]
[c]
5: y0 ()
6: y1 (c)
4: /
[v4 ]
[v4 x0(1) ]
[v4 x1(1) ]
0: x0
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
1: x1
,
Computational Differentiation, WS 16/17
207
t
[y0(1) ]
[y1 v4(1) ]
5: y0 ()
[x0 ]
4: /
[v4 ]
[v4 x0(1) ]
[v4 x1(1) ]
0: x0
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
1: x1
,
Computational Differentiation, WS 16/17
208
[v4(1) ]
4: /
[y0 x0(1) ]
[v4 x0(1) ]
[v4 x1(1) ]
0: x0
1: x1
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
x1(1) + = v4(1) v4 x1(1)
x0(1) + = v4(1) v4 x0(1)
,
Computational Differentiation, WS 16/17
209
[x0(1) ]
[x1(1) ]
0: x0
1: x1
STCE
v2 := x0 x1
v3 := sin(v2 )
v4 := v3 /x1
v4 v3(1) := 1/x1 1
v4 v2(1) := cos(v2 ) v4 v3(1)
v4 x1(1) := v4 v2(1) x1
v4 x0(1) := v4 /x1 + v4 v2(1) x0
y0 := x0 v4
y1 := v4 c
v4(1) + = c y1(1)
v4(1) + = x0 y0(1)
x0(1) + = v4 y0(1)
x1(1) + = v4(1) v4 x1(1)
x0(1) + = v4(1) v4 x0(1)
,
Computational Differentiation, WS 16/17
210
STCE
,
Computational Differentiation, WS 16/17
211
Outline
STCE
,
Computational Differentiation, WS 16/17
212
STCE
,
Computational Differentiation, WS 16/17
213
STCE
d2 F
(x).
dx2
,
Computational Differentiation, WS 16/17
214
STCE
(1)
(w.l.o.g. m = 1)
as
d2 f
(x0 )
dxi dxj
df
0
dxi (x
+ ej h)
df
0
dxi (x
ej h)
2h
"
f (x + ej h + ei h) f (x0 + ej h ei h)
2h
#
f (x0 ej h + ei h) f (x0 ej h ei h)
/(2 h).
2h
,
Computational Differentiation, WS 16/17
215
STCE
n1
X
y=
2
x2i
i=0
we are looking for a routine fgh(...) returning for a given vector x of length n the
value y of f, its gradient g, and Hessian h.
int main(int argc, char argv[]) {
assert(argc==2); cout.precision(15);
size t n=atoi(argv[1]);
vector<double> x(n,0), g(n,0), h(n(n+1)/2,0); double y=0;
for (size t i=0;i<n;i++) x[i]=cos(static cast<double>(i));
fgh(x,y,g,h);
cout << y << endl;
for (size t i=0;i<n;i++) cout << g[i] << endl;
int ii=0;
for (int i=0;i<n;i++)
for (int j=0;j<=i;j++,ii++)
cout << h[ii] << endl;
return 0;
}
Computational Differentiation, WS 16/17
,
216
STCE
template<typename T>
void fgh(const vector<T>& x, T &y, vector<T>& g, vector<T>& h) {
size t n=x.size();
int ii=0;
for (int i=0;i<n;i++) {
vector<T> x pp(x), x mp(x), g pp(n,0), g mp(n,0);
double p=(x mp[i]==0) ? sqrt(sqrt(DBL EPSILON))
: sqrt(sqrt(DBL EPSILON))abs(x mp[i]);
x mp[i]=p; fg(x mp,y,g mp);
x pp[i]+=p; fg(x pp,y,g pp);
for (int j=0;j<=i;j++,ii++)
h[ii]=(g pp[j]g mp[j])/(2p);
}
fg(x,y,g);
}
,
Computational Differentiation, WS 16/17
217
Motivation
I can do better ... (t2s a1s.cpp)
STCE
... takes about 0.5s for n = 103 to produce the Hessian with machine accuracy.
,
Computational Differentiation, WS 16/17
218
STCE
y
(2)
y
(1) = F (1,2) (x, x(2) , x(1) , x(1,2) ),
y
y (1,2)
as follows:
(2)
(1) :=
y
T
y (1,2)
x(1)
F (x)
F (x) x(2)
.
(1)
F (x) x
,
Computational Differentiation, WS 16/17
219
STCE
Notation
F (x)
dF (x)
(1)
dx x
dy
dx(1)
. . . yields for
dy
dx
. . . yields for
dy (1)
dx
. . . yields for
dy (1)
dx(1)
,
Computational Differentiation, WS 16/17
220
STCE
Derivation
y
! d (1)
!
y
y (2)
x(2)
(1,2)
(1)
y (1,2)
x
d(x x )
T
T dF (x) T
d2 F (x)
d2 F (x)
; dx2
= dx2
dx
y (1) =x(1)
{
dy
dy
(2)
+ (1) x(1,2)
dx x
dx (1)
dy (1)
dy
(2)
(1,2)
+ dx
(1) x
dx x
T
x(1)
=0
}|
dF (x)
(2)
dx x
2
d F (x)
(2)
+ dFdx(x)
dx2 x
x(1,2)
,
Computational Differentiation, WS 16/17
221
STCE
T d2 F (x)
dy (1) (2)
x = x(1)
x(2)
dx
dx2
,
Computational Differentiation, WS 16/17
222
STCE
Accumulation of Hessian
,
Computational Differentiation, WS 16/17
223
STCE
Define
v (2)
dv
ds
d(x(1) F (x)T )
d(F (x) x(1) )
=
ds
ds
=
dx(1)
ds
T
T
dF (x)
ds
dF (x)T dx
dx
ds
F (x)T + x(1)
T
,
Computational Differentiation, WS 16/17
224
STCE
dy
ds
dF (x)
=
x(2)
dx
dy (1)
ds
T
dF (x)
d2 F (x)
=
x(2)
x(1,2) + x(1)
dx
dx2
y (2)
5: y (1) []
3: y[F ]
x(1)
y (1,2)
4: F
2 F
Comments:
2: x(1)
1: x
x(2)
x, x(2) y (2)
x(1,2)
0: s
,
Computational Differentiation, WS 16/17
225
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
,
Computational Differentiation, WS 16/17
226
STCE
,
Computational Differentiation, WS 16/17
227
STCE
,
Computational Differentiation, WS 16/17
228
STCE
,
Computational Differentiation, WS 16/17
229
STCE
,
Computational Differentiation, WS 16/17
230
STCE
,
Computational Differentiation, WS 16/17
231
STCE
y
y (2)
(2)
(2)
x = F(1) (x, x(2) , y(1) , y(1) ),
(1)
(2)
x(1)
as follows:
F (x)
y (2)
F (x) x(2)
x :=
T
F (x) y(1)
(1)
(2)
(2)
x(1)
y(1) 2 F (x) x(2) + F (x)T y(1)
y
,
Computational Differentiation, WS 16/17
232
STCE
Notation
F (x)
dF (x) T
dx
y(1)
. . . yields for
dy
dy(1)
. . . yields for
dy
dx
. . . yields for
dx(1)
dx
. . . yields for
dx(1)
dy(1)
(2)
(2)
,
Computational Differentiation, WS 16/17
233
STCE
Derivation
{
!
dy
dy (2)
x(2)
(2)
y
= dx x +
(2)
dy(1) (1)
y(1)
dx(1)
dx(1)
(2)
(2)
x
+
y
(1)
dx
dy(1)
T
2
2
T
dF (x)
d F (x)
d F (x)
dF (x)
(2)
x(1) =y(1) dx
; dx2
= dx2
x
dx
=
T
(2)
d2 F (x)
y(1) dx2 x(2) + dFdx(x) y(1)
! d y
y (2)
x(1)
(2)
x(1)
d(x y(1) )
=0
}|
,
Computational Differentiation, WS 16/17
234
STCE
x(1)
dx(1) (2)
d2 F (x) (2)
x = y(1)
x
dx
dx2
,
Computational Differentiation, WS 16/17
235
STCE
Accumulation of Hessian
,
Computational Differentiation, WS 16/17
236
STCE
Define
dv
ds
for v {x, x(1) , y, y(1) } and assuming that F(1) (x(s), x(1) (s), y(1) (s)) is
continuously differentiable over its domain.
v (2)
,
Computational Differentiation, WS 16/17
237
STCE
6: x(1) []
4: y[F ]
T
y(1)
(2)
x(1)
5: F T
dy
ds
dF (x)
=
x(2)
dx
dx(1)
ds
(2)
= y(1) 2 F (x) x(2) + F (x)T y(1)
y (2)
F T
Comments:
2 F
x, x(2) y (2)
1: y(1)
0: x
(2)
x(2) y(1)
s
(2)
(2)
,
Computational Differentiation, WS 16/17
238
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
,
Computational Differentiation, WS 16/17
239
STCE
(2)
(2)
User Guide: x(1) := y(1) 2 F (x) x(2) + F (x)T y(1)
template<typename T>
void fgh(const vector<T>& xv,
T& yv, vector<T>& g, vector<vector<T> >& h) {
// generic tangent over adjoint scalar dco mode
typedef ga1s<typename gt1s<T>::type> DCO M;
typedef typename DCO M::type DCO T;
typedef typename DCO M::tape t DCO TAPE T;
size t n=xv.size();
DCO M::global tape=DCO TAPE T::create();
for (size t i=0;i<n;i++) {
vector<DCO T> x(n,0); DCO T y=0;
for (size t j=0;j<n;j++) {
x[j]=xv[j];
DCO M::global tape>register variable(x[j]);
}
derivative(value(x[i]))=1; // seed tangent
f(x,y); // overloaded primal
DCO M::global tape>register output variable(y);
yv=passive value(y); // harvest tangent
,
Computational Differentiation, WS 16/17
240
STCE
g[i]=derivative(value(y));
value(derivative(y))=1; // seed adjoint
DCO M::global tape>interpret adjoint();
for (size t j=0;j<n;j++) h[i][j]=derivative(derivative(x[j]));
// harvest adjoint
DCO M::global tape>reset(); // reset tape to start position
}
DCO TAPE T::remove(DCO M::global tape);
}
,
Computational Differentiation, WS 16/17
241
STCE
,
Computational Differentiation, WS 16/17
242
STCE
,
Computational Differentiation, WS 16/17
243
STCE
,
Computational Differentiation, WS 16/17
244
STCE
y
y (1)
(1)
(1)
x = F(2) (x, x(1) , y(2) , y(2) ),
(2)
(1)
x(2)
as follows:
F (x)
(1)
F (x) x(1)
:=
(1)
F (x)T y(2)
x(2)
(1)
(1)
2
(1)
T
x(2)
y(2) F (x) x + F (x) y(2)
,
Computational Differentiation, WS 16/17
245
STCE
x(2)
(1)
x(2)
!T
y (1)
d(x x(1) )
y(2)
(1)
y(2)
dy T
dx
dy (1)
dx
y(2) +
T
dy
=
(1) y(2) +
dx
|
{z
}
(1)
dy
dx(1)
(1)
y(2)
(1)
y(2)
=0
dy (1)
dx
T d2 F (x)
d2 F (x)
; dx2
dx2
=x(1)
d2 F (x)
dx2
dF (x) T
dx
2
(1)
F (x)
y(2) + y(2) d dx
2
(1)
dF (x) T
y(2)
dx
(1)
,
Computational Differentiation, WS 16/17
246
STCE
Essential Activity
dy (1)
dx
T
(1)
(1)
y(2) = y(2)
d2 F (x) (1)
x
dx2
,
Computational Differentiation, WS 16/17
247
STCE
(1)
dt T
(1)
= F (x))T y(2)
dx(1)
dt T
dx
x(2)
x(2)
(1) T
T
y(2)
y(2)
(1)
4: y (1) []
1: y[F ]
(1) T
3: F
since actually
F
2 F
0: x
(1)
dy
dF
dF
dx
dt
dy (1)
IR,
IR1(1n) ,
and
IR(1n)n .
2: x(1)
I x, y (1) x(1)
(2)
(2)
I x(1) varied? y useful?
,
Computational Differentiation, WS 16/17
248
STCE
(2)
,
Computational Differentiation, WS 16/17
249
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
compare with second-order ToT and ToA codes and with second-order
finite differences
,
Computational Differentiation, WS 16/17
250
STCE
(2)
User Guide: x(2) := F (x)T y(2) + y(1) 2 F (x) x(1)
,
Computational Differentiation, WS 16/17
251
STCE
}
value(derivative(x[i]))=1; // seed
f(x,y); // overloaded primal generates tape
DCO BASE MODE::global tape>register output variable(value(y));
DCO BASE MODE::global tape>register output variable(derivative(y));
derivative(derivative(y))=1; // seed
DCO BASE MODE::global tape>interpret adjoint();
for (size t j=0;j<n;j++)
H[j][i] = derivative(value(x indep[j])); // harvest
DCO BASE MODE::global tape>reset(); // reset tape to start position
g[i]=value(derivative(y)); // harvest
}
yv=passive value(y);
DCO TAPE TYPE::remove(DCO BASE MODE::global tape);
}
,
Computational Differentiation, WS 16/17
252
STCE
,
Computational Differentiation, WS 16/17
253
STCE
,
Computational Differentiation, WS 16/17
254
STCE
y
x(1)
(1,2)
, y(1) , y(1,2) ),
x(2) = F(1,2) (x, x
y(1,2)
as follows:
F (x)
y
x(1)
F (x)T y(1)
2
T
x(2) :=
y(1) F (x) x(1,2) + F (x) y(2)
y(1,2)
F (x) x(1,2)
,
Computational Differentiation, WS 16/17
255
STCE
Notation
F (x)
dF (x) T
dx
y(1)
dx(1)
dx
. . . yields for
dy
dx
. . . yields for
dy
dy(1)
. . . yields for
dx(1)
dy(1)
,
Computational Differentiation, WS 16/17
256
STCE
Derivation
T
dx(1) T
dy T
y
y
+
x
(1,2)
dx
d
dx T (2)
dy
x(1)
x(2)
y(2)
dx(1) T
=
y(2) + dy(1) x(1,2)
y(1,2)
x(1,2)
d(x y(1) )
dy(1)
|
{z
}
=0
dF (x) T
dx
y(2) + y(1)
dF (x)
dx
d2 F (x)
dx2
x(1,2)
x(1,2)
,
Computational Differentiation, WS 16/17
257
STCE
dx(1)
d2 F (x)
x(1,2) = y(1)
x(1,2)
dx
dx2
,
Computational Differentiation, WS 16/17
258
STCE
dt T
dx
= F (x)T y(2) + y(1) 2 F (x) x(1,2)
x(2)
t
T
y(2)
xT(1,2)
y(1,2)
4: x(1) []
1: y[F ]
dt T
dy(1)
= F (x)T x(1,2)
T
y(1)
Comments:
F
3: F
dx(1)
dF
2 F
0: x
2: y(1)
IRn(1n) , and
dF
dx
x, x(1,2) y(1,2)
IR(1n)n .
,
Computational Differentiation, WS 16/17
259
STCE
,
Computational Differentiation, WS 16/17
260
STCE
Pn1
Live for y = ( i=0 x2i )2 :
I
compare with second-order ToT, ToA and AoT codes and with
second-order finite differences
,
Computational Differentiation, WS 16/17
261
STCE
,
Computational Differentiation, WS 16/17
262
STCE
x in[j]=x[j];
}
f(x,y);
derivative(y)=1.0;
DCO BASE MODE::global tape>register variable(derivative(y));
DCO MODE::global tape>interpret adjoint();
for (size t j=0;j<n;j++)
g[j]=value(derivative(x in[j]));
// repeated interpretation of same tape
for (size t i=0;i<n;i++) {
derivative(derivative(x in[i]))=1;
DCO BASE MODE::global tape>interpret adjoint();
for (size t j=0;j<n;j++)
H[i][j]=derivative(value(x in[j]));
// zero adjoints prior to reinterpretation
DCO BASE MODE::global tape>zero adjoints();
}
yv=passive value(y);
DCO BASE TAPE TYPE::remove(DCO BASE MODE::global tape);
DCO TAPE TYPE::remove(DCO MODE::global tape);
}
,
Computational Differentiation, WS 16/17
263
STCE
,
Computational Differentiation, WS 16/17
264
STCE
,
Computational Differentiation, WS 16/17
265
STCE
template<typename T>
void fgh(const vector<T>& x, T &y, vector<T>& g, vector<T>& h) {
size t n=x.size();
int ii=0;
for (int i=0;i<n;i++) {
vector<T> x pp(x), x mp(x), g pp(n,0), g mp(n,0);
double p=(x mp[i]==0) ? sqrt(DBL EPSILON)
: sqrt(DBL EPSILON)abs(x mp[i]);
x mp[i]=p; fg(x mp,y,g mp);
x pp[i]+=p; fg(x pp,y,g pp);
for (int j=0;j<=i;j++,ii++)
h[ii]=(g pp[j]g mp[j])/(2p);
}
fg(x,y,g);
}
,
Computational Differentiation, WS 16/17
266
STCE
,
Computational Differentiation, WS 16/17
267
References I
STCE
,
Computational Differentiation, WS 16/17
268
References II
STCE
,
Computational Differentiation, WS 16/17
269
References III
STCE
,
Computational Differentiation, WS 16/17
270
References IV
STCE
,
Computational Differentiation, WS 16/17
271