0% found this document useful (0 votes)

29 views12 pages

Autodiff

Uploaded by

Abhishek Shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views12 pages

Autodiff

Uploaded by

Abhishek Shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Automatic Reverse-Mode Differentiation:

Lecture Notes
William W. Cohen
October 17, 2016

1 Background: Automatic Differentiation

1.1 Why automatic differentiation is important
Most neural network packages (e.g., Torch or Theano) don’t require a user to
actually derive the gradient updates for a neural model. Instead they allow
the user to define a model in a “little language”, which supports common
neural-network operations like matrix multiplication, “soft max”, etc, and
automatically derive the gradients. Typically, the user will define a loss
function L in this sublanguage: the loss function takes as inputs the training
data, current parameter values, and hyperparameters, and outputs a scalar
value. From the definition of L, the system will compute the partial derivative
of the loss function with respect to every parameter, using these gradients,
it’s straightforward bto implement gradient-based learning methods.
Going from the definition of L to its partial derivatives is called automatic
differentiation. In this assignment you will start with a simple automatic
differentiation system written in Python, and use it to implement a neural-
network package.

1.2 Wengert lists

Automatic differentiation proceeds in two stages. First, function definitions
f (x1 , . . . , xk ) in the sublanguage are converted to a format called a Wengert
list. The second stage is to evaluate the function and its gradients using the
Wengert list.

1
A Wengert list defines a function f (x1 , . . . , xk ) of some inputs x1 , . . . , xk .
Syntactically, it is just a list of assignment statements, where the right-hand-
side (RHS) of each statement is very simple: a call to a function g (where
g is one of a set of primitive functions G = {g1 , . . . , g` } supported by the
sublanguage), where the arguments to g either inputs x1 , . . . , xk , or the left-
hand-side (LHS) of a previous assignment. (The functions in G will be called
operators here.) The output of the function is the LHS of the last item in
the list. For example, a Wengert list for
f (x1 , x2 ) ≡ (2x1 + x2 )2 (1)
with the functions G = {add, multiply, square} might be
z1 = add(x1 , x1 )
z2 = add(z1 , x2 )
f = square(z2 )
A Wengert list for
f (x) ≡ x3
might be
z1 = multiply(x, x)
f = multiply(z1 , x)
The set of functions G defines the sublanguage. It’s convenient if they
have a small fixed number of arguments, and are differentiable with respect
to all their arguments. But there’s no reason that they have to be scalar
functions! For instance, if G contains the appropriate matrix and vector
operations, the loss for logistic regression (for example x with label y and
weight matrix W ) could be written as
f (x, y, W ) ≡ crossEntropy(softmax(x · W ), y) + frobeniusNorm(W )
and be compiled to the Wengert list
z1 = dot(x, W )
z2 = softmax(z1 )
z3 = crossEntropy(z2 , y)
z4 = frobeniusNorm(W )
f = add(z3 , z4 )

2
This implements k-class logistic regression if x is a d-dimensional row vector,
dot is matrix product, W is a d × k weight matrix, and
ea1 ead
softmax(ha1 , . . . , ad i) ≡ h Pd a
, . . . , Pd a
i
i=1 ei i=1 ei
d
X
crossEntropy(ha1 , . . . , ad i, hb1 , . . . , bd i) ≡ − ai log bi
i=1
v
u
uXd X
k
frobeniusNorm(A) ≡ a2i,j
u
t
i=1 j=1

1.3 Backpropagation through Wengert list

We’ll first discuss how to use a Wengert list, and then below, discuss how to
construct one.
Given a Wengert list for f , it’s obvious how to evaluate f : just step
through the assignment statements in order, computing each value as needed.
To be perfectly clear about this, the procedure is as follows. We will encode
each assignment in Python as a nested tuple

(z, g, (y1 , . . . , yk ))

where z is a string that names the LHS variable, g is a string that names the
operator, and the yi ’s are strings that name the arguments to g. So the list
for the function of Equation 1 would be encoded in python as
[ ("z1", "add", ("x1","x1")),
("z2", "add", ("z1","x2")),
("f", "square", ("z2")) ]
We also store functions for each operator in a Python dictionary G:
G = { "add" : lambda a,b: a+b,
"square": lambda a:a*a }
Before we evaluate the function, we will store the parameters in a dictionary
val: e.g., to evalute f and x1 = 3, x2 = 7 we will initialize val to
val = { "x1" : 3, "x2" : 7 }
The pseudo-code to evaluate f is:

3
def eval(f )
initialize val to the inputs at which f should be evaluated
for (z, g, (y1 , . . . , yk )) in the list:
op = G[g]
val[z] = op(val[y1 ],...,val[yk ])
return the last entry stored in val.

Some Python hints: (1) to convert (y1 , . . . , yk ) to (val[y1 ],...,val[yk ])

you might use Python’s map function. (2) If args is a length-2 Python list
and g is a function that takes two arguments (like G[‘‘add’’] above) then
g(*args) will call g with the elements of that list as the two arguments to
g.
To differentiate, we will use a generalization of backpropagation (back-
prop). We’ll assume that eval has already been run and val has been popu-
lated, and we will compute, in reverse order, a value delta(zi ) for each vari-
able zi that appears in the list. We initialize delta by setting delta(f ) = 1,
(where f is the string that names the function output).
Informally you can think of delta(zi ) as the “sensitivity” of f to the
variable zi , , at the point we’re evaluating f (i.e., the point a that corresponds
to the initial dictionary entries we stored in val). Here zi can be intermediate
variable or an input. If it’s an input x that we’re treating as a parameter,
delta is the gradient of the cost function, evaluated at a: i.e., delta(x) =
df
dx
(a).
To compute these sensitivities we need to “backpropagate” through the
list: when we encounter the assignment (z, g, (y1 , . . . , yk )) we will use delta(z)
and the derivatives of g with respect to its inputs to compute the sensitivities
of the y’s. We will store derivatives for each operator in a Python dictionary
DG.
Note that if g has k inputs, then we need k partial derivatives, one for
each input, so the entries in DG are lists of functions. For the functions used
in this example, we’ll need these entries in DG.
DG = { "add" : [ (lambda a,b: 1), (lambda a,b: 1) ],
"square": [ lambda a:2*a ] }
d
To figure out these functions we used some high school calculus rules: dx (x +
d d 2
y) = 1, dy (x + y) = 1 and dx (x ) = 2x.
Finally, the pseudo-code to compute the deltas is below. Note that we
don’t just store values in delta: we accumulate them additively.

4
def backprop(f ,val)
initialize delta: delta[f ] = 1
for (z,g,(y1 , . . . , yk )) in the list, in reverse order:
for i = 1, . . . , k:
opi = DG[g][i]
if delta[yi ] is not defined set delta[yi ] = 0
delta[yi ] = delta[yi ] + delta[z] * opi (val[y1 ],...,val[yk ])

1.4 Examples
Let’s look at Equation 1. In freshman calculus you’d just probably just do
this:

f (x1 , x2 ) = (2x1 + x2 )2 = 4x21 + 4x1 x2 + x22

df
= 8x1 + 4x2
dx1
df
= 4x1 + 2x2
dx2

Here we’ll use the Wengert list, in reverse, and the chain rule. This list is

z1 = add(x1 , x1 )
z2 = add(z1 , x2 )
f = square(z2 )
df
Table 1 contains a detailed derivation of dx 1
, where in each step we either
plug in a definition of a variable in the list, or use the derivative of one of
the operators (square or add). Table 2 contains an analogous derivation of
df
dx1
. Notice that these derivations are nearly identical. In fact, they are very
analogous to the computations carried out by the backprop algorithm: can
you see how?
Finally Table 3 shows a slightly less detailed derivation of the second
sample function, f = x3 . It is instructive to step through the backprop
algorithm for these functions as well: for example the list z1 = x · x; f = z1 · x
leads to the delta updates.

5
Derivation Step Reason
df dz 2 dz2
dx1
= dz22 · dx1
f = z22

df dz2 d(a2 )
dx1
= 2z2 · dx1 da
= 2a

df d(z1 +x2 )
dx1
= 2z2 · dx1
z2 = z1 + x2

df dz1 dx2 d(a+b) d(a+b)
dx1
= 2z2 · 1 · dx1
+1· dx1 da
= db
=1

df d(x1 +x1 ) dx2
dx1
= 2z2 · 1 · dx1
+1· dx1
z1 = x1 + x1

df dx1 dx1 dx2 d(a+b) d(a+b)
dx1
= 2z2 · 1 · 1 · dx1
+1· dx1
+1· dx1 da
= db
=1

df da da
dx1
= 2z2 · (1 · (1 · 1 + 1 · 1) + 1 · 0) da
= 1 and db
= 0 for inputs a, b

df
dx1
= 2z2 · 2 = 8x1 + 4x2 simplify
df
Table 1: A detailed derivation of dx1
for f = z22 ; z2 = z1 + x2 ; z1 = x1 + x1

6
Derivation Step Reason
df dz 2 dz2
dx2
= dz22 · dx2
f = z22

df dz2 d(a2 )
dx2
= 2z2 · dx2 da
= 2a

df d(z1 +x2 )
dx2
= 2z2 · dx2
z2 = z1 + x2

df dz1 dx2 d(a+b) d(a+b)
dx2
= 2z2 · 1 · dx2
+1· dx2 da
= db
=1

df d(x1 +x1 ) dx2
dx2
= 2z2 · 1 · dx2
+1· dx2
z1 = x1 + x1

df dx2 dx1 dx2 d(a+b) d(a+b)
dx2
= 2z2 · 1 · 1 · dx2
+1· dx2
+1· dx2 da
= db
=1

df da da
dx1
= 2z2 · (1 · (1 · 1 + 1 · 0) + 1 · 1) da
= 1 and db
= 0 for inputs a, b

df
dx1
= 2z2 · 2 = 4x1 + 2x2 simplify
df
Table 2: A detailed derivation of dx2
for f = z22 ; z2 = z1 + x2 ; z1 = x1 + x1

Derivation Step Reason

df d(ab)
dx
= dz
dx
1
x + z1 dx
dx
f = z1 x and dx
= da
dx
b db
+ a dx

df dx d(ab)
dx1
= dx
x + x dx
dx
· x + z1 dx
dx
z1 = x · x and dx
= da
dx
b db
+ a dx

df
dx1
= (x + x) · x + x2 = 3x2 z1 = x · x and simplify
df
Table 3: A derivation of dx
for f = z1 x; z1 = xx

7
delta[f ] =1
delta[z1 ] + = delta[f ] · x = x arg 1 of f =mul(...)
delta[x] + = delta[f ] · z1 = x2 arg 2 of f =mul(...)
delta[x] + = delta[z1 ] · x = x2 arg 1 of z1 =mul(...)
delta[x] + = delta[z1 ] · x = x2 arg 1 of z1 =mul(...)

1.5 Discussion
What’s going on here? Let’s simplify for a minute and assume that the list
is of the form

z1 = f1 (z0 )
z2 = f2 (z1 )
...
zm = fm (zm−1 )

so f = zm = fm (fm−1 (. . . f1 (z0 ) . . .)). We’ll assume we can compute the fi

functions and their derivations fi0 . We know that one way to find dz m
dz0
would
be to repeatedly use the chain rule:
dzm dzm dzm−1
=
dz0 dzm−1 dz0
dzm dzm−1 dzm−2
=
dzm−1 dzm−2 dz0
...
dzm dzm−1 dz1
= ...
dzm−1 dzm−2 dz0
Let’s take some time to unpack what this means. When we do derivations
by hand, we are working symbolically: we are constructing a symbolic rep-
resentation of the derivative function. This is an interesting problem—it’s
called symbolic differentiation—but it’s not the same task as automatic dif-
ferentiation. In automatic differentiation, we want instead an algorithm for
evaluating the derivative function.
dzi
To simplify notation, let hi,j be the function dzj
. (I’m doing this so that I
dzi dzi
can use hi,j (a) to denote the result of evaluating the function dz j
at a: dzj
(a)

8
is hard to read.) Notice that there are m2 of these hi,j functions—quite a
few!—but for machine learing applications we won’t care about most of them:
typically we just care about the partial derivative of the cost function (the
final variable in the list) with respect to the parameters, so we only need hm,i
for certain i’s.
Let’s look at evaluating hm,0 as some point z0 = a (say z0 = 53.4). Again
to simplify, define

a1 = f1 (a)
a2 = f2 (f1 (a))
...
am = fm (fm−1 (fm−2 (. . . f1 (a) . . .)))

When we write
dzm dzm dzm−1
=
dz0 dzm−1 dz0
We mean that: for all a
0
hm,0 (a) = fm (am ) · hm−1,1 (a)

That’s a useful step because we have assumed we have available a routine

0
to evaluate fm (am ): in the code this would be the function DG[fm ][1].
Continuing, when we write
dzm dzm dzm−1 dz1
= ...
dz0 dzm−1 dzm−2 dz0
it means that
0 0
hm,0 (a) = fm (am ) · fm−1 (am−1 ) · · · f20 (a1 ) · f10 (a)

When we execute the backprop code above, this is what we do: in particular
we group the operations as

0 0 0
hm,0 (a) = (fm (am ) · fm−1 (am−1 ) · fm−2 (am−2 ) · · · f20 (a1 )) · f10 (a)

and the delta’s are partial products: specifically

0
delta[zi ] = fm (am ) · · · fi0 (ai )

9
1.6 Constructing Wengert lists
Wengert lists are useful but tedious to program in. Usually they are con-
structed using some sort programming language extension. You will be pro-
vided a package, xman.py, to construct Wengert lists from Python expres-
sions: xman is short for “expression manager”. Here is an example of using
xman:

from xman import *

...
class f(XManFunctions):
@staticmethod
def half(a):
...
class Triangle(XMan):
h = f.input()
w = f.input()
area = f.half(h*w)
...
xm = Triangle().setup()
print xm.operationSequence(xm.area)

In the definition of Triangle, the variables h, w, and area are called registers.
Note that after creating an instance of a subclass of xman.XMan, you need to
call setup(), which returns the newly-created instance. After the setup you
can call the operationSequence method to construct a Wengert list, which
will be encoded in Python as

[(’z1’, ’mul’, [’h’, ’w’]),

(’area’, ’half’, [’z1’])]

Internally this works as follows. There are two types of Python objects,
called Registers and Operations. A Register corresponds to a variable, and
an Operation corresponds to a function call.
The base XManFunctions class defines a method input() which creates a
register object that is marked as an “input”, meaning that it has no definition.
(A similar method param() creates a register object that is marked as a
“parameter”, which like an input has no definition.) It also defines a few
functions that correspond to operators, like mul and add. These return a

10
class XManFunctions(object):
@staticmethod
def input(default=None):
return Register(role=’input’,default=default)
...
@staticmethod
def mul(a,b):
return XManFunctions.registerDefinedByOperator(’mul’,a,b)
...
@staticmethod
def registerDefinedByOperator(fun,*args):
reg = Register(role=’operationOutput’)
op = Operation(fun,*args)
reg.definedAs = op
op.outputReg = reg
return reg

Figure 1: The Python data structures created by the Triangle class. Blue
objects are Registers, and green ones are Operations. Arrows are pointers.

11
register object that is cross-linked to an Operation object, as illustrated in
Figure 1.6. (The Register class also uses Python’s operator overloading to
so that syntax like h*w is expanded to XManFunctions.mul(h,w).)
To produce the Wengert list, the setup() command uses Python in-
trospection methods to add names to each register, based on the Python
variable that points to it, and generates new variable names for any reach-
able registers that cannot be named with Python variables. Finally, the
operationSequence does a pre-order traversal of the data structure to cre-
ate a Wengert list.

PML Course Iranpiping
100% (1)
PML Course Iranpiping
150 pages
Deeplearning2017 Johnson Automatic Differentiation 01
No ratings yet
Deeplearning2017 Johnson Automatic Differentiation 01
142 pages
Optimization: Calculating Derivatives Panos Patrinos STADIUS, Department of Electrical Engineering, KU Leuven
No ratings yet
Optimization: Calculating Derivatives Panos Patrinos STADIUS, Department of Electrical Engineering, KU Leuven
21 pages
Tut 01
No ratings yet
Tut 01
39 pages
Week 1 Solutions
No ratings yet
Week 1 Solutions
8 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
Differentiable Programming and Design Optimization
No ratings yet
Differentiable Programming and Design Optimization
72 pages
Matrix Calculus Short
No ratings yet
Matrix Calculus Short
5 pages
Backprop Unit 2
No ratings yet
Backprop Unit 2
5 pages
Automatic Differentiation (1) : Slides Prepared By: Atılım Güneş Baydin Gunes@robots - Ox.ac - Uk
No ratings yet
Automatic Differentiation (1) : Slides Prepared By: Atılım Güneş Baydin Gunes@robots - Ox.ac - Uk
114 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
A Step-By-step Introduction To The Implementation of Automatic Differentiation
No ratings yet
A Step-By-step Introduction To The Implementation of Automatic Differentiation
17 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Lecture NM 1 Numerical Differentiation Integration
No ratings yet
Lecture NM 1 Numerical Differentiation Integration
57 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
Fast Ai Class Notes
No ratings yet
Fast Ai Class Notes
48 pages
Opt Sem3
No ratings yet
Opt Sem3
50 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
Automatic Differentiation and Neural Networks
No ratings yet
Automatic Differentiation and Neural Networks
13 pages
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
No ratings yet
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
1 page
AutomaticDifferentiation AppliedMaths
No ratings yet
AutomaticDifferentiation AppliedMaths
228 pages
Euler Method
No ratings yet
Euler Method
9 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
FAI 4 Mathematical Concepts II
No ratings yet
FAI 4 Mathematical Concepts II
39 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
DL03 Classroom SNN
No ratings yet
DL03 Classroom SNN
41 pages
First
No ratings yet
First
92 pages
ML Ctanujit
No ratings yet
ML Ctanujit
56 pages
Ex4 Tutorial - Forward and Back-Propagation
No ratings yet
Ex4 Tutorial - Forward and Back-Propagation
20 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
01 Introduction To Feedforward Neural Networks (Hugo)
No ratings yet
01 Introduction To Feedforward Neural Networks (Hugo)
78 pages
Deep Learning Assignment2 Solutions PDF
No ratings yet
Deep Learning Assignment2 Solutions PDF
16 pages
Learning 3
No ratings yet
Learning 3
98 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
hw07 Neural Soln PDF
No ratings yet
hw07 Neural Soln PDF
6 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Introduction To Differentiable Physics - Physics-Based Deep Learning
No ratings yet
Introduction To Differentiable Physics - Physics-Based Deep Learning
8 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Neural Networks Skimmed - Ipynb - Colab
No ratings yet
Neural Networks Skimmed - Ipynb - Colab
8 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
DL Notes B Div
No ratings yet
DL Notes B Div
13 pages
Demystifying Deep Learning
No ratings yet
Demystifying Deep Learning
68 pages
Matrix Calculus
No ratings yet
Matrix Calculus
33 pages
Chapter 5 - Vector Calculus File
No ratings yet
Chapter 5 - Vector Calculus File
41 pages
1 Euler's Method With Python
No ratings yet
1 Euler's Method With Python
5 pages
Deep Learning Basics Lecture 2 Backpropagation
No ratings yet
Deep Learning Basics Lecture 2 Backpropagation
31 pages
Predictor Corrector Method
No ratings yet
Predictor Corrector Method
5 pages
Linear Algebra Assignment Solution
100% (1)
Linear Algebra Assignment Solution
28 pages
1 Euler's Method With Python
100% (1)
1 Euler's Method With Python
5 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
Research Paper
No ratings yet
Research Paper
10 pages
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
Lecture 2
No ratings yet
Lecture 2
80 pages
04 Numerical
No ratings yet
04 Numerical
46 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
OOPS Unit 1
No ratings yet
OOPS Unit 1
84 pages
Programming in C (Common To Agri, Civil, Mech, Eee and Eie)
No ratings yet
Programming in C (Common To Agri, Civil, Mech, Eee and Eie)
26 pages
Defining Own Classes
No ratings yet
Defining Own Classes
14 pages
Python - Chapter 04 Slides - Tagged
No ratings yet
Python - Chapter 04 Slides - Tagged
53 pages
WSC-2009 A Quality of Service-Oriented Web Services Challenge
No ratings yet
WSC-2009 A Quality of Service-Oriented Web Services Challenge
4 pages
User-Defined Methods: Class 10 - Logix Kips Icse Computer Applications With Bluej Multiple Choice Questions
100% (1)
User-Defined Methods: Class 10 - Logix Kips Icse Computer Applications With Bluej Multiple Choice Questions
10 pages
Unit - 5: Functions and Pointers
No ratings yet
Unit - 5: Functions and Pointers
238 pages
TheMinitestCookbook Sample
No ratings yet
TheMinitestCookbook Sample
28 pages
DAT500 Manual 0.2
No ratings yet
DAT500 Manual 0.2
41 pages
1eg S4CLD2302 BPD en Us
No ratings yet
1eg S4CLD2302 BPD en Us
51 pages
Mastering jBPM6 - Sample Chapter
No ratings yet
Mastering jBPM6 - Sample Chapter
52 pages
Linux Bash Beginners - Guide 50
No ratings yet
Linux Bash Beginners - Guide 50
2 pages
Coding Arena: Welcome Agatha Dominique Bacani Home Compile & Run Submissions Graphs Feedback
No ratings yet
Coding Arena: Welcome Agatha Dominique Bacani Home Compile & Run Submissions Graphs Feedback
3 pages
Spring Cert Web
No ratings yet
Spring Cert Web
14 pages
Class: III EEE Subject Code: CS2311 Subject: Object Oriented Programming
No ratings yet
Class: III EEE Subject Code: CS2311 Subject: Object Oriented Programming
35 pages
07 Advanced RTF Template Techniques
No ratings yet
07 Advanced RTF Template Techniques
52 pages
Lab 5-Nguyen The Huu-SE1841
No ratings yet
Lab 5-Nguyen The Huu-SE1841
4 pages
Silk Performer
100% (1)
Silk Performer
33 pages
Plsqloracle Academy
No ratings yet
Plsqloracle Academy
1,404 pages
An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples
No ratings yet
An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples
14 pages
Teach Yourself Cobol in 21 Days (2nd Ed)
No ratings yet
Teach Yourself Cobol in 21 Days (2nd Ed)
590 pages
AutoLISP Programming Techniques
No ratings yet
AutoLISP Programming Techniques
63 pages
Net Int Quet PDF
No ratings yet
Net Int Quet PDF
67 pages
QB For Final Revision - CS - XII - 2020 Part-2
No ratings yet
QB For Final Revision - CS - XII - 2020 Part-2
60 pages
Python Cheat Sheet PDF
100% (4)
Python Cheat Sheet PDF
26 pages
t24 Frequently Used Core Routinesdoc
100% (2)
t24 Frequently Used Core Routinesdoc
6 pages
Cracking Delphi Programs
No ratings yet
Cracking Delphi Programs
11 pages
NRA24 Millimeter Wave Radar User Manual
No ratings yet
NRA24 Millimeter Wave Radar User Manual
15 pages
Setting Calculation P443
75% (4)
Setting Calculation P443
27 pages

Autodiff

Uploaded by

Autodiff

Uploaded by

Automatic Reverse-Mode Differentiation:

1 Background: Automatic Differentiation

1.2 Wengert lists

1.3 Backpropagation through Wengert list

Some Python hints: (1) to convert (y1 , . . . , yk ) to (val[y1 ],...,val[yk ])

f (x1 , x2 ) = (2x1 + x2 )2 = 4x21 + 4x1 x2 + x22

Derivation Step Reason

so f = zm = fm (fm−1 (. . . f1 (z0 ) . . .)). We’ll assume we can compute the fi

That’s a useful step because we have assumed we have available a routine

and the delta’s are partial products: specifically

from xman import *

[(’z1’, ’mul’, [’h’, ’w’]),

You might also like