Lecture8 Computational Graph Pytorch TF
Lecture8 Computational Graph Pytorch TF
Tensorflow
DD2424
DD2424 - Lecture 8 1
Outline
• First Part
• Computation Graphs
• TensorFlow
• PyTorch
• Notes
• Second Part
DD2424 - Lecture 8 2
Frameworks
DD2424 - Lecture 8 3
Frameworks
DD2424 - Lecture 8 4
O’Reilly Poll: Most popular framework for machine learning
[ Source: https://www.techrepublic.com/google-amp/article/most-popular-
programming-language-frameworks-and-tools-for-machine-learning/ ]
DD2424 - Lecture 8 5
What are computation graphs?
DD2424 - Lecture 8 6
Computation Graph
• Nodes
• Variables
• Mathematical Operations
var
• Edges
• Feeding input op
var
DD2424 - Lecture 8 7
Computation Graph
•𝑐 = 𝑎+𝑏
𝒄=𝒂+𝒃
DD2424 - Lecture 8 8
Computation Graph
•𝑐 = 𝑎+𝑏∗2
𝒄=𝒂+𝒛
𝒃 𝒛=𝒃∗𝟐
DD2424 - Lecture 8 9
Computation Graph
𝑧 = 𝑾𝑥 a= 𝒛 + 𝒃
DD2424 - Lecture 8 10
Computation Graph
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 )
𝒃𝟏
DD2424 - Lecture 8 11
Computation Graph
𝑾𝟏 𝑾𝟐
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑧2 = 𝑾𝟐 𝒔𝟏 𝒂𝟐 = 𝒛𝟐 + 𝒃𝟐 𝒔𝟐 = 𝝈(𝒂𝟐 )
𝒃𝟏 𝒃𝟐
DD2424 - Lecture 8 12
Python (NumPy)
𝑧 = 𝑾𝑥 a= 𝒛 + 𝒃
𝒃
DD2424 - Lecture 8 13
PyTorch
NumPy
PyTorch
𝑾
𝑧 = 𝑾𝑥 a= 𝒛 + 𝒃
𝒃
DD2424 - Lecture 8 14
PyTorch
NumPy
PyTorch
Not always!
DD2424 - Lecture 8 15
PyTorch-NumPy
DD2424 - Lecture 8 16
PyTorch-NumPy
Shared Memory
DD2424 - Lecture 8 17
PyTorch-NumPy
DD2424 - Lecture 8 18
“Define by Run” Computation Graphs
DD2424 - Lecture 8 19
“Define and Run” Computation Graphs
𝑾𝟏
• Run G with 𝑥1 , 𝑊1 , 𝑏1
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 )
𝒙 • Run G with 𝑥2 , 𝑊2 , 𝑏2
𝒃𝟏
• …
DD2424 - Lecture 8 20
Run graph
Define graph
many times
DD2424 - Lecture 8
21
TensorFlow
Data loop
DD2424 - Lecture 8 22
Why computation graphs at all?!
DD2424 - Lecture 8 23
Why computation graphs?
DD2424 - Lecture 8 24
Why computation graphs?
• Is it feasible?
DD2424 - Lecture 8 25
Why computation graphs?
DD2424 - Lecture 8 26
Let’s look at examples in PyTorch and TensorFlow
DD2424 - Lecture 8 27
Computation Graph
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 )
𝒃𝟏
DD2424 - Lecture 8 28
Computation Graph
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
𝒃𝟏 𝒚
DD2424 - Lecture 8 29
Backprop in Computation Graph
• Learnable parameters
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
𝒃𝟏 𝒚
DD2424 - Lecture 8 30
Backprop in Computation Graph
𝜕𝑙
𝜕𝑊1
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
𝒃𝟏 𝒚
𝜕𝑙
𝜕𝑏1
DD2424 - Lecture 8 31
Backprop in Computation Graph
𝜕𝑙
𝜕𝑧1
𝜕𝑊1
𝑾𝟏 𝜕𝑊1 𝜕𝑎1 𝜕𝑠1 𝜕𝑙
𝜕𝑧1 𝜕𝑎1 𝜕𝑠1
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
𝒙
𝜕𝑎1
𝒃𝟏 𝒚
𝜕𝑏1
𝜕𝑙
𝜕𝑏1
DD2424 - Lecture 8 32
Backprop in Computation Graph
𝒙
𝜕𝑎1
𝒃𝟏 𝒚
𝜕𝑏1
𝜕𝑙
𝜕𝑏1
DD2424 - Lecture 8 33
Backprop in Computation Graph
• Addition Node
• Forward pass: 𝑎 = 𝑏 + 𝑐
𝜕𝑎 𝜕𝑎
• Backward pass: 𝜕𝑏 = 1 and 𝜕𝑐
=1
DD2424 - Lecture 8 34
Backprop in Computation Graph
• Max Node
• Forward pass: 𝑎 = max 𝑏, 𝑐
• Backward pass:
• If b < c
𝜕𝑎 𝜕𝑎
• 𝜕𝑏
= 0 and
𝜕𝑐
=1
max
• If b > c
𝜕𝑎 𝜕𝑎
• 𝜕𝑏
= 1 and
𝜕𝑐
=0
DD2424 - Lecture 8 35
Variables and Ops
• Ops
• Intermediate or final nodes
• Variables
• intrinsic parameters of the model
• input to the model
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
𝒃𝟏 𝒚
DD2424 - Lecture 8 36
Variables and Ops
• Ops
• Intermediate or final nodes
• Variables
• intrinsic parameters of the model
• input to the model
𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
𝒃𝟏 𝒚
DD2424 - Lecture 8 37
Variables and Ops
• Variables
• Intrinsic parameters of the model
• Input to the model
• TensorFlow
• Variables
• Place Holders 𝑾𝟏
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2
• PyTorch 𝒙
• Variables
𝒃𝟏 𝒚
DD2424 - Lecture 8 38
Variable
PyTorch Autograd
• package: torch.autograd
Data
Tensor
Gradient
w.r.t.
this variable
Function
that created
this variable DD2424 - Lecture 8 39
Pytorch Autograd
DD2424 - Lecture 8 40
Pytorch Autograd
DD2424 - Lecture 8 41
Pytorch Autograd
DD2424 - Lecture 8 42
Pytorch Autograd
DD2424 - Lecture 8 43
PyTorch Autograd
• var.backward()
DD2424 - Lecture 8 44
TensorFlow gradients
• And evaluate it
DD2424 - Lecture 8 45
TensorFlow gradients
DD2424 - Lecture 8 46
TensorFlow gradient
DD2424 - Lecture 8 47
How to use GPU?
DD2424 - Lecture 8 48
PyTorch GPU
• var = var.cuda(#)
DD2424 - Lecture 8 49
PyTorch GPU
• var = var.cpu()
DD2424 - Lecture 8 50
TensorFlow GPU
• tf.device(/gpu:0)
• tf.device(/gpu:1)
•…
• tf.device(/cpu:0)
DD2424 - Lecture 8 51
TensorFlow GPU
DD2424 - Lecture 8 52
TensorFlow GPU
tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))
DD2424 - Lecture 8 53
How to implement complicated models in practice?
DD2424 - Lecture 8 54
PT High-Level Library
DD2424 - Lecture 8 55
TF High-Level Libraries
DD2424 - Lecture 8 56
Data, storage, and loading
!!!Important!!!
DD2424 - Lecture 8 57
Use Visualization
• Always monitor the loss function on the training and validation sets visually
• Monitor all other important scalars, such as learning rate, regularization loss,
layer activations summary, how full your data queues are, and …
• If you work with images, time to time visualize samples from the batch, if you do
data augmentation, visualize the original sample as well as the augmented one
• TensorBoard for TF
• TensorBoardX, matplotlib, seaborn, … for PT
DD2424 - Lecture 8 58
Use Visualization
DD2424 - Lecture 8 59
Which one is better? PyTorch or TensorFlow?
DD2424 - Lecture 8 60
pros and cons
DD2424 - Lecture 8 61
TensorFlow Eager execution
• Eager Execution
• Dynamic!
• tf.enable_eager_execuation()
• https://www.tensorflow.org/guide/eager
DD2424 - Lecture 8 62
Caffe(2)
DD2424 - Lecture 8 63
Summary
• Don’t take the following statements too seriously! -- it depends on many factors
• If you want to use pretrained classic deep networks (AlexNet, VGG, ResNet, …) for feature extraction and/or fine-
tuning → Use Caffe and/or Caffe2
• If you have a mobile application in mind → Use Caffe/Caffe2 or TensorFlow
• If you want more pythonic → use PyTorch
• If you are familiar with Matlab and don’t need much flexibility or advanced layers → use MatConvNet
• If you don’t want so much of flexibility and still use python → use Keras
• If you are working on NLP applications or complicated RNNs → use PyTorch
• If you want large community help, sustainable learning of a framework → use TensorFlow
• If you want to work on bleeding-edge papers → See what framework has the original and/or cleanest
implementation (most likely TensorFlow)
• If you want to prototype many different novel setups → Use PyTorch or TF Eager
DD2424 - Lecture 8 64