0% found this document useful (0 votes)
60 views

Lecture8 Computational Graph Pytorch TF

The document discusses computation graphs and deep learning frameworks like PyTorch and TensorFlow. It defines a computation graph as a directed acyclic graph with nodes for variables and operations. Computation graphs allow automatic calculation of gradients using backpropagation. Frameworks like PyTorch and TensorFlow implement computation graphs with dynamic "define by run" graphs or static "define and run" graphs. This allows automatic calculation of gradients for training neural networks.

Uploaded by

kpratik41
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Lecture8 Computational Graph Pytorch TF

The document discusses computation graphs and deep learning frameworks like PyTorch and TensorFlow. It defines a computation graph as a directed acyclic graph with nodes for variables and operations. Computation graphs allow automatic calculation of gradients using backpropagation. Frameworks like PyTorch and TensorFlow implement computation graphs with dynamic "define by run" graphs or static "define and run" graphs. This allows automatic calculation of gradients for training neural networks.

Uploaded by

kpratik41
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Lecture 6 – Computational Graphs; PyTorch and

Tensorflow

DD2424

April 11, 2019

DD2424 - Lecture 8 1
Outline

• First Part
• Computation Graphs
• TensorFlow
• PyTorch
• Notes

• Second Part

DD2424 - Lecture 8 2
Frameworks

DD2424 - Lecture 8 3
Frameworks

DD2424 - Lecture 8 4
O’Reilly Poll: Most popular framework for machine learning

[ Source: https://www.techrepublic.com/google-amp/article/most-popular-
programming-language-frameworks-and-tools-for-machine-learning/ ]

DD2424 - Lecture 8 5
What are computation graphs?

DD2424 - Lecture 8 6
Computation Graph

• DAG (directed acyclic graph)

• Nodes
• Variables
• Mathematical Operations
var
• Edges
• Feeding input op

var

DD2424 - Lecture 8 7
Computation Graph

•𝑐 = 𝑎+𝑏

𝒄=𝒂+𝒃

DD2424 - Lecture 8 8
Computation Graph

•𝑐 = 𝑎+𝑏∗2

𝒄=𝒂+𝒛

𝒃 𝒛=𝒃∗𝟐

DD2424 - Lecture 8 9
Computation Graph

• Tensors: Multi-dimensional arrays


• 𝒂 = 𝑊𝒙 + 𝒃

𝑧 = 𝑾𝑥 a= 𝒛 + 𝒃

DD2424 - Lecture 8 10
Computation Graph

• A feed-forward neural network

𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 )

𝒃𝟏

DD2424 - Lecture 8 11
Computation Graph

• A multi-layer feed-forward neural network

𝑾𝟏 𝑾𝟐

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑧2 = 𝑾𝟐 𝒔𝟏 𝒂𝟐 = 𝒛𝟐 + 𝒃𝟐 𝒔𝟐 = 𝝈(𝒂𝟐 )

𝒃𝟏 𝒃𝟐

DD2424 - Lecture 8 12
Python (NumPy)

𝑧 = 𝑾𝑥 a= 𝒛 + 𝒃

𝒃
DD2424 - Lecture 8 13
PyTorch

NumPy

PyTorch
𝑾

𝑧 = 𝑾𝑥 a= 𝒛 + 𝒃

𝒃
DD2424 - Lecture 8 14
PyTorch

NumPy

PyTorch

Not always!

DD2424 - Lecture 8 15
PyTorch-NumPy

• Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

DD2424 - Lecture 8 16
PyTorch-NumPy

• Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

Shared Memory

DD2424 - Lecture 8 17
PyTorch-NumPy

• Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

DD2424 - Lecture 8 18
“Define by Run” Computation Graphs

This kind of computation graph is called “define by run“

Also referred to as “dynamic”

DD2424 - Lecture 8 19
“Define and Run” Computation Graphs

• First define the graph structure


• Then run it by feeding in the (input) variables.
Define graph G Run the graph G

𝑾𝟏
• Run G with 𝑥1 , 𝑊1 , 𝑏1
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 )

𝒙 • Run G with 𝑥2 , 𝑊2 , 𝑏2
𝒃𝟏

• …

Also known as “static graphs”

DD2424 - Lecture 8 20
Run graph
Define graph
many times

DD2424 - Lecture 8
21
TensorFlow
Data loop

• Dynamic Graph • Static Graph

DD2424 - Lecture 8 22
Why computation graphs at all?!

DD2424 - Lecture 8 23
Why computation graphs?

• In lecture 3, you’ve learnt how to do backprop using the chain rule

DD2424 - Lecture 8 24
Why computation graphs?

• Is it feasible?

DD2424 - Lecture 8 25
Why computation graphs?

• Automatic chain rule


• automatic back-prop using implemented operations
• Each operation has their gradient already implemented
• If you want to use a novel operation, then you have to provide it’s gradient w.r.t. inputs
and its learnable parameters (if any)

DD2424 - Lecture 8 26
Let’s look at examples in PyTorch and TensorFlow

DD2424 - Lecture 8 27
Computation Graph

• A feed-forward neural network

𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 )

𝒃𝟏

DD2424 - Lecture 8 28
Computation Graph

• A feed-forward neural network with squared 𝐿2 loss

𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒃𝟏 𝒚

DD2424 - Lecture 8 29
Backprop in Computation Graph

• Learnable parameters

𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒃𝟏 𝒚

DD2424 - Lecture 8 30
Backprop in Computation Graph

𝜕𝑙
𝜕𝑊1
𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒃𝟏 𝒚

𝜕𝑙
𝜕𝑏1

DD2424 - Lecture 8 31
Backprop in Computation Graph

𝜕𝑙
𝜕𝑧1
𝜕𝑊1
𝑾𝟏 𝜕𝑊1 𝜕𝑎1 𝜕𝑠1 𝜕𝑙
𝜕𝑧1 𝜕𝑎1 𝜕𝑠1
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒙
𝜕𝑎1
𝒃𝟏 𝒚
𝜕𝑏1
𝜕𝑙
𝜕𝑏1

DD2424 - Lecture 8 32
Backprop in Computation Graph

A deep learning framework provides an automatic gradient calculation


of its output variables w.r.t. its input variables
𝜕𝑙
𝜕𝑧1
𝜕𝑊1
𝑾𝟏 𝜕𝑊1 𝜕𝑎1 𝜕𝑠1 𝜕𝑙
𝜕𝑧1 𝜕𝑎1 𝜕𝑠1
𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒙
𝜕𝑎1
𝒃𝟏 𝒚
𝜕𝑏1
𝜕𝑙
𝜕𝑏1

DD2424 - Lecture 8 33
Backprop in Computation Graph

• Addition Node
• Forward pass: 𝑎 = 𝑏 + 𝑐
𝜕𝑎 𝜕𝑎
• Backward pass: 𝜕𝑏 = 1 and 𝜕𝑐
=1

DD2424 - Lecture 8 34
Backprop in Computation Graph

• Max Node
• Forward pass: 𝑎 = max 𝑏, 𝑐

• Backward pass:
• If b < c
𝜕𝑎 𝜕𝑎
• 𝜕𝑏
= 0 and
𝜕𝑐
=1
max
• If b > c
𝜕𝑎 𝜕𝑎
• 𝜕𝑏
= 1 and
𝜕𝑐
=0

DD2424 - Lecture 8 35
Variables and Ops

• Ops
• Intermediate or final nodes

• Variables
• intrinsic parameters of the model
• input to the model
𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒃𝟏 𝒚

DD2424 - Lecture 8 36
Variables and Ops

• Ops
• Intermediate or final nodes

• Variables
• intrinsic parameters of the model
• input to the model
𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

𝒃𝟏 𝒚

DD2424 - Lecture 8 37
Variables and Ops

• Variables
• Intrinsic parameters of the model
• Input to the model

• TensorFlow
• Variables
• Place Holders 𝑾𝟏

𝑧1 = 𝑾𝟏 𝑥 𝒂𝟏 = 𝒛𝟏 + 𝒃𝟏 𝒔𝟏 = 𝝈(𝒂𝟏 ) 𝑙 = |𝑠1 − 𝑦 |2

• PyTorch 𝒙
• Variables
𝒃𝟏 𝒚

DD2424 - Lecture 8 38
Variable

PyTorch Autograd

• package: torch.autograd

Data
Tensor

Gradient
w.r.t.
this variable

Function
that created
this variable DD2424 - Lecture 8 39
Pytorch Autograd

DD2424 - Lecture 8 40
Pytorch Autograd

DD2424 - Lecture 8 41
Pytorch Autograd

DD2424 - Lecture 8 42
Pytorch Autograd

DD2424 - Lecture 8 43
PyTorch Autograd

• Calculate gradient using backward() method of a Variable

• var.backward()

DD2424 - Lecture 8 44
TensorFlow gradients

• Add gradient nodes in the graph where necessary using


Tf.gradients(ys, xs, gs)

• And evaluate it

DD2424 - Lecture 8 45
TensorFlow gradients

• Then update the parameters

DD2424 - Lecture 8 46
TensorFlow gradient

• Use tf.Variable instead

DD2424 - Lecture 8 47
How to use GPU?

DD2424 - Lecture 8 48
PyTorch GPU

Turn variables into “GPU” variables by the following command:

• var = var.cuda(#)

DD2424 - Lecture 8 49
PyTorch GPU

Turn back variables into “CPU” variables by the following command:

• var = var.cpu()

DD2424 - Lecture 8 50
TensorFlow GPU

• In TF variables or operations can sit on specific device

• tf.device(/gpu:0)
• tf.device(/gpu:1)
•…
• tf.device(/cpu:0)

DD2424 - Lecture 8 51
TensorFlow GPU

• In TF variables or operations can sit on specific device


tf.Session(config=tf.ConfigProto(log_device_placement=True))

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0


2018-04-10 12:59:09.508497: I tensorflow/core/common_runtime/placer.cc:874] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
add: (Add): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-10 12:59:09.508513: I tensorflow/core/common_runtime/placer.cc:874] add: (Add)/job:localhost/replica:0/task:0/device:GPU:0
Maximum: (Maximum): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-10 12:59:09.508525: I tensorflow/core/common_runtime/placer.cc:874] Maximum: (Maximum)/job:localhost/replica:0/task:0/device:GPU:0
Maximum/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-10 12:59:09.508537: I tensorflow/core/common_runtime/placer.cc:874] Maximum/y: (Const)/job:localhost/replica:0/task:0/device:GPU:0
Placeholder_2: (Placeholder): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-10 12:59:09.508548: I tensorflow/core/common_runtime/placer.cc:874] Placeholder_2: (Placeholder)/job:localhost/replica:0/task:0/device:GPU:0
Placeholder_1: (Placeholder): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-10 12:59:09.508558: I tensorflow/core/common_runtime/placer.cc:874] Placeholder_1: (Placeholder)/job:localhost/replica:0/task:0/device:GPU:0
Placeholder: (Placeholder): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-10 12:59:09.508567: I tensorflow/core/common_runtime/placer.cc:874] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/device:GPU:0

DD2424 - Lecture 8 52
TensorFlow GPU

• Some TF operations do not have a CUDA implementation

tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))

DD2424 - Lecture 8 53
How to implement complicated models in practice?

DD2424 - Lecture 8 54
PT High-Level Library

• PyTorch package called nn and class called Module

DD2424 - Lecture 8 55
TF High-Level Libraries

• Keras: highest abstraction


• SLIM: best pre-trained models
• TFLearn,
• Sonnet,
• Pretty Tensor,
•…

DD2424 - Lecture 8 56
Data, storage, and loading

!!!Important!!!

• Always monitor CPU/GPU usage (linux: nvidia-smi, top)

• Make storage more efficient (TF Records, etc.)

• Make reading pipeline more efficient (parallel readers, prefetching,


etc.)

DD2424 - Lecture 8 57
Use Visualization

• Always monitor the loss function on the training and validation sets visually
• Monitor all other important scalars, such as learning rate, regularization loss,
layer activations summary, how full your data queues are, and …

• If you have an imbalanced classification problem, visualize the CE loss separately


for each class.

• If you work with images, time to time visualize samples from the batch, if you do
data augmentation, visualize the original sample as well as the augmented one

• TensorBoard for TF
• TensorBoardX, matplotlib, seaborn, … for PT
DD2424 - Lecture 8 58
Use Visualization

You can have the configuration shown as a text file in tensorboard!

DD2424 - Lecture 8 59
Which one is better? PyTorch or TensorFlow?

DD2424 - Lecture 8 60
pros and cons

• PyTorch: easier for prototyping


• PyTorch: much easier to implement flexible graphs
• PyTorch: different structures in each iteration (dependent on data). This is possible with TF too, but is a pain.
• PyTorch: manipulating weight and gradients
• PyTorch: code-level debugging (breakpoints, imperative, tracing your own code instead of TF kernels)
• PyTorch: probably better abstractions for dataset, variable, parallelism, etc. but TF has many high-level wrappers with better abstractions
• Tie?!: Faster run-time, (NHWC v.s. NCHW)
• TF: TensorBoard
• TF: research-level debugging (TensorBoard)
• TF: windows
• TF: distributed training (PyTorch has it now too, but seems not as developed as the TF version)
• TF: easier with distributing the code over multiple devices (GPUs/CPU) (maybe not anymore)
• TF: online community is noticeably larger
• TF: data readers
• TF: supposedly more optimizations of the graph (done by the engine)
• TF: documentation and tutorials
• TF: more models available
• TF: Serialization, code and portability (saving and loading models for across platforms, or checkpoints)
• TF: Deployment: Server, Mobile, etc. (TensorFlow Serving, TensorFlow Lite)
• TF: Richer API (e.g. FFT)
• TF: Automatic shape inference
• TF has a MOOC: https://eu.udacity.com/course/deep-learning--ud730

DD2424 - Lecture 8 61
TensorFlow Eager execution

• Eager Execution
• Dynamic!

• tf.enable_eager_execuation()

• Considerably Slower (being worked on)

• https://www.tensorflow.org/guide/eager

DD2424 - Lecture 8 62
Caffe(2)

• Portability is seamless (e.g. mobile apps)

• Simplest framework for fine-tuning or feature extraction

• Used to be fastest (Caffe)

DD2424 - Lecture 8 63
Summary

• Don’t take the following statements too seriously! -- it depends on many factors
• If you want to use pretrained classic deep networks (AlexNet, VGG, ResNet, …) for feature extraction and/or fine-
tuning → Use Caffe and/or Caffe2
• If you have a mobile application in mind → Use Caffe/Caffe2 or TensorFlow
• If you want more pythonic → use PyTorch
• If you are familiar with Matlab and don’t need much flexibility or advanced layers → use MatConvNet
• If you don’t want so much of flexibility and still use python → use Keras
• If you are working on NLP applications or complicated RNNs → use PyTorch
• If you want large community help, sustainable learning of a framework → use TensorFlow
• If you want to work on bleeding-edge papers → See what framework has the original and/or cleanest
implementation (most likely TensorFlow)
• If you want to prototype many different novel setups → Use PyTorch or TF Eager

DD2424 - Lecture 8 64

You might also like