0% found this document useful (0 votes)
28 views66 pages

L6 Hardware and Software For DL en

hust

Uploaded by

Phuc Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views66 pages

L6 Hardware and Software For DL en

hust

Uploaded by

Phuc Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

1

Lesson 6:
Software and hardware for
deep learning

2
Outline
1. Hardware for deep learning (CPU vs GPU)
2. Deep learning frameworks
3. Accelerator and compression tools

3
Hardware for deep learning
(CPU vs GPU)

4
A not so common computer

5
CPU vs GPU

• CPU: fewer cores, but each core is much faster and


much more capable, great at sequential tasks.
• GPU: more cores, but each core is much slower and
“dumber”, great for parallel tasks.

6
Example: Matrix multiplication

• More efficient with GPU

7
GigaFLOPs per 1$

8
CPU vs GPU in practice

9
CPU vs GPU in practice (2)

cuDNN much faster than “unoptimized” CUDA

10
CPU vs GPU vs TPU

• TPU: Specialized hardware for deep learning

11
GigaFLOPs per 1$

12
NVIDIA DGX-2

13
NVidia edge computing

14
Google Coral

15
ARM edge computing

16
ARM NPU

17
Programming GPUs
• CUDA (NVIDIA only)
• Write C-like code that runs directly on the GPU
• Optimized APIs: cuBLAS, cuFFT, cuDNN, etc
• OpenCL
• Similar to CUDA, but runs on anything
• Usually slower on NVIDIA hardware
• HIP https://github.com/ROCm-Developer-Tools/HIP
• New project that automatically converts CUDA code to
something that can run on AMD GPUs
CPU / GPU Communication

• If you aren’t careful,


training can bottleneck
on reading data and
transferring to GPU!
• Solutions:
• Read all data into
RAM
• Use SSD instead of
HDD
• Use multiple CPU
threads to prefetch
data
Deep learning frameworks

20
There are many ...

21
Computational graphs
Computational graphs (2)
Computational graphs (3)
• Pros:
• Clean API,
easy to write
numeric code
• Cons:
• Have to
compute our
own gradients
• Can’t run on
GPU

24
Computational graphs (4)

• Lập trình giống như Numpy!


25
Computational graphs (5)

• PyTorch can do auto gradient

26
The point of deep learning
frameworks
• Quick to develop and test new ideas
• Automatically compute gradients
• Run it all efficiently on GPU (wrap cuDNN, cuBLAS,
OpenCL, etc)
PyTorch: Tensors
• Like a numpy array, but
can run on GPU
• Data model and API are
too much the same!
• Here is an example of
training a two-layer
neural network using
PyTorch Tensors

28
PyTorch: Autograd
• Create Tensors with
requires_grad=True to
enable autograd
• Torch.no_grad means
don't include this in the
calculation graph
• We need to set the
gradients to zero before
starting to do back
propragation because
PyTorch accumulates the
gradients on subsequent
backward passes. This is
convenient while training
RNNs.

29
PyTorch: nn
• Higher-level
wrapper to work
with neural
networks
• Make
programming
easier

30
PyTorch: nn (2)
• It is possible to
define new
modules in
PyTorch
• Modules can
contain weights
or other modules
• PyTorch
automatically
handles Autograd
for new modules

31
PyTorch: optim
• Optimization
algorithms are
available in
PyTorch, such
as Adam

32
PyTorch: Pretrained models
• PyTorch has several pre-trained models available.
• These models can be used directly.

33
PyTorch: Visdom
• Tool to help visualize the calculation process
• Currently does not support the visualization of
computational graph structures

34
PyTorch: tensorboardx
• A python wrapper around Tensorflow’s web-based visualization
tool.
• pip install tensorboardx
• https://github.com/lanpa/tensorboardX

35
PyTorch: Dynamic computational graph
• Create tensor

36
PyTorch: Dynamic computational graph (2)
• Build graph and perform computation

37
PyTorch: Dynamic computational graph (3)
• Build graph and perform computation

38
PyTorch: Dynamic computational graph (4)
• Find the path on the graph from the objective function
to w1 and w2 for backprop, then do the calculation

39
PyTorch: Dynamic computational graph (5)
• On the next iteration, delete all the graphs and
backpropagation in the previous step, rebuild all from
scratch
• Seems inefficient, especially when building the same
graph multiple times...

40
PyTorch: Static computation graphs
Static graph
Step 1: Build
computational graph
describing our computation
(including finding paths for
backprop)
Step 2: : Reuse the same
graph on every iteration

41
Tensorflow Pre2.0
• Step 1:
Build a
calculation
graph
• Step 2: Run
this
calculation
graph
several
times

42
Tensorflow 2.0
• TensorFlow's Eager Execution mode is an imperative
programming environment that allows immediate
operations to be executed without the need to build a
computation graph
• operations return specific values instead of building a graph of
the calculation and run it later.
• This makes it easier to get started with TensorFlow
models and easier to debug.

43
Tensorflow 2.0 vs Pre2.0

44
Tensorflow 2.0 vs Pre2.0

45
Tensorflow 2.0: Neural Network
• Turn numpy array into TF tensor

46
Tensorflow 2.0: Neural Network
• Use tf.GradientTape() to build dynamic computational
graphs

47
Tensorflow 2.0: Neural Network
• All operations in the forward step are tracked for later
gradient calculations.

48
Tensorflow 2.0: Neural Network
• tape.gradient() uses the previously tracked calculation
graph to calculate the gradient.

49
Tensorflow 2.0: Neural Network
• Neural network training: loop over computational
graphs, use gradients to update weights

50
Tensorflow 2.0: Neural Network
• The available optimization algorithm (optimizer) can be
used to calculate the gradient and update the weights

51
Tensorflow 2.0: Neural Network
• Can use predefined objective function

52
Keras: High-Level wrapper
• Keras is a layer on top of TensorFlow, makes common
things easy to do (Used to be third-party, now merged
into TensorFlow)

53
Keras: High-Level wrapper

54
Tensorflow 2.0: @tf.function
• tf.function decorator
(implicitly) compiles
python functions to
static graph for
better performance
• Here we compare
the forward-pass
time of the same
model under
dynamic graph
mode and static
graph mode

55
TensorFlow: Pretrained Models
• tf.keras:
https://www.tensorflow.org/api_docs/python/tf/keras/ap
plications

• TF-Slim:
https://github.com/tensorflow/models/tree/master/resea
rch/slim

56
TensorFlow: Tensorboard
• Add log in the code to observe the
objective function, parameters...
• Run server tensorboard and see
the result

57
Static vs Dynamic

• With static graphs, framework can optimize the graph


for you before it runs!
58
Static vs Dynamic: Optimization
• Static • Dynamic
Once graph is built, can Graph building and
serialize it and run it execution are
without the code that built intertwined, so always
the graph! need to keep code
around

59
Static PyTorch
• Caffe2:
https://caffe2.ai/
• ONNX:
https://github.com/onnx/onnx

60
Accelerator and compression tools

61
Tensorflow Lite
• Tensorflow Lite is a set of tools to optimize Tensorflow
models, make models more compact and infer faster
on mobile platforms.

62
NVIDIA TensorRT
• tf

63
Other tools
• Pocket flow: https://github.com/Tencent/PocketFlow
• Tencent NCNN: https://github.com/Tencent/ncnn

64
References
1. The lecture is based on Stanford's cs231n
http://cs231n.stanford.edu
2. Tensorflow vs Keras vs PyTorch:
https://databricks.com/session/a-tale-of-three-deep-
learning-frameworks-tensorflow-keras-pytorch
3. NVIDIA TensorRT:
Fast Neural Network Inference with TensorRT on
Autonomous
4. ARM chip:
Design And Reuse 2018 Keynote

65
Thank you
for your
attention!!!

66

You might also like