0% found this document useful (0 votes)

28 views66 pages

L6 Hardware and Software For DL en

hust

Uploaded by

Phuc Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views66 pages

L6 Hardware and Software For DL en

hust

Uploaded by

Phuc Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

1

Lesson 6:
Software and hardware for
deep learning

2
Outline
1. Hardware for deep learning (CPU vs GPU)
2. Deep learning frameworks
3. Accelerator and compression tools

3
Hardware for deep learning
(CPU vs GPU)

4
A not so common computer

5
CPU vs GPU

• CPU: fewer cores, but each core is much faster and

much more capable, great at sequential tasks.
• GPU: more cores, but each core is much slower and
“dumber”, great for parallel tasks.

6
Example: Matrix multiplication

• More efficient with GPU

7
GigaFLOPs per 1$

8
CPU vs GPU in practice

9
CPU vs GPU in practice (2)

cuDNN much faster than “unoptimized” CUDA

10
CPU vs GPU vs TPU

• TPU: Specialized hardware for deep learning

11
GigaFLOPs per 1$

12
NVIDIA DGX-2

13
NVidia edge computing

14
Google Coral

15
ARM edge computing

16
ARM NPU

17
Programming GPUs
• CUDA (NVIDIA only)
• Write C-like code that runs directly on the GPU
• Optimized APIs: cuBLAS, cuFFT, cuDNN, etc
• OpenCL
• Similar to CUDA, but runs on anything
• Usually slower on NVIDIA hardware
• HIP https://github.com/ROCm-Developer-Tools/HIP
• New project that automatically converts CUDA code to
something that can run on AMD GPUs
CPU / GPU Communication

• If you aren’t careful,

training can bottleneck
on reading data and
transferring to GPU!
• Solutions:
• Read all data into
RAM
• Use SSD instead of
HDD
• Use multiple CPU
threads to prefetch
data
Deep learning frameworks

20
There are many ...

21
Computational graphs
Computational graphs (2)
Computational graphs (3)
• Pros:
• Clean API,
easy to write
numeric code
• Cons:
• Have to
compute our
own gradients
• Can’t run on
GPU

24
Computational graphs (4)

• Lập trình giống như Numpy!

25
Computational graphs (5)

• PyTorch can do auto gradient

26
The point of deep learning
frameworks
• Quick to develop and test new ideas
• Automatically compute gradients
• Run it all efficiently on GPU (wrap cuDNN, cuBLAS,
OpenCL, etc)
PyTorch: Tensors
• Like a numpy array, but
can run on GPU
• Data model and API are
too much the same!
• Here is an example of
training a two-layer
neural network using
PyTorch Tensors

28
PyTorch: Autograd
• Create Tensors with
requires_grad=True to
enable autograd
• Torch.no_grad means
don't include this in the
calculation graph
• We need to set the
gradients to zero before
starting to do back
propragation because
PyTorch accumulates the
gradients on subsequent
backward passes. This is
convenient while training
RNNs.

29
PyTorch: nn
• Higher-level
wrapper to work
with neural
networks
• Make
programming
easier

30
PyTorch: nn (2)
• It is possible to
define new
modules in
PyTorch
• Modules can
contain weights
or other modules
• PyTorch
automatically
handles Autograd
for new modules

31
PyTorch: optim
• Optimization
algorithms are
available in
PyTorch, such
as Adam

32
PyTorch: Pretrained models
• PyTorch has several pre-trained models available.
• These models can be used directly.

33
PyTorch: Visdom
• Tool to help visualize the calculation process
• Currently does not support the visualization of
computational graph structures

34
PyTorch: tensorboardx
• A python wrapper around Tensorflow’s web-based visualization
tool.
• pip install tensorboardx
• https://github.com/lanpa/tensorboardX

35
PyTorch: Dynamic computational graph
• Create tensor

36
PyTorch: Dynamic computational graph (2)
• Build graph and perform computation

37
PyTorch: Dynamic computational graph (3)
• Build graph and perform computation

38
PyTorch: Dynamic computational graph (4)
• Find the path on the graph from the objective function
to w1 and w2 for backprop, then do the calculation

39
PyTorch: Dynamic computational graph (5)
• On the next iteration, delete all the graphs and
backpropagation in the previous step, rebuild all from
scratch
• Seems inefficient, especially when building the same
graph multiple times...

40
PyTorch: Static computation graphs
Static graph
Step 1: Build
computational graph
describing our computation
(including finding paths for
backprop)
Step 2: : Reuse the same
graph on every iteration

41
Tensorflow Pre2.0
• Step 1:
Build a
calculation
graph
• Step 2: Run
this
calculation
graph
several
times

42
Tensorflow 2.0
• TensorFlow's Eager Execution mode is an imperative
programming environment that allows immediate
operations to be executed without the need to build a
computation graph
• operations return specific values instead of building a graph of
the calculation and run it later.
• This makes it easier to get started with TensorFlow
models and easier to debug.

43
Tensorflow 2.0 vs Pre2.0

44
Tensorflow 2.0 vs Pre2.0

45
Tensorflow 2.0: Neural Network
• Turn numpy array into TF tensor

46
Tensorflow 2.0: Neural Network
• Use tf.GradientTape() to build dynamic computational
graphs

47
Tensorflow 2.0: Neural Network
• All operations in the forward step are tracked for later
gradient calculations.

48
Tensorflow 2.0: Neural Network
• tape.gradient() uses the previously tracked calculation
graph to calculate the gradient.

49
Tensorflow 2.0: Neural Network
• Neural network training: loop over computational
graphs, use gradients to update weights

50
Tensorflow 2.0: Neural Network
• The available optimization algorithm (optimizer) can be
used to calculate the gradient and update the weights

51
Tensorflow 2.0: Neural Network
• Can use predefined objective function

52
Keras: High-Level wrapper
• Keras is a layer on top of TensorFlow, makes common
things easy to do (Used to be third-party, now merged
into TensorFlow)

53
Keras: High-Level wrapper

54
Tensorflow 2.0: @tf.function
• tf.function decorator
(implicitly) compiles
python functions to
static graph for
better performance
• Here we compare
the forward-pass
time of the same
model under
dynamic graph
mode and static
graph mode

55
TensorFlow: Pretrained Models
• tf.keras:
https://www.tensorflow.org/api_docs/python/tf/keras/ap
plications

• TF-Slim:
https://github.com/tensorflow/models/tree/master/resea
rch/slim

56
TensorFlow: Tensorboard
• Add log in the code to observe the
objective function, parameters...
• Run server tensorboard and see
the result

57
Static vs Dynamic

• With static graphs, framework can optimize the graph

for you before it runs!
58
Static vs Dynamic: Optimization
• Static • Dynamic
Once graph is built, can Graph building and
serialize it and run it execution are
without the code that built intertwined, so always
the graph! need to keep code
around

59
Static PyTorch
• Caffe2:
https://caffe2.ai/
• ONNX:
https://github.com/onnx/onnx

60
Accelerator and compression tools

61
Tensorflow Lite
• Tensorflow Lite is a set of tools to optimize Tensorflow
models, make models more compact and infer faster
on mobile platforms.

62
NVIDIA TensorRT
• tf

63
Other tools
• Pocket flow: https://github.com/Tencent/PocketFlow
• Tencent NCNN: https://github.com/Tencent/ncnn

64
References
1. The lecture is based on Stanford's cs231n
http://cs231n.stanford.edu
2. Tensorflow vs Keras vs PyTorch:
https://databricks.com/session/a-tale-of-three-deep-
learning-frameworks-tensorflow-keras-pytorch
3. NVIDIA TensorRT:
Fast Neural Network Inference with TensorRT on
Autonomous
4. ARM chip:
Design And Reuse 2018 Keynote

65
Thank you
for your
attention!!!

Major Project Report Draft
No ratings yet
Major Project Report Draft
60 pages
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
Pytorch Cheatsheet EN
No ratings yet
Pytorch Cheatsheet EN
1 page
Deep Learning Library PDF
No ratings yet
Deep Learning Library PDF
12 pages
Ultimate Guide To Tensorflow 2.0 in Python
No ratings yet
Ultimate Guide To Tensorflow 2.0 in Python
23 pages
04 Mainstream Development Frameworks in The Industry
No ratings yet
04 Mainstream Development Frameworks in The Industry
41 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
Let Us Code: Using Deep Learning Through A Library
No ratings yet
Let Us Code: Using Deep Learning Through A Library
17 pages
Eeb131 Intro To Ai and It-03
No ratings yet
Eeb131 Intro To Ai and It-03
23 pages
NB4-06 PT I Using CNN
No ratings yet
NB4-06 PT I Using CNN
21 pages
Dzone Rc251 Gettingstartedwithtensorflow
No ratings yet
Dzone Rc251 Gettingstartedwithtensorflow
5 pages
Lecture8 Computational Graph Pytorch TF
No ratings yet
Lecture8 Computational Graph Pytorch TF
64 pages
19BCE233 AI Practical 1
No ratings yet
19BCE233 AI Practical 1
4 pages
The First Artificial Neuron
No ratings yet
The First Artificial Neuron
2 pages
Machine Learning Assignment-1
No ratings yet
Machine Learning Assignment-1
7 pages
Dla
No ratings yet
Dla
23 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
Week 2
No ratings yet
Week 2
4 pages
1 TensorFlow
No ratings yet
1 TensorFlow
66 pages
Assignment No.1: Theory
No ratings yet
Assignment No.1: Theory
4 pages
14 DL Frameworks
No ratings yet
14 DL Frameworks
30 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
8 pages
Deep Learning r18 Jntuh Lab Manual
No ratings yet
Deep Learning r18 Jntuh Lab Manual
20 pages
LAB SHEET 1 Basics
No ratings yet
LAB SHEET 1 Basics
5 pages
Lesson 05 TensorFlow
No ratings yet
Lesson 05 TensorFlow
113 pages
Appendix Tensorflow PDF
50% (8)
Appendix Tensorflow PDF
14 pages
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
No ratings yet
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
108 pages
Introduction To Deep Neural Networks - DataCamp
No ratings yet
Introduction To Deep Neural Networks - DataCamp
10 pages
Deep Learning1
No ratings yet
Deep Learning1
23 pages
Pytorch Slides
No ratings yet
Pytorch Slides
31 pages
DIP Lab 10
No ratings yet
DIP Lab 10
11 pages
Sony Ai Content
No ratings yet
Sony Ai Content
26 pages
Autoencoders: Parallel Programming Parallel Processing
No ratings yet
Autoencoders: Parallel Programming Parallel Processing
5 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
Create AI Model Guide
No ratings yet
Create AI Model Guide
14 pages
Ansari H. Mastering TensorFlow. Unleashing The Power of Deep Learning... 2024
No ratings yet
Ansari H. Mastering TensorFlow. Unleashing The Power of Deep Learning... 2024
134 pages
A Comparative Study of Deep Learning
No ratings yet
A Comparative Study of Deep Learning
6 pages
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
No ratings yet
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
119 pages
Comparison Matrix - PyTorch Vs TensorFlow
No ratings yet
Comparison Matrix - PyTorch Vs TensorFlow
4 pages
DSE 3141 Deep Learning Lab Manual 2024 Week4
No ratings yet
DSE 3141 Deep Learning Lab Manual 2024 Week4
14 pages
Introduction To PyTorch
No ratings yet
Introduction To PyTorch
35 pages
Tensorflow Proposal
No ratings yet
Tensorflow Proposal
3 pages
Week 13 GCP Lec Notes
No ratings yet
Week 13 GCP Lec Notes
28 pages
Deep Learning Blog
No ratings yet
Deep Learning Blog
6 pages
What Is TensorFlow
No ratings yet
What Is TensorFlow
38 pages
ML 2
No ratings yet
ML 2
4 pages
Tensor
No ratings yet
Tensor
19 pages
Tensorlayer Documentation: Release 1.11.1
No ratings yet
Tensorlayer Documentation: Release 1.11.1
258 pages
24 TensorFlow Clipper
No ratings yet
24 TensorFlow Clipper
35 pages
Tensor Flow 101
100% (8)
Tensor Flow 101
58 pages
Tensorflow World Resources Readthedocs Io en Latest
No ratings yet
Tensorflow World Resources Readthedocs Io en Latest
21 pages
TF Recitation
No ratings yet
TF Recitation
38 pages
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
100% (1)
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
22 pages
DeepLearning Pytorch 522H0134 NguyenNhatHuy 522H0150 PhamHuynhTin
No ratings yet
DeepLearning Pytorch 522H0134 NguyenNhatHuy 522H0150 PhamHuynhTin
54 pages
Python TensorFlow Tutorial - Build A Neural Network - Adventures in Machine Learning
No ratings yet
Python TensorFlow Tutorial - Build A Neural Network - Adventures in Machine Learning
18 pages
Assignment1 Blog
No ratings yet
Assignment1 Blog
4 pages
Unit Ii
No ratings yet
Unit Ii
83 pages
DL Unit 3
No ratings yet
DL Unit 3
21 pages
Deep Learning Titans Compared - TensorFlow vs. PyTorch
No ratings yet
Deep Learning Titans Compared - TensorFlow vs. PyTorch
13 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages
S06 DNN Tensorflow PyTorch Wip
No ratings yet
S06 DNN Tensorflow PyTorch Wip
24 pages
L12 Generative Models en
No ratings yet
L12 Generative Models en
65 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
RADL TTho
No ratings yet
RADL TTho
64 pages
Part 1 Linux Chapter 4 Process Management
No ratings yet
Part 1 Linux Chapter 4 Process Management
10 pages
00 Pytorch and Deep Learning Fundamentals PDF
No ratings yet
00 Pytorch and Deep Learning Fundamentals PDF
44 pages
PyCoder 12 20
No ratings yet
PyCoder 12 20
80 pages
L01 - Introduction-to-ML
No ratings yet
L01 - Introduction-to-ML
10 pages
PyTorch Neural Network Classifcation
No ratings yet
PyTorch Neural Network Classifcation
1 page
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Docker Guide For AI Research
No ratings yet
Docker Guide For AI Research
8 pages
Aditya Intern Report
No ratings yet
Aditya Intern Report
52 pages
Sparsity in INT8 - Training Workflow and Best Practices For NVIDIA TensorRT Acceleration - NVIDIA Technical Blog
No ratings yet
Sparsity in INT8 - Training Workflow and Best Practices For NVIDIA TensorRT Acceleration - NVIDIA Technical Blog
9 pages
Ajay Kumar Garg Engineering College: (Shapemyskills)
No ratings yet
Ajay Kumar Garg Engineering College: (Shapemyskills)
32 pages
PyTorch & PyTorch Geometric
No ratings yet
PyTorch & PyTorch Geometric
21 pages
Lecture 09 Softmax Classifier
No ratings yet
Lecture 09 Softmax Classifier
46 pages
Kolmogorov-Arnold-Networks in Python
No ratings yet
Kolmogorov-Arnold-Networks in Python
8 pages
Chapter VI - Introduction To Deep Learning
No ratings yet
Chapter VI - Introduction To Deep Learning
38 pages
PPB ML Notes
No ratings yet
PPB ML Notes
54 pages
1822 B.E Cse Batchno 237
No ratings yet
1822 B.E Cse Batchno 237
30 pages
Harvard CS197 Lecture 6 & 7 Notes
No ratings yet
Harvard CS197 Lecture 6 & 7 Notes
18 pages
Spring Assignment 2024
No ratings yet
Spring Assignment 2024
12 pages
1 Introduction Full
No ratings yet
1 Introduction Full
33 pages
Introduction To PyTorch
No ratings yet
Introduction To PyTorch
9 pages
CSE121
No ratings yet
CSE121
13 pages
ML Exp. 1-10 Output
No ratings yet
ML Exp. 1-10 Output
59 pages
PyTorch - Advanced Deep Learning
No ratings yet
PyTorch - Advanced Deep Learning
237 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Feature Extraction in TorchVision Using Torch FX - PyTorch
No ratings yet
Feature Extraction in TorchVision Using Torch FX - PyTorch
9 pages
2.1 Pytorch Intro Slides
No ratings yet
2.1 Pytorch Intro Slides
14 pages
Pytorch Tutorial: - Ntu Machine Learning Course
No ratings yet
Pytorch Tutorial: - Ntu Machine Learning Course
64 pages
PyTorch Guide
No ratings yet
PyTorch Guide
17 pages