0% found this document useful (0 votes)

15 views62 pages

3 Lecture 21 01 25

The document outlines a lecture on deep neural networks (DNNs), covering topics such as convolution, pooling, normalization layers, and commonly used datasets and models. It includes detailed explanations of convolutional layer parameters, fully connected layers, and various DNN architectures like LeNet-5, AlexNet, and VGG-16. The lecture emphasizes the importance of techniques like batch normalization and the use of smaller stacked filters for efficient training and accuracy in DNNs.

Uploaded by

rppay777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views62 pages

3 Lecture 21 01 25

Uploaded by

rppay777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Mem Sensor

μP Battery

E0 294: Systems for Machine Learning

Lecture #3
21st January 2025
Previous Class
▪ DNN Training and Inference
▪ CNN Basics

2
Today’s Agenda
▪ Example of convolution
▪ Non-linear operation
▪ Pooling operation
▪ Normalization layer
▪ Commonly used dataset
▪ Commonly used DNN models

3
CNN Parameters (Recap)

4
Conv Layer
▪ Filters are 4-dimensional
–𝑅×𝑆×𝐶 ×𝑀
▪ What are R, S, C, M?
– R: Height of the filter
– S: Width of the filter
– C: Number of channels
– M: Number of filters

5
Conv Layer Implementation
▪ Naïve 7-layer for-loop implementation:

6
Conv Layer Computation

7
Conv Layer Computation

8
Conv Layer Computation

9
Conv Layer Computation

10
Conv Layer Computation

▪ Both channel 1 and channel 2 in the left are used to generate channel 1 of
the output

11
Conv Layer Computation

▪ We can flatten both the filters and the input feature map
– The computation is vector-matrix multiplication
12
Conv Layer Computation

▪ Filter 2 corresponds to (channel 1, channel2) of input feature map, but

channel 2 of output feature map
▪ Similarly, Filter 1: (channel 1, channel 2) of input and channel 1 of output
13
Conv Layer Computation

▪ We can flatten both filters and input feature maps in the same way as before
▪ Naturally generate flattened output feature maps of two channels

14
Conv Layer Computation

▪ What are R, S, C, M?
▪ R: Height of the filter
▪ S: Width of the filter
▪ C: Number of channels
▪ M: Number of filters

15
Mem Sensor

μP Battery

A Quantitative Example
An Example

17
Converting Filter Traces into Matrix

18
Filter Flattened into a Vector
▪ The matrix of weights for the convolutional layer can be flattened into a vector,
K
[1., 2., 3., 4.]

19
Mem Sensor

μP Battery

Fully Connected Layer: Extracting

Parallelism
Batch of N Input fmaps
▪ A batch has N input fmaps, so we apply the same M filters of size CHW,and
generate N output fmaps of 1x1xM

21
Fully Connected Layer

▪ We flatten three dimensions into one

– C (#of channels input fmap), HxW (filter size)
– All filters of for the input fmaps are flattened into one row of CHW. We have M such roes, each
for an output channel
▪ To perform matrix multiplication, input fmaps also need to be flattened to have CHW rows
– After multiplication, the output contains one point for all M output channels

22
Fully Connected Layer

▪ We flatten three dimensions into one

23
Fully Connected Layer

▪ We flatten three dimensions into one

24
Flattened Fully Connected Layer

▪ After flattening, having a batch size of N turns the matrix-vector operation into a
matrix-matrix multiplication

25
Flattened Fully Connected Layer

▪ After flattening, having a batch size of N turns the matrix-vector operation into a
matrix-matrix multiplication

26
Flattened Fully Connected Layer

▪ After flattening, having a batch size of N turns the matrix-vector operation into a
matrix-matrix multiplication

27
Flattened Fully Connected Layer

▪ After flattening, having a batch size of N turns the matrix-vector operation into a
matrix-matrix multiplication

28
Flattened Fully Connected Layer

▪ After flattening, having a batch size of N turns the matrix-vector operation into a
matrix-matrix multiplication
▪ How much temporal locality (reuse of data within a time frame) for this
implementation?
– None
29
Tiled Fully Connected Layer
▪ Matrix multiplication is tiled to fit in cache
▪ Computation ordered to maximize reuse of data in cache

30
Tiled Fully Connected Layer
▪ Implementation: Matrix Multiplication (GEMM)
– CPU: OpenBLAS, Intel MKL etc
– GPU: cuBLAS, cuDNN etc
▪ Library will note shape of the matrix multiplication and select implantation
optimized for that shape
▪ Optimization usually involves proper tiling to storage hierarchy

31
GV100 – “Tensor Core”

▪ New opcodes
– Matrix Multiply Accumulate (HMMA)
▪ FP16 operands
– 48 inputs / 16 outputs
▪ 64 multiplies
▪ 64 adds
▪ 120 TFLOPS (FP16)
▪ 400 GFLOPS/W (FP16)
32
Tensor Processing Unit

33
Today’s Agenda
▪ Example of convolution
▪ Non-linear operation
▪ Pooling operation
▪ Normalization layer
▪ Commonly used dataset
▪ Commonly used DNN models

34
Non-linear Operation

35
More Activation Functions

36
Today’s Agenda
▪ Non-linear operation
▪ Pooling operation
▪ Normalization layer
▪ Commonly used dataset
▪ Commonly used DNN models

37
Pooling (Pool) Layer
▪ Reduce resolution of each channel independently
▪ Overlapping or non-overlapping
– Depends on stride
▪ Increases translational-invariance and noise-resillience

38
Translational Invariance

Case-1

Output fmaps
are similar

Case-2

39
Translational Invariance
▪ Provides the same output independent of the location of the object within the
image
▪ Pooling helps to provide the invariance

40
Pooling Layer Implementation
▪ Naïve 6-layer for-loop implementation for max-pool

41
Today’s Agenda
▪ Non-linear operation
▪ Pooling operation
▪ Normalization layer
▪ Commonly used dataset
▪ Commonly used DNN models

42
Normalization Layer
▪ Batch Normalization
– Normalization activations towards mean=0 and std dev=1 based on the
statistics of the training data set
– Put between conv/FC and activation function
▪ Believed to be key to getting high accuracy and faster training for DNNs

43
Normalization Layer
▪ The normalized values are further scaled and shifted
– The parameters are learnt through training

44
Today’s Agenda
▪ Non-linear operation
▪ Pooling operation
▪ Normalization layer
▪ Commonly used dataset
▪ Commonly used DNN models

45
Commonly used Dataset
▪ MNIST
– Digit Classification
– 28×28 pixels (B&W)
– 10 Classes
– 60,000 training
– 10,000 testing

46
LeNet-5
▪ Conv layers: 2
▪ Fully connected layers: 2
▪ Weights: 60k
▪ MACs: 341k
▪ Sigmoid used for non-linearity

47
LeNet-5

48
ImageNet
▪ Image classification ▪ For ImageNet Large Scale Visual Recognition
– 256×256 Challenge (ILSVRC)
– Colour images – Accuracy of classification task reported based
on top-1/top-5 error
– 1000 classes
– What is top-K error?
– 1.3 M training
– 100, 000 testing

49
AlexNet (Krizhevsky et al., NeurIPS 2012)
▪ ILSCVR12 Winner
▪ Uses local response normalization (LRN)
▪ Structure
– 5 conv layers
– 3 fully connected layers
– Weights: 61M
– MACs 724M
– ReLU used for non-linearity

50
AlexNet (Krizhevsky et al., NeurIPS 2012)
▪ ILSCVR12 Winner
▪ Uses local response normalization (LRN)
▪ Structure
– 5 conv layers
– 3 fully connected layers
– Weights: 61M
– MACs 724M
– ReLU used for non-linearity

51
AlexNet: Large Sizes with Varying Shapes

52
VGG-16 (Simonyan et al., ICLR 2015)
▪ Conv layers: 13
▪ FC layers: 3
▪ Weights: 138M
▪ MACs: 15.5G
▪ There is a 19-layer version too (VGG-19)

53
Stacked Filters
▪ Deeper networks means more weights
▪ Use stack of smaller filters (3×3) to cover the same receptive field with fewer
filter weights

54
Stacked Filters
▪ Deeper networks means more weights
▪ Use stack of smaller filters (3×3) to cover the same receptive field with fewer
filter weights

55
Stacked Filters
▪ Deeper networks means more weights
▪ Use stack of smaller filters (3×3) to cover the same receptive field with fewer
filter weights
▪ Non-linear activations inserted between each filter
– 5×5 filter (25 weights) → two 3×3 filters (18 weights)

56
Deep into Inception

57
1×1 Convolution

58
1×1 Convolution

59
1×1 Convolution

60
Inception V1
▪ Apply 1×1 before ‘large’ convolution filters
▪ Reduce weights such that the entire DNN can be trained on one GPU
▪ Number of multiplications reduced from 854M →358M

61
Mem Sensor

μP Battery

THANK YOU

Selfie Verification Bypass
No ratings yet
Selfie Verification Bypass
9 pages
Horizon Academic Research Journal Vol. 4 No. 1
No ratings yet
Horizon Academic Research Journal Vol. 4 No. 1
406 pages
2nddec Microsoft AI Engineer
No ratings yet
2nddec Microsoft AI Engineer
26 pages
Fin Irjmets1709201453
No ratings yet
Fin Irjmets1709201453
6 pages
Capstone Project Report CO6I
No ratings yet
Capstone Project Report CO6I
34 pages
3 DL ConvNets
No ratings yet
3 DL ConvNets
46 pages
2411.19537v1 Survey
No ratings yet
2411.19537v1 Survey
24 pages
Image Recognition
No ratings yet
Image Recognition
16 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
1 s2.0 S0738081X23002687 Main
No ratings yet
1 s2.0 S0738081X23002687 Main
9 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
27 pages
MLP Bearing and Speed
No ratings yet
MLP Bearing and Speed
6 pages
Unit Iv - NNDL
No ratings yet
Unit Iv - NNDL
32 pages
Ren Multiscale Structure Guided Diffusion For Image Deblurring ICCV 2023 Paper
No ratings yet
Ren Multiscale Structure Guided Diffusion For Image Deblurring ICCV 2023 Paper
13 pages
10 1109vdat50263 2020 9190274
No ratings yet
10 1109vdat50263 2020 9190274
6 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
Deep Learning Convolution Neural Networks
No ratings yet
Deep Learning Convolution Neural Networks
73 pages
Artificial Intelligence in Chemical Engineering: Past, Present, and Future Perspectives
No ratings yet
Artificial Intelligence in Chemical Engineering: Past, Present, and Future Perspectives
11 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
Quality Control in The Development Process of AI System On Ships
No ratings yet
Quality Control in The Development Process of AI System On Ships
7 pages
Steel 3
No ratings yet
Steel 3
39 pages
Week 7
No ratings yet
Week 7
24 pages
Endsem Project Report B16
No ratings yet
Endsem Project Report B16
26 pages
CNN Ai
No ratings yet
CNN Ai
17 pages
A CNN Accelerator On FPGA Using Depthwise
No ratings yet
A CNN Accelerator On FPGA Using Depthwise
5 pages
02 CNN Slides
No ratings yet
02 CNN Slides
77 pages
Unit1 C
No ratings yet
Unit1 C
21 pages
09 Evaluation
No ratings yet
09 Evaluation
64 pages
A Canvas of Air and Signs: Integrating Voice Activated Hand Sign Recognition and Air Canvas For Hearing Impaired and Non-Verbal People
No ratings yet
A Canvas of Air and Signs: Integrating Voice Activated Hand Sign Recognition and Air Canvas For Hearing Impaired and Non-Verbal People
4 pages
Vision Transformer (Vit) Model For Birds Classification
No ratings yet
Vision Transformer (Vit) Model For Birds Classification
51 pages
CNN Slides Part2
No ratings yet
CNN Slides Part2
69 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
79 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
5 Lecture 28 01 25
No ratings yet
5 Lecture 28 01 25
47 pages
Plant Disease Detection Using Machine Learning
No ratings yet
Plant Disease Detection Using Machine Learning
9 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Face Recognition On Small-Scale Datasets
No ratings yet
Face Recognition On Small-Scale Datasets
6 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
70 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
Coloured Night Vision
No ratings yet
Coloured Night Vision
13 pages
d2l en PDF
No ratings yet
d2l en PDF
1,197 pages
21CS743 DL Module4 Notes
No ratings yet
21CS743 DL Module4 Notes
7 pages
SIT 410 KBS Assignment-2
No ratings yet
SIT 410 KBS Assignment-2
11 pages
Computer Vision Based Attendance Management System For Students
No ratings yet
Computer Vision Based Attendance Management System For Students
6 pages
ResNet-50 Vs VGG-19 Vs Training From Scratch A Comparative Analysis of The Segmentation and Classification of Pneumonia From
No ratings yet
ResNet-50 Vs VGG-19 Vs Training From Scratch A Comparative Analysis of The Segmentation and Classification of Pneumonia From
10 pages
21CS743 Module4 Notes
No ratings yet
21CS743 Module4 Notes
15 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
77 pages
Ineuron 12mnths
No ratings yet
Ineuron 12mnths
26 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
NN 06
No ratings yet
NN 06
18 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Residual Attention Network For Image Classification
No ratings yet
Residual Attention Network For Image Classification
9 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
U-Net Architectures For Fast Prediction of Incompressible Laminar Flows
No ratings yet
U-Net Architectures For Fast Prediction of Incompressible Laminar Flows
12 pages
Iii Unit - Deeplearning
No ratings yet
Iii Unit - Deeplearning
93 pages
CS 230 - Convolutional Neural Networks Cheatsheet
No ratings yet
CS 230 - Convolutional Neural Networks Cheatsheet
17 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
29 pages
Noise2Noise: Learning Image Restoration Without Clean Data
No ratings yet
Noise2Noise: Learning Image Restoration Without Clean Data
12 pages
MLT CNN Architectures
No ratings yet
MLT CNN Architectures
104 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
CS 230 - Convolutional Neural Networks Cheatsheet
No ratings yet
CS 230 - Convolutional Neural Networks Cheatsheet
7 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
Convolutional Neural Networks-CNN PDF
No ratings yet
Convolutional Neural Networks-CNN PDF
95 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Basic Design Approaches To Accelerating Deep Neural Networks
No ratings yet
Basic Design Approaches To Accelerating Deep Neural Networks
93 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Cheatsheet Convolutional Neural Networks
No ratings yet
Cheatsheet Convolutional Neural Networks
5 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
CNN Cheat Sheet
No ratings yet
CNN Cheat Sheet
5 pages

3 Lecture 21 01 25

Uploaded by

3 Lecture 21 01 25

Uploaded by

Mem Sensor

E0 294: Systems for Machine Learning

▪ Filter 2 corresponds to (channel 1, channel2) of input feature map, but

Fully Connected Layer: Extracting

▪ We flatten three dimensions into one

▪ We flatten three dimensions into one

▪ We flatten three dimensions into one

You might also like