Lec 06
Lec 06
1
References and Slide Credits
• Slides from Deep Learning for Computer Vision, Prof. Yu-
Chiang Frank Wang, National Taiwan University
• Slides from Machine Learning, Prof. Hung-Yi Lee, EE,
National Taiwan University
• Slides from CE 5554 / ECE 4554: Computer Vision, Prof. J.-B.
Huang, Virginia Tech
• http://cs231n.stanford.edu/syllabus.html
• Marc'Aurelio Ranzato, Tutorial in CVPR2014
• Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep
Learning
• https://www.deeplearningbook.org/
• Bishop, Pattern Recognition and Machine Learning
• Reference papers
2
Outline
• Introduction of neural network
• Go deeper
• Introduction of convolutional neural network (CNN)
• Modern CNN models
3
History of Neural Network and
Deep Learning [Prof. Hung-Yi Lee]
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey, “Deep learning,” Nature, 2015.
4
How Powerful?
Object Recognition
Not deep-learning
Deep-learning based
Source:
https://devblogs.nvidia.com/parallelforall/mocha-jl-deep-learning-julia/
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-
machine-learning-deep-learning-ai/
5
Biological neuron and Perceptrons
9
Recap: Linear Classification
• Linear Classifier
• Let’s take the input image as x, and the linear classifier as W.
We need y = Wx + b as a 10-dimensional output vector, indicating the score for each class.
• For example, an image with 2 x 2 pixels & 3 classes of interest
we need to learn a linear classifier W (plus a bias b),
so that desirable outputs y = Wx + b can be expected.
11
Multi-Layer Perceptron: A Nonlinear Classifier (cont’d)
12
Layer 1 in MLP
13
Layer 2 in MLP
14
Multi-Layer Perceptron: A Nonlinear Classifier (cont’d)
15
Let’s Get a Closer Look…
• A single neuron 1
0.5
0
一5 0 5
output of neuron
activity of neuron
inputs to neuron
16
Input-Output Function of a Single Neuron
w = [0,1]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
17
Input-Output Function of a Single Neuron
w = [0.2,1]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
18
Input-Output Function of a Single Neuron
w = [0.3,0.9]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
19
Input-Output Function of a Single Neuron
w = [0.5,0.9]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
20
Input-Output Function of a Single Neuron
w = [0.6,0.8]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
21
Input-Output Function of a Single Neuron
w = [0.8,0.6]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
22
Input-Output Function of a Single Neuron
w = [0.9,0.5]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
23
Input-Output Function of a Single Neuron
w = [0.9,0.3]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
24
Input-Output Function of a Single Neuron
w = [1,0.2]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
25
Input-Output Function of a Single Neuron
w = [1,0]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
26
Input-Output Function of a Single Neuron (cont’d)
w = [0,1]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
27
Input-Output Function of a Single Neuron (cont’d)
w = [0,2]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
28
Input-Output Function of a Single Neuron (cont’d)
w = [0,3]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
29
Input-Output Function of a Single Neuron (cont’d)
w = [0,4]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
30
Input-Output Function of a Single Neuron (cont’d)
w = [0,5]
5
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
5 −5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
31
Input-Output Function of a Single Neuron (cont’d)
w = [0,1] contours
5 sets direction of boundary
sets steepness of boundary
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
一5 一5
0 一5 0 5
5 一5 z2 z1
z1
1
x(z1, z2) = 1+exp(一w1z1一w2z2)
32
Weight Space of a Single Neuron
W = [2,2]
1 1 1 1
0 0
5
0 0
5
0 0
z2
5
-5
0 0
5
-5 -5 -5
5 -5
0
5 -5 5 -5
0 0
5 -5
0
z1
1 1 1 1
W2 0 0.5
5
0.5
5
0.5
5
0.5
5
0 0 0 0 0 0 0 0
-5 -5 -5 -5
5 -5 5 -5
0
5 -5
0
5 -5
0 0
1 1 1 1
-2 0 2 4
W1 33
Training a Single Neuron
0 0
0 training data
class class
34
Training a Single Neuron
0 0
0 training data
class class
35
Training a Single Neuron
0 0
0 training data
class class
objective function:
training data
1
0 0
0
objective function:
w = [0,−1]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0.2 0
10
objective
0
0
−5
0
−5 z2
5
0 5 10 15 20
z1 iteration
38
Training a Single Neuron
w = [0.4,−0.7]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0.2 0
10
objective
0
0
−5
0
−5 z2
5
0 5 10 15 20
z1 iteration
39
Training a Single Neuron
w = [0.9,−0.2]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0.2 0
10
objective
0
0
−5
0
−5 z2
5
0 5 10 15 20
z1 iteration
40
Training a Single Neuron
w = [1.1,0.1]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0.2 0
10
objective
0
0
−5
0
−5 z2
5
0 5 10 15 20
z1 iteration
41
Training a Single Neuron
w = [1.4,0.4]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0.2 0
10
objective
0
0
−5
0
−5 z2
5
0 5 10 15 20
z1 iteration
42
Training a Single Neuron
w = [5.2,12.6]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0.2 0
10
objective
0
0
−5
0
−5 z2
5
0 5 10 15 20
z1 iteration
43
Training a Single Neuron
w = [9.7,25.3]
5
z2
0
0.8
−5
0.6 −5 0 5
z
x
1
0.4 5
0
0.2 10
objective
0
0
−5
−5 10
0
−5 z2
5
0 10 20 30 40 50
z1 iteration
44
Overfitting and Weight Decay
training data
1
0 0
0
objective function:
z2
0 0
−5 −5
−5 0 5 −5 0 5
z z
1 1
original
objective
regularised
0
10
0 10 20 30 40 50
iteration
46
Training a Single Neuron (cont’d)
z2
0 0
−5 −5
−5 0 5 −5 0 5
z z
1 1
original
objective
regularised
0
10
0 10 20 30 40 50
iteration
47
Training a Single Neuron (cont’d)
z2
0 0
−5 −5
−5 0 5 −5 0 5
z z
1 1
original
objective
regularised
0
10
0 10 20 30 40 50
iteration
48
Training a Single Neuron (cont’d)
z2
0 0
−5 0 5 −5 0 5
z z
1 1
original
objective
regularised
0
10
0 10 20 30 40 50
iteration
49
Training a Single Neuron (cont’d)
z2
0 0
−5 −5
−5 0 5 −5 0 5
z z
1 1
original
objective
regularised
0
10
0 10 20 30 40 50
iteration
50
Training a Single Neuron (cont’d)
z2
0 0
−5 −5
−5 0 5 −5 0 5
z z
1 1
original
objective
regularised
0
10
0 10 20 30 40 50
iteration
51
Single Hidden Layer Neural Networks
output
hidden
layer
inputs
layer
52
Sampling Random Neural Network Classifiers
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
53
Training a Neural Network with a Single Hidden Layer
objective function:
likelihood same as before
54
Training a Neural Network with a Single Hidden Layer
Networks with hidden layers can be fit using gradient descent using an
algorithm called back-propagation.
objective function:
likelihood same as before
55
Training a Neural Network with a Single Hidden Layer
0.8
z2
0
0.6
x
0.4
5
0.2
0 0
一5 一5
0 一5 0 5
5 一5 z2 z
1
z1
56
Training a Neural Network with a Single Hidden Layer
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
57
Training a Neural Network with a Single Hidden Layer
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
58
Training a Neural Network with a Single Hidden Layer
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
59
Training a Neural Network with a Single Hidden Layer
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
60
Training a Neural Network with a Single Hidden Layer
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
61
Training a Neural Network with a Single Hidden Layer
0.8
2
0
z
0.6
x
0.4
5
0.2
0 0
−5 −5
0 −5 0 5
−5 z z
5 2 1
z
1
62
Hierarchical Models with Many Layers
output
hidden
layer
inputs
layer
63
Convolutional Neural Networks (CNN):
Local Connectivity
Hidden layer
Input layer
w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2
Input layer
Channel 2
Input layer
Filter 1 Filter 2
71
Ref: Marc'Aurelio Ranzato, Tutorial in CVPR2014
Generalized to 2D Cases:
72
Ref: Marc'Aurelio Ranzato, Tutorial in CVPR2014
Generalized to 2D Cases:
73
Ref: Marc'Aurelio Ranzato, Tutorial in CVPR2014
Convolutional Layer
Input Output
74
Convolutional Layer
Input Output
75
Convolutional Layer
Input Output
76
Convolutional Layer
Input Output
77
Convolutional Layer
Input Output
78
Convolutional Layer
Input Output
79
Convolutional Layer
Input Output
80
81
Ref: Marc'Aurelio Ranzato, Tutorial in CVPR2014
Putting them together → CNN
• Local connectivity
• Weight sharing
• Handling multiple input channels
• Handling multiple output maps
Weight sharing
Local connectivity
86
Putting them together (cont’d)
• The brain/neuron view of CONV layer
90
Putting them together (cont’d)
• The brain/neuron view of CONV layer
91
Putting them together (cont’d)
• The brain/neuron view of CONV layer
92
Putting them together (cont’d)
• Image input with 32 x 32 pixels convolved repeatedly with 5 x 5 x 3
filters shrinks volumes spatially (32 -> 28 -> 24 -> …).
93
Variations of Convolution
• Zero Padding
• Output is the same size as input (doesn’t shrink as the network gets deeper).
94
Variations of Convolution
• Stride
• Step size across signals
95
Variations of Convolution
• Stride
• Step size across signals
96
Nonlinearity Layer in CNN
99
Nonlinearity Layer
• E.g., ReLU (Rectified Linear Unit)
• Pixel by pixel computation of max(0, x)
100
Receptive Field
• For convolution with kernel size n x n,
each entry in the output layer depends on a n x n receptive field in the input layer.
• Thus, for large images, we need many layers for each entry in output to “see” the entire input image.
Possible solution → downsample the image/feature map (see pooling layer next)
104
Pooling Layer
• Makes the representations smaller and more manageable
• Operates over each activation map independently
• E.g., Max Pooling
105
Pooling Layer for 2D Cases
• Reduces the spatial size and provides spatial invariance
106
Fully Connected (FC) Layer in CNN
109
FC Layer
• Contains neurons that connect to the entire input volume,
as in ordinary neural networks
110
FC Layer
• Contains neurons that connect to the entire input volume,
as in ordinary neural networks
111
CNN
112
LeNet
• Presented by Yann LeCun during the 1990s for reading digits
• Has the elements of modern architectures
113
LeNet [LeCun et al. 1998]
115
AlexNet [Krizhevsky et al., 2012]
• Repopularized CNN
by winning the ImageNet Challenge 2012
• 7 hidden layers, 650,000 neurons,
60M parameters
• Error rate of 16% vs. 26% for 2nd place.
116
Krizhevsky et al. “ImageNet classification with deep convolutional neural networks,” NIPS, 2012.
AlexNet
• Parameters
• Convolution: 1.89M parameters = 7.56MB
• Fully connected: 58.62M parameters = 234.49MB
• Computation
• Convolution: 591M Floating MAC
• Fully connected: 58.62M Floating MAC
• Full-HD 30fps: 805 GFLOPS (no overlap)
117
Krizhevsky et al. “ImageNet classification with deep convolutional neural networks,” NIPS, 2012.
Deep or Not?
• Depth of the network is critical for performance.
118
Ultra Deep Network
22 layers
http://cs231n.stanford.e
du/slides/winter1516_le 19 layers
cture8.pdf
8 layers
6.7%
7.3%
16.4%
7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
VGG (2014)
• Parameters:
• Convolution: ~14M, 56MB
• Fully connected: ~124M, 496MB
• Computation:
• Convolution: 15.52G Floating MAC
• Fully connected: 123.63M Floating MAC
• Full-HD 30fps: 19.3TFLOPS(no overlap)
125
Simonyan and Zisserman, “Very Deep Convolutional Networks for Large-scale Image Recognition,” arxiv :1409.1556v6, Sept. 2014
ResNet (2016)
• Can we just increase the #layer?
128
ResNeXT (2017)
• Deeper and wider → better…what else?
• Increase cardinality
132
Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." CVPR, 2017.
Squeeze-and-Excitation Net (SENet)
• How to improve acc. without much overhead?
• Feature recalibration (channel attention)
133
Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." CVPR, 2018.
Various Deep Learning Models…
131
Ref: Bianco et al., "Benchmark Analysis of Representative Deep Neural Network Architectures," arXiv:1810.00736.