cs231n 2018 Lecture09
cs231n 2018 Lecture09
cs231n 2018 Lecture09
Administrative
A2 due Wed May 2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 2 May 1, 2018
Last time: Deep learning frameworks
PaddlePaddle Chainer
(Baidu)
Caffe Caffe2
(UC Berkeley) (Facebook) MXNet
(Amazon)
CNTK
Developed by U Washington, CMU, MIT,
Hong Kong U, etc but main framework of
(Microsoft)
Torch PyTorch choice at AWS
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 4 May 1, 2018
Today: CNN Architectures
Case Studies
- AlexNet
- VGG
- GoogLeNet
- ResNet
Also....
- NiN (Network in Network) - DenseNet
- Wide ResNet - FractalNet
- ResNeXT - SqueezeNet
- Stochastic Depth - NASNet
- Squeeze-and-Excitation Network
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 5 May 1, 2018
Review: LeNet-5
[LeCun et al., 1998]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 6 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Architecture:
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
Max POOL3
FC6
FC7
FC8
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 7 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 8 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 9 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 10 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 11 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 12 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 13 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 14 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 15 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 16 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 17 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 18 May 1, 2018
Case Study: AlexNet
[Krizhevsky et al. 2012]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 19 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 20 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 21 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
ZFNet: Improved
hyperparameters over 152 layers 152 layers 152 layers
AlexNet
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 22 May 1, 2018
ZFNet [Zeiler and Fergus, 2013]
AlexNet but:
CONV1: change from (11x11 stride 4) to (7x7 stride 2)
CONV3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512
ImageNet top 5 error: 16.4% -> 11.7%
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 23 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 24 May 1, 2018
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
8 layers (AlexNet)
-> 16 - 19 layers (VGG16Net)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 25 May 1, 2018
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 26 May 1, 2018
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 27 May 1, 2018
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
[7x7]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 28 May 1, 2018
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 29 May 1, 2018
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K params: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456
POOL2: [56x56x128] memory: 56*56*128=400K params: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K params: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K params: 0
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K params: 0
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 VGG16
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 30 May 1, 2018
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K params: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456
POOL2: [56x56x128] memory: 56*56*128=400K params: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K params: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K params: 0
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K params: 0
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 VGG16
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 31 May 1, 2018
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728 Note:
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K params: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728 Most memory is in
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456 early CONV
POOL2: [56x56x128] memory: 56*56*128=400K params: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K params: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K params: 0 Most params are
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296 in late FC
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K params: 0
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000
TOTAL memory: 24M * 4 bytes ~= 96MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 32 May 1, 2018
INPUT: [224x224x3] memory: 224*224*3=150K params: 0 (not counting biases)
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64] memory: 224*224*64=3.2M params: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K params: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128] memory: 112*112*128=1.6M params: (3*3*128)*128 = 147,456
POOL2: [56x56x128] memory: 56*56*128=400K params: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K params: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K params: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K params: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K params: 0
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K params: 0
FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 VGG16
TOTAL memory: 24M * 4 bytes ~= 96MB / image (only forward! ~*2 for bwd) Common names
TOTAL params: 138M parameters
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 33 May 1, 2018
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
Details:
- ILSVRC’14 2nd in classification, 1st in
localization
- Similar training procedure as Krizhevsky
2012
- No Local Response Normalisation (LRN)
- Use VGG16 or VGG19 (VGG19 only
slightly better, more memory)
- Use ensembles for best results
- FC7 features generalize well to other
tasks
AlexNet VGG16 VGG19
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 34 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 35 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
- 22 layers
- Efficient “Inception” module
- No FC layers
- Only 5 million parameters!
12x less than AlexNet Inception module
- ILSVRC’14 classification winner
(6.7% top 5 error)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 36 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Inception module
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 37 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 38 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 39 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Example:
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 40 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 41 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
28x28x128
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 42 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
28x28x128
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 43 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 44 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 45 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
28x28x(128+192+96+256) = 28x28x672
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 46 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 47 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 48 May 1, 2018
Case Study: GoogLeNet Q: What is the problem with this?
[Szegedy et al., 2014] [Hint: Computational complexity]
Module input:
28x28x256
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 49 May 1, 2018
Reminder: 1x1 convolutions
1x1 CONV
56 with 32 filters
56
(each filter has size
1x1x64, and performs a
64-dimensional dot
56 product)
56
64 32
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 50 May 1, 2018
Reminder: 1x1 convolutions
1x1 CONV
56 with 32 filters
56
preserves spatial
dimensions, reduces depth!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 51 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 52 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
1x1 conv “bottleneck”
layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 53 May 1, 2018
Case Study: GoogLeNet Using same parallel layers as
naive example, and adding “1x1
[Szegedy et al., 2014]
conv, 64 filter” bottlenecks:
28x28x480
Conv Ops:
[1x1 conv, 64] 28x28x64x1x1x256
[1x1 conv, 64] 28x28x64x1x1x256
28x28x128 28x28x192 28x28x96 28x28x64
[1x1 conv, 128] 28x28x128x1x1x256
[3x3 conv, 192] 28x28x192x3x3x64
[5x5 conv, 96] 28x28x96x5x5x64
28x28x64 28x28x64 28x28x256
[1x1 conv, 64] 28x28x64x1x1x256
Total: 358M ops
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 54 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Inception module
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 55 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Stem Network:
Conv-Pool-
2x Conv-Pool
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 56 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Stacked Inception
Modules
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 57 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Classifier output
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 58 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Classifier output
(removed expensive FC layers!)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 59 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 60 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
Full GoogLeNet
architecture
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 61 May 1, 2018
Case Study: GoogLeNet
[Szegedy et al., 2014]
- 22 layers
- Efficient “Inception” module
- No FC layers
- 12x less params than AlexNet
- ILSVRC’14 classification winner Inception module
(6.7% top 5 error)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 62 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
“Revolution of Depth”
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 63 May 1, 2018
Case Study: ResNet
[He et al., 2015]
relu
Very deep networks using residual F(x) + x
connections
..
.
- 152-layer model for ImageNet X
F(x) relu
- ILSVRC’15 classification winner identity
(3.57% top 5 error)
- Swept all classification and
detection competitions in X
ILSVRC’15 and COCO’15! Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 64 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 65 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 66 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 67 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 68 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 69 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Solution: Use network layers to fit a residual mapping instead of directly trying to fit a
desired underlying mapping
relu
H(x) F(x) + x
F(x) X
relu relu
identity
X X
“Plain” layers Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 70 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Solution: Use network layers to fit a residual mapping instead of directly trying to fit a
desired underlying mapping
H(x) = F(x) + x relu
H(x) F(x) + x
Use layers to
fit residual
F(x) X F(x) = H(x) - x
relu relu
identity
instead of
H(x) directly
X X
“Plain” layers Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 71 May 1, 2018
Case Study: ResNet
[He et al., 2015]
F(x) X
relu
identity
X
Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 72 May 1, 2018
Case Study: ResNet
[He et al., 2015]
X
Residual block
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 73 May 1, 2018
Case Study: ResNet
[He et al., 2015]
- Periodically, double # of
filters and downsample F(x) X
relu
identity
spatially using stride 2
(/2 in each dimension)
- Additional conv layer at
the beginning X
Residual block
Beginning
conv layer
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 74 May 1, 2018
Case Study: ResNet No FC layers
besides FC
1000 to
[He et al., 2015] output
classes
- Periodically, double # of
filters and downsample F(x) X
relu
identity
spatially using stride 2
(/2 in each dimension)
- Additional conv layer at
the beginning X
- No FC layers at the end Residual block
(only FC 1000 to output
classes)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 75 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 76 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 77 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 78 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 79 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Experimental Results
- Able to train very deep
networks without degrading
(152 layers on ImageNet, 1202
on Cifar)
- Deeper networks now achieve
lowing training error as
expected
- Swept 1st place in all ILSVRC
and COCO 2015 competitions
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 80 May 1, 2018
Case Study: ResNet
[He et al., 2015]
Experimental Results
- Able to train very deep
networks without degrading
(152 layers on ImageNet, 1202
on Cifar)
- Deeper networks now achieve
lowing training error as
expected
- Swept 1st place in all ILSVRC
and COCO 2015 competitions
ILSVRC 2015 classification winner (3.6%
top 5 error) -- better than “human
performance”! (Russakovsky 2014)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 81 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 82 May 1, 2018
Comparing complexity...
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 83 May 1, 2018
Comparing complexity... Inception-v4: Resnet + Inception!
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 84 May 1, 2018
VGG: Highest
Comparing complexity... memory, most
operations
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 85 May 1, 2018
GoogLeNet:
Comparing complexity... most efficient
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 86 May 1, 2018
AlexNet:
Comparing complexity... Smaller compute, still memory
heavy, lower accuracy
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 87 May 1, 2018
ResNet:
Comparing complexity... Moderate efficiency depending on
model, highest accuracy
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 88 May 1, 2018
Forward pass time and power consumption
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 89 May 1, 2018
Other architectures to know...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 90 May 1, 2018
Network in Network (NiN)
[Lin et al. 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 91 May 1, 2018
Improving ResNets...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 92 May 1, 2018
Improving ResNets...
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 93 May 1, 2018
Improving ResNets...
Aggregated Residual Transformations for Deep
Neural Networks (ResNeXt)
[Xie et al. 2016]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 94 May 1, 2018
Improving ResNets...
Deep Networks with Stochastic Depth
[Huang et al. 2016]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 95 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Network ensembling
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 96 May 1, 2018
Improving ResNets...
“Good Practices for Deep Feature Fusion”
[Shao et al. 2016]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 97 May 1, 2018
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Adaptive feature map reweighting
19 layers 22 layers
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 98 May 1, 2018
Improving ResNets...
Squeeze-and-Excitation Networks (SENet)
[Hu et al. 2017]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 99 May 1, 2018
Beyond ResNets...
FractalNet: Ultra-Deep Neural Networks without Residuals
[Larsson et al. 2017]
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018
0
Beyond ResNets...
Densely Connected Convolutional Networks
[Huang et al. 2017]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018
Efficient networks...
SqueezeNet: AlexNet-level Accuracy With 50x Fewer
Parameters and <0.5Mb Model Size
[Iandola et al. 2017]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018
Meta-learning: Learning to learn network architectures...
Neural Architecture Search with Reinforcement Learning (NAS)
[Zoph et al. 2016]
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018
3
Meta-learning: Learning to learn network architectures...
Learning Transferable Architectures for Scalable Image
Recognition
[Zoph et al. 2017]
- Applying neural architecture search (NAS) to a
large dataset like ImageNet is expensive
- Design a search space of building blocks (“cells”)
that can be flexibly stacked
- NASNet: Use NAS to find best cell structure on
smaller CIFAR-10 dataset, then transfer
architecture to ImageNet
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018
4
Summary: CNN Architectures
Case Studies
- AlexNet
- VGG
- GoogLeNet
- ResNet
Also....
- NiN (Network in Network) - DenseNet
- Wide ResNet - FractalNet
- ResNeXT - SqueezeNet
- Stochastic Depth - NASNet
- Squeeze-and-Excitation Network
10
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018
5
Summary: CNN Architectures
- VGG, GoogLeNet, ResNet all in wide use, available in model zoos
- ResNet current best default, also consider SENet when available
- Trend towards extremely deep networks
- Significant research centers around design of layer / skip
connections and improving gradient flow
- Efforts to investigate necessity of depth vs. width and residual
connections
- Even more recent trend towards meta-learning
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 106 May 1, 2018