0% found this document useful (0 votes)

18 views107 pages

Week3 Lec1 2

Lecture notes of CV801

Uploaded by

Abrham Gebreselasie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views107 pages

Week3 Lec1 2

Lecture notes of CV801

Uploaded by

Abrham Gebreselasie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

CV801: Week 3 Lecture 1 and 2

CNN Architectures
Project Discussion
ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

16.4
15 19 22
11.7 layers layers
10 7.3 6.7
8 layers 8 layers
3.6 5.1
5 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Lin et al Sanchez & Krizhevsky et al Zeiler & Simonyan & Szegedy et al He et al Shao et al Hu et al Russakovsky et al
(ResNet) (SENet)
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet)
AlexNet

Input size Layer Output size

Layer C H / W filters kernel stride pad C H / W memory
memory(KB)
(KB) params
params(k)
(k) flop
flop
(M)
(M)
conv1 3 227 64 11 4 2 64 56 784
784 2323 7373
pool1 64 56 3 2 0 64 27 182
182 00 00
conv2 64 27 192 5 1 2 192 27 547
547 307
307 224
224
pool2 192 27 3 2 0 192 13 127
127 00 00
conv3 192 13 384 3 1 1 384 13 254
254 664
664 112
112
conv4 384 13 256 3 1 1 256 13 169
169 885
885 145
145
conv5 256 13 256 3 1 1 256 13 169
169 590
590 100
100
pool5 256 13 3 2 0 256 6 36
36 00 00
flatten 256 6 9216 36
36 00 00
fc6 9216 4096 4096 16
16 37,749
37,749 3838
fc7 4096 4096 4096 16 16,777 17
fc8 4096 1000 1000 4 4,096 46
AlexNe
t Cin: 256, Cout 384

• 224 x 224 inputs

• Used “Local response normalization”; Not
• 5 Convolutional layers used anymore
• 3 fully-connected layers • Trained on two GTX 580 GPUs – only 3GB of
memory each! Model split over two GPUs.
• 3 Max pooling layers
• ReLU nonlinearities to the output of every conv and fc layer.
• The output of the last fully-connected layer is fed to a 1000-way softmax which
produces a distribution over the 1000 class labels.
ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

16.4
15 19 22
11.7 layers layers
10 7.3 6.7
8 layers 8 layers
3.6 5.1
5 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Sanchez & Simonyan & Szegedy et al He et al Russakovsky et
Lin et al Krizhevsky et al Zeiler & Hu et al (GoogLeNet) (ResNet) (SENet)
Perronnin (AlexNet) Fergus Zisserman (VGG) aSlhaoet al
VGG: Deeper Networks, Regular Design Softmax
Softmax
FC 1000
FC 4096
FC 1000 FC 4096

VGG Design rules: FC 4096

FC 4096
Pool
3x3 conv, 512

All conv are 3x3 stride 1 pad 1 Pool

3x3 conv, 512
3x3 conv, 512
3x3 conv, 512

All max pool are 2x2 stride 2 3x3 conv, 512

3x3 conv, 512
3x3 conv, 512
Pool

AlexNet VGG16 VGG19

Simonyan and Zissermann, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, ICLR 2015
VGG-16
VGG: Deeper Networks, Regular Design
VGG Design rules:
All conv are 3x3 stride 1 pad 1
All max pool are 2x2 stride 2
After pool, double #channels
Network has 5 convolutional stages:
Stage 1: conv-conv-pool
Stage 2: conv-conv-pool
Stage 3: conv-conv-conv-[conv]- pool
Stage 4: conv-conv-conv-[conv]-pool
Stage 5: conv-conv-conv-[conv]-pool
(VGG-16 has 3 conv in stages 3, 4 and 5)
(VGG-19 has 4 conv in stages 3, 4 and 5)
VGG: Deeper Networks, Regular Design
VGG Design rules:
All conv are 3x3 stride 1 pad 1
All max pool are 2x2 stride 2
After pool, double #channels

Option 1:
Conv(5x5, C -> C)

Params: 25C2
FLOPs: 25C2HW
VGG: Deeper Networks, Regular Design
VGG Design rules: Two 3x3 conv has same
receptive field as a single 5x5
All conv are 3x3 stride 1 pad 1 conv, but has fewer parameters
All max pool are 2x2 stride 2 and takes less computation!
After pool, double #channels Softmax
FC 1000
FC 4096
FC 4096

Option 1: Option 2: Pool

3x3 conv, 256

Conv(5x5, C -> C) Conv(3x3, C -> C) 3x3 conv, 384

Pool

Conv(3x3, C -> C) 3x3 conv, 384

Pool
5x5 conv, 256
11x11 conv, 96

Params: 25C2 Params: 18C2 Input

AlexNet
FLOPs: 25C2HW FLOPs: 18C2HW
VGG: Deeper Networks, Regular Design
VGG Design rules:
All conv are 3x3 stride 1 pad 1
All max pool are 2x2 stride 2
After pool, double #channels

Input: C x 2H x 2W
Layer: Conv(3x3, C->C)

FLOPs: 36HWC2
VGG: Deeper Networks, Regular Design
VGG Design rules:
All conv are 3x3 stride 1 pad 1
All max pool are 2x2 stride 2
After pool, double #channels

Input: C x 2H x 2W Input: 2C x H x W
Layer: Conv(3x3, C->C) Conv(3x3, 2C -> 2C)

FLOPs: 36HWC2 FLOPs: 36HWC2

VGG: Deeper Networks, Regular Design
VGG Design rules: Most of the Conv layers at
All conv are 3x3 stride 1 pad 1 EACH spatial resolution take
All max pool are 2x2 stride 2 the same amount of
computation!
After pool, double #channels

Input: C x 2H x 2W Input: 2C x H x W
Layer: Conv(3x3, C->C) Conv(3x3, 2C -> 2C)

FLOPs: 36HWC2 FLOPs: 36HWC2

Simonyan and Zissermann, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, ICLR 2015
AlexNet vs VGG-16: Much bigger network!

Memory, (KB) Params, M MFLOPs

AlexNet total: 1.9 MB AlexNet total: 61M AlexNet total: 0.7 GFLOP
VGG-16 total: 48.6 MB (25x) VGG-16 total: 138M (2.3x) VGG-16 total: 13.6 GFLOP (19.4x)
ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

16.4
15 19 22
11.7 layers layers
10 7.3 6.7
8 layers 8 layers
3.6 5.1
5 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Sanchez & Simonyan & Szegedy et al He et al Shao et al Hu et al Russakovsky et al
Lin et al Krizhevsky et al Zeiler & (SENet)
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet)
ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

16.4
15 19 22
11.7 layers layers
10 7.3 6.7
8 layers 8 layers
3.6 5.1
5 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Szegedy et al He et al Russakovsky et
Lin et al Sanchez & Krizhevsky et al Zeiler & Simonyan &
alShaoet al Hu et al
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet) (SENet)
GoogLeNet: Focus on Efficiency

Many innovations for efficiency: reduce parameter

count, memory usage, and computation

Szegedy et al, “Going deeper with convolutions”, CVPR 2015

GoogLeNet: Aggressive Stem
Stem network at the start aggressively downsamples input
(In VGG-16: Most of the compute was at the start)

Szegedy et al, “Going deeper with convolutions”, CVPR 2015

GoogLeNet: Aggressive Stem
Stem network at the start aggressively downsamples input

Input size Layer Output size

Layer C H / W filters kernelstride pad C H/W memory (KB) params (K) flop (M)
conv 3 224 64 7 2 3 64 112 3136 9 118
max-pool 64 112 3 2 1 64 56 784 0 2
conv 64 56 64 1 1 0 64 56 784 4 13
conv 64 56 192 3 1 1 192 56 2352 111 347
max-pool 192 56 3 2 1 192 28 588 0 1

Total from 224 to 28 spatial resolution:

Memory: 7.5 MB
Params: 124K
MFLOP: 418
Szegedy et al, “Going deeper with convolutions”, CVPR 2015
GoogLeNet: Aggressive Stem
Stem network at the start aggressively downsamples input
(Recall in VGG-16: Most of the compute was at the start)
Input size Layer Output size
Layer C H / W filters kernelstride pad C H/W memory (KB) params (K) flop (M)
conv 3 224 64 7 2 3 64 112 3136 9 118
max-pool 64 112 3 2 1 64 56 784 0 2
conv 64 56 64 1 1 0 64 56 784 4 13
conv 64 56 192 3 1 1 192 56 2352 111 347
max-pool 192 56 3 2 1 192 28 588 0 1

Total from 224 to 28 spatial resolution: Compare VGG-16:

Memory: 7.5 MB Memory: 42.9 MB (5.7x)
Params: 124K Params: 1.1M (8.9x)
MFLOP: 418 MFLOP: 7485 (17.8x)
Szegedy et al, “Going deeper with convolutions”, CVPR 2015
GoogLeNet: Global Average Pooling
No large FC layers at the end! Instead uses global average pooling to
collapse spatial dimensions, and one linear layer to produce class scores
(Recall VGG-16: Most parameters were in the FC layers!)

Input size Layer Output size

Layer C H/W filters kernel stride pad C H/W memory (KB) params (k) flop (M)
avg-pool 1024 7 7 1 0 1024 1 4 0 0
fc 1024 1000 1000 0 1025 1
Compare with VGG-16:
Layer C H/W filters kernel stride pad C H/W memory (KB) params (K) flop (M)
flatten 512 7 25088 98
fc6 25088 4096 4096 16 102760 103
fc7 4096 4096 4096 16 16777 17
fc8 4096 1000 1000 4 4096 4
AlexNet vs VGG-16
AlexNet vs VGG-16 AlexNet vs VGG-16 AlexNet vs VGG-16
(Memory, KB) (MFLOPs) (Params, M)
30000 5000
120000
4500
25000
4000 100000
3500
20000
80000
3000
15000 2500
60000
2000
10000 1500 40000
1000
5000
500 20000

0 0
0

fc8
1

1
2
3
4
5
6

fc6
fc7
nv

nv
nv
nv
nv
nv
v
fc

8
5
nv

fc
co

co
co
co
co
co

n
co

co
co
AlexNet VGG-16 AlexNet VGG-16 AlexNet VGG-16
AlexNet total: 1.9 MB AlexNet total: 0.7 GFLOP AlexNet total: 61M
VGG-16 total: 48.6 MB (25x) VGG-16 total: 13.6 GFLOP (19.4x) VGG-16 total: 138M (2.3x)
GoogLeNet: Global Average Pooling
No large FC layers at the end! Instead uses “global average pooling” to
collapse spatial dimensions, and one linear layer to produce class
scores (Recall VGG-16: Most parameters were in the FC layers!)

Input size Layer Output size

Layer C H/W filters kernel stride pad C H/W memory (KB) params (k) flop (M)
avg-pool 1024 7 7 1 0 1024 1 4 0 0
fc 1024 1000 1000 0 1025 1
Compare wi th VGG-16:
Layer C H/W filters kernel stride pad C H/W memory (KB) params (K) flop (M)
flatten 512 7 25088 98
fc6 25088 4096 4096 16 102760 103
fc7 4096 4096 4096 16 16777 17
fc8 4096 1000 1000 4 4096 4
GoogLeNet: Inception Module
Inception module
Local unit with
parallel branches

Local structure repeated

many times throughout the
network

Szegedy et al, “Going deeper with convolutions”, CVPR 2015

GoogLeNet: Inception Module
Inception module
Local unit with
parallel branches

Local structure repeated

many times throughout the
network

Uses 1x1 “Bottleneck”

layers to reduce channel
dimension before
expensive conv (used in
ResNet!)
Szegedy et al, “Going deeper with convolutions”, CVPR 2015
GoogLeNet: Auxiliary Classifiers

Training using loss at the end of the network didn’t work well:
Network is too deep, gradients don’t propagate cleanly

As a hack, attach “auxiliary classifiers” at several intermediate points

in the network that also try to classify the image and receive loss

GoogLeNet was before batch normalization! With BatchNorm no

longer need to use this trick
ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

16.4
15 19 22
11.7 layers layers
10 7.3
8 layers 8 layers
6.7 5.1
5 3.6 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Lin et al Sanchez & Krizhevsky et al Zeiler & Simonyan & Szegedy et al He et al Russakovsky et alShao et al Hu et al
(ResNet) (SENet)
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet)
Residual Networks
Outline

• Residual Networks (ResNet)

ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks
Once we have Batch Normalization, we can train networks with 10+ layers.
What happens as we go deeper?

Test error
Deeper model does worse than
shallow model! 56-layer

Initial guess: Deep model is 20-layer

overfitting since it is much
bigger than the other model
Iterations

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks
Once we have Batch Normalization, we can train networks with 10+ layers.
What happens as we go deeper?
Training error Test error

56-layer
56-layer

20-layer

20-layer
Iterations Iterations

In fact the deep model seems to be underfitting since it also performs worse
than the shallow model on the training set! It is actually underfitting

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks
A deeper model can emulate a shallower model: copy layers from
shallower model, set extra layers to identity

Thus deeper models should do at least as good as shallow models

Hypothesis: This is an optimization problem. Deeper models are

harder to optimize, and in particular don’t learn identity functions to
emulate shallow models

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks
A deeper model can emulate a shallower model: copy layers from
shallower model, set extra layers to identity

Thus deeper models should do at least as good as shallow models

Hypothesis: This is an optimization problem. Deeper models are

harder to optimize, and in particular don’t learn identity functions to
emulate shallow models

Solution: Change the network so learning identity functions with

extra layers is easy!

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks
Solution: Change the network so learning identity functions with extra layers is easy!
F(x) + x relu
H(x)
F(x)
conv
conv
relu Additive
relu
“shortcut”
conv conv

X X
“Plain” block Residual Block
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Residual Networks
Solution: Change the network so learning identity functions with extra layers is easy!

relu
H(x) F(x) + x
F(x)
If you set these to conv
conv
0, the whole block Additive
relu will compute the relu
identity function! “shortcut”
conv conv

X X
“Plain” block Residual Block
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

A residual network is a stack of

3x3 conv, 512
3x3 conv, 512

many residual blocks relu 3x3 conv, 512

F(x) + x
3x3 conv, 512, /2

..
Regular design, like VGG: each .
3x3 conv, 128
3x3 conv
residual block has two 3x3 conv 3x3 conv, 128

F(x) relu 3x3 conv, 128

3x3 conv, 128

Network is divided into stages: the 3x3 conv

3x3 conv, 128
3x3 conv, 128, / 2

first block of each stage halves the 3x3 conv, 64

3x3 conv, 64
resolution (with stride-2 conv) and
doubles the number of channels X 3x3 conv, 64
3x3 conv, 64

Residual block 3x3 conv, 64

3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

Similar to GoogleNet, downsample the input 4x before applying

3x3 conv, 512
3x3 conv, 512, /2

residual blocks: ..
.
3x3 conv, 128
3x3 conv, 128
Input Output 3x3 conv, 128

size Layer size 3x3 conv, 128

3x3 conv, 128

C H/W filters kernel stride pad C H/W

3x3 conv, 128, / 2

Layer 3x3 conv, 64

conv 3 224 64 7 2 3 64 112 3x3 conv, 64

max-pool 64 112 3 2 1 64 56 3x3 conv, 64

3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

3x3 conv, 512, /2

..
Like GoogLeNet, no big fully-connected-layers: instead use .

global average pooling and a single linear layer at the end

3x3 conv, 128
3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

3x3 conv, 512
ResNet-18: 3x3 conv, 512

Stem: 1 conv layer 3x3 conv, 512

3x3 conv, 512

Stage 1 (C=64): 2 res. block = 4 conv 3x3 conv, 512, /2

Stage 2 (C=128): 2 res. block = 4 conv ..

.
Stage 3 (C=256): 2 res. block = 4 conv 3x3 conv, 128

Stage 4 (C=512): 2 res. block = 4 conv

3x3 conv, 128

Linear 3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

ImageNet top-5 error: 10.92 3x3 conv, 64

3x3 conv, 64

GFLOP: 1.8 3x3 conv, 64

3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Error rates are 224x224 single-crop testing, reported by torchvision
Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

3x3 conv, 512
ResNet-18: ResNet-34: 3x3 conv, 512

Stem: 1 conv layer Stem: 1 conv layer 3x3 conv, 512

3x3 conv, 512

Stage 1 (C=64): 2 res. block = 4 conv Stage 1: 3 res. block = 6 conv 3x3 conv, 512, /2

Stage 2 (C=128): 2 res. block = 4 conv Stage 2: 4 res. block = 8 conv ..

.
Stage 3 (C=256): 2 res. block = 4 conv Stage 3: 6 res. block = 12 conv 3x3 conv, 128

Stage 4 (C=512): 2 res. block = 4 conv Stage 4: 3 res. block = 6 conv

3x3 conv, 128

Linear Linear 3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

ImageNet top-5 error: 10.92 ImageNet top-5 error: 8.58 3x3 conv, 64
3x3 conv, 64

GFLOP: 1.8 GFLOP: 3.6 3x3 conv, 64

3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Error rates are 224x224 single-crop testing, reported by torchvision
Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

3x3 conv, 512
ResNet-18: ResNet-34: 3x3 conv, 512

Stem: 1 conv layer Stem: 1 conv layer 3x3 conv, 512

3x3 conv, 512

Stage 1 (C=64): 2 res. block = 4 conv Stage 1: 3 res. block = 6 conv 3x3 conv, 512, /2

Stage 2 (C=128): 2 res. block = 4 conv Stage 2: 4 res. block = 8 conv ..

.
Stage 3 (C=256): 2 res. block = 4 conv Stage 3: 6 res. block = 12 conv 3x3 conv, 128

Stage 4 (C=512): 2 res. block = 4 conv Stage 4: 3 res. block = 6 conv

3x3 conv, 128

Linear Linear 3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

ImageNet top-5 error: 10.92 ImageNet top-5 error: 8.58 3x3 conv, 64
3x3 conv, 64

GFLOP: 1.8 GFLOP: 3.6 3x3 conv, 64

3x3 conv, 64

VGG-16: 3x3 conv, 64

ImageNet top-5 error: 9.62 Pool

7x7 conv, 64, / 2

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

GFLOP: 13.6
Input

Error rates are 224x224 single-crop testing, reported by torchvision

Residual Networks: Basic Block

Conv(3x3, C->C)

“Basic”
Residual block

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks: Basic Block

Conv(3x3, C->C) FLOPs: 9HWC2

“Basic” Total FLOPs:

Residual block 18HWC2

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Residual Networks: Bottleneck Block

Conv(1x1, C->4C)
Conv(3x3, C->C) FLOPs: 9HWC2
Conv(3x3, C->C)
Conv(3x3, C->C) FLOPs: 9HWC2
Conv(1x1, 4C->C)

“Basic” Total FLOPs:

Residual block 18HWC2 “Bottleneck”
Residual block
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Residual Networks: Bottleneck Block
More layers, less computational cost!

FLOPs: 4HWC2 Conv(1x1, C->4C)

Conv(3x3, C->C) FLOPs: 9HWC2
FLOPs: 9HWC2 Conv(3x3, C->C)
Conv(3x3, C->C) FLOPs: 9HWC2
FLOPs: 4HWC2 Conv(1x1, 4C->C)

“Basic” Total FLOPs:

Residual block 18HWC2 Total FLOPs: “Bottleneck”
17HWC2 Residual block
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

3x3 conv, 512, /2

Stage 1 Stage 2 Stage 3 Stage 4 ..

.
Block Stem FC ImageNet 3x3 conv, 128
type layers Blocks Layers Blocks Layers Blocks Layers Blocks Layers layers GFLOP top-5 error 3x3 conv, 128

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92 3x3 conv, 128

ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58 3x3 conv, 128

ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13 3x3 conv, 128

3x3 conv, 128, / 2

ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44 3x3 conv, 64

ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94 3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016 Input

Error rates are 224x224 single-crop testing, reported by torchvision

Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

ResNet-50 is the same as ResNet-34, but replaces Basic blocks with Bottleneck Blocks. 3x3 conv, 512

This is a great baseline architecture for many tasks even today! 3x3 conv, 512
3x3 conv, 512, /2

Stage 1 Stage 2 Stage 3 Stage 4 ..

.
Block Stem FC ImageNet 3x3 conv, 128
type layers Blocks Layers Blocks Layers Blocks Layers Blocks Layers layers GFLOP top-5 error 3x3 conv, 128

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92 3x3 conv, 128

ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58 3x3 conv, 128

ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13 3x3 conv, 128

3x3 conv, 128, / 2

ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44 3x3 conv, 64

ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94 3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016 Input

Error rates are 224x224 single-crop testing, reported by torchvision

Softmax

Residual Networks
FC 1000
Pool

3x3 conv, 512

Deeper ResNet-101 and ResNet-152 models are more 3x3 conv, 512

accurate, but also more computationally heavy 3x3 conv, 512

3x3 conv, 512, /2

Stage 1 Stage 2 Stage 3 Stage 4 ..

.
Block Stem FC ImageNet 3x3 conv, 128
type layers Blocks Layers Blocks Layers Blocks Layers Blocks Layers layers GFLOP top-5 error 3x3 conv, 128

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92 3x3 conv, 128

ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58 3x3 conv, 128

ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13 3x3 conv, 128

3x3 conv, 128, / 2

ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44 3x3 conv, 64

ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94 3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016 Input

Error rates are 224x224 single-crop testing, reported by torchvision

Residual Networks
Ø The spatial downsampling is achieved by the residual block at the start of
each stage, using 3x3 conv with stride 2.

Ø How about short connections of such downsampling residual block?

Ø (1x1 conv with stride 2 at the shortcut connection).
Residual Networks

- Able to train very deep networks

- Deeper networks do better than
shallow networks (as expected)
- Swept 1st place in all ILSVRC and
COCO 2015 competitions
- Still widely used today!

He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016

Improving Residual Networks: Block Design
Original ResNet block “Pre-Activation” ResNet Block

ReLU Note ReLU after residual:

Cannot actually learn Conv

identity function since
Batch Norm outputs are nonnegative! ReLU
Conv Batch Norm
Note ReLU inside residual:
ReLU Conv
Batch Norm Can learn true identity ReLU
function by setting Conv
Conv weights to zero! Batch Norm

He et al, ”Identity mappings in deep residual networks”, ECCV 2016

Improving Residual Networks: Block Design
Original ResNet block “Pre-Activation” ResNet Block

ReLU
Slight improvement in accuracy
Conv
(ImageNet top-1 error)
Batch Norm ReLU
Conv ResNet-152: 21.3 vs 21.1 Batch Norm
ResNet-200: 21.8 vs 20.7
ReLU Conv
Batch Norm Not actually used that much in ReLU
Conv practice Batch Norm

He et al, ”Identity mappings in deep residual networks”, ECCV 2016

Improving ResNets

FLOPs:
Conv(1x1, C->4C) 4HWC2

FLOPs:
Conv(3x3, C->C) 9HWC2

FLOPs:
Conv(1x1, 4C->C) 4HWC2

“Bottleneck” Total FLOPs:

Residual block 17HWC2
Grouped Convolution (recap)
Convolution with groups=G:
X Cout/G G parallel conv layers; each “sees” Cin/G input
channels and produces Cout/G output channels
Cin/G
Input: C x H x W
x Cout/G
Split to G x [(Cin / G) x H x W]
Cin Cout Weight: G x (Cout / G) x (Cin /G) x K x K parallel
Cin/G
convolutions
Output: G x [(Cout / G) x H’ x W’] Concat to
Cout x H’ x W’
FLOPs: CoutCinK2H'W'/G

Convolution with groups=1: Normal convolution

Input: Cin x H x W Weight: Cout x Cin x K x K Output: Cout x H’ x W’ FLOPs: CoutCinK2H'W'

All convolutional kernels touch all Cin channels of the input

Improving ResNets: ResNeXt
G parallel pathways

FLOPs:
Conv(1x1, C->4C) 4HWC2

FLOPs:
Conv(3x3, C->C) 9HWC2 Conv(1x1, c->4C) Conv(1x1, c->4C)
FLOPs:
Conv(1x1, 4C->C) 4HWC2 Conv(3x3, c->c) … Conv(3x3, c->c)

Conv(1x1, 4C->c) Conv(1x1, 4C->c)

“Bottleneck” Total FLOPs:
Residual block 17HWC2
Xie et al, “Aggregated residual transformations for deep neural networks”, CVPR 2017
Improving ResNets: ResNeXt
G parallel pathways

FLOPs:
Conv(1x1, C->4C) 4HWC2

FLOPs:
Conv(3x3, C->C) 9HWC2
Conv(1x1, c->4C) Conv(1x1, c->4C)
4HWCc
FLOPs:
Conv(1x1, 4C->C) 4HWC2
9HWc2 Conv(3x3, c->c) … Conv(3x3, c->c)

4HWCc Conv(1x1, 4C->c) Conv(1x1, 4C->c)

“Bottleneck” Total FLOPs:
Residual block 17HWC2
Xie et al, “Aggregated residual transformations for deep neural networks”, CVPR 2017 Total FLOPs:
(8Cc + 9c2)*HWG
Improving ResNets: ResNeXt
G parallel pathways

FLOPs:
Conv(1x1, C->4C) 4HWC2
Conv(1x1, c->4C) Conv(1x1, c->4C)
FLOPs: 4HWCc
Conv(3x3, C->C) 9HWC2

FLOPs:
9HWc2 Conv(3x3, c->c) … Conv(3x3, c->c)
Conv(1x1, 4C->C) 4HWC2
4HWCc Conv(1x1, 4C->c) Conv(1x1, 4C->c)

“Bottleneck” Total FLOPs:

Residual block 17HWC2 Total FLOPs:
Equal cost when (8Cc + 9c2)*HWG
Xie et al, “Aggregated residual transformations for deep neural networks”, CVPR 2017
9Gc2 + 8GCc – 17C2 = 0
Example: C=64, G=4, c=24; C=64, G=32, c=4
Improving ResNets: ResNeXt
Improving ResNets: ResNeXt G parallel pathways

Equivalent formulation
with grouped convolution

Conv(1x1, Gc->4C) Conv(1x1, c->4C) Conv(1x1, c->4C)

4HWCc
Conv(3x3, Gc->Gc,
9HWc2 Conv(3x3, c->c) … Conv(3x3, c->c)
groups=G)

Conv(1x1, 4C->Gc) 4HWCc Conv(1x1, 4C->c) Conv(1x1, 4C->c)

Total FLOPs:
ResNeXt block: Equal cost when (8Cc + 9c2)*HWG
Grouped convolution 9Gc2 + 8GCc – 17C2 = 0
Xie et al, “Aggregated residual transformations for deep neural networks”, CVPR 2017 Example: C=64, G=4, c=24; C=64, G=32, c=4
ResNeXt: Maintain computation by adding groups!

Model Groups Group width Top-1 Error Model Groups Group width Top-1 Error
ResNet-50 1 64 23.9 ResNet-101 1 64 22.0
ResNeXt-50 2 40 23 ResNeXt-101 2 40 21.7
ResNeXt-50 4 24 22.6 ResNeXt-101 4 24 21.4
ResNeXt-50 8 14 22.3 ResNeXt-101 8 14 21.3
ResNeXt-50 32 4 22.2 ResNeXt-101 32 4 21.2

Adding groups improves performance with same computational complexity!

Xie et al, “Aggregated residual transformations for deep neural networks”, CVPR 2017
ImageNet Classification Challenge
30 152 152 152
28.2
25.8 layers layers layers
25

20
Error Rate

16.4
15 19 22
11.7 layers layers
10 7.3 6.7
8 layers 8 layers 5.1
5 3.6 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Lin et al Sanchez & Krizhevsky et al Zeiler & Simonyan & Szegedy et al He et al Hu et al Russakovsky et al
Shao et al
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet) (SENet)
Squeeze-and-Excitation Networks

Adds a ”Squeeze-and-excite” branch to

each residual block that performs global
pooling, full-connected layers, and
multiplies back onto feature map

Adds global context to each residual block!

Won ILSVRC 2017 with ResNeXt-152-SE

Hu et al, “Squeeze-and-Excitation networks”, CVPR 2018

ImageNet Classification Challenge
30 28.2 152 152 152
25.8 layers layers layers
25

20
Error Rate

16.4
Completion of the challenge:
15 Annual ImageNet 19 22 no longer
11.7 competition
held after 2017 ->layers
now moved
layersto Kaggle.
10 7.3 6.7
8 layers 8 layers 5.1
5 3.6 3 2.3
Shallow
0
2010 2011 2012 2013 2014 2014 2015 2016 2017 Human
Lin et al Sanchez & Krizhevsky et al Zeiler & Simonyan & Szegedy et al He et al Hu et al Russakovsky et al
Shao et al (SENet)
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet)
Densely Connected Neural Networks
Softmax

FC
1x1 conv, 64
Pool

Dense Block 3
Dense blocks where each layer is Concat
Conv
connected to every other layer in 1x1 conv, 64 Pool

feedforward fashion Conv

Concat
Dense Block 2

Alleviates vanishing gradient, Conv

Conv

Pool
strengthens feature propagation, Concat Conv
encourages feature reuse Dense Block 1
Conv
Conv

Input Input

Dense Block
Huang et al, “Densely connected neural networks”, CVPR 2017
MobileNets: Tiny Networks (For Mobile Devices)
Standard Convolution Block Depthwise Separable Convolution
Total cost: 9C2HW Total cost: (9C + C2)HW

ReLU
ReLU Batch Norm
Batch Norm C2HW Conv(1x1, C->C) “Pointwise Convolution”
Conv(3x3, C->C) 9C2HW ReLU
Batch Norm
Speedup = 9C2/(9C+C2)
Conv(3x3, C->C,
= 9C/(9+C) 9CHW “Depthwise Convolution”
groups=C)
=> 9 (as C->>9)
Howard et al, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”, 2017
MobileNets: Tiny Networks (For Mobile Devices)
Depthwise Separable Convolution
Total cost: (9C + C2)HW

Also related: ReLU

Batch Norm
ShuffleNet: Zhang et al, CVPR 2018 MobileNetV2:
Sandler et al, CVPR 2018 ShuffleNetV2: Ma et al, C2HW Conv(1x1, C->C) “Pointwise Convolution”
ECCV 2018, MobileOne: CVPR 2023 ReLU
Batch Norm

Conv(3x3, C->C,
9CHW “Depthwise Convolution”
groups=C)

Howard et al, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”, 2017
Inverted Residual Block

(a) Conventional residual block connects the layers with high number of channels.
(b) Inverted residual connects the bottlenecks instead of layers with high number of channels.

Inverted residual block is more memory efficient

Inverted Residual Block

Conv(1x1, C->4C) Conv(1x1, 4C->C)

Conv(1x1, C->4C) Conv(1x1, 4C->C)
Conv(3x3, C->C) Conv(3x3,4C->4C)
Conv(3x3, C->C) Conv(3x3,4C->4C)
Conv(1x1, 4C->C) Conv(1x1, C->4C)
Conv(1x1, 4C->C) Conv(1x1, C->4C)

Conv(1x1, C->4C) Conv(1x1, 4C->C)

Conv(1x1, C->4C) Conv(1x1, 4C->C)
Conv(3x3, C->C) Conv(3x3, 4C->4C)
Conv(3x3, C->C) Conv(3x3, 4C->4C)
Conv(1x1, 4C->C) Conv(1x1, C->4C)
Conv(1x1, 4C->C) Conv(1x1, C->4C)
Select your Project mentor
Assessment Items: Reminder-1

• Project Problem Statement Slide submission: September 09 EoD (11.59pm

UAE time)
What to include in the Project Problem Statement Slides?
• The problem identified
• Baselines, reproduced baseline results (if any)
• Discussion on potential directions to solve
• Discussion on potential challenges and risks involved.
Assessment Items: Reminder-2

• Peer Review Report Submission for Project presentations :

September 18, EoD (11.59pm UAE time)

• Attend the Project Problem Statement sessions in-person

• Ask questions to peers
• Submit Peer-review report for Project presentations
Follow CVPR review format, summary, strength, weakness, Suggestions to improve the
proposed idea, overall rating of the proposed problem statement in a scale of 1-10).
Do you have some questions to be answered by the team for the next presentation? )
ConvNeXts: A ConvNet for the 2020s, CVPR 2022
Citations: >5000
Why ConvNeXt?
• Identify several key components that contribute to the performance gain.
• ConvNeXt maintains the efficiency of standard ConvNets, and the fully-convolutional
nature for both training and testing.
• Re-examine the design spaces of ConvNet and test the limits of what a pure ConvNet
can achieve.
ConvNet vs ViT
• ViT comparisons with CNN was fair? Especially @ post-ViT era?
• ConvNext claim to bridge the gap between the
pre-ViT and post-ViT eras for ConvNets
• They test the limits of what a pure ConvNet ConvNet ViT
can achieve.
• “Modernize” a standard ResNet toward the
design of a vision Transformer.
ConvNeXt: Modernizing a ConvNet

• How do design decisions in Transformers impact ConvNets’ performance?

• Key Idea: different levels of designs from a Swin Transformer while maintaining
the network’s simplicity as a standard ConvNet.
• From ResNet to a ‘ConvNet that bears a resemblance to Transformers’.
Why ConvNeXt?

“Constructed entirely from standard ConvNet modules,

ConvNeXts compete favorably with Transformers in terms
of accuracy and scalability, achieving 87.8% ImageNet
top-1 accuracy and outperforming Swin Transformers on
COCO detection and ADE20K segmentation, while
maintaining the simplicity and efficiency of standard
ConvNets.”
ConvNeXt: Key Ideas
Key changes

Ø Advanced Training techniques

Ø Study a series of design decisions that are summarized as
• Macro design
• ResNeXt-ify
• Inverted bottleneck
• Large kernel size
• Various layer-wise micro designs
1. Advanced Training Techniques

Follow the advanced training techniques used in Swin transformer to obtain a baseline based
on ResNet50
• The training is extended to 300 epochs from the original 90 epochs for ResNets.
• Use the AdamW optimizer ,
• Use data augmentation techniques such as Mixup, Cutmix, RandAugment, Random Erasing
• Use regularization schemes such as Label Smoothing

• This enhanced training recipe increased the performance of the ResNet-50

model from 76.1% to 78.8% (+2.7%),
2. Macro Design

1. Changing stage compute ratio in ResNet

• The original design of the computation

distribution across stages in ResNet was
empirical.
• The heavy “res4” stage is generally used for
downstream tasks like object detection
• Following Swin-T design, the number of
blocks in each stage of ResNet is adjusted
from (3, 4, 6, 3) in ResNet-50 to (3, 3, 9, 3),
which also aligns the FLOPs with Swin-T.
• This improves the model accuracy
from 78.8% to 79.4%.
2. Macro Design: Changing stage compute ratio in ResNet
Softmax
FC 1000
Pool

Residual Networks (recap) 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Stage 1 Stage 2 Stage 3 Stage 4
3x3 conv, 512
Block Stem FC ImageNet 3x3 conv, 512, /2

type layers Blocks Layers Blocks Layers Blocks Layers Blocks Layers layers GFLOP top-5 error ..
ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92 .
ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58 3x3 conv, 128
3x3 conv, 128
ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13
3x3 conv, 128
ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44 3x3 conv, 128

ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94 3x3 conv, 128

3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

3x3 conv, 64
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016 3x3 conv, 64

Error rates are 224x224 single-crop testing, reported by torchvision Pool

7x7 conv, 64, / 2
Input
2. Macro Design

2. Modification to Stem in ResNet:

- Use 4x4, stride 4 convolution,
96 filters

Why?

• This simplified stem slightly

increased the performance by
0.1 while slightly reducing the
computation over resnet!
3. ResNext-ify: Using depthwise convolutions
ResNeXt (recap)
• ResNeXt has a better FLOPs/accuracy trade-off than ResNet.
• The core component in ResNeXt is grouped convolution, where the
convolutional filters are separated into different groups.
• ResNeXt’s guiding principle is to “use more groups, expand width”.
• More precisely, ResNeXt employs grouped convolution for the 3x3 conv layer in
a bottleneck block. As this significantly reduces the FLOPs, the network width is
expanded to compensate for the capacity loss.
3. ResNext-ify: Using depthwise Convolutions

• Use depthwise convolutions.

---This will reduce the number of FLOPS and “Accuracy”

• Increase the network width similar to ResNeXt.

---to the same number of channels as Swin-T’s
(from 64 to 96).
4. Inverted Residual Block (recap)

(a) Conventional residual block connects the layers with high number of channels.
(b) Inverted residual connects the bottlenecks instead of layers with high number of channels.

Inverted residual block is more memory efficient

Inverted Residual Block (recap)
Inverted residual block is more memory efficient

Conv(1x1, C->4C)Conv(1x1, C->4C) Conv(1x1, 4C->C)Conv(1x1, 4C->C)

Conv(3x3, C->C) Conv(3x3, C->C) Conv(3x3,4C->4C)Conv(3x3,4C->4C)

Conv(1x1, 4C->C)Conv(1x1, 4C->C) Conv(1x1, C->4C)Conv(1x1, C->4C)

Conv(1x1, C->4C)Conv(1x1, C->4C) Conv(1x1, 4C->C)Conv(1x1, 4C->C)

Conv(3x3, C->C) Conv(3x3, C->C) Conv(3x3, 4C->4C)

Conv(3x3, 4C->4C)

Conv(1x1, 4C->C)Conv(1x1, 4C->C) Conv(1x1, C->4C)Conv(1x1, C->4C)

4. Inverted Bottleneck

384-96

How this can reduce the number of FLOPS?

Due to the significant FLOPs reduction in the downsampling
residual blocks’ shortcut 1x1 conv layer.
à Also increased the accuracy.
4. Inverted Bottleneck

Moving up depthwise convolution.

4. Inverted Residual Block
3. Moving up depthwise convolution. 1. Residual block 2. Inverted residual block

Conv(1x1, 4C->C) Conv(1x1,

Conv(1x1, C->4C)
C->4C) Conv(1x1,
Conv(1x1, 4C->C)
4C->C)
Conv(1x1, C->C4) Conv(3x3,
Conv(3x3, C->C)
C->C) Conv(3x3,4C->4C)
Conv(3x3,4C->4C)

Conv(3x3, C->C) Conv(1x1,

Conv(1x1, 4C->C)
4C->C) Conv(1x1,
Conv(1x1, C->4C)
C->4C)

Conv(1x1, 4C->C) Conv(1x1,

Conv(1x1, C->4C)
C->4C) Conv(1x1,
Conv(1x1, 4C->C)
4C->C)

Conv(1x1 C->4C) Conv(3x3,

Conv(3x3, C->C)
C->C) Conv(3x3,
Conv(3x3, 4C->4C)
4C->4C)

Conv(3x3, C->C) Conv(1x1,

Conv(1x1, 4C->C)
4C->C) Conv(1x1,
Conv(1x1, C->4C)
C->4C)
4. Inverted Bottleneck

Moving up depthwise convolution.

This reduced the accuracy. Then why they did that?

5. Large Kernel Sizes

To increase the kernel size from 3x3, while maintaining the FLOPS.

• 7x7 provides the optimum accuracy.

• Beyond 7x7 performance wont increase
• 7x7 here has nearly the same FLOPS as 3x3.

Why large Kernel size?

-To have a global/larger receptive field similar to ViT
(different to what we learned from VGG-use only 3x3)
6. Micro Design Choices: Activation Function

1. Replace ReLU with GELU: same accuracy

--> similar to ViT

2. Fewer activation functions.

6. Micro Design Choices: Fewer Normalization Layers
1. Fewer Normalization layers: Improved the performance

2. Replacing BN with LN

• Directly substituting LN for BN in the original ResNet will result in suboptimal performance.
• With all the modifications in network architecture and training techniques, ConvNext model
does not have any difficulties training with LN. It provides slightly better accuracy of 81.5%.
Spatial downsampling in resent (recap)
Ø The spatial downsampling is achieved by the residual block at the start of
each stage, using 3x3 conv with stride 2.

Ø How about short connections of such downsampling residual block?

• 1x1 conv with stride 2 at the shortcut connection.
6. Micro Design Choices: Separate Downsampling Layers
• ConvNext first explored a strategy in which they use 2x2
conv layers with stride 2 for spatial downsampling. This
modification leads to diverged training.

• How to solve this?

Ø Adding normalization layers wherever spatial resolution
is changed can help stablize training.
Ø These include several LN layers (also used in Swin
Transformers): one before each downsampling layer,
one after the stem, and one after the final global
average pooling.
Ø We can improve the accuracy to 82.0%.
Summary of Changes
Throughput Comparison
Experiments

• Scalability
• Used as a backbone for downstream applications such as detection,
segmentation.
Limitations.
CNN Architectures Summary
Early work (AlexNet -> VGG) shows that bigger networks work better

GoogLeNet one of the first to focus on efficiency (aggressive stem, 1x1 bottleneck
convolutions, global avg pool instead of FC layers)

ResNet showed us how to train extremely deep networks – limited only by GPU
memory! Started to show diminishing returns as networks got bigger

After ResNet: Efficient networks became central: how can we improve the accuracy
without increasing the complexity?

Lots of tiny networks aimed at mobile devices: MobileNet, ShuffleNet, etc

Neural Architecture Search promises to automate architecture design

Which Architecture should I use?

• If you care about accuracy, ResNet-50 or ResNet-101 are good choices among CNN (try

convNeXt also!)

• If you want an efficient network (real-time, run on mobile, etc) try

MobileNets and ShuffleNets
Summary-what we learned?

• How to compute FLOPS, parameters and memory requirements for a given network

• Computationally efficient convolutional operators

-depthwise separable convolution and group convolution

• Key design principles of Alexnet, VGG, Resent, ResNeXt, Densenet, SE, MobileNets

• Detailed discussion about ConvNeXT.

Linear Optimization - Max
No ratings yet
Linear Optimization - Max
186 pages
8th Lecture Delta Rule Learning s1 21 22
No ratings yet
8th Lecture Delta Rule Learning s1 21 22
48 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
Unit III
No ratings yet
Unit III
58 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Transfer Learning - CNN Architectures
No ratings yet
Transfer Learning - CNN Architectures
120 pages
VGG16 Architecture
No ratings yet
VGG16 Architecture
30 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
Modern Convolutional Neural Networks
No ratings yet
Modern Convolutional Neural Networks
68 pages
DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024
No ratings yet
DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024
9 pages
3 DL ConvNets
No ratings yet
3 DL ConvNets
46 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Lecture05 DeepLearningCNN Trang 2
No ratings yet
Lecture05 DeepLearningCNN Trang 2
45 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
Tesi
No ratings yet
Tesi
73 pages
Ann 1
No ratings yet
Ann 1
102 pages
CSE - MINI Project Report Sample
No ratings yet
CSE - MINI Project Report Sample
18 pages
Evaluating Fast Algorithms For Convolutional Neural Networks On FPGAs
No ratings yet
Evaluating Fast Algorithms For Convolutional Neural Networks On FPGAs
8 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
No ratings yet
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
82 pages
TRes Net
No ratings yet
TRes Net
37 pages
5b Dana
No ratings yet
5b Dana
67 pages
VGG Net
No ratings yet
VGG Net
6 pages
2023 AN2DL Lez 4 CNN Famous Architectures
No ratings yet
2023 AN2DL Lez 4 CNN Famous Architectures
113 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
Cs437 Cs5317 Ee414 Ee513 l10 Cnncasestudies
No ratings yet
Cs437 Cs5317 Ee414 Ee513 l10 Cnncasestudies
55 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
Deep Learning Syllabus
100% (1)
Deep Learning Syllabus
2 pages
MLT CNN Architectures
No ratings yet
MLT CNN Architectures
104 pages
Deep Learning EECS 6327
No ratings yet
Deep Learning EECS 6327
43 pages
4 March 23 - DL
No ratings yet
4 March 23 - DL
79 pages
Ch11 Presn
No ratings yet
Ch11 Presn
29 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
DL UNIT 2 CNN Architectures
No ratings yet
DL UNIT 2 CNN Architectures
12 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
Chitra K S 2022bcse07aed1011
No ratings yet
Chitra K S 2022bcse07aed1011
21 pages
UNIT 4 - Perceptron and DL
No ratings yet
UNIT 4 - Perceptron and DL
39 pages
cs231n 2018 Lecture09
No ratings yet
cs231n 2018 Lecture09
106 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Difference Between AlexNet, VGGNet, ResNet, and Inception - by Aqeel Anwar - Towards Data Science
No ratings yet
Difference Between AlexNet, VGGNet, ResNet, and Inception - by Aqeel Anwar - Towards Data Science
14 pages
Endsem Project Report B16
No ratings yet
Endsem Project Report B16
26 pages
VGG New
No ratings yet
VGG New
15 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
No ratings yet
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
11 pages
17 VGG 03 09 2024
No ratings yet
17 VGG 03 09 2024
10 pages
VGG Net
No ratings yet
VGG Net
22 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
14 pages
Unit 2 CNN
No ratings yet
Unit 2 CNN
15 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
Difference Between Alexnet, Vggnet, Resnet, and Inception
No ratings yet
Difference Between Alexnet, Vggnet, Resnet, and Inception
14 pages
AlexNet
No ratings yet
AlexNet
20 pages
CNN Architectures - Transfer Learning
No ratings yet
CNN Architectures - Transfer Learning
64 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
20IS712 Deep Learning 300
No ratings yet
20IS712 Deep Learning 300
15 pages
Rainfall-Runoff Modelling Using Artificial Neural
No ratings yet
Rainfall-Runoff Modelling Using Artificial Neural
7 pages
Id EYTAp JYzupq DZ M
No ratings yet
Id EYTAp JYzupq DZ M
9 pages
CS436 CS5310 Ee513 L05 CNN2
No ratings yet
CS436 CS5310 Ee513 L05 CNN2
27 pages
Notes
No ratings yet
Notes
15 pages
MLDS A5 24F-8010 Rameesha
No ratings yet
MLDS A5 24F-8010 Rameesha
7 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Lecture 03 Perceptron PDF
No ratings yet
Lecture 03 Perceptron PDF
12 pages
NNDL Internal I Key
No ratings yet
NNDL Internal I Key
5 pages
200-Article Text-3847-1-10-20230705
No ratings yet
200-Article Text-3847-1-10-20230705
7 pages
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
3 Days Workshop On Generative Ai Agenda1-1 PDF
No ratings yet
3 Days Workshop On Generative Ai Agenda1-1 PDF
4 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Recurrent Neural Network: What Does RNN Stand For?
No ratings yet
Recurrent Neural Network: What Does RNN Stand For?
7 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
11 pages
What Is VGG
No ratings yet
What Is VGG
3 pages
Assignment-1 (MLP From Scratch) : Roll No: EDM18B055
No ratings yet
Assignment-1 (MLP From Scratch) : Roll No: EDM18B055
1 page
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
An Overview of Convolutional Neural Network Architectures For Deep Learning
No ratings yet
An Overview of Convolutional Neural Network Architectures For Deep Learning
22 pages
Literature Review On Image Classification Architecture
No ratings yet
Literature Review On Image Classification Architecture
14 pages
Csa3007 - Deep-Learning - LTP - 1.0 - 40 - Deep Learning
No ratings yet
Csa3007 - Deep-Learning - LTP - 1.0 - 40 - Deep Learning
2 pages
Week 4
No ratings yet
Week 4
5 pages
Deep Tensor Convolution On Multicores
No ratings yet
Deep Tensor Convolution On Multicores
10 pages
CNN
No ratings yet
CNN
2 pages
Introduction To Neural Networks: Freek Stulp
No ratings yet
Introduction To Neural Networks: Freek Stulp
12 pages
Data Science Interview Preparation (#DAY 14)
No ratings yet
Data Science Interview Preparation (#DAY 14)
11 pages
Xor in C#
No ratings yet
Xor in C#
3 pages
Lecture Notes - Recurrent Neural Networks
No ratings yet
Lecture Notes - Recurrent Neural Networks
11 pages
CNN For Visual Recognition
No ratings yet
CNN For Visual Recognition
4 pages
You Are Not Stupid: Computers and Technology Simplified
From Everand
You Are Not Stupid: Computers and Technology Simplified
Jack C. Stanely
No ratings yet