0% found this document useful (0 votes)
36 views23 pages

Res Net 4

The document discusses ResNet, a deep convolutional neural network architecture that uses residual learning to address problems with training very deep networks. It introduces ResNet and compares its performance to other networks, finding that ResNet achieved state-of-the-art results on ImageNet classification in 2015.

Uploaded by

jaffar bikat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views23 pages

Res Net 4

The document discusses ResNet, a deep convolutional neural network architecture that uses residual learning to address problems with training very deep networks. It introduces ResNet and compares its performance to other networks, finding that ResNet achieved state-of-the-art results on ImageNet classification in 2015.

Uploaded by

jaffar bikat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

ResNet

Natalie Lang Tomer Malach

Deep Residual Learning for Image Recognition


Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
Microsoft Research

Deep Learning and Its Applications to Signal and Image Processing and Analysis

Dr. Tammy Riklin Raviv


Spring 2019
Motivation
• Deep learning continuously changing the world around us

• Its Applications are everywhere:


• Image Recognition - classification, detection
• Healthcare - breast or skin-cancer diagnostics
• Finance - predict the stock market
• Predicting earthquakes - vital in saving a life
Benchmarks
• Assess the relative performance of the nets

• Checking them all on the same Datasets


CIFAR-10
MNIST

•• 60,000
60,000 images
images
•• 28
32 xx 28
32
•• 10
10 labels
labels
• Classification
• Classification
Grading The Networks
• Evaluating networks performance

• The Top-5 error rate is the percentage of test


examples for which the correct class was not in the
top 5 scores predicted classes.
• The Top-1 error rate is the percentage of test
examples for which the correct class was not the top
score class

Having database and grades


the competition begins…
ImageNet 152 layers 152 layers 152 layers
Top-5 Error Rate [%]

• ~1,200,000 images
19 layers 22 layers
• 1000 labels
shallow 8 layers 8 layers • classification

http://www.image-net.org/challenges/LSVRC/
LeNet-5 [LeCun et al., 1998]

• First use of convolutional layers

• Handwritten character recognition

• Components: conv., pooling and FC layers


Overview 2012

First CNN-based winner 152 layers 152 layers 152 layers

19 layers 22 layers

shallow 8 layers 8 layers


AlexNet [Krizhevsky et al., 2012]

• Deeper – more layers

• Bigger – more filters

• First to use ReLU activation function


Overview 2014

Deeper Network 152 layers 152 layers 152 layers

19 layers 22 layers

shallow 8 layers 8 layers


VGGNet [Simonyan and Zisserman, 2014]

• Depth is a critical component.

• Performs only 3x3 convolutions.

• 2 Versions: VGG16 or VGG19.

• The deeper the better

AlexNet VGG16 VGG19


Up to Now

• Motivation

• Benchmarks

• LeNet

• ILSVRC
• AlexNet
• VGG
Problems In Deep Networks: Overfitting?
What happens when we continue stacking deeper layers
on a “plain” convolutional neural network?

56 layer model performs worse on both training and test error.


Conclusion:
The deeper model performs worse, but it’s not caused
by overfitting!
Vanishing Gradient
Let’s demonstrate the problem with short computational graph
𝐹example:
=𝑊 2 ∙( 𝑋 ∙𝑊 1+ 𝑍 )
𝑋 0.2
𝜕𝐹 𝜕𝐹 𝜕𝐵 𝜕 𝐴 𝜕𝐹 𝜕𝐹 𝜕𝐵
= ∙ ∙ ¿ 20 .12
=𝑊 2 ∙1 ∙ 𝑊 = ∙ =𝑊 2 ∙1=0 . 3
𝜕𝑋 𝜕𝐵 𝜕 𝐴 𝜕 𝑋 𝜕 𝐴 𝜕𝐵 𝜕 𝐴
x 𝐴= 𝑋 ∙𝑊 1=0 . 08
0 . 4 𝑤1 𝑤2
𝜕𝐹
𝑊1 𝑋 (𝑖𝑛𝑝𝑢𝑡) N1
𝜕𝐵
=𝑊 N2
2=0 . 3 𝐹 (𝑜𝑢𝑡𝑝𝑢𝑡)
𝜕𝐹 𝜕𝐹 𝜕𝐵 𝜕 𝐴
=𝑊 2∙ 1 ∙ ¿𝑋𝟎 . 𝟎𝟔
+
= ∙ ∙ 𝐵= 𝐴+ 𝑍 =0 . 18
0.1
𝜕𝑊 1 𝜕 𝐵 𝜕 𝐴 𝜕𝑊 1 𝑍 (𝑏𝑖𝑎𝑠)
𝑍
𝜕𝐹 𝜕𝐹 𝜕𝐵
= ∙ =𝑊 2∙ 1=0 . 3 x 𝐹¿ 𝐵 ∙𝑊 2=0 . 054
𝜕𝑍 𝜕𝐵 𝜕𝑍
0.3
𝜕𝐹
=1
𝜕𝐹 𝜕𝐹
𝑊2 𝜕𝐹
−4
𝑊 1=𝑊 1− ∙ 𝑙𝑟 , 𝑙𝑟 =10 =𝐵=0 . 18
𝜕𝑊 1 𝜕 𝑊 2

𝑾 𝟏=𝟎.𝟑𝟗𝟗𝟗𝟗𝟒 ≅ 𝟎.𝟒
The Hypothesis
The problem is an optimization problem, deeper
models are harder to optimize!

• The deeper model should be able to perform at least as well as


the shallower model.
• A solution by construction is copying the learned layers from the
shallower model and setting additional layers to identity mapping.

Deeper has to be at least Better!

Shallow
input Identity Identity Identity output
Network
Overview 2015

152 layers 152 layers 152 layers

19 layers 22 layers

shallow 8 layers 8 layers


ResNet [He et al., 2015]
Use network layers to fit a residual mapping instead of
directly trying to fit a desired underlying mapping.
relu
H(x) F(x) + x +

F(x) X
relu relu
identity

X X
“Plain” layers Residual
block
ResNet - Architecture

• Stack residual blocks. relu


• Every residual block has two 3x3 conv X ..
.
identity
• layers.
Periodically, double # of filters and downsample
spatially using stride 2 (/2 in each dimension).
relu
• Additional conv layer at the beginning.
• No FC layers at the end (only FC 1000 to
output classes).
X
• Total depths of 34, 50, 101, or 152 layers Residual
block
for ImageNet.
ResNet - Training
• Batch Normalization after every CONV layer (To avoid vanishing
gradient).
• Xavier / initialization from He et al (To avoid vanishing gradient).
• SGD + Momentum (0.9).
• Learning rate: 0.1, divided by 10 when validation error plateaus.
Validation error
• Mini-batch size 256.
• Weight decay of 1e-5.
• No dropout used.
ResNet - Results

• Able to train very deep networks without degrading (152 layers


on ImageNet, 1202 on Cifar).
• Deeper networks now achieve lowing training error as expected.

• Swept 1st place in all ILSVRC and COCO 2015 competitions.


Comparing ResNet to others

ILSVRC 2015 classification winner


(3.57% top 5 error)
better than “human performance”!
[Russakovsky 2014]
Comparing ResNet to other networks performances

Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017.


To Summarize

• Problems in Deep Networks


• Overfitting

• Vanishing gradient

• ResNet
• Architecture

• Training

• Results

• Comparing ResNet to others


Thank You!

You might also like