Winter1516 Lecture54
Winter1516 Lecture54
nonlinearity it breaks.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 61 20 Jan 2016
He et al., 2015
(note additional /2)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 62 20 Jan 2016
He et al., 2015
(note additional /2)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 63 20 Jan 2016
Proper initialization is an active area of research…
Understanding the difficulty of training deep feedforward neural networks
by Glorot and Bengio, 2010
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by
Saxe et al, 2013
Random walk initialization for training very deep feedforward networks by Sussillo and
Abbott, 2014
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 64 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
“you want unit gaussian activations? just make them so.”
this is a vanilla
differentiable function...
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 65 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
“you want unit gaussian activations?
just make them so.”
D
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 66 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
BN Problem: do we
necessarily want a unit
tanh gaussian input to a
tanh layer?
...
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 67 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
Normalize:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 68 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
- Improves gradient flow through
the network
- Allows higher learning rates
- Reduces the strong dependence
on initialization
- Acts as a form of regularization
in a funny way, and slightly
reduces the need for dropout,
maybe
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 69 20 Jan 2016
[Ioffe and Szegedy, 2015]
Batch Normalization
Note: at test time BatchNorm layer
functions differently:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 70 20 Jan 2016
Babysitting the Learning Process
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 71 20 Jan 2016
Step 1: Preprocess the data
10 output
output layer
input neurons, one
CIFAR-10
layer hidden layer per class
images, 3072
numbers
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 73 20 Jan 2016
Double check that the loss is reasonable:
disable regularization
loss ~2.3.
“correct “ for returns the loss and the
10 classes gradient for all parameters
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 74 20 Jan 2016
Double check that the loss is reasonable:
crank up regularization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 75 20 Jan 2016
Lets try to train now…
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 76 20 Jan 2016
Lets try to train now…
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 78 20 Jan 2016
Lets try to train now…
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 79 20 Jan 2016
Lets try to train now…
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 80 20 Jan 2016