Winter1516 Lecture53
Winter1516 Lecture53
DATA CLOUD
dead ReLU
will never activate
=> never update
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 41 20 Jan 2016
active ReLU
DATA CLOUD
Leaky ReLU
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 43 20 Jan 2016
[Mass et al., 2013]
Activation Functions [He et al., 2015]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 44 20 Jan 2016
[Clevert et al., 2015]
Activation Functions
Exponential Linear Units (ELU)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 45 20 Jan 2016
[Goodfellow et al., 2013]
Maxout “Neuron”
- Does not have the basic form of dot product ->
nonlinearity
- Generalizes ReLU and Leaky ReLU
- Linear Regime! Does not saturate! Does not die!
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 46 20 Jan 2016
TLDR: In practice:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 47 20 Jan 2016
Data Preprocessing
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 48 20 Jan 2016
Step 1: Preprocess the data
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 50 20 Jan 2016
TLDR: In practice for Images: center only
e.g. consider CIFAR-10 example with [32,32,3] images
- Subtract the mean image (e.g. AlexNet)
(mean image = [32,32,3] array)
- Subtract per-channel mean (e.g. VGGNet)
(mean along each channel = 3 numbers)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 51 20 Jan 2016
Weight Initialization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 52 20 Jan 2016
- Q: what happens when W=0 init is used?
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 53 20 Jan 2016
- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 54 20 Jan 2016
- First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 55 20 Jan 2016
Lets look at
some
activation
statistics
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 56 20 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 57 20 Jan 2016
All activations
become zero!
Q: think about the
backward pass.
What do the
gradients look like?
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 58 20 Jan 2016
Almost all neurons
*1.0 instead of *0.01
completely
saturated, either -1
and 1. Gradients
will be all zero.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 59 20 Jan 2016
“Xavier initialization”
[Glorot et al., 2010]
Reasonable initialization.
(Mathematical derivation
assumes linear activations)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 60 20 Jan 2016