Lecture Slides For Chapter 7 of Deep Learning Ian Goodfellow 2016-09-27
Lecture Slides For Chapter 7 of Deep Learning Ian Goodfellow 2016-09-27
Deep Learning
Lecture slides for Chapter 7 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-27
Definition
(Goodfellow 2016)
Weight Decay as Constrained
ARIZATION FOR DEEP LEARNING
Optimization
w⇤
w̃
w2
w1
Figure 7.1 (Goodfellow 2016)
Norm Penalties
(Goodfellow 2016)
Dataset Augmentation
Affine Elastic
Noise
Distortion Deformation
Horizontal Random
Hue Shift
flip Translation
(Goodfellow 2016)
network in figure 7.2.
2. Generic parameters, shared across all the tasks (which benefit from th
Multi-Task Learning
pooled data of all the tasks). These are the lower layers of the neural networ
in figure 7.2.
y(1) y(2)
h(shared)
Figure
Figure 7.2: Multi-task learning can be cast in 7.2
several ways in deep learning framewor
(Goodfellow 2016)
HAPTER 7. REGULARIZATION FOR DEEP LEARNING
Learning Curves
Early stopping: terminate while validation set
performance is better
0.20
Loss (negative log-likelihood)
0.10
0.05
0.00
0 50 100 150 200 250
Time (epochs)
w⇤ w⇤
w̃ w̃
w2
w2
w1 w1
Figure 7.4
(Goodfellow 2016)
Sparse Representations
HAPTER 7. REGULARIZATION FOR DEEP LEARNING
2 3
2 3 2 3 0
14 3 1 2 5 4 1 6 2 7
6 1 7 6 4 2 3 1 1 3 7 6 7
6 7 6 7 6 0 7
6 19 7 =6 1 5 4 2 3 2 7 6 7
6 7 6 7 6 0 7 (7.47)
4 2 5 4 3 1 2 3 0 3 5 6 7
4 3 5
23 5 4 2 2 5 1
0
y 2 Rm B 2 Rm⇥n h 2 Rn
In the first expression, we have an example of a sparsely parametrized linear
egression model. In the second, we have linear regression with a sparse representa-
on h of the data x. That is, h is a function of x that, in some sense, represents
he information present in x, but does so with a sparse vector.
Representational regularization is accomplished by the same sorts of mechanisms
hat we have used in parameter regularization.
(Goodfellow 2016)
Norm penalty regularization of representations is performed by adding to the
Bagging
CHAPTER 7. REGULARIZATION FOR DEEP LEARNING
Original dataset
Figure 7.5: A cartoon depiction of how bagging works. Suppose we train an 8 detector
the dataset depicted above, containing an 8, a 6 and a 9. Suppose we make two differ
Figure
resampled datasets. The bagging training 7.5 is to construct each of these data
procedure
by sampling with replacement. The first dataset omits the 9 and repeats the(Goodfellow
8. On 2016)
t
CHAPTER 7. REGULARIZATION FOR DEEP LEARNING
Dropout
y y y y
Figure 7.6
h1 h2 h1 h2 h1 h2 h2
x1 x2 x2 x1 x1 x2
y
y y y y
h1 h1 h2 h2
h1 h2
x1 x2 x1 x2 x2
y y y y
x1 x2
h1 h1 h2
Base network
x1 x2 x1 x1
y y y y
h2 h1
x2
(Goodfellow 2016)
Ensemble of subnetworks
Adversarial Examples
CHAPTER 7. REGULARIZATION FOR DEEP LEARNING
+ .007 ⇥ =
x+
x sign(rx J(✓, x, y))
✏ sign(rx J(✓, x, y))
y =“panda” “nematode” “gibbon”
w/ 57.7% w/ 8.2% w/ 99.3 %
confidence confidence confidence
Figure 7.8
Figure 7.8: A demonstration of adversarial example generation applied to GoogLeNet
(Szegedy et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose
elements are equal to the sign of the elements of the gradient of the cost function with
respect to the input, we can change GoogLeNet’s classification of the image. Reproduced
Training on adversarial examples is mostly
with permission from Goodfellow et al. (2014b).
Tangent Propagation
Normal Tangent
x2
x1
Figure 7.9
7.9: Illustration of the main idea of the tangent prop algorithm (Sima
nd manifold tangent classifier (Rifai et al., 2011c), which both regul
(Goodfellow 2016)