Lect11 Neural Nets2
Lect11 Neural Nets2
Neural Network:
x W1 h W2 s
3072 100 10
2
3
Classification
4
Preview [From recent Yann
LeCun slides]
5
ImageNet
(slide from Kaiming He’s recent presentation)
7
Working with CNNs in practice:
• Data augmentation
• Transfer learning
• Autoencoders
8
Data Augmentation
9
Classification
10
Data Augmentation
“cat”
Load
image and
label Compute
loss
CNN
11
Data Augmentation
“cat”
Load
image and
label Compute
loss
CNN
Transform
image
12
Data Augmentation
13
Data Augmentation
1. Horizontal flips
14
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
15
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch
16
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch
17
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch
19
Data Augmentation
Complex:
3. Color jitter
Simple: 1. Apply PCA to all [R, G, B]
Randomly jitter contrast pixels in training set
Random mix/combinations of :
- translation
- rotation
- stretching
- shearing,
- lens distortions, … (go crazy)
21
Data Augmentation: Takeaway
22
Transfer Learning
23
Transfer Learning
24
Transfer Learning with CNNs
1. Train on
Imagenet
25
Transfer Learning with CNNs
2. Small dataset:
1. Train on feature extractor
Imagenet
Freeze
these
Train
this
26
Transfer Learning with CNNs
2. Small dataset: 3. Medium dataset:
1. Train on feature extractor finetuning
Imagenet
more data = retrain
more of the network
(or all of it)
Freeze
Freeze these
these
Train
this
Train
this
27
Transfer Learning with CNNs
2. Small dataset: 3. Medium dataset:
1. Train on feature extractor finetuning
Imagenet
more data = retrain
more of the network
(or all of it)
Freeze
Freeze these
tip: use only ~1/10th
these of the original
learning rate in
finetuning top layer,
and ~1/100th on
Train
intermediate layers
this
Train
this
28
CNN Features off-the-shelf: an Astounding Baseline for Recognition
[Razavian et al, 2014]
DeCAF: A Deep
Convolutional
Activation Feature for
Generic Visual
Recognition
[Donahue*, Jia*, et
al., 2013]
29
very similar very different
dataset dataset
more generic
30
very similar very different
dataset dataset
more generic
31
very similar very different
dataset dataset
more generic
32
Overview
Caffe Torch Theano TensorFlow
parallel
Readable Yes (C++) Yes (Lua) No No
source code
33
Good at RNN No Mediocre Yes Yes (best)
Supervised vs Unsupervised
Supervised Learning
Data: (x, y)
x is data, y is label
34
Supervised vs Unsupervised
Supervised Learning Unsupervised Learning
Data: (x, y) Data: x
x is data, y is label Just data, no labels!
35
Unsupervised Learning
• Autoencoders
• Traditional: feature learning
36
Autoencoders
Features z
Encoder
Input data x
38
Autoencoders
Originally: Linear + nonlinearity (sigmoid)
Later: Deep, fully-connected
Later: ReLU CNN
Features z
Encoder
Input data x
39
Autoencoders
Originally: Linear + nonlinearity (sigmoid)
z usually smaller than x
Later: Deep, fully-connected
(dimensionality reduction)
Later: ReLU CNN
Features z
Encoder
Input data x
40
Autoencoders
Reconstructed
input data
xx
Decoder
Features z
Encoder
Input data x
41
Originally: Linear +
Reconstructed
input data
xx
Decoder Encoder: 4-layer conv
Decoder: 4-layer upconv
Features z
Encoder
Input data x
42
Originally: Linear +
Reconstructed
input data
xx
Encoder / decoder Decoder Train for
sometimes share reconstruction
weights Features z with no labels!
Example:
dim(x) = D Encoder
dim(z) = H
we: H x D Input data x
wd : D x H = weT
43
Autoencoders Loss function
(Often L2)
Reconstructed
input data
xx
Decoder Train for
reconstruction
Features z with no labels!
Encoder
Input data x
44
Autoencoders
Reconstructed
input data
xx
After training, Decoder
throw away
decoder! Features z
Encoder
Input data x
45
Autoencoders Loss function
(Softmax, etc)
bird plan
Predicted Label yy y dog deer e truc
Use encoder to k
initialize a Classifier
supervised Train for final task
Fine-tune
model
Features z encoder (sometimes with
jointly with small data)
classifier
Encoder
Input data x
46
Autoencoders
Autoencoders can
reconstruct data, and
Reconstructed
xx can learn features to
input data
initialize a supervised
Decoder
model
Features z
Encoder
Input data x
47