Lecture 6
Lecture 6
James Zou
10/12/16
Recap: architectures
• Feedforward
Learning a nonlinear
mapping from inputs to
outputs.
Predicting:
• Convnets
TF binding,
gene expression,
disease status from images,
risk from SNPs,
protein structure
• RNN, LSTM
…
How to train your neural network
Regularization—prevent overfitting
Optimization—overcome underfitting
How to train your neural network
Regularization—prevent overfitting
• Early stopping
• L2 regularization (aka weight decay)
• Multi-task learning; data augmentation
• Dropout
Optimization—overcome underfitting
• SGD, SGD with momentum
• RMSProp
Empirical loss vs true loss
Regularization—prevent overfitting
• Early stopping
• L2 regularization (aka weight decay)
• Multi-task learning; data augmentation
• Dropout
Optimization—overcome underfitting
• SGD, SGD with momentum
• RMSProp
Early Stopping
Entire dataset
train
validation
test
stop
error
validation error
training error
# of steps
Regularization—prevent overfitting
• Early stopping
• L2 regularization (aka weight decay)
• Multi-task learning; data augmentation
• Dropout
Optimization—overcome underfitting
• SGD, SGD with momentum
• RMSProp
Weight decay
Optimize the loss
Regularization penalty
= arg min ( ( , ), ) +
In gradient descent
= ( ( , ), ) +
+ = ·
=( ) ( ( , ), )
Weight decay
Optimize the loss
Regularization penalty
= arg min ( ( , ), ) +
= arg min ( ( , ), ) +
Regularization—prevent overfitting
• Early stopping
• L2 regularization (aka weight decay)
• Multi-task learning; data augmentation
• Dropout
Optimization—overcome underfitting
• SGD, SGD with momentum
• RMSProp
Increase your training set: multitask learning
y1
y2
h1
h2
h3
hshared
Increase your training set: multitask learning
Task specific y1
y2
predictions
h1
h2
h3
hshared
Leverages all the data
Increase your training set: data augmentation
Regularization—prevent overfitting
• Early stopping
• L2 regularization (aka weight decay)
• Multi-task learning; data augmentation
• Dropout
Optimization—overcome underfitting
• SGD, SGD with momentum
• RMSProp
Dropout
( )
( )
Dropout
Dropout is approximately
training and averaging an
exponentially large
ensemble of networks.
Summary: regularization
Regularization—prevent overfitting
• Early stopping
• L2 regularization (aka weight decay)
• Multi-task learning; data augmentation
• Dropout
Optimization—overcome underfitting
• SGD, SGD with momentum
• RMSProp
Stochastic gradient descent
[ , ]
{ ( ) , ( ) , ..., ( ) } ()
= (( ( ), ), ( ))
+ = +
SGD with momentum
[ , ]
{ ( ) , ( ) , ..., ( ) } ()
= (( ( ), ), ( ))
+ = + +
What are limitations of gradient based methods?
What are limitations of gradient based methods?
{ ( ) , ( ) , ..., ( ) } ()
= (( ( ), ), ( ))
= +( )
= +
+ = +
Example: DeepBind
DeepBind optimization
Objective function
= arg min ( ( , ), ) + || ||
DeepBind optimization
Objective function
= arg min ( ( , ), ) + || ||
Initialization
N( , ) [ , ]
DeepBind optimization
Objective function
= arg min ( ( , ), ) + || ||
Initialization
N( , ) [ , ]
Objective function
= arg min ( ( , ), ) + || ||
Initialization
N( , ) [ , ]
Objective function
= arg min ( ( , ), ) + || ||
Initialization
N( , ) [ , ]
Dropout.
Hyperparameter optimization