Deep Learning Handson
Deep Learning Handson
Imry Kissos
Deep Learning Meetup
TLV August 2015
Outline
● Problem Definition
● Training a DNN
Deep
Convolution
Network
1 http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/ 3
Tutorial
● Goal: Detect facial
landmarks on (normal)
face images
● Data set provided by
Dr. Yoshua Bengio
● Tutorial code available:
https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4
Flow
Train Model
“Mouth Corners”
5
Flow
6
Flow
7
Python Deep Learning Framework
High Level
9
Training a Deep Neural Network
1. Data Analysis
a. Exploration + Validation
b. Pre-Processing
c. Batch and Split
2. Architecture Engineering
3. Optimization
4. Training the DNN
10
Data Exploration + Validation 1
Data:
● 7K gray-scale images of detected faces
● 96x96 pixels per image
● 15 landmarks per image (?)
Data validation:
● Some Landmarks are missing
11
Pre-Processing
Data
Normalization
12
Batch
-
- t - train batch
⇐One Epoch’s data
- validation batch
- - test batch
train/valid/test splits are constant 13
Train / Validation Split
14
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
a. Layers Definition
b. Layers Implementation
3. Optimization
4. Training
15
Architecture
XY
16
Layers Definition
17
Activation Function 1
ReLU
18
Dense Layer
19
Dropout
20
Dropout
21
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
a. Back Propagation
b. Objective
c. SGD
d. Updates
e. Convergence Tuning
4. Training the DNN 22
Back Propagation
Forward Path
XY
Output
Conv Dense Points
23
Back Propagation
Forward Path
XY XY
Output Training
Conv Dense Points Points
24
Back Propagation
Backward Path
XY
Conv Dense
25
Back Propagation
Update
For All Layers:
Conv Dense
26
Objective
27
S.G.D Updates the network after each batch
29
Alec Radford
Adjusting Learning Rate & Momentum
Linear in epoch
30
Convergence Tuning
31
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
a. Fit
b. Fine Tune Pre-Trained
c. Learning Curves
32
Fit
Forward+BackProp
Forward
33
Fine Tune Pre-Trained
fgd
Epochs
35
Learning Curves Analysis
Net 1
Net 2
RMSE
RMSE
Epochs Epochs
Convergence Overfitting
Jittering 36
Part 1 Summary
Training a DNN:
37
Part 1 End
Break
Part 2
Beyond Training
Outline
● Problem Definition
● Motivation
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
40
Beyond Training
1. Improving the DNN
a. Analysis Capabilities
b. Augmentation
c. Forward - Backward Path
d. Monitor Layers’ Training
2. Open Source Packages
3. Summary
41
Improving the DNN
Very tempting:
● >1M images
● >1M parameters
● Large gap: Theory ↔ Practice
43
Reduce Overfitting Net 1
Net 2
Solution:
Data Augmentation
Epochs
Overfitting
44
Data Augmentation
45
Advanced Augmentation
http://benanne.github.io/2015/03/17/plankton.html 46
Convergence Challenges
RMSE
Epochs Epochs
Normalization Data Error
Need to monitor forward + backward path
47
Forward - Backward Path
Forward
Backward:
Gradient w.r.t parameters
48
Monitor Layers’ Training
nolearn - visualize.py
49
Monitor Layers’ Training
X. Glorot ,Y. Bengio, Understanding the difficulty of training deep feedforward neural networks:
“Monitoring activation and gradients across layers and training
iterations is a powerful investigation tool”
50
Weight Initialization matters (1)
Layer 1- Gradient are close to zero - vanishing gradients
51
Weight Initialization matters (2)
Network returns close to zero values for all inputs
52
Monitoring Activation
plateaus sometimes seen when training neural
networks
For most epochs the network returns close to zero output for all inputs
1e-1
0 Epoch
3e-3
Max of Updates of Conv1:
2e-3
1e-3
0 Epoch
http://cs231n.github.io/neural-networks-3/#baby 54
Beyond Training
1. Improving the DNN
2. Open Source Packages
a. Hardware and OS
b. Python Framework
c. Deep Learning Open Source Packages
d. Effort Estimation
3. Summary
55
Hardware and OS
● Amazon Cloud GPU:
AWS Lasagne GPU Setup
Spot ~ $0.0031 per GPU Instance Hour
● IBM Cloud GPU:
http://www-03.ibm.com/systems/platformcomputing/products/symphony/gpuharvesting.html
● Your Linux machine GPU:
pip install -r https://raw.githubusercontent.com/dnouri/kfkd-
tutorial/master/requirements.txt
● Window install
http://deeplearning.net/software/theano/install_windows.html#install-windows
56
Starting Tips
● Sanity Checks:
○ DNN Architecture : “Overfit a tiny subset of data” Karpathy
○ Check Regularization ↗ Loss ↗
● Use pre-trained VGG as a base line
● Start with ~3 conv layer with ~16 filter each - quickly iterate
57
Python
● Rich eco-system
● State-of-the-art
● Easy to port from prototype to production
Podcast : http://www.reversim.com/2015/10/277-scientific-python.html
58
Python Deep
Learning Framework
60
Deep Learning
Open Source Packages
Open source progress rapidly→ impossible to predict industry’s standard
Caffe for applications
Torch and Theano for research on Deep Learning itself
http://fastml.com/torch-vs-theano/
63
References
Hinton Coursera Neuronal Network
https://www.coursera.org/course/neuralnets
Technion Deep Learning course
http://moodle.technion.ac.il/course/view.php?id=4128
Oxford Deep Learning course
https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
CS231n CNN for Visual Recognition
http://cs231n.github.io/
Deep Learning Book
http://www.iro.umontreal.ca/~bengioy/dlbook/
Montreal DL summer school
http://videolectures.net/deeplearning2015_montreal/
64
Questions?
Deep
Convolution
Regression
Network
65