UDRC RNN LSTM LibrariesTutorial
UDRC RNN LSTM LibrariesTutorial
Muhammad Awais
[email protected]
Outline
RNN
x
Recurrent Neural Networks
RNN
x
Recurrent Neural Networks
RNN
new state old state input vector at
some time step
some function x
with parameters W
Recurrent Neural Networks
RNN
new state old state input vector at
some time step
some function x
with parameters W
Notice: the same function and the same set
of parameters are used at every time step.
Recurrent Neural Networks
The state consists of a single “hidden” vector h:
RNN
x
Recurrent Neural Networks
Character-level y
language model
example RNN
Vocabulary:
x
[h,e,l,o]
Example training
sequence:
“hello”
Recurrent Neural Networks
Character-level
language model
example
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Recurrent Neural Networks
Character-level
language model
example
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Recurrent Neural Networks
Character-level
language model
example
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Recurrent Neural Networks
Image Captioning
X
Recurrent Neural Networks
test image
x0
<STA
RT>
<START>
Recurrent Neural Networks
test image
y0
before:
h = tanh(Wxh * x + Whh * h)
h0
Wih
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
x0
<STA
RT>
v <START>
Recurrent Neural Networks
test image
y0
sample!
h0
x0
<STA straw
RT>
<START>
Recurrent Neural Networks
test image
y0 y1
h0 h1
x0
<STA straw
RT>
<START>
Recurrent Neural Networks
test image
y0 y1
h0 h1
sample!
x0
<STA straw hat
RT>
<START>
Recurrent Neural Networks
test image
y0 y1 y2
h0 h1 h2
x0
<STA straw hat
RT>
<START>
Recurrent Neural Networks
test image
y0 y1 y2
sample
<END> token
h0 h1 h2 => finish.
x0
<STA straw hat
RT>
<START>
Recurrent Neural Networks
Image Sentence Datasets
Microsoft COCO
[Tsung-Yi Lin et al. 2014]
mscoco.org
currently:
~120K images
~5 sentences each
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
depth
time
Recurrent Neural Networks
LSTM:
depth
time
Long Short Term Memory (LSTM)
h
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
vector from
below (x)
x sigmoid i
h sigmoid f
W
vector from sigmoid o
before (h)
tanh g
4n x 2n 4n 4*n
Long Short Term Memory (LSTM)
cell
state c
f
Long Short Term Memory (LSTM)
cell
state c
x +
f i g
Long Short Term Memory (LSTM)
cell
state c
x +
c
x tanh
f i g o x
h
Long Short Term Memory (LSTM)
higher layer, or
prediction
cell
state c
x +
c
x tanh
f i g o x
h
Long Short Term Memory (LSTM)
cell
state c
x + x +
tanh tanh
x x
f f
i g x i g x
o o
h h h
x x
Long Short Term Memory (LSTM)
Summary
- RNNs allow a lot of flexibility in architecture design
- Vanilla RNNs are simple but don’t work very well
- Common to use LSTM or GRU: their additive interactions
improve gradient flow
- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is
controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research
- Better understanding (both theoretical and empirical) is needed.
Deep Learning Libraries
Caffe, Torch, Theano, TensorFlow
Caffe
http://caffe.berkeleyvision.org
Caffe overview
Protocol Buffers
“Typed JSON” .proto file
from Google
https://developers.google.com/protocol-buffers/
Caffe
Protocol Buffers
“Typed JSON” .proto file
from Google
https://developers.google.com/protocol-buffers/
Caffe
Protocol Buffers
“Typed JSON” .proto file Java class
from Google
Protocol Buffers
https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto
<- All Caffe proto types defined here, good documentation!
Caffe
Training / Finetuning
No need to write code!
1. Convert data (run a script)
2. Define net (edit prototxt)
3. Define solver (edit prototxt)
4. Train (with pretrained weights) (run a script)
Caffe
Learning rates
(weight + bias)
Regularization
(weight + bias)
Caffe
Learning rates
(weight + bias)
Regularization
(weight + bias)
Caffe
Set these to 0 to
freeze a layer
Learning rates
(weight + bias)
Regularization
(weight + bias)
Caffe
● ResNet-152 prototxt is
6775 lines long!
https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-152-deploy.prototxt
Caffe
Step 4: Train!
./build/tools/caffe train \
-gpu 0 \
-model path/to/trainval.prototxt \
-solver path/to/solver.prototxt \
-weights path/to/pretrained_weights.caffemodel
https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp
Caffe
Step 4: Train!
./build/tools/caffe train \
-gpu 0 \
-model path/to/trainval.prototxt \
-solver path/to/solver.prototxt \
-weights path/to/pretrained_weights.caffemodel
-gpu -1
https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp
Caffe
Step 4: Train!
./build/tools/caffe train \
-gpu 0 \
-model path/to/trainval.prototxt \
-solver path/to/solver.prototxt \
-weights path/to/pretrained_weights.caffemodel
-gpu all
https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp
Caffe
Pros / Cons
(+) Good for feedforward networks
(+) Good for finetuning existing networks
(+) Train models without writing any code!
(+) Python and matlab interfaces are pretty useful!
(-) Need to write C++ / CUDA for new GPU layers
(-) Not good for recurrent networks
(-) Cumbersome for big networks (GoogLeNet, ResNet)
Torch
http://torch.ch
Torch
Lua
High level scripting language, easy to
interface with C
Similar to Javascript:
One data structure:
table == JS object
Prototypical inheritance
metatable == JS prototype
First-class functions
Some gotchas:
1-indexed =(
Variables global by default =(
Small standard library http://tylerneylon.com/a/learn-lua/
Torch
Tensors
Torch tensors are just like numpy arrays
Torch
Tensors
Torch tensors are just like numpy arrays
Torch
Tensors
Torch tensors are just like numpy arrays
Torch
Tensors
Like numpy, can easily change data type:
Torch
Tensors
Unlike numpy, GPU is just a datatype away:
Torch
Tensors
Documentation on GitHub:
https://github.com/torch/torch7/blob/master/doc/tensor.md https://github.com/torch/torch7/blob/master/doc/maths.md
Torch
nn
nn module lets you easily build
and train neural nets
Torch
nn
nn module lets you easily build
and train neural nets
nn
nn module lets you easily build
and train neural nets
nn
nn module lets you easily build
and train neural nets
nn
nn module lets you easily build
and train neural nets
nn
nn module lets you easily build
and train neural nets
nn
nn module lets you easily build
and train neural nets
nn
nn module lets you easily build
and train neural nets
cunn
Running on GPU is easy:
Torch
cunn
Running on GPU is easy:
cunn
Running on GPU is easy:
cunn
Running on GPU is easy:
optim
optim package implements different
update rules: momentum, Adam, etc
Torch
optim
optim package implements different
update rules: momentum, Adam, etc
optim
optim package implements different
update rules: momentum, Adam, etc
optim
optim package implements different
update rules: momentum, Adam, etc
Modules
Caffe has Nets and Layers;
Torch just has Modules
Torch
Modules
Caffe has Nets and Layers;
Torch just has Modules
Modules
Caffe has Nets and Layers;
Torch just has Modules
https://github.com/torch/nn/blob/master/Linear.lua
Torch
Modules
Caffe has Nets and Layers;
Torch just has Modules
updateGradInput: Backward;
compute gradient of input
https://github.com/torch/nn/blob/master/Linear.lua
Torch
Modules
Caffe has Nets and Layers;
Torch just has Modules
accGradParameters: Backward;
compute gradient of weights
https://github.com/torch/nn/blob/master/Linear.lua
Torch
Modules
Tons of built-in modules and loss functions
https://github.com/torch/nn
Torch
Modules
Writing your own modules is easy!
Torch
Modules
Container modules allow you to combine multiple modules
Torch
Modules
Container modules allow you to combine multiple modules
mod1
mod2
out
Torch
Modules
Container modules allow you to combine multiple modules
x x
mod1
mod1 mod2
mod2
Modules
Container modules allow you to combine multiple modules
x x
x1 x2
mod1
mod1 mod2 mod1 mod2
mod2
nngraph
Use nngraph to build modules
that combine their inputs in
complex ways
Inputs: x, y, z
Outputs: c
a=x+y
b=a☉z
c=a+b
Torch
x y z
nngraph
+
Use nngraph to build modules
a
that combine their inputs in
complex ways
☉
Inputs: x, y, z
b
Outputs: c
a=x+y +
b=a☉z
c=a+b c
Torch
x y z
nngraph
+
Use nngraph to build modules
a
that combine their inputs in
complex ways
☉
Inputs: x, y, z
b
Outputs: c
a=x+y +
b=a☉z
c=a+b c
Torch
Pretrained Models
loadcaffe: Load pretrained Caffe models: AlexNet, VGG, some others
https://github.com/szagoruyko/loadcaffe
Package Management
After installing torch, use luarocks
to install or update Lua packages
Pros / Cons
(-) Lua
(-) Less plug-and-play than Caffe
You usually write your own training code
(+) Lots of modular pieces that are easy to combine
(+) Easy to write your own layer types and run on GPU
(+) Most of the library code is in Lua, easy to read
(+) Lots of pretrained models!
(-) Not great for RNNs
Theano
http://deeplearning.net/software/theano/
Theano
c
Theano
Computational Graphs
x y z
c
Theano
Computational Graphs
x y z Define symbolic variables;
these are inputs to the
+
graph
a
c
Theano
Computational Graphs
x y z
Compute intermediates
+
and outputs symbolically
a
c
Theano
Computational Graphs
x y z
+
Compile a function that
a produces c from x, y, z
(generates code)
☉
c
Theano
Computational Graphs
x y z
c
Theano
Computational Graphs
x y z
c
Theano
Simple Neural Net
Theano
Simple Neural Net
Define symbolic variables:
x = data
y = labels
w1 = first-layer weights
w2 = second-layer weights
Theano
Simple Neural Net
From Google