0% found this document useful (0 votes)

31 views

UDRC RNN LSTM LibrariesTutorial

The document discusses RNN, LSTM and deep learning libraries like Caffe, Torch, Theano and TensorFlow. It explains RNN flexibility and applications, how LSTM works and improves on vanilla RNN, and provides an overview of Caffe including its main classes and use of protocol buffers.

Uploaded by

Soumana Sanou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

UDRC RNN LSTM LibrariesTutorial

Uploaded by

Soumana Sanou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 144

RNN LSTM and Deep Learning Libraries

UDRC Summer School

Muhammad Awais
[email protected]
Outline

➢ Recurrent Neural Network

➢ Application of RNN
➢ LSTM
➢ Caffe
➢ Torch
➢ Theano
➢ TensorFlow
Flexibility of Recurrent Neural Networks

Vanilla Neural Networks

Flexibility of Recurrent Neural Networks

e.g. Image Captioning

image -> sequence of words
Flexibility of Recurrent Neural Networks

e.g. Sentiment Classification

sequence of words -> sentiment
Flexibility of Recurrent Neural Networks

e.g. Machine Translation

seq of words -> seq of words
Flexibility of Recurrent Neural Networks

e.g. Video classification on frame level

Recurrent Neural Networks

RNN

x
Recurrent Neural Networks

usually want to predict

y a vector at some time
steps

RNN

x
Recurrent Neural Networks

We can process a sequence of vectors x by

applying a recurrence formula at every time step: y

RNN
new state old state input vector at
some time step
some function x
with parameters W
Recurrent Neural Networks

We can process a sequence of vectors x by

applying a recurrence formula at every time step: y

RNN
new state old state input vector at
some time step
some function x
with parameters W
Notice: the same function and the same set
of parameters are used at every time step.
Recurrent Neural Networks
The state consists of a single “hidden” vector h:

RNN

x
Recurrent Neural Networks

Character-level y
language model
example RNN

Vocabulary:
x
[h,e,l,o]

Example training
sequence:
“hello”
Recurrent Neural Networks

Character-level
language model
example

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”
Recurrent Neural Networks

Character-level
language model
example

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”
Recurrent Neural Networks

Character-level
language model
example

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”
Recurrent Neural Networks
Image Captioning

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.

Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei
Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Recurrent Neural Networks

Recurrent Neural Network

Convolutional Neural Network

Recurrent Neural Networks
test image
Recurrent Neural Networks
test image
Recurrent Neural Networks
test image

X
Recurrent Neural Networks
test image

x0
<STA
RT>

<START>
Recurrent Neural Networks
test image

before:
h = tanh(Wxh * x + Whh * h)
h0
Wih
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
x0
<STA
RT>

v <START>
Recurrent Neural Networks
test image

sample!
h0

x0
<STA straw
RT>

<START>
Recurrent Neural Networks
test image

y0 y1

h0 h1

x0
<STA straw
RT>

<START>
Recurrent Neural Networks
test image

y0 y1

h0 h1
sample!

x0
<STA straw hat
RT>

<START>
Recurrent Neural Networks
test image

y0 y1 y2

h0 h1 h2

x0
<STA straw hat
RT>

<START>
Recurrent Neural Networks
test image

y0 y1 y2

sample
<END> token
h0 h1 h2 => finish.

x0
<STA straw hat
RT>

<START>
Recurrent Neural Networks
Image Sentence Datasets

Microsoft COCO
[Tsung-Yi Lin et al. 2014]
mscoco.org

currently:
~120K images
~5 sentences each
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks

depth

time
Recurrent Neural Networks

LSTM:

depth

time
Long Short Term Memory (LSTM)

h
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]

vector from
below (x)
x sigmoid i

h sigmoid f
W
vector from sigmoid o
before (h)
tanh g

4n x 2n 4n 4*n
Long Short Term Memory (LSTM)

cell
state c

f
Long Short Term Memory (LSTM)

cell
state c

x +

f i g
Long Short Term Memory (LSTM)

cell
state c

x +
c
x tanh

f i g o x

h
Long Short Term Memory (LSTM)
higher layer, or
prediction

cell
state c

x +
c
x tanh

f i g o x

h
Long Short Term Memory (LSTM)

LSTM one timestep one timestep

cell
state c

x + x +

tanh tanh
x x

f f
i g x i g x
o o

h h h
x x
Long Short Term Memory (LSTM)

Summary
- RNNs allow a lot of flexibility in architecture design
- Vanilla RNNs are simple but don’t work very well
- Common to use LSTM or GRU: their additive interactions
improve gradient flow
- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is
controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research
- Better understanding (both theoretical and empirical) is needed.
Deep Learning Libraries
Caffe, Torch, Theano, TensorFlow
Caffe
http://caffe.berkeleyvision.org
Caffe overview

From U.C. Berkeley

Written in C++
Has Python and MATLAB bindings
Good for training or finetuning feedforward models
Caffe
SoftmaxLossLayer
Main classes
data
fc1
Blob: Stores data and diffs
derivatives (header source)
Layer: Transforms bottom InnerProductLayer
blobs to top blobs (header + source)
Net: Many layers; computes data data data
gradients via forward / W X y
diffs diffs diffs
backward (header source)
Solver: Uses gradients to
update weights (header source) DataLayer
45
Caffe

Protocol Buffers
“Typed JSON” .proto file

from Google

Define “message types” in

.proto files

https://developers.google.com/protocol-buffers/
Caffe

Protocol Buffers
“Typed JSON” .proto file

from Google

Define “message types” in

.proto files

Serialize instances to text .prototxt file

files (.prototxt)
name: “John Doe”
id: 1234
email: “[email protected]”

https://developers.google.com/protocol-buffers/
Caffe

Protocol Buffers
“Typed JSON” .proto file Java class
from Google

Define “message types” in

.proto files

Serialize instances to text .prototxt file

C++ class
files (.prototxt)
name: “John Doe”
id: 1234
email: “[email protected]”
Compile classes for
different languages
Caffe

Protocol Buffers

https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto
<- All Caffe proto types defined here, good documentation!
Caffe

Training / Finetuning
No need to write code!
1. Convert data (run a script)
2. Define net (edit prototxt)
3. Define solver (edit prototxt)
4. Train (with pretrained weights) (run a script)
Caffe

Step 1: Convert Data

DataLayer reading from LMDB is the easiest
Create LMDB using convert_imageset
Need text file where each line is
“[path/to/image.jpeg] [label]”
Create HDF5 file yourself using h5py
Caffe

Step 2: Define Net

Caffe

Step 2: Define Net

Layers and Blobs
often have same
name!
Caffe

Step 2: Define Net

Layers and Blobs
often have same
name!

Learning rates
(weight + bias)

Regularization
(weight + bias)
Caffe

Step 2: Define Net Number of output

classes

Layers and Blobs

often have same
name!

Learning rates
(weight + bias)

Regularization
(weight + bias)
Caffe

Step 2: Define Net Number of output

classes

Layers and Blobs

often have same
name!

Set these to 0 to
freeze a layer

Learning rates
(weight + bias)

Regularization
(weight + bias)
Caffe

Step 2: Define Net

● .prototxt can get ugly for

big models

● ResNet-152 prototxt is
6775 lines long!

● Not “compositional”; can’t

easily define a residual
block and reuse

https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-152-deploy.prototxt
Caffe

Step 2: Define Net (finetuning)

Original prototxt: Modified prototxt:
layer {
Same name: layer {
name: "fc7" weights copied name: "fc7"
type: "InnerProduct" type: "InnerProduct"
inner_product_param { inner_product_param {
num_output: 4096 num_output: 4096
}
Pretrained weights: }
“fc7.weight”: [values]
} }
“fc7.bias”: [values]
[... ReLU, Dropout] [... ReLU, Dropout]
“fc8.weight”: [values]
layer { layer {
“fc8.bias”: [values]
name: "fc8" name: "my-fc8"
type: "InnerProduct" type: "InnerProduct"
inner_product_param { inner_product_param {
num_output: 1000 num_output: 10
} }
} }
Caffe

Step 2: Define Net (finetuning)

Step 3: Define Solver

Write a prototxt file defining a
SolverParameter
If finetuning, copy existing
solver.prototxt file
Change net to be your net
Change snapshot_prefix to your
output
Reduce base learning rate (divide
by 100)
Maybe change max_iter and
snapshot
Caffe

Step 4: Train!
./build/tools/caffe train \
-gpu 0 \
-model path/to/trainval.prototxt \
-solver path/to/solver.prototxt \
-weights path/to/pretrained_weights.caffemodel

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp
Caffe

Step 4: Train!
./build/tools/caffe train \
-gpu 0 \
-model path/to/trainval.prototxt \
-solver path/to/solver.prototxt \
-weights path/to/pretrained_weights.caffemodel

-gpu -1

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp
Caffe

Step 4: Train!
./build/tools/caffe train \
-gpu 0 \
-model path/to/trainval.prototxt \
-solver path/to/solver.prototxt \
-weights path/to/pretrained_weights.caffemodel

-gpu all

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp
Caffe

Pros / Cons
(+) Good for feedforward networks
(+) Good for finetuning existing networks
(+) Train models without writing any code!
(+) Python and matlab interfaces are pretty useful!
(-) Need to write C++ / CUDA for new GPU layers
(-) Not good for recurrent networks
(-) Cumbersome for big networks (GoogLeNet, ResNet)
Torch
http://torch.ch
Torch

From NYU + IDIAP

Written in C and Lua
Used a lot a Facebook, DeepMind
Torch

Lua
High level scripting language, easy to
interface with C
Similar to Javascript:
One data structure:
table == JS object
Prototypical inheritance
metatable == JS prototype
First-class functions
Some gotchas:
1-indexed =(
Variables global by default =(
Small standard library http://tylerneylon.com/a/learn-lua/
Torch

Tensors
Torch tensors are just like numpy arrays
Torch

Tensors
Like numpy, can easily change data type:
Torch

Tensors
Unlike numpy, GPU is just a datatype away:
Torch

Tensors
Documentation on GitHub:

https://github.com/torch/torch7/blob/master/doc/tensor.md https://github.com/torch/torch7/blob/master/doc/maths.md
Torch

nn
nn module lets you easily build
and train neural nets
Torch