Long Short-Term Memory Networks PDF
Long Short-Term Memory Networks PDF
Akshay Sood
Introduction
● Feedforward neural networks
Recurrent Neural Networks (RNNs)
● Networks with feedback loops (recurrent edges)
● Output at current time step depends on current input as well as previous state (via recurrent edges)
UNFOLD
IN TIME
Training RNNs
● Backpropagation Through Time (BPTT)
○ Regular (feedforward) backprop applied to RNN unfolded in time
○ Truncated BPTT approximation
Training RNNs
● Problem: can’t capture long-term dependencies due to vanishing/exploding
gradients during backpropagation
UNFOLD
Long Short-Term Memory networks (LSTMs)
● A type of RNN architecture that addresses the vanishing/exploding gradient problem
and allows learning of long-term dependencies
MEMORY
Simplified schematic
for reference
Cell state vector
● Represents the memory of the LSTM
● Undergoes changes via forgetting of old memory (forget gate) and addition of new
memory (input gate)
● Gates are controlled by a concatenation of the output from the previous time step and
the current input and optionally the cell state vector.
Forget Gate
● Controls what information to throw away from memory
Input Gate
● Controls what new information is added to cell state from current input
Memory Update
● The cell state vector aggregates the two components (old memory via the
forget gate and new memory via the input gate)
Output Gate
● Conditionally decides what to output from the memory
LSTM Memory Cell Summary
LSTM Training
● Backpropagation Through Time (BPTT) most common
● What weights are learned?
○ Gates (input/output/forget)
○ Input tanh layer
● Outputs depend on the task:
○ Single output prediction for the whole sequence (e.g. below)
○ One output at each time step (sequence labeling)
Deep LSTMs
● Bidirectional RNNs can better exploit context in both directions, for e.g. bidirectional LSTMs perform
better than unidirectional ones in speech recognition (Graves et al. 2013)
LSTMs for Machine Translation (Sutskever et al. 2014)