RNN LSTM BiRNN Notes
RNN LSTM BiRNN Notes
### Overview
Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle
sequential data by using feedback loops. Unlike feedforward neural networks, RNNs
have connections that cycle back, enabling them to maintain a hidden state and
process data sequences of arbitrary length.
### Architecture
- At each time step:
- Input vector \( x_t \)
- Hidden state \( h_t \)
- Output vector \( o_t \)
- Transition equations:
\[
h_t = f(W_{hx} x_t + W_{hh} h_{t-1} + b_h)
\]
\[
o_t = g(W_{ho} h_t + b_o)
\]
where \( f \) is typically a non-linear activation function (e.g., tanh or ReLU),
and \( g \) can be softmax for classification tasks.
### Limitations
1. **Vanishing/Exploding Gradients**: Gradients diminish or grow exponentially
during backpropagation for long sequences, making it difficult to learn
dependencies over extended time horizons.
2. **Short-Term Memory**: Standard RNNs struggle with long-term dependencies due to
their simplistic hidden state update mechanism.
---
### Overview
LSTMs are a type of RNN designed to address the vanishing gradient problem and
better capture long-term dependencies in sequences. They achieve this through a
more complex architecture, including gates to control the flow of information.
### Advantages
1. **Handles Long-Term Dependencies**: The cell state and gating mechanisms allow
LSTMs to capture dependencies across long sequences.
2. **Flexible Memory Management**: Gates provide control over what to remember,
update, or forget.
### Applications
- Text generation
- Machine translation
- Speech recognition
- Time series forecasting
---
### Overview
Bidirectional RNNs process sequences in both forward and backward directions,
allowing them to capture past and future context simultaneously.
### Architecture
- Two RNNs are used:
1. **Forward RNN**: Processes the input sequence from the beginning to the end.
2. **Backward RNN**: Processes the input sequence from the end to the beginning.
- The outputs of both RNNs are combined at each time step:
\[
h_t = [\overrightarrow{h_t}; \overleftarrow{h_t}]
\]
where \( \overrightarrow{h_t} \) and \( \overleftarrow{h_t} \) are the hidden
states from the forward and backward RNNs, respectively.
### Advantages
1. **Rich Context**: BiRNNs capture information from both past and future context
in the sequence.
2. **Improved Performance**: Particularly beneficial for tasks like speech
recognition and sequence labeling.
### Applications
- Named Entity Recognition (NER)
- Part-of-Speech (POS) tagging
- Text classification
- Speech-to-text systems
---
## Summary Table
| Feature | RNN | LSTM | BiRNN
|
|------------------------|-----------------------|-----------------------|---------
--------------|
| **Handles Long-Term Dependencies** | Limited | Yes
| Yes |
| **Gradient Issues** | Vanishing/Exploding | Solved | Same as
underlying RNN (LSTM or GRU) |
| **Bidirectional Context** | No | No | Yes
|
| **Architecture Complexity** | Simple | Complex |
Complex |
| **Applications** | Basic sequential tasks| Long-sequence tasks | Context-
rich tasks |
---
## Key Takeaways
- RNNs are foundational for sequence modeling but are limited by gradient-related
issues.
- LSTMs address these limitations by introducing gates and memory cells to capture
long-term dependencies effectively.
- Bidirectional RNNs enhance sequence modeling by incorporating both past and
future context, making them powerful for NLP and speech-related tasks.