0% found this document useful (0 votes)
18 views3 pages

RNN LSTM BiRNN Notes

The document provides an overview of Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Bidirectional RNNs (BiRNNs). RNNs are designed for sequential data but face limitations like vanishing gradients, while LSTMs address these issues with a more complex architecture that includes gates for better long-term memory. BiRNNs enhance performance by processing sequences in both directions, capturing richer context for applications in natural language processing and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

RNN LSTM BiRNN Notes

The document provides an overview of Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Bidirectional RNNs (BiRNNs). RNNs are designed for sequential data but face limitations like vanishing gradients, while LSTMs address these issues with a more complex architecture that includes gates for better long-term memory. BiRNNs enhance performance by processing sequences in both directions, capturing richer context for applications in natural language processing and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

# Detailed Notes on RNN, LSTM, and Bidirectional RNN

## 1. Recurrent Neural Network (RNN)

### Overview
Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle
sequential data by using feedback loops. Unlike feedforward neural networks, RNNs
have connections that cycle back, enabling them to maintain a hidden state and
process data sequences of arbitrary length.

### Key Concepts


- **Sequential Data Processing**: RNNs are suitable for tasks like time series
prediction, natural language processing (NLP), and speech recognition.
- **Hidden State**: The hidden state acts as a memory that captures information
about previous time steps.
- **Backpropagation Through Time (BPTT)**: RNNs use BPTT to calculate gradients
during training, which extends back through the sequence.

### Architecture
- At each time step:
- Input vector \( x_t \)
- Hidden state \( h_t \)
- Output vector \( o_t \)
- Transition equations:
\[
h_t = f(W_{hx} x_t + W_{hh} h_{t-1} + b_h)
\]
\[
o_t = g(W_{ho} h_t + b_o)
\]
where \( f \) is typically a non-linear activation function (e.g., tanh or ReLU),
and \( g \) can be softmax for classification tasks.

### Limitations
1. **Vanishing/Exploding Gradients**: Gradients diminish or grow exponentially
during backpropagation for long sequences, making it difficult to learn
dependencies over extended time horizons.
2. **Short-Term Memory**: Standard RNNs struggle with long-term dependencies due to
their simplistic hidden state update mechanism.

---

## 2. Long Short-Term Memory (LSTM)

### Overview
LSTMs are a type of RNN designed to address the vanishing gradient problem and
better capture long-term dependencies in sequences. They achieve this through a
more complex architecture, including gates to control the flow of information.

### Key Components


1. **Cell State (\( C_t \))**: A memory mechanism that retains long-term
information.
2. **Hidden State (\( h_t \))**: Short-term memory used for current computations.
3. **Gates**: Mechanisms to regulate the flow of information:
- **Forget Gate**: Decides what information to discard from the cell state.
\[
f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)
\]
- **Input Gate**: Decides what information to add to the cell state.
\[
i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)
\]
\[
\tilde{C}_t = \tanh(W_c [h_{t-1}, x_t] + b_c)
\]
- **Output Gate**: Decides the output of the LSTM cell.
\[
o_t = \sigma(W_o [h_{t-1}, x_t] + b_o)
\]

### Update Mechanism


- Update cell state:
\[
C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t
\]
- Update hidden state:
\[
h_t = o_t \odot \tanh(C_t)
\]

### Advantages
1. **Handles Long-Term Dependencies**: The cell state and gating mechanisms allow
LSTMs to capture dependencies across long sequences.
2. **Flexible Memory Management**: Gates provide control over what to remember,
update, or forget.

### Applications
- Text generation
- Machine translation
- Speech recognition
- Time series forecasting

---

## 3. Bidirectional RNN (BiRNN)

### Overview
Bidirectional RNNs process sequences in both forward and backward directions,
allowing them to capture past and future context simultaneously.

### Architecture
- Two RNNs are used:
1. **Forward RNN**: Processes the input sequence from the beginning to the end.
2. **Backward RNN**: Processes the input sequence from the end to the beginning.
- The outputs of both RNNs are combined at each time step:
\[
h_t = [\overrightarrow{h_t}; \overleftarrow{h_t}]
\]
where \( \overrightarrow{h_t} \) and \( \overleftarrow{h_t} \) are the hidden
states from the forward and backward RNNs, respectively.

### Advantages
1. **Rich Context**: BiRNNs capture information from both past and future context
in the sequence.
2. **Improved Performance**: Particularly beneficial for tasks like speech
recognition and sequence labeling.
### Applications
- Named Entity Recognition (NER)
- Part-of-Speech (POS) tagging
- Text classification
- Speech-to-text systems

---

## Summary Table
| Feature | RNN | LSTM | BiRNN
|
|------------------------|-----------------------|-----------------------|---------
--------------|
| **Handles Long-Term Dependencies** | Limited | Yes
| Yes |
| **Gradient Issues** | Vanishing/Exploding | Solved | Same as
underlying RNN (LSTM or GRU) |
| **Bidirectional Context** | No | No | Yes
|
| **Architecture Complexity** | Simple | Complex |
Complex |
| **Applications** | Basic sequential tasks| Long-sequence tasks | Context-
rich tasks |

---

## Key Takeaways
- RNNs are foundational for sequence modeling but are limited by gradient-related
issues.
- LSTMs address these limitations by introducing gates and memory cells to capture
long-term dependencies effectively.
- Bidirectional RNNs enhance sequence modeling by incorporating both past and
future context, making them powerful for NLP and speech-related tasks.

You might also like