0% found this document useful (0 votes)
10 views6 pages

Deep Learning

LECTURE NOTES OF RNN IN DEEP LEARNING

Uploaded by

Divya Zindani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Deep Learning

LECTURE NOTES OF RNN IN DEEP LEARNING

Uploaded by

Divya Zindani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

RECURRENT NEURAL NETWORK

### Recurrent Neural Networks (RNNs): A Complete Description

Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle
sequential data and time-series problems. Unlike feedforward networks, RNNs have
internal memory that enables them to retain information from previous inputs and utilize
it in processing subsequent inputs. This memory makes them well-suited for tasks where
the current input is dependent on prior inputs, such as natural language processing,
speech recognition, and time-series prediction.

#### 1. **The Architecture of RNNs**

The core of an RNN is a loop that allows information to be passed from one step of the
network to the next. This is achieved by maintaining a hidden state (or internal state) at
each time step. The hidden state acts as memory, holding information about previous time
steps.

In a simple RNN:
- The input at time step \( t \) is denoted by \( x_t \).
- The hidden state at time \( t \) is denoted by \( h_t \), which is updated based on the
current input \( x_t \) and the hidden state from the previous time step \( h_{t-1} \).
- The output at time \( t \) is \( o_t \), which can be based on the hidden state or both the
hidden state and the input.

The equations governing the RNN are as follows:


\[
h_t = \sigma(W_h h_{t-1} + W_x x_t + b_h)
\]
\[
o_t = \sigma(W_o h_t + b_o)
\]
where:
- \( W_h \), \( W_x \), and \( W_o \) are weight matrices,
- \( b_h \) and \( b_o \) are biases,
- \( \sigma \) is a non-linear activation function like tanh or ReLU.

The network is "recurrent" because the hidden state at time \( t \) depends on the hidden
state from the previous time step \( t-1 \), creating a feedback loop. This feedback
mechanism enables the network to process sequences of data.

#### 2. **Challenges in RNNs: Vanishing and Exploding Gradients**

One of the major issues with training RNNs is the vanishing and exploding gradient
problem. This occurs when gradients during backpropagation through time (BPTT) either
shrink to near-zero values or grow exponentially, leading to instability. The vanishing
gradient issue hampers the network's ability to learn long-term dependencies, while the
exploding gradient problem makes the learning process unstable due to large weight
updates.

The vanishing gradient problem arises from the repeated multiplication of the gradient by
small values (due to the chain rule), especially when using non-linear activation functions
like tanh or sigmoid. Exploding gradients, on the other hand, occur when gradients
become too large, causing erratic updates to the weights during training.

Several solutions have been proposed to mitigate these issues:


- **Gradient Clipping**: This technique prevents the gradients from becoming too large by
scaling them back when they exceed a certain threshold.
- **Gated Architectures**: RNN variants like Long Short-Term Memory (LSTM) and Gated
Recurrent Units (GRU) are designed to address these problems more effectively.

#### 3. **Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU)**

LSTMs and GRUs are extensions of standard RNNs and are widely used because they
overcome the limitations of vanilla RNNs in learning long-term dependencies.
- **LSTM**: An LSTM introduces additional gates (input gate, forget gate, and output gate)
to control the flow of information through the network. It also maintains a cell state that
can retain information over long periods. The cell state is updated based on how much
information should be "forgotten" from the past and how much new information should
be added. This gating mechanism allows LSTMs to capture long-range dependencies
effectively.

The key equations for an LSTM are:


\[
f_t = \sigma(W_f [h_{t-1}, x_t] + b_f) \quad \text{(forget gate)}
\]
\[
i_t = \sigma(W_i [h_{t-1}, x_t] + b_i) \quad \text{(input gate)}
\]
\[
o_t = \sigma(W_o [h_{t-1}, x_t] + b_o) \quad \text{(output gate)}
\]
\[
\tilde{C_t} = \tanh(W_c [h_{t-1}, x_t] + b_c) \quad \text{(candidate memory cell)}
\]
\[
C_t = f_t * C_{t-1} + i_t * \tilde{C_t} \quad \text{(new cell state)}
\]
\[
h_t = o_t * \tanh(C_t) \quad \text{(new hidden state)}
\]
where:
- \( f_t \) is the forget gate,
- \( i_t \) is the input gate,
- \( o_t \) is the output gate,
- \( C_t \) is the cell state,
- \( h_t \) is the hidden state.

- **GRU**: A GRU simplifies the LSTM by combining the forget and input gates into a
single update gate, and it merges the cell state and hidden state into one. Despite having
fewer gates, GRUs have been shown to perform similarly to LSTMs in many applications.

The key equations for GRU are:


\[
z_t = \sigma(W_z [h_{t-1}, x_t] + b_z) \quad \text{(update gate)}
\]
\[
r_t = \sigma(W_r [h_{t-1}, x_t] + b_r) \quad \text{(reset gate)}
\]
\[
\tilde{h_t} = \tanh(W_h [r_t * h_{t-1}, x_t] + b_h) \quad \text{(candidate hidden state)}
\]
\[
h_t = (1 - z_t) * h_{t-1} + z_t * \tilde{h_t} \quad \text{(new hidden state)}
\]
where:
- \( z_t \) is the update gate,
- \( r_t \) is the reset gate,
- \( \tilde{h_t} \) is the candidate hidden state.

#### 4. **Applications of RNNs**

RNNs, LSTMs, and GRUs are used in various applications that require sequential data
processing:
- **Natural Language Processing (NLP)**: RNNs are used in language models, machine
translation, and sentiment analysis. They are capable of processing variable-length text
sequences and capturing contextual relationships between words.
- **Speech Recognition**: RNNs can model sequences of audio frames to recognize
spoken words, as they can capture the temporal dependencies in speech.
- **Time-Series Forecasting**: RNNs are used for predicting stock prices, weather
forecasting, and other applications that rely on historical data to make future predictions.
- **Image Captioning**: When combined with Convolutional Neural Networks (CNNs),
RNNs can be used to generate descriptions for images, where the CNN extracts features
and the RNN generates a sequence of words to describe the image.

#### 5. **Bidirectional RNNs**

A Bidirectional RNN (BRNN) consists of two RNNs: one processes the input sequence in the
forward direction (left to right), and the other processes the sequence in the backward
direction (right to left). By doing this, the BRNN can capture dependencies from both past
and future context, improving performance on tasks where context in both directions is
important, such as in NLP.

The output of a BRNN is computed by concatenating the hidden states from the forward
and backward passes at each time step.

#### 6. **Attention Mechanism**

Although RNNs can handle sequential data, they may struggle with very long sequences
due to the difficulty in retaining important information over many time steps. The
attention mechanism addresses this issue by allowing the network to focus on specific
parts of the input sequence when making predictions. This has been particularly effective
in machine translation and other sequence-to-sequence tasks.

### Conclusion

RNNs have made significant contributions to sequence modeling tasks, but their
limitations, such as vanishing gradients and difficulty in handling long-term dependencies,
have led to the development of more advanced architectures like LSTMs, GRUs, and
attention mechanisms. These models continue to be essential tools in various domains
where sequence data plays a critical role.

You might also like