0% found this document useful (0 votes)

3 views60 pages

AI Foundation Application-RNN

Uploaded by

Đức Trung Trần

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views60 pages

AI Foundation Application-RNN

Uploaded by

Đức Trung Trần

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Recurrent Neural Networks

Thien Huynh-The
Department of Computer and Communications Engineering
HCMC University of Technology and Education

April 17, 2025

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 1 / 60

Introduction: The Need for Memory

• Traditional Neural Networks (FFNNs) process inputs independently.

• This is insufficient for sequential data (e.g., text, time series).
• RNNs introduce the concept of memory to handle sequences.
• They maintain a hidden state that captures information from previous inputs.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 2 / 60

RNN Architecture: Unrolling in Time
• RNNs process sequential data by maintaining a hidden state - carry information across time steps.
• The ”unrolling” visualization shows how the RNN processes a sequence step-by-step.
• At each time step t:
• The RNN receives an input xt .
• It updates its hidden state ht based on the current input and the previous hidden state ht−1 .
• It produces an output yt (optional, depending on the task).
• The same weight matrices (Wxh , Whh , Why ) are used at each time step for model efficiency.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 3 / 60

RNN Equations
• Hidden state update:

ht = σ(Wxh xt + Whh ht−1 + bh )

• Output calculation:

yt = softmax(Why ht + by )

where:
• xt : Input at time t
• ht : Hidden state at time t
• yt : Output at time t
• Wxh , Whh , Why : Weight matrices
• bh , by : Bias vectors
• σ: Activation function (e.g., tanh, ReLU)

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 4 / 60

Detailed Look at the Hidden State Update

• The core of an RNN is the hidden state update.

• The hidden state at time t, ht , is calculated using the following equation:

ht = σ(Wxh xt + Whh ht−1 + bh )

• Breakdown:
• xt : Input at time t.
• ht−1 : Hidden state from the previous time step. This is the “memory” of the network.
• Wxh : Weight matrix connecting the input to the hidden state.
• Whh : Weight matrix connecting the previous hidden state to the current hidden state.
• bh : Bias vector for the hidden state.
• σ: Activation function (e.g., tanh, ReLU). This introduces non-linearity.
• The weights Wxh and Whh , and the bias bh are shared across all time steps.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 5 / 60

Input, Hidden State, and Output Dimensions

• Let’s consider the dimensions of the vectors and matrices:

• xt ∈ Rnx : Input vector of size nx .
• ht ∈ Rnh : Hidden state vector of size nh .
• yt ∈ Rny : Output vector of size ny .
• The weight matrices then have the following dimensions:
• Wxh ∈ Rnh ×nx : Maps the input to the hidden state space.
• Whh ∈ Rnh ×nh : Maps the previous hidden state to the current hidden state space.
• Why ∈ Rny ×nh : Maps the hidden state to the output space.
• The bias vectors have the following dimensions:
• bh ∈ Rnh : Bias for the hidden state.
• by ∈ Rny : Bias for the output.
• Understanding these dimensions is crucial for implementing and working with RNNs.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 6 / 60

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 7 / 60
Many-to-Many RNNs: Sequence Input, Sequence Output

• Many-to-many RNNs are used when both the input and output are sequences of arbitrary length.
• Examples:
• Machine Translation (e.g., English sentence to French sentence)
• Video captioning (sequence of frames to a description)
• Part-of-speech tagging (sequence of words to sequence of tags)
• The network processes each input in the sequence and produces an output at each time step.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 8 / 60

Unrolling the Many-to-Many RNN

• To understand the computations, we “unroll” the RNN over time.

• This reveals the sequential processing of inputs and the generation of outputs.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 9 / 60

Computational Steps and Equations

• At each time step t:

1. The RNN receives input xt .
2. The hidden state is updated:

ht = σ(Wxh xt + Whh ht−1 + bh )

3. An output yt is produced:
yt = softmax(Why ht + by )
• Key Points:
• The same weight matrices (Wxh , Whh , Why ) are used at every time step. This is crucial for
handling variable-length sequences and greatly reduces the number of parameters.
• The hidden state ht carries information from previous time steps, enabling the network to
learn dependencies in the sequence.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 10 / 60

Loss Calculation in Many-to-Many RNNs

• In a Many-to-Many RNN, we often compute a loss at each time step.

• This allows us to train the network to generate the correct output sequence.
• Let’s define:
• yt : The predicted output at time step t.
• ŷt : The target (or true) output at time step t.
• Lt : The loss function at time step t.
• The loss at each time step is calculated by comparing the predicted output yt with the
corresponding target output ŷt using a suitable loss function L.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 11 / 60

Loss Calculation in Many-to-Many RNNs

• Common loss functions:

• Cross-Entropy Loss: Typically used for classification tasks where the output is a probability
distribution over classes.
XC
Lt = − ŷt,i log(yt,i )
i=1

where C is the number of classes, ŷt,i is the true probability for class i at time t, and yt,i is
the predicted probability for class i at time t.
• Mean Squared Error (MSE): Used for regression tasks where the output is a continuous
value.
n
1X
Lt = (ŷt,i − yt,i )2
n
i=1

where n is the dimension of the output vector.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 12 / 60

Loss Calculation in Many-to-Many RNNs

Total Loss
The total loss over the entire sequence is usually the sum of the individual time-step losses:
T
X
L= Lt
t=1

or the average:
T
1 X
L= Lt
T t=1
where T is the length of the sequence. This total loss is what is minimized during training using
backpropagation through time (BPTT).

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 13 / 60

Weight Sharing: Efficiency and Generalization

• Parameter Efficiency: Sharing weights significantly reduces the number of parameters the model
needs to learn. This is especially important when dealing with long sequences. Imagine if each
time step had its own set of weights; the number of parameters would explode.
• Generalization: Weight sharing allows the model to generalize across different positions in the
sequence. The model learns features that are useful regardless of where they appear in the input.
For example, in language modeling, the model learns grammatical rules that apply throughout a
sentence, not just at specific word positions.
• This is a key characteristic that distinguishes RNNs from other neural network architectures.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 14 / 60

Many-to-One RNNs: Sequence Input, Single Output

• Many-to-one RNNs process a sequence of inputs and produce a single output.

• They are useful for tasks where the entire input sequence needs to be summarized or classified.
• Examples:
• Sentiment analysis (a sentence to a sentiment label: positive, negative, neutral)
• Document classification (a document to a category)
• DNA sequence classification (a DNA sequence to a type of disease)
• The network reads the entire input sequence, updates its hidden state at each step, and uses the
final hidden state to generate the output.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 15 / 60

Unrolling the Many-to-One RNN
• Unrolling the RNN reveals the sequential processing of the input and the final output generation.
• The hidden state carries information from all previous inputs.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 16 / 60

Computational Steps and Equations
• The RNN processes the input sequence step by step:
1. For each time step t, the hidden state is updated:

ht = σ(Wxh xt + Whh ht−1 + bh )

2. After processing the entire sequence, the final hidden state hT (where T is the length of the
sequence) is used to compute the output:

y = g (Why hT + by )

where g is an appropriate output activation function (e.g., softmax for classification, sigmoid
for binary classification).
• Key Aspects:
• The same weight matrices (Wxh , Whh , Why ) are used at each time step.
• The final hidden state hT summarizes the information from the entire input sequence.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 17 / 60

The Importance of the Final Hidden State

• In many-to-one RNNs, the final hidden state hT is crucial.

• It acts as a compressed representation of the entire input sequence.
• This representation is then used to make a prediction or classification.
• For example, in sentiment analysis, hT captures the overall sentiment expressed in the sentence.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 18 / 60

One-to-Many RNNs: Single Input, Sequence Output

• One-to-many RNNs take a single input and generate a sequence as output.

• They are suitable for tasks like:
• Image captioning (an image to a descriptive sentence)
• Music generation (a starting note or style to a melody)
• Generating text from a single keyword
• The input is typically used to initialize the hidden state, and then the RNN generates the output
sequence step by step.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 19 / 60

Unrolling the One-to-Many RNN

• The unrolled graph visualizes the sequential output generation.

• The initial input influences the entire generated sequence.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 20 / 60

Computational Steps and Equations
• The process unfolds as follows:
1. The initial hidden state h0 is often a function of the input x:
h0 = f (Wxh x + bh )
where f can be a linear transformation or more complex function. In simpler cases, h0 can be
initialized to zero.
2. For each time step t (starting from t = 1):
2.1 The hidden state is updated:
ht = σ(Whh ht−1 + bh )
Note: no xt here since the input is only used to initialize h0 .
2.2 An output yt is generated:
yt = g (Why ht + by )
where g is an appropriate output activation function (e.g., softmax, linear).
• Key Points:
• Weight matrices (Whh , Why ) are shared across time steps.
• The initial input x influences the entire output sequence through h0 .
Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 21 / 60
Example: Image Captioning

• In image captioning:
• The input x could be a feature vector extracted from an image using a Convolutional Neural
Network (CNN).
• The RNN then generates a sequence of words (the caption) based on this image feature
vector.
• The initial hidden state h0 is initialized based on the image features, setting the context for
the caption generation.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 22 / 60

Example: Image Captioning

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 23 / 60

Example: Video Captioning

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 24 / 60

Example: Video Captioning

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 25 / 60

Example: Video Frame-based Action Recognition

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 26 / 60

Vanishing/Exploding Gradients

• Vanishing gradients: Gradients become very small, hindering learning of long-term

dependencies.
• Exploding gradients: Gradients become very large, leading to unstable training.
• Solutions:
• Gradient clipping
• Gated architectures (LSTMs, GRUs)

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 27 / 60

LSTM Core Idea: The Cell State
• LSTMs introduce the concept of a cell state (Ct ), which acts as a ”highway” for information to
flow through the network relatively unchanged.
• This allows information to be preserved over many time steps.
• Gates regulate the flow of information into and out of the cell state.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 28 / 60

LSTM Gates: Controlling Information Flow

• LSTMs have three main gates:

• Forget Gate (ft ): Determines what information to discard from the cell state.
• Input Gate (it ): Determines what new information to store in the cell state.
• Output Gate (ot ): Determines what information from the cell state to output.
• Each gate is a sigmoid layer, outputting values between 0 and 1.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 29 / 60

LSTM Equations: A Step-by-Step Breakdown
• Let’s define the key components:
• xt : Input at time t.
• ht−1 : Hidden state at the previous time step.
• Ct−1 : Cell state at the previous time step.
• ht : Hidden state at time t.
• Ct : Cell state at time t.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 30 / 60

LSTM Equations: Forget Gate

• Forget Gate: Decides what information to throw away from the cell state.

ft = σ(Wf [ht−1 , xt ] + bf )

where:
• σ: Sigmoid function.
• Wf : Weight matrix for the forget gate.
• [ht−1 , xt ]: Concatenation of ht−1 and xt .
• bf : Bias for the forget gate.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 31 / 60

LSTM Equations: Input Gate

• Input Gate: Decides what new information to store in the cell state.

it = σ(Wi [ht−1 , xt ] + bi )

• Candidate Cell State: Creates a vector of new candidate values.

C̃t = tanh(WC [ht−1 , xt ] + bC )

• Where:
• Wi : Weight matrix for the input gate.
• WC : Weight matrix for the candidate cell state.
• bi , bC : Biases.
• tanh: Hyperbolic tangent function.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 32 / 60

LSTM Equations: Cell State Update

• Cell State Update: Combines the forget gate, previous cell state, input gate, and
candidate cell state.
Ct = ft ⊙ Ct−1 + it ⊙ C̃t
where ⊙ represents element-wise multiplication.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 33 / 60

LSTM Equations: Output Gate and Hidden State

• Output Gate: Decides what parts of the cell state to output.

ot = σ(Wo [ht−1 , xt ] + bo )

• Hidden State:
ht = ot ⊙ tanh(Ct )
where:
• Wo : Weight matrix for the output gate.
• bo : Bias for the output gate.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 34 / 60

Gated Recurrent Unit (GRU): Simplification of LSTMs
• GRUs are a simplified version of LSTMs, designed to address the vanishing gradient problem with
fewer parameters.
• They achieve comparable performance to LSTMs in many tasks but are computationally more
efficient.
• GRUs combine the cell state and hidden state into a single hidden state.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 35 / 60

GRU Gates: Reset and Update

• GRUs have two gates:

• Update Gate (zt ): Determines how much of the previous hidden state to keep and how
much of the new candidate hidden state to incorporate.
• Reset Gate (rt ): Determines how much of the previous hidden state to ignore.
• Both gates are sigmoid layers, outputting values between 0 and 1.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 36 / 60

GRU Equations: A Step-by-Step Breakdown
• Let’s define the key components:
• xt : Input at time t.
• ht−1 : Hidden state at the previous time step.
• ht : Hidden state at time t.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 37 / 60

GRU Equations: Update Gate
• Update Gate: Controls how much of the previous hidden state is retained.
zt = σ(Wz [ht−1 , xt ] + bz )
where:
• σ: Sigmoid function.
• Wz : Weight matrix for the update gate.
• [ht−1 , xt ]: Concatenation of ht−1 and xt .
• bz : Bias for the update gate.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 38 / 60

GRU Equations: Reset Gate

• Reset Gate: Controls how much of the previous hidden state is used to compute the candidate
hidden state.
rt = σ(Wr [ht−1 , xt ] + br )
where:
• Wr : Weight matrix for the reset gate.
• br : Bias for the reset gate.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 39 / 60

GRU Equations: Candidate Hidden State

• Candidate Hidden State: Computes a new hidden state based on the current input and the
(potentially reset) previous hidden state.

h̃t = tanh(W [rt ⊙ ht−1 , xt ] + b)

where:
• tanh: Hyperbolic tangent function.
• W : Weight matrix for the candidate hidden state.
• rt ⊙ ht−1 : Element-wise multiplication of the reset gate and the previous hidden state.
• b: Bias for the candidate hidden state.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 40 / 60

GRU Equations: Final Hidden State Update
• Final Hidden State: Combines the previous hidden state and the candidate hidden state based
on the update gate.
ht = (1 − zt ) ⊙ ht−1 + zt ⊙ h̃t
• This is a weighted average between the previous hidden state and the candidate hidden state.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 41 / 60

LSTM and GRU: Video Explanation

Illustrated Guide to LSTM’s and GRU’s: A step by step explanation

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 42 / 60

LSTM and GRU: Video Explanation

Long Short-Term Memory for NLP

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 43 / 60

Applications of LSTMs: Natural Language Processing (NLP)
• Machine Translation: LSTMs are used in sequence-to-sequence models to translate text from
one language to another. They excel at capturing long-range dependencies in sentences.
• Text Generation: LSTMs can generate text, such as poems, scripts, and articles, by learning
patterns in existing text.
• Sentiment Analysis: LSTMs can analyze text to determine the sentiment expressed (positive,
negative, neutral). They can capture contextual information that is crucial for accurate sentiment
detection.
• Speech Recognition: LSTMs are used to transcribe spoken language into text. They can handle
the temporal nature of speech signals effectively.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 44 / 60

Applications of LSTMs: Time Series Analysis

• Stock Market Prediction: LSTMs can be used to predict stock prices based on historical data.
They can capture temporal patterns and trends in financial data.
• Weather Forecasting: LSTMs can forecast weather conditions based on historical weather data.
• Healthcare: LSTMs can analyze medical time series data, such as ECG or EEG signals, for
disease detection and prediction.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 45 / 60

Advantages of LSTMs Compared to Other Methods

• Handling Long-Term Dependencies: LSTMs are specifically designed to address the vanishing
gradient problem, enabling them to capture long-range dependencies in sequential data, which
traditional RNNs struggle with.
• Superior Performance in Sequence Tasks: Compared to traditional machine learning methods
like Hidden Markov Models (HMMs) or Conditional Random Fields (CRFs), LSTMs often achieve
better performance in tasks involving sequential data, especially when long-range dependencies are
important.
• Flexibility in Input and Output: LSTMs can handle variable-length input and output sequences,
making them suitable for a wide range of tasks.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 46 / 60

Limitations of LSTMs

• Computational Cost: LSTMs are computationally more expensive than simpler RNNs due to the
multiple gates and complex computations within each cell.
• Difficulty in Parallelization: The sequential nature of LSTMs makes it difficult to parallelize
computations, which can limit training speed on large datasets.
• Still Sensitive to Hyperparameters: LSTMs still require careful tuning of hyperparameters, such
as the number of layers, hidden units, and learning rate.
• Limited Long Context in Very Long Sequences: While they mitigate the vanishing gradient
problem, extremely long sequences can still pose challenges for LSTMs to capture dependencies
across the entire sequence.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 47 / 60

Future Research Directions

• More Efficient Architectures: Research is ongoing to develop more efficient RNN architectures,
such as GRUs or other novel gating mechanisms, that can achieve similar performance to LSTMs
with reduced computational cost.
• Attention Mechanisms: Integrating attention mechanisms with LSTMs allows the model to
focus on relevant parts of the input sequence, further improving performance on tasks with long
sequences.
• Transformer Networks: Transformer networks, which rely on attention mechanisms and do not
have the inherent sequential limitations of RNNs, have shown great success in NLP and other
sequence tasks and are a major area of research. However, RNNs and LSTMs are still relevant in
many contexts.
• Combining with other architectures: Combining LSTMs with Convolutional Neural Networks
(CNNs) or other architectures for multimodal tasks or for capturing different types of features.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 48 / 60

The Rise of Transformers: Overcoming RNN Limitations

• RNNs, while effective for sequential data,

suffer from limitations:
• Vanishing/exploding gradients, hindering
long-range dependency learning.
• Sequential computation, limiting
parallelization.
• Transformers address these limitations
using attention mechanisms.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 49 / 60

Key Idea: Attention
• Attention allows the model to focus on different parts of the input sequence when
processing each element.
• Unlike RNNs, which process sequentially, attention allows for parallel computation.
• This leads to significant speed improvements, especially on long sequences.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 50 / 60

Scaled Dot-Product Attention
• Scaled Dot-Product Attention is the core attention mechanism used in Transformers.
• It takes three inputs: Queries (Q), Keys (K), and Values (V).

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 51 / 60

Scaled Dot-Product Attention: The Calculation

• The attention is calculated as follows:

QK T

Attention(Q, K , V ) = softmax √ V
dk
where:
• Q: Queries matrix.
• K : Keys matrix.
• V : Values matrix. √
• dk : Dimension of the keys. Scaling by dk prevents gradients from becoming too small.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 52 / 60

Multi-Head Attention

• Multi-Head Attention runs the scaled

dot-product attention multiple times in
parallel.
• This allows the model to attend to
information from different representation
subspaces at different positions.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 53 / 60

Multi-Head Attention: The Process

• Each “head” computes:

headi = Attention(QWiQ , KWiK , VWiV )

• The outputs are concatenated and linearly transformed:

MultiHead(Q, K , V ) = Concat(head1 , ..., headh )W O

• Where WiQ , WiK , WiV , and W O are weight matrices.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 54 / 60

Transformer Architecture: Encoder
• The encoder consists of multiple identical layers.
• Each layer contains:
• Multi-Head Attention layer.
• Feed-Forward Network (FFN).
• Residual connections and layer normalization are applied around each sub-layer.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 55 / 60

Transformer Architecture: Decoder
• The decoder also consists of multiple identical layers.
• Each layer contains:
• Masked Multi-Head Attention layer (prevents attending to future tokens).
• Multi-Head Attention layer (over the encoder output).
• Feed-Forward Network (FFN).
• Residual connections and layer normalization are also applied.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 56 / 60

Positional Encoding
• Since Transformers process inputs in parallel, they lack inherent information about the position of
words in the sequence.
• Positional encodings are added to the input embeddings to provide this information.

• Positional encodings are calculated using sine and cosine functions:

pos
PE(pos,2i) = sin 2i/dmodel
10000pos
PE(pos,2i+1) = cos
100002i/dmodel
where:
• pos: Position in the sequence.
• i: Dimension.
• dmodel : Dimension of the embeddings.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 57 / 60

Applications of Transformers

• Natural Language Processing (NLP): Machine translation, text summarization, question

answering, text generation.
• Computer Vision: Image classification, object detection, image generation.
• Speech Recognition: Speech-to-text.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 58 / 60

Advantages of Transformers

• Parallel Computation: Enables faster training, especially on long sequences.

• Effective Long-Range Dependency Learning: Attention mechanisms allow the model to
capture relationships between distant words in a sequence.
• State-of-the-art Performance: Transformers have achieved state-of-the-art results on many
NLP and other sequence-based tasks.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 59 / 60

Conclusion

• RNNs are powerful tools for processing sequential data.

• LSTMs and GRUs address the vanishing gradient problem.
• They have numerous applications in various fields.
• Transformers have revolutionized sequence modeling by introducing attention mechanisms.
• Their ability to handle long-range dependencies and parallelize computations has led to significant
advancements in various fields.

Thien Huynh-The - HCMUTE Recurrent Neural Networks April 17, 2025 60 / 60

Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Ece Thesis NG Kuya Ko
No ratings yet
Ece Thesis NG Kuya Ko
61 pages
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
Lecture06 RNNtTransformer
No ratings yet
Lecture06 RNNtTransformer
60 pages
RNN Notes
No ratings yet
RNN Notes
36 pages
DL Unit 3-5
No ratings yet
DL Unit 3-5
44 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
RNN LSTM
No ratings yet
RNN LSTM
71 pages
Deep Learning - Question Bank
No ratings yet
Deep Learning - Question Bank
6 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
The Math Behind Recurrent Neural Networks
No ratings yet
The Math Behind Recurrent Neural Networks
39 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Slides On RNNs 26th March 2025
No ratings yet
Slides On RNNs 26th March 2025
30 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
Final Finaldoc
No ratings yet
Final Finaldoc
52 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
20 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
5 LSTM
No ratings yet
5 LSTM
4 pages
BERT A Review of Applications in Sentiment Analysis
No ratings yet
BERT A Review of Applications in Sentiment Analysis
10 pages
DL 4 Notes
No ratings yet
DL 4 Notes
34 pages
Op Jeeva1
No ratings yet
Op Jeeva1
36 pages
Traditional Neural Networks (TNNS) - Simple Explanation What Are Traditional Neural Networks?
No ratings yet
Traditional Neural Networks (TNNS) - Simple Explanation What Are Traditional Neural Networks?
25 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
28 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Python Deep Learning: Understand How Deep Neural Networks Work and Apply Them To Real-World Tasks 3rd Edition Vasilev Ebook All Chapters PDF
100% (6)
Python Deep Learning: Understand How Deep Neural Networks Work and Apply Them To Real-World Tasks 3rd Edition Vasilev Ebook All Chapters PDF
46 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
ANN Syllabus
No ratings yet
ANN Syllabus
5 pages
Unit 3
No ratings yet
Unit 3
41 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
26 pages
Module 5
No ratings yet
Module 5
21 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Tianzheng Troy Wang CIS498EAS499 Submission
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
51 pages
Seminar Report - Transformer Model
No ratings yet
Seminar Report - Transformer Model
16 pages
Real-Time Prediction For Bitcoin
No ratings yet
Real-Time Prediction For Bitcoin
14 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
Module 7 RNN
No ratings yet
Module 7 RNN
12 pages
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Unit 2
No ratings yet
Unit 2
34 pages
Thesis Presentation2
No ratings yet
Thesis Presentation2
43 pages
Nria20-Dl - Unit-4 Notes-Final
No ratings yet
Nria20-Dl - Unit-4 Notes-Final
21 pages
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
No ratings yet
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
14 pages
Unit 4
No ratings yet
Unit 4
13 pages
A Review of Generative Adversarial Networks GANs and Its Applications in A Wide Variety of Disciplines From Medical To Remote Sensing
No ratings yet
A Review of Generative Adversarial Networks GANs and Its Applications in A Wide Variety of Disciplines From Medical To Remote Sensing
28 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Introduction To Recurrent Neural Networks
No ratings yet
Introduction To Recurrent Neural Networks
15 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
RNN
No ratings yet
RNN
23 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Program: B.Tech, CSE, 6 Sem, 3 Year CS 601: Machine Learning Unit-4 Machine Learning: RNN in ML
No ratings yet
Program: B.Tech, CSE, 6 Sem, 3 Year CS 601: Machine Learning Unit-4 Machine Learning: RNN in ML
24 pages
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
DL Unit4
No ratings yet
DL Unit4
20 pages
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
No ratings yet
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
24 pages
RNN SK
No ratings yet
RNN SK
17 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Recurrent Neural Networks: Index
No ratings yet
Recurrent Neural Networks: Index
13 pages
Using News Titles and Financial Features To Predict Intraday Movements of The DJIA
No ratings yet
Using News Titles and Financial Features To Predict Intraday Movements of The DJIA
9 pages
26 Augmented Hill-Climb Increases Reinforcement Learning Efficiency For Language-Based de Novo Molecule Generation.
No ratings yet
26 Augmented Hill-Climb Increases Reinforcement Learning Efficiency For Language-Based de Novo Molecule Generation.
22 pages
UNIT5
No ratings yet
UNIT5
13 pages
Online and Linear-Time Attention by Enforcing Monotonic Alignments
No ratings yet
Online and Linear-Time Attention by Enforcing Monotonic Alignments
19 pages
Electronics-10-02453-V2
No ratings yet
Electronics-10-02453-V2
25 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Recurrent Neural Network Jeeva
No ratings yet
Recurrent Neural Network Jeeva
10 pages
10 - Recurrent Neural Network Based Speech Emotion
No ratings yet
10 - Recurrent Neural Network Based Speech Emotion
13 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Prediction of Depression Severity Based On The Prosodic and Semantic Features With Bidirectional LSTM and Time Distributed CNN
No ratings yet
Prediction of Depression Severity Based On The Prosodic and Semantic Features With Bidirectional LSTM and Time Distributed CNN
15 pages
An Approach For Rainfall Prediction Using Long Short Term Memory Neural Network
No ratings yet
An Approach For Rainfall Prediction Using Long Short Term Memory Neural Network
6 pages
Phase 3 Project
No ratings yet
Phase 3 Project
6 pages
Pix2code Generating Code From A Graphical User Int
No ratings yet
Pix2code Generating Code From A Graphical User Int
8 pages
Week 4
No ratings yet
Week 4
5 pages
8.5 Recurrent Neural Networks
No ratings yet
8.5 Recurrent Neural Networks
5 pages
Enhancing Academic Resource Evaluation in Computer Science and Engineering Through Automated Assessment
No ratings yet
Enhancing Academic Resource Evaluation in Computer Science and Engineering Through Automated Assessment
4 pages