0% found this document useful (0 votes)

22 views46 pages

21CSE356T-NLP-Unit 4.1

The document provides an overview of Natural Language Processing (NLP) with a focus on language models, particularly Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. It explains the architecture and functioning of these models, their advantages and disadvantages, and the challenges they face, such as the vanishing gradient problem. Additionally, it covers the mechanisms of LSTM, including its gates for selective memory management, making it suitable for sequence prediction tasks.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views46 pages

21CSE356T-NLP-Unit 4.1

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

21CSE356T– NATURAL

LANGUAGE PROCESSING

Instructor:
Ms. S. Rama,
Assistant Professor
Department of Information Technology,
SRM Institute of Science and Technology,
Unit IV- (Language Models)
• Recurrent Neural Network
• Long Short Term Memory
• Attention Mechanism
• Transformer Based Models
• Self attention
• Multihead attention
• BERT
• RoBERTa
• Fine Tuning For down streaming tasks
• Text Classification and Generation
Language Model
• What is a Language Model?
• A Language Model (LM) is a computational model designed to
understand, generate, and predict human language. It is a
fundamental concept in Natural Language Processing (NLP) and
serves as the backbone for many AI applications, including chatbots,
machine translation, text summarization, and speech recognition.
Types of Language Models
• Language models can be categorized based on their architecture and
how they process text:
• Statistical Language Models (Before Deep Learning Era)
• These models use probabilities to predict the next word in a sentence
based on previous words.
• Examples:
• N-gram models (bigram, trigram)
• Hidden Markov Models (HMM)
• Latent Dirichlet Allocation (LDA) for topic modeling
Neural Language Models (Deep
Learning Era)
• These models use deep learning techniques to learn complex patterns
in text.
• Examples:
• Recurrent Neural Networks (RNNs) – Captures sequential dependencies.
• Long Short-Term Memory (LSTM) – Handles long-range dependencies better
than RNNs.
• Gated Recurrent Units (GRU) – A simplified version of LSTMs.
• Transformers (State-of-the-Art) – Used in modern NLP applications.
Different Ways to Model Text
• A Classic Approach for Text Classification: Bag-of-Words Model
Recurrent Neural Networks (RNNs)
Recurrent Neural Networks
• Like traditional neural networks, such as feedforward neural networks
and convolutional neural networks (CNNs), recurrent neural networks
use training data to learn. They are distinguished by their “memory” as
they take information from prior inputs to influence the current input
and output.

• While traditional deep learning networks assume that inputs and

outputs are independent of each other, the output of recurrent neural
networks depend on the prior elements within the sequence. While
future events would also be helpful in determining the output of a given
sequence, unidirectional recurrent neural networks cannot account for
these events in their predictions.

• A Recurrent Neural Network (RNN) is a type of neural network that processes sequential
data by maintaining a memory of previous inputs
Overview
Networks we
used previously:
also called
feedforward
neural networks

•Input Layer – Accepts sequential data.

•Hidden Layer – Maintains memory across
time steps.
•Output Layer – Produces predictions.

usually want to
predict a vector at
some time steps
Different Types of Sequence Modeling Tasks

Many-to-one: The input data is a sequence, but

the output is a fixed size vector, not a
sequence.

Ex.: sentiment analysis, the input is some text,

and the output is a class label

One-to-many: Input data is in a standard format

(not a sequence), the output is a sequence.

Ex.: Image captioning, where the input is an

image, the output is a text description of that
image
Different Types of Sequence Modeling Tasks
Many-to-many: Both inputs
and outputs are sequences.
Can be direct or delayed.

Ex.: Video-captioning, i.e.,

describing a sequence of
images via text (direct).
Translating one language
into another (delayed)
Likewise – levels of output
based on type
Unfolded RNN
Recurrent Neural Network

We can process a sequence of vectors x by applying a

recurrence formula at every time step:

Notice: the same function and the

same set of parameters are used at •H- Hidden state
Whyis the output weight matrix,
Output is usually a softmax (for classification) or
every time step. •xt = Input at time t
another activation function suited to the task.
•W = Weight matrices
•f = f is a non-linear activation
function – ReLU, tanh
What is Unrolling in Time?

• Sequential Representation:
In an RNN, the same network cell is applied at every time step, processing
one element of the sequence at a time. However, because each time step's
computation depends on the previous ones (via the hidden state), the
network inherently forms a loop. Unrolling in time means "unfolding" this
loop so that each time step is represented as a separate layer in a deep
feedforward network.
• Temporal Layers:Imagine you have a sequence with T time steps. Unrolling
the RNN creates TTT copies of the network cell arranged sequentially.
Although these cells share the same weights, they are shown as separate
layers corresponding to time steps t=1,2,… T
Why Unroll the RNN?
• Visualization: It makes it easier to understand how the hidden state
propagates through time and how each time step contributes to the
final output.
• Training with Backpropagation Through Time (BPTT):Unrolling allows
us to apply a variant of backpropagation known as Backpropagation

• By viewing the RNN as a deep network with 𝑇 layers, we can

Through Time.

calculate gradients at each time step and propagate errors backward

through the sequence. This process is critical for updating the weights
shared across all time steps.
How actually?
How Unrolling Works

• Time Step 1: The RNN cell processes the first input 𝑥1 and computes the first hidden state
Forward Pass:

• Time Step 2: The cell takes 𝑥2 and ℎ1 to compute ℎ2 and so on.

ℎ1

• Result: You end up with a chain of hidden states ℎ1,ℎ2,…,ℎ𝑇 and possibly outputs 𝑦1, 𝑦2,
…,𝑦𝑇
• Backward Pass:
• Error Propagation: When training, the error from the output is propagated backward
through each of these unrolled steps. This allows the network to adjust its weights based
on the entire sequence context.
• Shared Weights: Despite the unrolled structure, the weight matrices remain the same
across all time steps. Gradients computed at each step are accumulated for the shared
parameters.
Backpropagation and Vanishing Gradient
Loss in RNN
• Loss
• Definition:
In machine learning, the loss (or cost) function quantifies the
difference between the model's predicted output and the actual
target values. It measures how "wrong" the predictions are.
• In RNNs:
Since RNNs handle sequential data, the loss is often computed at each
time step and then aggregated (for example, by summing or
averaging) over the entire sequence. This aggregated loss then guides
how the model should adjust its parameters during training.
Backpropagation in RNNs

• Definition:
Backpropagation is the algorithm used to compute the gradient of the loss function with
respect to each weight in the network, allowing for weight updates that minimize the loss.
• Backpropagation Through Time (BPTT):
• Process:
In RNNs, the standard backpropagation algorithm is extended to account for the sequential nature of the
data. The network is "unrolled" over time, creating a copy of the network for each time step. The
gradients are then computed for each of these time steps and aggregated.
• Challenges:
The process can suffer from the vanishing gradient problem because the gradients from later time steps
(which might carry important long-term dependencies) become exponentially smaller as they are
propagated back to earlier time steps.
• Significance:
BPTT is essential for training RNNs effectively, but its challenges have led to the development of
alternative architectures and methods to better capture long-range dependencies in sequences.
Vanishing Gradient
• It occurs when gradients—the values used to update the network's weights—become
exceedingly small as they are propagated backward through the network during
training. Here's a detailed breakdown:
• What Happens During Backpropagation
• Chain Rule Multiplication:
In deep networks or RNNs, the backpropagation algorithm relies on the chain rule to
compute gradients. This involves multiplying the derivatives of the activation
functions across many layers or time steps.
• Exponential Decay of Gradients:
If these derivatives are less than one, as is common with activation functions like
sigmoid or tanh, repeated multiplication can cause the gradient to shrink
exponentially. This results in very small gradient values for earlier layers or time steps.
Consequences of Vanishing
Gradients

• Slow Learning:
When gradients become very small, the corresponding weights receive
almost no update during training. This makes it extremely difficult for
the network to learn from the data, especially the information relevant
to earlier layers or earlier parts of the sequence.
• Difficulty Capturing Long-Term Dependencies:
In the context of RNNs, vanishing gradients hinder the network’s ability
to capture long-term dependencies. The influence of an input from
many time steps ago diminishes rapidly, leading the model to "forget"
important contextual information.
RNN
RNN is basically a blackbox, where it has an “internal state” that is updated as
a sequence is processed. At every single timestep, we feed in an input vector
into RNN where it modifies that state as a function of what it receives.

When we tune RNN weights, RNN will show different behaviors in terms of
how its state evolves as it receives these inputs. We are also interested in
producing an output based on the RNN state, so we can produce these
output vectors on top of the RNN
If we unroll an RNN model, then there are inputs (e.g. video frame) at
different timesteps x1,x2,….xt.
RNN at each timestep takes in two inputs – an input frame ( xi) and previous
representation of what it seems so far (i.e. history) – to generate an output
and update its history, which will get forward propagated over time. All the
RNN blocks i are the same block that share the same parameter, but have
different inputs and history at each timestep.
Advantages
• An RNN remembers each and every piece of information through time. It is
useful in time series prediction only because of the feature to remember
previous inputs as well. This is called Long short term memory.
• Recurrent neural networks are even used with convolutional layers to extend
the effective pixel neighborhood.
Disadvantages
• Vanishing and exploding gradient problems.
• Training an RNN is a very difficult task.
• It cannot process very long sequences if using Tanh or Relu as an activation
function.
LSTM (Long Short-Term Memory)
Sarcasm,
dependency??
LSTM (Long Short-Term Memory)
• What is?
• LSTM is a recurrent neural network (RNN) architecture widely
used in Deep Learning. It excels at capturing long-term
dependencies, making it ideal for sequence prediction
tasks.
• Unlike traditional neural networks, LSTM incorporates feedback
connections, allowing it to process entire sequences of data, not
just individual data points. This makes it highly effective in
understanding and predicting patterns in sequential data like
time series, text, and speech.
• LSTM has become a powerful tool in artificial intelligence and
deep learning, enabling breakthroughs in various fields by
uncovering valuable insights from sequential data
Selective Read, Selective Write,
Selective Forget-
• Selective Write - we select what to write
• Selective read – we select what to read
• Selective Forget - we select what to forget
LSTM Architecture
• At a high level, LSTM works very much like an RNN cell. Here is the
internal functioning of the LSTM network. The LSTM network
architecture consists of three parts, as shown in the image below, and
each part performs an individual function.

• The first part chooses whether the information coming from the previous
timestamp is to be remembered or is irrelevant and can be forgotten.
• In the second part, the cell tries to learn new information from the input to this
cell.
• At last, in the third part, the cell passes the updated information from the current
timestamp to the next timestamp.
LONG SHORT TERM MEMEORY
• These three parts of an LSTM unit are known as gates. They
control the flow of information in and out of the memory
cell or LSTM cell.
• The first gate is called Forget gate, the second gate is
known as the Input gate, and the last one is the Output
gate.
• An LSTM unit that consists of these three gates and a
memory cell or LSTM cell can be considered as a layer
of neurons in traditional feedforward neural
network, with each neuron having a hidden layer and
a current state.
LSTM
• LSTM also has a hidden state where H(t-1)
represents the hidden state of the previous
timestamp and Ht is the hidden state of the
current timestamp. In addition to that, LSTM also
has a cell state represented by C(t-1) and C(t)
for the previous and current timestamps,
respectively.
• Here the hidden state is known as Short
term memory, and the cell state is known
as Long term memory.
The Logic Behind LSTM
Example of LTSM Working : Input : A is a nice person. But B is evil
• Let’s take an example to understand how LSTM works. Here we have
two sentences separated by a full stop. The first sentence is “A is a
nice person,” and the second sentence is “B, on the Other hand, is
evil”. It is very clear, in the first sentence, we are talking about A, and
as soon as we encounter the full stop(.), we started talking about B.
As we move from the first
sentence to the second
sentence, our network should
realize that we are no more
talking about A. Now our
subject is B. Here, the Forget
gate of the network allows it to
forget about it. Let’s
understand the roles played by
these gates in LSTM
Forget Gate
• The information that is no longer useful in the cell state is removed with the forget gate. Two
inputs xt (input at the particular time) and ht-1 (previous cell output) are fed to the gate and
multiplied with weight matrices followed by the addition of bias.
• The resultant is passed through an activation function which gives a binary output. If for a
particular cell state the output is 0, the piece of information is forgotten and for output 1,
the information is retained for future use.
The equation for the forget gate is:
ft=σ(Wf⋅[ht−1,xt]+bf)
where:
•Wf represents the weight matrix associated with the forget gate.

•[ht-1, xt] denotes the concatenation of the current input and the
previous hidden state.

•bf is the bias with the forget gate.

•σ is the sigmoid activation function.

Input gate

The addition of useful information to the cell state is done by the input gate. First, the information
is regulated using the sigmoid function and filter the values to be remembered similar to the
forget gate using inputs ht-1 and xt. . Then, a vector is created using tanh function that gives an
output from -1 to +1, which contains all the possible values from h t-1 and xt. At last, the values of
the vector and the regulated values are multiplied to obtain the useful information. The equation
for the input gate is:
it=σ(Wi⋅[ht−1,xt]+bi)
Ct=tanh(Wc⋅[ht−1,xt]+bc)
We multiply the previous state by ft, disregarding the
information we had previously chosen to ignore. Next, we
include it∗Ct. This represents the updated candidate values,
adjusted for the amount that we chose to update each state
value.
Ct=ft⊙Ct−1+it⊙C^t
where
• ⊙ denotes element-wise multiplication

•tanh is tanh activation function

Output gate
The task of extracting useful information from the current cell
state to be presented as output is done by the output gate.
• First, a vector is generated by applying tanh function on
the cell.
• Then, the information is regulated using the sigmoid
function and filter by the values to be remembered using
inputs ht−1and xt.
• At last, the values of the vector and the regulated values
are multiplied to be sent as an output and input to the next
cell.
The equation for the output gate is:
Bidirectional LSTM Model
Bidirectional LSTM (Bi LSTM/ BLSTM) is a variation of normal LSTM
which processes sequential data in both forward and backward
directions. This allows Bi LSTM to learn longer-range dependencies
in sequential data than traditional LSTMs which can only process
sequential data in one direction.
•Bi LSTMs are made up of two LSTM networks one that processes
the input sequence in the forward direction and one that processes
the input sequence in the backward direction.
•The outputs of the two LSTM networks are then combined to
produce the final output.
LSTM models including Bi LSTMs have demonstrated state-of-the-art
performance across various tasks such as machine translation, speech
recognition and text summarization.
• Applications of LSTM

Some of the famous applications of LSTM includes:

• Language Modeling: Used in tasks like language modeling, machine translation
and text summarization. These networks learn the dependencies between words in
a sentence to generate coherent and grammatically correct sentences.
• Speech Recognition: Used in transcribing speech to text and recognizing spoken
commands. By learning speech patterns they can match spoken words to
corresponding text.
• Time Series Forecasting: Used for predicting stock prices, weather and energy
consumption. They learn patterns in time series data to predict future events.
• Anomaly Detection: Used for detecting fraud or network intrusions. These
networks can identify patterns in data that deviate drastically and flag them as
potential anomalies.
• Recommender Systems: In recommendation tasks like suggesting movies,
music and books. They learn user behavior patterns to provide personalized
suggestions.
LTSM vs RNN
Feature LSTM (Long Short-term Memory) RNN (Recurrent Neural Network)
Has a special memory unit that allows it to
Memory learn long-term dependencies in sequential Does not have a memory unit
data

Directionality Can be trained to process sequential data in Can only be trained to process
both forward and backward directions sequential data in one direction

Training More difficult to train than RNN due to the Easier to train than LSTM
complexity of the gates and memory unit
Natural language processing,
Machine translation, speech recognition, text machine translation, speech
Applications summarization, natural language processing, recognition, image processing,
time series forecasting
video processing
Attention Mechanism
Motivation
Recurrent Neural Networks (LSTM/GRU) are the model of choice when working
with variable-length inputs and are thus a natural fit to operate on text
processing .But:
• the sequential nature of RNNs prohibits parallelization,
• the context is computed from past only,
• there is no explicit distinction between short- and long-range dependencies
(everything is dealt with via the context),
• training is tricky, how can we do you do efficiently transfer learning?
On the other hand, Convolution can
• operate on both time-series (1D convolution), and images,
• be massively parallelized,
• exploit local dependencies (within the kernel) and long-range dependencies
(using multiple layers),
but:
• we can’t deal with variable-size inputs,
• the position of these dependencies is fixed (see below).
Attention
• An attention model is a mechanism used in neural networks that dynamically
focuses on the most relevant parts of the input data when making
predictions or generating outputs.
• Instead of processing all parts of the input equally, the model assigns different
weights to different elements, allowing it to "attend" more to critical features.
This is especially useful in tasks like machine translation, text summarization,
and image recognition.
• Key Points:
• Dynamic Focus: Traditional models might compress input into a fixed-length
representation, but attention mechanisms evaluate and weigh each input
element based on its relevance to the current task.
• Applications: Originally introduced for neural machine translation (e.g.,
Bahdanau attention), attention mechanisms now play a central role in many
state-of-the-art architectures such as Transformers.
•In Sentence 1, the attention mechanism assigns more weight to "street" when resolving "it."
•In Sentence 2, the attention mechanism assigns more weight to "animal" when resolving "it."
Transformer has no recurrence
• Transformer is an attention mechanism to learn contextual
relations between words– It includes two mechanisms
1. An Encoder that reads text input
2. A Decoder that produces a prediction for the task
3. Model has no recurrence – Self-attention to represent
input/output without RNN
4. Allows more parallelism – High translation quality: state-of-
the-art
5. Training 12 hours on eight P100 GPUs

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Bio-Stats Step 3
100% (6)
Bio-Stats Step 3
9 pages
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
Letu Da Notes-Compiled
No ratings yet
Letu Da Notes-Compiled
438 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
DL 4 Notes
No ratings yet
DL 4 Notes
34 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
DS303 RNN LSTM
No ratings yet
DS303 RNN LSTM
16 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
Unit IV
No ratings yet
Unit IV
22 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
Module 5
No ratings yet
Module 5
21 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
No ratings yet
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
26 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
RNN
No ratings yet
RNN
23 pages
DL Notes
No ratings yet
DL Notes
35 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Convolutional Neural Networks (CNNS)
No ratings yet
Convolutional Neural Networks (CNNS)
10 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
28 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
Deep Arch MSC 2024
No ratings yet
Deep Arch MSC 2024
83 pages
NLP Unit-3A Notes
No ratings yet
NLP Unit-3A Notes
28 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
36 pages
Unit 5 RNN
No ratings yet
Unit 5 RNN
14 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Bianchi
No ratings yet
Bianchi
62 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
RNNs
No ratings yet
RNNs
22 pages
Lec 10
No ratings yet
Lec 10
37 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
DL 4
No ratings yet
DL 4
19 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
26 pages
Unit 4
No ratings yet
Unit 4
13 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Unit 4
No ratings yet
Unit 4
106 pages
21CSS303T Data Science Syllabus
No ratings yet
21CSS303T Data Science Syllabus
2 pages
21CSE356T-NLP - Unit 5
No ratings yet
21CSE356T-NLP - Unit 5
118 pages
NLP Unit-2 QB Updated
No ratings yet
NLP Unit-2 QB Updated
10 pages
Dsa Team 4 Project
No ratings yet
Dsa Team 4 Project
11 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
Assignment Top Sheet Department of Civil Engineering & Technology
No ratings yet
Assignment Top Sheet Department of Civil Engineering & Technology
3 pages
Brms Final
0% (1)
Brms Final
2 pages
Forest Succession 1
No ratings yet
Forest Succession 1
2 pages
The Kardashev Scale Measuring Civilizational Advancement
No ratings yet
The Kardashev Scale Measuring Civilizational Advancement
2 pages
Gravitation Revision Notes (JEE Mains)
No ratings yet
Gravitation Revision Notes (JEE Mains)
33 pages
Mos Cabin R1
100% (1)
Mos Cabin R1
13 pages
Lvsuysl Blikr DH Iysv) Píjsa RFKK Ifùk K¡ Fof'Kf"V: HKKJRH Ekud
100% (4)
Lvsuysl Blikr DH Iysv) Píjsa RFKK Ifùk K¡ Fof'Kf"V: HKKJRH Ekud
17 pages
Pac 6500-Sira 16 Atex 2362-00
No ratings yet
Pac 6500-Sira 16 Atex 2362-00
3 pages
Application of Linear Programming Techniques To Practical
100% (1)
Application of Linear Programming Techniques To Practical
13 pages
Wishup Interview Prep Naveen Complete
No ratings yet
Wishup Interview Prep Naveen Complete
4 pages
Snapdragon X POCO F7 KOL Narrative
No ratings yet
Snapdragon X POCO F7 KOL Narrative
6 pages
Recreational & Adventure 1
50% (2)
Recreational & Adventure 1
40 pages
Portable Accommodation Modules Guide Feb20
No ratings yet
Portable Accommodation Modules Guide Feb20
48 pages
1133010I Rev. 02
No ratings yet
1133010I Rev. 02
2 pages
AFM ER308 Afm Er308L
No ratings yet
AFM ER308 Afm Er308L
9 pages
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
No ratings yet
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
2 pages
Fortum Investor Presentation May 2019 0
No ratings yet
Fortum Investor Presentation May 2019 0
56 pages
Notes For Property Tut
No ratings yet
Notes For Property Tut
3 pages
China Plastic Chair in Furniture Suppliers, Plastic Chair in Furniture Manufacturers From China On
No ratings yet
China Plastic Chair in Furniture Suppliers, Plastic Chair in Furniture Manufacturers From China On
11 pages
Wave Properties
100% (1)
Wave Properties
2 pages
Research Reports
No ratings yet
Research Reports
11 pages
New Song
No ratings yet
New Song
8 pages
Energy Management SYSTEM Manual
No ratings yet
Energy Management SYSTEM Manual
34 pages
Lesson 1 Intro To Orgl Behavior
No ratings yet
Lesson 1 Intro To Orgl Behavior
19 pages
List of Imran Series by Ibn-e-Safi - Wikipedia
No ratings yet
List of Imran Series by Ibn-e-Safi - Wikipedia
25 pages
Syllabus 0413201 - Web Applications Development - 1st Sem 2024 - 2025
No ratings yet
Syllabus 0413201 - Web Applications Development - 1st Sem 2024 - 2025
6 pages
University of Cambridge International Examinations International General Certificate of Secondary Education
No ratings yet
University of Cambridge International Examinations International General Certificate of Secondary Education
20 pages
Data Engineer Requirment
No ratings yet
Data Engineer Requirment
2 pages

21CSE356T-NLP-Unit 4.1

Uploaded by

21CSE356T-NLP-Unit 4.1

Uploaded by

21CSE356T– NATURAL

• While traditional deep learning networks assume that inputs and

•Input Layer – Accepts sequential data.

Many-to-one: The input data is a sequence, but

Ex.: sentiment analysis, the input is some text,

One-to-many: Input data is in a standard format

Ex.: Image captioning, where the input is an

Ex.: Video-captioning, i.e.,

We can process a sequence of vectors x by applying a

Notice: the same function and the

• By viewing the RNN as a deep network with 𝑇 layers, we can

calculate gradients at each time step and propagate errors backward

• Time Step 2: The cell takes 𝑥2 and ℎ1 to compute ℎ2 and so on.

•bf is the bias with the forget gate.

•σ is the sigmoid activation function.

•tanh is tanh activation function

Some of the famous applications of LSTM includes:

You might also like