06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)

Uploaded by

Hoàng Đăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views44 pages

06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)

Uploaded by

Hoàng Đăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Class 06

Deep Learning for Text Data

(LSTM Seq2Seq Models)
Dr Tran Anh Tuan
Department of Math & Computer Sciences
University of Science, HCMC

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

1
University of Science HCMC
Contents
• The Problem of Long-Term Dependencies
• Long Short Term Memory networks(LSTM)
• LSTM Usecase
• Variants on Long Short Term Memory
• GRU (Gated Recurrent Unit)
• Bidirectional RNN
• RNN and LSTM application
• Processing text data
• Seq2Seq Model
• What is Encoder Decoder Architecture ?
• Encoder - Decoder

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

2
University of Science HCMC
The Problem of Long-Term Dependencies
• The RNN will be a function with inputs xt (input vector) and previous state ht−1 .
The new state will be ht. The recurrent function, fW , will be fixed after training
and used to every time step.
• Recurrent Neural Networks are the best model for regression, because it take into
account past values.

Recurrent Neural Networks have loops

The Problem of Long-Term Dependencies
• Implementing Vanilla RNN

• Observe that in our case of RNN we are now more interested on the next state, ht
not exactly the output, yt
The Problem of Long-Term Dependencies
• Sometimes, we only need to look at recent information to perform the present task. For example,
consider a language model trying to predict the next word based on the previous ones. If we are
trying to predict the last word in “the clouds are in the sky,” we don’t need any further context –
it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the
relevant information and the place that it’s needed is small, RNNs can learn to use the past
information.
The Problem of Long-Term Dependencies
• But there are also cases where we need more context. Consider trying to predict the last word in
the text “I grew up in France… I speak fluent French.” Recent information suggests that the next
word is probably the name of a language, but if we want to narrow down which language, we
need the context of France, from further back. It’s entirely possible for the gap between the
relevant information and the point where it is needed to become very large.
• Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.
Long Short Term Memory networks(LSTM)
• LSTM provides a different recurrent formula fW , it's more powefull than vanilla
RNN, due to it's complex fW that add "residual information" to the next state
instead of just transforming each state. Imagine LSTM are the "residual" version
of RNNs.
• In other words LSTM suffer much less from vanishing gradients than normal
RNNs. Remember that the plus gates distribute the gradients.
• So by suffering less from vanishing gradients, the LSTMs can remember much
more in the past. So from now just use LSTMs when you think about RNN.
• Also in other words LSTM are better to remember long term dependencies.
Long Short Term Memory networks(LSTM)
• The vanishing problem can be solved
with LSTM, but another problem that
can happen with all recurrent neural
network is the exploding gradient
problem.
• To fix the exploding gradient problem,
people normally do a gradient clipping,
that will allow only a maximum
gradient value.
• This highway for the gradients is called
Cell-State, so one difference compared
to the RNN that has only the state
flowing, on LSTM we have states and
the cell state.
Long Short Term Memory networks(LSTM)
• In the above diagram, each line carries an entire vector, from the output of one
node to the inputs of others. The pink circles represent pointwise operations, like
vector addition, while the yellow boxes are learned neural network layers. Lines
merging denote concatenation, while a line forking denote its content being
copied and the copies going to different locations.
Long Short Term Memory networks(LSTM)
• The key to LSTMs is the cell state, the horizontal line running through the top of
the diagram.
• The cell state is kind of like a conveyor belt. It runs straight down the entire chain,
with only some minor linear interactions. It’s very easy for information to just
flow along it unchanged.
Long Short Term Memory networks(LSTM)
• LSTM Gate
• Doing a zoom on the
LSTM gate. This also
improves how to do the
backpropagation.
Variants on Long Short Term Memory
• In fact, it seems like almost every paper involving LSTMs uses a slightly different version. The
differences are minor, but it’s worth mentioning some of them.
• One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole
connections.” This means that we let the gate layers look at the cell state.
• The above diagram adds peepholes to all the gates, but many papers will give some peepholes
and not others
Variants on Long Short Term Memory
• Another variation is to use coupled forget and input gates. Instead of separately
deciding what to forget and what we should add new information to, we make
those decisions together. We only forget when we’re going to input something in
its place. We only input new values to the state when we forget something older.
GRU (Gated Recurrent Unit)
• The Gru cells can be considered as a variant of the LSTM (Also want's to fight
vanishing gradients) cell, but more computational efficient. On this cell the forget
and input gates are merged (update gate).
Bidirectional RNN
• Bidirectional recurrent neural networks(RNN) are really just putting two independent RNNs
together. The input sequence is fed in normal time order for one network, and in reverse time
order for another. The outputs of the two networks are usually concatenated at each time step,
though there are other options, e.g. summation.
• This structure allows the networks to have both backward and forward information about the
sequence at every time step. The concept seems easy enough. But when it comes to actually
implementing a neural network which utilizes bidirectional structure, confusion arises…
Bidirectional RNN
• Bidirectional recurrent neural
networks(RNN) are really just
putting two independent RNNs
together. The input sequence is
fed in normal time order for one
network, and in reverse time
order for another. The outputs of
the two networks are usually
concatenated at each time step,
though there are other options,
e.g. summation.
Bidirectional RNN
Bidirectional RNN
Bidirectional RNN
RNN and LSTM application
RNN and LSTM application
RNN and LSTM application
RNN and LSTM application
Processing text data

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

32
University of Science HCMC
Processing text data

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

33
University of Science HCMC
Processing
text data

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

34
University of Science HCMC
Processing text data
Word level processing (using embedding):
• In this method, we do the same steps as the first method, but here instead
of make a dictionary of characters, we make a dictionary of the words used
in the text we want to process or sometimes we use the most frequent
10,000 words of the text’s language.
• To make it easy to understand what we are going to do, we will :
1.Convert text to lowercase
2.Clean data from digits and punctuation .
3.append ‘SOS’ and ‘EOS’ to the target data:
4. Make dictionaries to convert words to indexed numbers .
5. Use embedding layer to convert each word to a fixed length vector .
Now, the data is ready to be used by seq2seq network.

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

35
University of Science HCMC
Seq2Seq Model
• We will use an architecture called (seq2seq) or ( Encoder Decoder), It
is appropriate in our case where the length of the input sequence (
English sentences in our case) does not has the same length as the
output data ( French sentences in our case)

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

36
University of Science HCMC
What is Encoder Decoder Architecture ?
1. Encoder:
• The encoder simply takes the input data, and train on it then it passes the last state of its
recurrent layer as an initial state to the first recurrent layer of the decoder part.
2. Decoder :
• The decoder takes the last state of encoder’s last recurrent layer and uses it as an initial state to
its first recurrent layer , the input of the decoder is the sequences that we want to get ( in our
case French sentences).

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

37
University of Science HCMC
What is Encoder Decoder
Architecture ?
• Some other images explaining
the encoder decoder:

Generative Model Chatbots

Dr Tran Anh Tuan - Department of Math & Computer Sciences -
38
University of Science HCMC
Encoder
• The encoder is made up of :
1.Input Layer : Takes the English sentence and pass it to the embedding layer.
2.Embedding Layer : Takes the English sentence and convert each word to fixed size
vector
3.First LSTM Layer : Every time step, it takes a vector that represents a word and
pass its output to the next layer, We used CuDNNLSTM layer instead of LSTM
because it’s much much faster.
4.Second LSTM Layer : It does the same thing as the previous layer, but instead of
passing its output, it passes its states to the first LSTM layer of the decoder .

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

39
University of Science HCMC
Encoder

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

40
University of Science HCMC
Decoder
• The decoder is made up of :
1.Input Layer : Takes the French sentence and pass it to the embedding layer.
2.Embedding Layer : Takes the French sentence and convert each word to fixed size
vector
3.First LSTM Layer : Every time step, it takes a vector that represents a word and
pass its output to the next layer, but here in the decoder, we initialize the state of
this layer to be the last state of the last LSTM layer from the decoder .
4.Second LSTM Layer : Processing the output from the previous layer and passes its
output to a dense layer .
5.Dense Layer (Output Layer) : Takes the output from the previous layer and
outputs a one hot vector representing the target French word
Dr Tran Anh Tuan - Department of Math & Computer Sciences -
41
University of Science HCMC
Decoder

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

42
University of Science HCMC
Reference
• https://towardsdatascience.com/nlp-sequence-to-sequence-
networks-part-1-processing-text-data-d141a5643b72
• https://towardsdatascience.com/nlp-sequence-to-sequence-
networks-part-2-seq2seq-model-encoderdecoder-model-
6c22e29fd7e1

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

43
University of Science HCMC
THANK YOU

Dr Tran Anh Tuan - Department of Math & Computer Sciences -

44
University of Science HCMC

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Exercise Sheet 6: Theoretical Computer Science (Bridging Course)
No ratings yet
Exercise Sheet 6: Theoretical Computer Science (Bridging Course)
3 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
Unit 3 - Part 02
No ratings yet
Unit 3 - Part 02
40 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
LSTM
No ratings yet
LSTM
19 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Unit 3
No ratings yet
Unit 3
8 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
23 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Unit5 3
No ratings yet
Unit5 3
48 pages
RNN LSTM
No ratings yet
RNN LSTM
42 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
DL For Sequencial Data
No ratings yet
DL For Sequencial Data
36 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
RNN 2
No ratings yet
RNN 2
144 pages
Deep Arch MSC 2024
No ratings yet
Deep Arch MSC 2024
83 pages
Unit 4
No ratings yet
Unit 4
27 pages
DL Module 5
No ratings yet
DL Module 5
10 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
15 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
Longshorttermmemorylstm 231215171600 1feb7b1b
No ratings yet
Longshorttermmemorylstm 231215171600 1feb7b1b
17 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Long Short-Term Memory-Networks For Machine Reading
No ratings yet
Long Short-Term Memory-Networks For Machine Reading
11 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
RNNs and Their Types - Simple Explanation
No ratings yet
RNNs and Their Types - Simple Explanation
5 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
Unit 4
No ratings yet
Unit 4
50 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
Deep Learning (MODULE-5)
No ratings yet
Deep Learning (MODULE-5)
71 pages
Final PDL - Unit IV
No ratings yet
Final PDL - Unit IV
51 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
LSTM
No ratings yet
LSTM
12 pages
Lecture 8 RNN LSTMs W Annotations
No ratings yet
Lecture 8 RNN LSTMs W Annotations
22 pages
Long Short-Term Memory Networks PDF
No ratings yet
Long Short-Term Memory Networks PDF
22 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
33 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Tech Doc 2 (Repaired)
No ratings yet
Tech Doc 2 (Repaired)
22 pages
Colah Github Io Posts 2015 08 Understanding LSTMs
No ratings yet
Colah Github Io Posts 2015 08 Understanding LSTMs
16 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Day 4
No ratings yet
Day 4
22 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
Context Based
No ratings yet
Context Based
10 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
36 pages
Lecture14 RNN Intro
No ratings yet
Lecture14 RNN Intro
22 pages
DL Half TechKnowledge
No ratings yet
DL Half TechKnowledge
50 pages
Troanary Photonic Storage Blueprint - How Light Based Logic can Redefine Computation and Data Storage
From Everand
Troanary Photonic Storage Blueprint - How Light Based Logic can Redefine Computation and Data Storage
Ylia Callan
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Theory - Computation
No ratings yet
Theory - Computation
97 pages
Question Bank On Artificial Neural Networks
No ratings yet
Question Bank On Artificial Neural Networks
2 pages
Keras and Tensorflow
No ratings yet
Keras and Tensorflow
11 pages
Lect8 DNN
No ratings yet
Lect8 DNN
33 pages
Chapter 7. Software Application
No ratings yet
Chapter 7. Software Application
43 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
6 pages
Lec6 Video Understanding
No ratings yet
Lec6 Video Understanding
33 pages
Questionário Linguística
No ratings yet
Questionário Linguística
136 pages
Mô Hình Hóa Toán Học (Co2011)
No ratings yet
Mô Hình Hóa Toán Học (Co2011)
10 pages
Assignment 01
No ratings yet
Assignment 01
3 pages
Cosc 340
No ratings yet
Cosc 340
3 pages
Bab I Pendahuluan: 1.2 Pengumpulan Data
No ratings yet
Bab I Pendahuluan: 1.2 Pengumpulan Data
6 pages
Class 5 - Deep Dive Into AI
No ratings yet
Class 5 - Deep Dive Into AI
32 pages
Ch5 - Review Question
No ratings yet
Ch5 - Review Question
3 pages
Fundamentals of Neural Networks
No ratings yet
Fundamentals of Neural Networks
24 pages
Autoencoder Transformer
No ratings yet
Autoencoder Transformer
2 pages
شبكات عصبية ٢
No ratings yet
شبكات عصبية ٢
6 pages
Simulation and Modelling - Cat Two
No ratings yet
Simulation and Modelling - Cat Two
3 pages
Analisis Curah Hujan Astanajapura
No ratings yet
Analisis Curah Hujan Astanajapura
80 pages
Purple Gradient Artificial Intelligence Presentation
No ratings yet
Purple Gradient Artificial Intelligence Presentation
9 pages
تقدير دالة الطلب على الواردات في السودان خلال الفترة (1998- 2017)
No ratings yet
تقدير دالة الطلب على الواردات في السودان خلال الفترة (1998- 2017)
15 pages
Deep Learning With Python Mini Course
No ratings yet
Deep Learning With Python Mini Course
26 pages
Ann-Unit Ii
No ratings yet
Ann-Unit Ii
21 pages
One Dimensions Random Variables PDF
No ratings yet
One Dimensions Random Variables PDF
99 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Mamba
No ratings yet
Mamba
3 pages
ANN - Session 3 CO 1
No ratings yet
ANN - Session 3 CO 1
8 pages
Cumulative Distribution Function & Expectation: Muhammed Haris
No ratings yet
Cumulative Distribution Function & Expectation: Muhammed Haris
12 pages
1 Sessional Question
No ratings yet
1 Sessional Question
1 page