M5 Topic 1 - Encoder Decoder
M5 Topic 1 - Encoder Decoder
MODEL
Presented By:
• Involves taking an input sequence (of any length) and mapping it to an output sequence (can be of a different
length).
• Challenge: the model must understand the meaning of the entire input sequence before generating the correct
output.
• Examples:
a. Machine Translation: “I love cats” “J’aime less chats”
b. Speech Recognition: Audio waveform “Hello, how are you?”
c. Text Summarization: Long document Short summary
d. Question Answering: “Who discovered gravity?” “Isaac Newton”
e. Chatbots: “How are you?” “I’m doing well, thanks!”
• Relied on manually written grammatical rules and dictionaries for translating text.
• Example: If translating English to French, “I eat an apple” “Je mange une pomme”, this required predefined
grammar rules.
• Disadvantage: Needed separate rules for different languages or for every sentence structure.
• Problem: Could not handle new or complex sentences if their grammatical rules were not predefined.
• Proposed an end-to-end neural network approach for Seq2Seq tasks with minimal assumptions.
• Used multilayered LSTMS: one to encode input into a fixed-dimensional vector and another to decode it.
• Achieved a BLEU score of 34.8 on the WMT’14 English-French dataset, outperforming Statistical Machine Translation
(33.3).
BLEU (Bilingual Evaluation Understudy) score: a metric to evaluate machine-generated translations by comparing them
with human translations.
Higher BLEU score = better translation quality.
What is Encoder-Decoder Model?
• In the seq2seq model, the encoder and the decoder architecture converts input sequences into output sequences.
• According to Daniel Jurafsky & James H. Martin in their book “Speech and Language Processing: An
Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”:
The key idea underlying these networks is the use of an encoder network that takes an input sequence and creates a
contextualized representation of it, often called the context. This representation is then passed to a decoder which generates a
tasks specific output sequence.
Our goal is to remember the stock value on Day 1 to accurately predict the value on Day 5.
Understanding LSTM: A Step-by-Step Example
At each step, the LSTM model goes through three key stages:
Stage 1: Forget Gate – Determines what percentage of the long-term memory should be retained.
Stage 2: Input Gate - Creates a potential long-term memory and decides how much of it should be added to
the existing long-term memory.
Stage 3: Output Gate – Updates the short-term memory by starting with the new long-term memory and
determining how much of it should be passed on to the next step.
Encoder-Decoder Architecture
1. Encoder:
• Takes an input sequence (e.g. a sentence in English) and processes it using layers like RNNs, LSTMs, etc.
• Converts the input into a fixed-length representation called a context vector. This vector captures the meaning
of the entire input sequence.
2. Context Vector:
• This is the compressed form of the input sequence.
• Contains the ‘context’ or meaning of the input sequence.
3. Decoder:
• Takes the context vector and generates the output sequence one step at a time.
• At each step, it predicts the next token (word/character) using previous outputs and the context vector.
• Continues until it generates the full output sequence.
5. Convert Scores to
Probabilities (Softmax Layer) Probability
Words 𝑠𝑡 Word 𝑠𝑡
P(𝒚𝑡)
Apple 2.1 6. The Predicted Word
Apple 2.1 0.32 is used as Input for
Cat 1.5 the Next Time Step.
Cat 1.5 0.20
Dog 1.8
Dog 1.8 0.24 7. Repeat Until End-
Pizza 0.5
Pizza 0.5 0.10 of-Sentence (EOS)
Run 0.3 Token is Generated
Run 0.3 0.08
Training vs Testing - Teacher Forcing Rule
During Training:
• Instead of using the predicted word, we force the correct word
from the training data as input.
• This helps the model learn faster and prevents it from getting stuck
in errors.
Mathematically, in training:
During Testing:
• The decoder uses its own predicted words as inputs.
• Errors accumulate if a wrong word is predicted, leading to
poor output quality.
Advantages and Disadvantages of LSTM based Encoder-
decoder
ADVANTAGES DISADVANTAGES
• Better than regular RNNs
• Can Work with Different Input and • Fixed-size context vector – loss of
Output Lengths information
• The encoder compresses the input • Slow for Long Sequences
into a fixed-size context vector,
which acts as a summary of the • High Memory Usage
sentence.
•Modern NLP models, including GPT and BERT, are built on these
advancements, making deep learning-based language understanding
more powerful than ever.
THANK YOU