0% found this document useful (0 votes)
4 views2 pages

Encoder-Decoder Differences

Uploaded by

Kawsar Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views2 pages

Encoder-Decoder Differences

Uploaded by

Kawsar Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Powered by AI

Copilot
The paper discusses the advantages of using an encoder-decoder model over a
decoder-only model in the context of transfer learning for NLP tasks. Here are
the key points:
● Generative Tasks: Encoder-decoder models are more effective for
generative tasks like translation or summarization, as they can process
input with an encoder before generating output with a decoder1.
● Classification Tasks: While encoder-only models like BERT are
designed for classification or span prediction tasks, they are not suitable
for generative tasks2.
● Flexibility: The encoder-decoder structure provides flexibility and good
results on both generative and classification tasks, making it a more
versatile choice for various NLP applications.
● Unified Approach: The paper leverages a unified text-to-text framework,
allowing the use of the same model, objective, training procedure, and
decoding process across diverse tasks, which is facilitated by the encoder-
decoder architecture.
Main Reason: The specific drawback of decoder-only models compared to
encoder-decoder models, as mentioned in the article, is related to the limitations
of uni-directional self-attention. In decoder-only models like GPT2, each token
can only attend to previous tokens, not future ones. This means that the model's
representation of an input sequence is limited because each token cannot depend
on the subsequent tokens.
In contrast, encoder-decoder models use bi-directional self-attention in the
encoder, allowing each token to attend to all other tokens in the sequence, thus
providing a more comprehensive understanding of the context. This bi-
directional context is crucial for tasks that require an understanding of the entire
input sequence, such as translation or summarization, making encoder-decoder
models more effective for these types of sequence-to-sequence tasks. The
drawback of decoder-only models is that their uni-directional nature can lead to
less effective representations for such tasks.
Why encoder only models are not fit Seq2Seq problems?
Because encoder-only models require to know the output length a priori, they
seem unfit for sequence-to-sequence tasks.
Powered by AI

Source(s)
1. We mainly ...
2. This makes...

You might also like