Encoder-Decoder Differences

Uploaded by

Kawsar Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views2 pages

Encoder-Decoder Differences

Uploaded by

Kawsar Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Powered by AI

Copilot
The paper discusses the advantages of using an encoder-decoder model over a
decoder-only model in the context of transfer learning for NLP tasks. Here are
the key points:
● Generative Tasks: Encoder-decoder models are more effective for
generative tasks like translation or summarization, as they can process
input with an encoder before generating output with a decoder1.
● Classification Tasks: While encoder-only models like BERT are
designed for classification or span prediction tasks, they are not suitable
for generative tasks2.
● Flexibility: The encoder-decoder structure provides flexibility and good
results on both generative and classification tasks, making it a more
versatile choice for various NLP applications.
● Unified Approach: The paper leverages a unified text-to-text framework,
allowing the use of the same model, objective, training procedure, and
decoding process across diverse tasks, which is facilitated by the encoder-
decoder architecture.
Main Reason: The specific drawback of decoder-only models compared to
encoder-decoder models, as mentioned in the article, is related to the limitations
of uni-directional self-attention. In decoder-only models like GPT2, each token
can only attend to previous tokens, not future ones. This means that the model's
representation of an input sequence is limited because each token cannot depend
on the subsequent tokens.
In contrast, encoder-decoder models use bi-directional self-attention in the
encoder, allowing each token to attend to all other tokens in the sequence, thus
providing a more comprehensive understanding of the context. This bi-
directional context is crucial for tasks that require an understanding of the entire
input sequence, such as translation or summarization, making encoder-decoder
models more effective for these types of sequence-to-sequence tasks. The
drawback of decoder-only models is that their uni-directional nature can lead to
less effective representations for such tasks.
Why encoder only models are not fit Seq2Seq problems?
Because encoder-only models require to know the output length a priori, they
seem unfit for sequence-to-sequence tasks.
Powered by AI

Source(s)
1. We mainly ...
2. This makes...

Answer For Introduction To Generative AI Quiz
75% (8)
Answer For Introduction To Generative AI Quiz
5 pages
Binder Merged
No ratings yet
Binder Merged
142 pages
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
No ratings yet
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
50 pages
11 Bert
No ratings yet
11 Bert
66 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
NeurIPS 2022 Error Correction Code Transformer Paper Conference
No ratings yet
NeurIPS 2022 Error Correction Code Transformer Paper Conference
11 pages
Seq 2 Seq
No ratings yet
Seq 2 Seq
59 pages
Sequence-to-Sequence Neural Net Models For Grapheme-to-Phoneme Conversion
No ratings yet
Sequence-to-Sequence Neural Net Models For Grapheme-to-Phoneme Conversion
5 pages
You Only Cache Once: Decoder-Decoder Architectures For Language Models
No ratings yet
You Only Cache Once: Decoder-Decoder Architectures For Language Models
20 pages
Polynomial Expansion Paper
No ratings yet
Polynomial Expansion Paper
4 pages
Encoder Decoder
No ratings yet
Encoder Decoder
8 pages
Binder
No ratings yet
Binder
97 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Unit - IV - Natural Language Processing
No ratings yet
Unit - IV - Natural Language Processing
9 pages
BERT-NAR-BERT A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints
No ratings yet
BERT-NAR-BERT A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints
11 pages
7 Transformers
No ratings yet
7 Transformers
20 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
A Modern Bidirectional Encoder For Fast, Memory Efficient, and Long Context Finetuning and Inference
No ratings yet
A Modern Bidirectional Encoder For Fast, Memory Efficient, and Long Context Finetuning and Inference
20 pages
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
No ratings yet
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
20 pages
NLP Answers
No ratings yet
NLP Answers
13 pages
Visualizing A Neural Machine Translation Model
No ratings yet
Visualizing A Neural Machine Translation Model
38 pages
Torralba Skip Thought Vectors
No ratings yet
Torralba Skip Thought Vectors
10 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
UNIT 2 FULL - Compressed
No ratings yet
UNIT 2 FULL - Compressed
26 pages
Sequence Models-II
No ratings yet
Sequence Models-II
10 pages
Blockwise Parallel Decoding For Deep Autoregressive Models
No ratings yet
Blockwise Parallel Decoding For Deep Autoregressive Models
10 pages
All About Encoder-Decoder Models
No ratings yet
All About Encoder-Decoder Models
50 pages
Encoder Vs Decoder Transformer Updated
No ratings yet
Encoder Vs Decoder Transformer Updated
10 pages
Sequence To Sequence
No ratings yet
Sequence To Sequence
4 pages
DL Co4 PPT-1
No ratings yet
DL Co4 PPT-1
29 pages
cl8 Encdec
No ratings yet
cl8 Encdec
51 pages
Exploring Sequence-to-Sequence Models - Understanding The Power of Encoder and Decoder Architecture - by Sachinsoni - Medium
No ratings yet
Exploring Sequence-to-Sequence Models - Understanding The Power of Encoder and Decoder Architecture - by Sachinsoni - Medium
18 pages
Encoder-Decoder Sequence To Sequence Architechure
No ratings yet
Encoder-Decoder Sequence To Sequence Architechure
16 pages
Enc-Dec Vs Dec-Only
No ratings yet
Enc-Dec Vs Dec-Only
22 pages
December Deep Learning
No ratings yet
December Deep Learning
10 pages
Dlunit 4
No ratings yet
Dlunit 4
122 pages
Unit5 3
No ratings yet
Unit5 3
48 pages
Understanding Transformer Model Architectures - Practical Artificial Intelligence
No ratings yet
Understanding Transformer Model Architectures - Practical Artificial Intelligence
6 pages
M5 Topic 1 - Encoder Decoder
No ratings yet
M5 Topic 1 - Encoder Decoder
21 pages
Lec 11
No ratings yet
Lec 11
30 pages
Unit4 Notes Final
No ratings yet
Unit4 Notes Final
34 pages
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
From Everand
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
Tom Mejer Antonsen
4/5 (12)
Introduction To Logic Circuit Design With VHDL
From Everand
Introduction To Logic Circuit Design With VHDL
Bilgehan Erkal
No ratings yet
Introduction to Google's Go Programming Language: GoLang
From Everand
Introduction to Google's Go Programming Language: GoLang
Orhan Gazi
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
GDB Fundamentals and Techniques: Definitive Reference for Developers and Engineers
From Everand
GDB Fundamentals and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
From Everand
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Rust Programming Basics: A Practical Guide with Examples
From Everand
Rust Programming Basics: A Practical Guide with Examples
William E. Clark
No ratings yet
Go Algorithms for Beginners: A Practical Guide with Examples
From Everand
Go Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Go Debugging from Scratch: A Practical Guide with Examples
From Everand
Go Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
The Beginner’s Guide to AI - Zed
From Everand
The Beginner’s Guide to AI - Zed
Steven Mcananey
No ratings yet
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
From Everand
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
Fouad Sabry
No ratings yet

Encoder-Decoder Differences

Uploaded by

Encoder-Decoder Differences

Uploaded by

Powered by AI

You might also like