Transformers

The document discusses the evolution of transformers in machine learning, highlighting their origin from Google's 2017 paper and the significance of transfer learning. It explains the encoder-decoder architecture, attention mechanisms, and the advantages of transformers over previous models like LSTMs. Additionally, it outlines various applications of transformers, the Hugging Face ecosystem, and the challenges faced in NLP, such as language bias and data availability.

Uploaded by

Saketh Kashyap Nagendra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views27 pages

Transformers

Uploaded by

Saketh Kashyap Nagendra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Transformers

Transformers Origin
In 2017 researchers at Google published a paper which proposed a novel neural
network architecture for sequence modeling.
Transformer, this architecture outperformed recurrent neural networks (RNNs) on
machine translation tasks, both in terms of translation quality and training cost.
In parallel, an effective transfer learning method called ULMFiT showed that
pretraining Long-Short Term Memory (LSTM) networks with a language
modeling objective on a very large and diverse corpus, and then fine-tuning on a
target task could produce robust text classifiers with little labeled data.
These advances were the catalysts for two of the most well-known transformers
today: GPT and BERT
Transfer Learning
• Transfer learning (TL) is a machine learning (ML) technique where a model pre-
trained on one task is fine-tuned for a new, related task. Training a new ML model
is a time-consuming and intensive process that requires a large amount of data,
computing power, and several iterations before it is ready for production.
But we are getting ahead of ourselves. To understand what is novel
about this approach combining very large datasets and a novel
architecture:
1. The encoder-decoder framework
2. Attention mechanisms
3. Transfer learning
The encoder-decoder architecture
• Prior to transformers, LSTMs were the state-of-the-art in NLP. These architectures
contain a cycle or feedback loop in the network connections that allows
information to propagate from one step to another, making them ideal for
modeling sequential data like language.
As the name suggests, the job of the encoder is to encode the information from the
input sequence into a numerical representation that is often called the last hidden
state. This state is then passed to the decoder, which generates the output sequence.
In general, the encoder and decoder components can be any kind of neural network
architecture that is suited for modeling sequences.
one weakness with this architecture is that the final hidden state of the encoder
creates an information bottleneck: it has to capture the meaning of the whole input
sequence because this is all the decoder has access to when generating the output.
This is especially challenging for long sequences where information at the start of
the sequence might be lost in the process of creating a single, fixed representation.
Fortunately, there is a way out of this bottleneck by allowing the decoder to have
access to all of the encoder’s hidden states. The general mechanism for this is
called attention and is a key component in many modern neural network
architectures
Attention Mechanism
The main idea behind attention is that instead of producing a single hidden state
for the input sequence, the encoder outputs a hidden state at each step which the
decoder can access.
However, using all states at the same time creates a huge input for the decoder, so
some mechanism is needed to prioritize which states to use.
This is where attention comes in: it lets the decoder assign a weight or “pay
attention” to the specific states in the past (and the context length can be very long
- several thousands words in the past for recent models like GPT or reformers)
which are most relevant for producing the next element in the output sequence.
• The Transformer architecture took this this idea several steps further and replaced
the recurrent units inside the encoder and decoder entirely with self-attention
layers and simple feed-forward networks.
• Moving from a sequential processing to a fully parallel processing unlocked
strong computational efficiency gains allowing to train on orders of magnitude
larger corpora for the same computational cost.
• At the same time, removing the sequential processing bottleneck of information
makes the transformer architecture more efficient on several task that requires
aggregating information over long time spans
Encoder-decoder architecture of the
original Transformer
Transfer Learning
• Transfer Learning is crucial in Natural Language Processing (NLP) due to its
ability to leverage knowledge learned from one task or domain and apply it to
another, typically related, task or domain.
Data Efficiency
Resource savings
Performance improvement
Domain adaption
Continual learning
Transfer learning
ULMFiT
Pretraining
The initial training objective is quite simple: predict the next word based on the previous
words. This task is referred to as language modeling. The elegance of this approach lies in
the fact that no labeled data is required, and one can make use of abundantly available
text from sources such as Wikipedia.1
Domain adaptation
Once the language model is pretrained on a large-scale corpus, the next step is to adapt it
to the in-domain corpus. This stage still uses language modeling, but now the model has
to predict the next word in the target corpus.
Fine-tuning
In this step, the language model is fine-tuned with a classification layer for the target task
Machine Learning Architecture
• Implement the model architecture in code, typically based on PyTorch or
TensorFlow.
• Load the pretrained weights (if available) from a server.
• Preprocess the inputs, pass them through the model, and apply some task-specific
postprocessing.
• Implement data loaders and define loss functions and optimizers to train the
model.
Each of these steps requires custom logic for each model and task.
Traditionally (but not always!), when research groups publish a new article, they
will also release the code along with the model weights. However, this code is
rarely standardized and often requires days of engineering to adapt to new use cases.
Transformer Applications
• Text Classification
• Named Entity Recognition
• Question Answering
• Summarization
• Translation
• Text Geneartion
Text Classification
• Transformers has a layered API that allows you to interact with the library at
various levels of abstraction.
• In Transformers, we instantiate a pipeline by calling the pipeline() function and
providing the name of the task we are interested in:
from transformers import pipeline
classifier = pipeline("text-classification")
The first time you run this code you’ll see a few progress bars appear because the pipeline
automatically downloads the model weights from the Hugging Face Hub.
The second time you instantiate the pipeline, the library will notice that you’ve already downloaded
the weights and will use the cached version instead. By default, the text classification pipeline uses a
model that’s designed for sentiment analysis, but it also supports multiclass and multilabel
classification.
Named Entity Recognition
• Predicting the sentiment of customer feedback is a good first step, but you often want to know if
the feedback was about a particular item or service.
• In NLP, real-world objects like products, places, and people are called named entities, and
extracting them from text is called named entity recognition (NER). We can apply NER by loading
the corresponding pipeline and feeding our customer review to it:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)
Question Answering
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])
Summarization
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45,
clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])
The Hugging Face Ecosystem
The Hugging Face ecosystem consists of mainly two parts: a family of
libraries and the Hub
The Hugging Face Hub
• Transfer learning is one of the key factors driving the success of transformers because it makes it
possible to reuse pretrained models for new tasks. Consequently, it is crucial to be able to load
pretrained models quickly and run experiments with them.
• The Hugging Face Hub hosts over 20,000 freely available models.
• In addition to model weights, the Hub also hosts datasets and scripts for computing metrics, which
let you reproduce published results or leverage additional data for your application.
• The Hub also provides model and dataset cards to document the contents of models and datasets
and help you make an informed decision about whether they’re the right ones for you.
Hugging Face Tokenizers
• Transformer models are trained on numerical representations of these tokens, so getting this step
right is pretty important for the whole NLP project!
• Tokenizers provides many tokenization strategies and is extremely fast at tokenizing text thanks to
its Rust backend.12 It also takes care of all the pre- and postprocessing steps, such as normalizing
the inputs and transforming the model outputs to the required format.
Hugging Face Datasets
• Loading, processing, and storing datasets can be a cumbersome process, especially when the
datasets get too large to fit in your laptop’s RAM. In addition, you usually need to implement
various scripts to download the data and transform it into a standard format.
• It also provides smart caching (so you don’t have to redo your preprocessing each time you run
your code) and avoids RAM limitations by leveraging a special mechanism called memory
mapping that stores the contents of a file in virtual memory and enables multiple processes to
modify a file more efficiently.
Hugging Face Accelerate
• If you’ve ever had to write your own training script in PyTorch, chances are that you’ve had some
headaches when trying to port the code that runs on your laptop to the code that runs on your
organization’s cluster.
• Accelerate adds a layer of abstraction to your normal training loops that takes care of all the
custom logic necessary for the training infrastructure. This literally accelerates your workflow by
simplifying the change of infrastructure when necessary
Main Challenges with Transformers
1. Language
NLP research is dominated by the English language. There are several models for other languages, but it is harder to find
pretrained models for rare or low-resource languages.
2. Data availability
Although we can use transfer learning to dramatically reduce the amount of labeled training data our models need ; it is still a
lot compared to how much a human needs to perform the task.
3. Working with long documents
Self-attention works extremely well on paragraph-long texts, but it becomes very expensive when we move to longer texts like
whole documents.
4. Opacity
As with other deep learning models, transformers are to a large extent opaque. It is hard or impossible to unravel “why” a
model made a certain prediction. This is an especially hard challenge when these models are deployed to make critical
decisions.
5. Bias
Transformer models are predominantly pretrained on text data from the internet. This imprints all the biases that are present in
the data into the models.

Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
272 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (2)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
D182 Task 2 Professional Growth Plan
0% (1)
D182 Task 2 Professional Growth Plan
9 pages
Unit 6 Workbook
100% (1)
Unit 6 Workbook
47 pages
Benchmark Gcu
No ratings yet
Benchmark Gcu
8 pages
(I. S. P. Nation) Learning Vocabulary in Another Language
100% (1)
(I. S. P. Nation) Learning Vocabulary in Another Language
0 pages
GenAI Syllabus
No ratings yet
GenAI Syllabus
17 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
20 pages
Transformer
No ratings yet
Transformer
55 pages
Transformers: State-of-the-Art Natural Language Processing
No ratings yet
Transformers: State-of-the-Art Natural Language Processing
8 pages
Week 12
100% (1)
Week 12
64 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Transformers NLP Presentation
No ratings yet
Transformers NLP Presentation
7 pages
Transformers in Machine Learning
No ratings yet
Transformers in Machine Learning
16 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
19 pages
Transformers: Attention Is All You Need
No ratings yet
Transformers: Attention Is All You Need
54 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Generative AI Unit 3 Notes
No ratings yet
Generative AI Unit 3 Notes
8 pages
Neptune - Ai Hugging Face Pre-Trained Models
No ratings yet
Neptune - Ai Hugging Face Pre-Trained Models
14 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Transformers - Introduction
No ratings yet
Transformers - Introduction
22 pages
3-Natural Language Processing With Attention Models
No ratings yet
3-Natural Language Processing With Attention Models
62 pages
Uppwise Standard PPT 2
No ratings yet
Uppwise Standard PPT 2
13 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
Transformers
No ratings yet
Transformers
10 pages
Transformer
No ratings yet
Transformer
5 pages
Transformers
No ratings yet
Transformers
21 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
Cluster1 Core ML NLP Techniques Summary
No ratings yet
Cluster1 Core ML NLP Techniques Summary
8 pages
JioDiscover-What Is The Neural Networ
No ratings yet
JioDiscover-What Is The Neural Networ
5 pages
Generative AI
No ratings yet
Generative AI
54 pages
Rans Formers
No ratings yet
Rans Formers
2 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Transformer Networks
No ratings yet
Transformer Networks
53 pages
Ai 1
No ratings yet
Ai 1
22 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
14.chapter10 AdvancedDeepLearningForText
No ratings yet
14.chapter10 AdvancedDeepLearningForText
22 pages
ML For NLP-LO3
No ratings yet
ML For NLP-LO3
61 pages
Information 14 00242
No ratings yet
Information 14 00242
17 pages
Transformer Architecture Explained in LLMs
No ratings yet
Transformer Architecture Explained in LLMs
2 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
Slides
No ratings yet
Slides
26 pages
LLM .Foundation - Models.from - The.ground - Up
No ratings yet
LLM .Foundation - Models.from - The.ground - Up
195 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
AE556 2024 Topic7 Transformer
No ratings yet
AE556 2024 Topic7 Transformer
49 pages
Generative AI For Everyone: Doç. Dr. Murat Mühendislik Fakültesi, Bilgisayar, Gazi Üniversitesi, E-Mail: My Gazi - Edu.tr
No ratings yet
Generative AI For Everyone: Doç. Dr. Murat Mühendislik Fakültesi, Bilgisayar, Gazi Üniversitesi, E-Mail: My Gazi - Edu.tr
44 pages
Session 8
No ratings yet
Session 8
24 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Prof. Devt. & Applied E. Module 5
No ratings yet
Prof. Devt. & Applied E. Module 5
6 pages
Poetry Analysis Essay Outline
100% (2)
Poetry Analysis Essay Outline
2 pages
Livingorganismlessonplangr 8science
No ratings yet
Livingorganismlessonplangr 8science
4 pages
Muet Victoria's Group New One
No ratings yet
Muet Victoria's Group New One
2 pages
The UMLS Semantic Network
No ratings yet
The UMLS Semantic Network
32 pages
Unit 2
No ratings yet
Unit 2
13 pages
Siike prt2
No ratings yet
Siike prt2
21 pages
Chapter 1: Content Knowledge and Pedagogy Title: Mother Tongue, Filipino and English in Teaching and Learning Observation
No ratings yet
Chapter 1: Content Knowledge and Pedagogy Title: Mother Tongue, Filipino and English in Teaching and Learning Observation
4 pages
AntagonismWorkshop (Compressed)
No ratings yet
AntagonismWorkshop (Compressed)
46 pages
Looking at Student Work Election Day Version
No ratings yet
Looking at Student Work Election Day Version
2 pages
Design Principles
No ratings yet
Design Principles
11 pages
Discourse Community Paper-Final Draft
No ratings yet
Discourse Community Paper-Final Draft
8 pages
Course Title: Structure in English Course Code: EM3 Course Description
No ratings yet
Course Title: Structure in English Course Code: EM3 Course Description
7 pages
1 Kushartanti
No ratings yet
1 Kushartanti
3 pages
Study Notes 09 For Pyc4807
No ratings yet
Study Notes 09 For Pyc4807
1 page
My Study Habits Version 4.0
No ratings yet
My Study Habits Version 4.0
13 pages
Passive Voice: by Group 1: Luthfika Adityo Gindo Putra Nugroho Primatia Palwani Munawaroh
No ratings yet
Passive Voice: by Group 1: Luthfika Adityo Gindo Putra Nugroho Primatia Palwani Munawaroh
8 pages
Observation, Grounded Theory and Content Analysis
No ratings yet
Observation, Grounded Theory and Content Analysis
28 pages
Morning Meeting Planning Template Fall 2020
No ratings yet
Morning Meeting Planning Template Fall 2020
2 pages
PPG Module
No ratings yet
PPG Module
68 pages
تصحيح وثيقة الخلاصة العامة لجميع دروس الوحدة الأولى (الإنجليزية مع السيمو)
No ratings yet
تصحيح وثيقة الخلاصة العامة لجميع دروس الوحدة الأولى (الإنجليزية مع السيمو)
2 pages
Sentence Patterns: (Eng 1-Communication Arts)
No ratings yet
Sentence Patterns: (Eng 1-Communication Arts)
17 pages
DLP PAssive Listening
No ratings yet
DLP PAssive Listening
4 pages
Learning Session No 16 - Iii Term Mysterious Places: Skill
No ratings yet
Learning Session No 16 - Iii Term Mysterious Places: Skill
3 pages
The Structure of A Second Conditional Sentence
No ratings yet
The Structure of A Second Conditional Sentence
4 pages
© Ncert Not To Be Republished: Mathematics Exemplar Problems
No ratings yet
© Ncert Not To Be Republished: Mathematics Exemplar Problems
8 pages

Transformers

Uploaded by

Transformers

Uploaded by

Transformers

You might also like