NLP Transformer Class Notes

transformer notes

Uploaded by

Aditya Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views3 pages

NLP Transformer Class Notes

transformer notes

Uploaded by

Aditya Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Large Language Models : : CHEAT SHEET

Large Language Models (LLMs) How are LLMs trained? Fine-Tuning

LLMs are artificial intelligence models that can generate LLMs are trained using a process called Fine-tuning is the process of training a pre-trained
human-like text, based on patterns found in massive unsupervised learning. This involves
amounts of training data. They are used in applications large language model on a specific task using a
such as language translation, chatbots, and content feeding the model massive amounts of smaller dataset. This allows the model to learn
creation. text data, such as books, articles, and task-specific features and improve its performance.
websites, and having the model learn the The fine-tuning process typically involves freezing
patterns and relationships between words
Some popular LLMs and phrases in the text. The model is then
the weights of the pre-trained model and only
training the task-specific layers.
fine-tuned on a specific task, such as
Some popular LLMs include GPT-3 (Generative When fine-tuning a model, it's important to
language translation or text
Pretrained Transformer by OpenAI, BERT consider factors such as the size of the fine-tuning
summarization.
(Bidirectional Encoder Representations from dataset, the choice of optimizer and learning rate,
Transformers) by Google, and XLNet (eXtreme and the choice of evaluation metrics
MultiLingual Language Model) by Carnegie Mellon
University and Google.
Preprocessing
Text normalization is the process of converting text to a
standard format, such as lowercasing all text, removing
special characters, and converting numbers to their
Example of fine-tuning LLMs
written form.
•Model Cost: $500 - $5000 per month, depending on the
Tokenization is the process of breaking down text into size and complexity of the language model
individual units, such as words or phrases. This is an Input Representations:
important step in preparing text data for NLP tasks. •GPU size: NVIDIA GeForce RTX 3080 or higher
•Word embeddings: Each token is replaced by a vector
Stop Words are common words that are usually removed •Number of GPUs: 1-4, depending on the size of the that represents its meaning in a continuous vector
Choose between LLMs during text processing, as they do not carry much meaning
and can introduce noise or affect the results of NLP tasks.
language model and the desired speed of fine-tuning. For space. Common methods for word embeddings include
Word2Vec, GloVe, and fastText.
example, fine-tuning the GPT-3 model, which is one of the
Examples of stop words include "the," "a," "an," "in," and largest language models available, would require a
When comparing different models, it's
important to consider their
"is.” minimum of 4 GPUs. •Subword embeddings: Each token is broken down into
architecture, the size of the model, the smaller subword units (e.g., characters or character n-
grams), and each subword is replaced by a vector that
amount of training data used, and their Lemmatization is the process of reducing words to their •The size of the data that GPT-3 is fine-tuned on can vary
performance on specific NLP tasks. base or dictionary form, by taking into account their part represents its meaning. This approach can handle out-
greatly depending on the specific use case and the size of
of speech and context. It is a more sophisticated of-vocabulary (OOV) words and can improve the model's
the model itself. GPT-3 is one of the largest language
technique than stemming and produces more accurate ability to capture morphological and semantic
models available, with over 175 billion parameters, so it
Components of LLMs results, but it is computationally more expensive. typically requires a large amount of data for fine-tuning to
similarities. Common methods for subword embeddings
include Byte Pair Encoding (BPE), Unigram Language
see a noticeable improvement in performance.
LLMs typically consist of an encoder, a Model (ULM), and SentencePiece.
Stemming and lemmatization are techniques used to
decoder, and attention mechanisms. The
reduce words to their base form. This helps to reduce the Note: fine-tuning GPT-3 on a small dataset of only a few
encoder takes in input text and converts it
dimensionality of the data and improve the performance gigabytes may not result in a significant improvement in •Positional encodings: Since LLMs operate on
into a set of hidden representations, while sequences of tokens, they need a way to encode the
of models. performance, while fine-tuning on a much larger dataset of
the decoder generates the output text. The position of each token in the sequence. Positional
attention mechanisms help the model focus several terabytes could result in a substantial improvement.
The size of the fine-tuning data will also depend on the encodings are vectors that are added to the word or
on the most relevant parts of the input text. subword embeddings to provide information about the
specific NLP task the model is being fine-tuned for and the
desired level of accuracy. position of each token.
Applications of LLMs
•Segment embeddings: In some LLMs, such as the
• LLMs are used in a wide range of Transformer, the input sequence can be divided into
applications, including language This is just one example, and actual costs and GPU
multiple segments (e.g., sentences or paragraphs).
translation, chatbots, content specifications may vary depending on the language model,
Segment embeddings are added to the word or subword
creation, and text the data it is being fine-tuned on, and other factors. It's
embeddings to indicate which segment each token
summarization. always best to check with the language model provider for
belongs to.
the latest information and specific recommendations for
• They can also be used to improve fine-tuning.
search engines, voice assistants,
and virtual assistants.

Ashish Patel • Principal Research Scientist • [email protected] Abonia Sojasingarayar • Machine Learning Scientist • [email protected] Updated: 2023-02
Large Language Models : : CHEAT SHEET
Text Generation: ChatGPT can generate text based on a
Attention Mechanisms Evaluating LLMs prompt or seed text. Example: Challenges and limitations
Self-Attention:
with LLMs
•Accuracy measures the proportion of correctly classified Human : Write a short story about a magical world.
•A mechanism that allows a sequence to weigh the • One of the main challenges with
instances out of all instances. This metric is commonly ChatGPT: Once upon a time, in a land far away, there
importance of all other elements in the sequence when LLMs is the potential for biased or
used for text classification tasks such as sentiment was a magical world filled with wonder and
computing a representation for each element. offensive language, as the models
analysis, where the goal is to correctly classify a text as enchantment. The sky was always a brilliant shade of
•Can capture relationships between different elements in the positive, negative, or neutral. purple, and the stars twinkled like diamonds in the sky.
learn from the patterns found in the
sequence, making it well-suited for tasks that require training data.
The people of this world possessed magical powers that
modeling long-range dependencies. they used to create incredible wonders and to help • Unethical considerations, such as
•F1-score is a metric that takes into account both
•Popularized by the Transformer model. precision and recall. Precision is the proportion of true those in need. One day, a young woman discovered that gender and racial biases.
she too had magical powers, and set out on an • Amount of computational resources
positive results out of all predicted positive results, while
incredible journey to explore the wonders of this needed to train and run LLMs, which
Multi-Head Attention: recall is the proportion of true positive results out of all
magical world and to hone her skills. can be expensive and energy-
•A variation of self-attention where the attention mechanism actual positive results. The F1-score is the harmonic mean
intensive.
is applied multiple times in parallel with different sets of of precision and recall, and it provides a balanced
weights. measure of model performance on text classification, Summarization: ChatGPT can summarize a long text or • Handling out-of-vocabulary words
•Allows the model to attend to different aspects of the input question answering, and other tasks. article. Example: • Improving interpretability. While
sequence, improving its ability to capture complex patterns large language models have shown
impressive performance on a variety
and dependencies. •Perplexity : It measures how well a language model Human : Please summarize this article about the
of NLP tasks, they may not perform
•Each "head" produces a separate output that is predicts the next word in a sequence. The lower the benefits of exercise.
as well on specific tasks, such as
concatenated and linearly transformed to produce the final perplexity, the better the model is at predicting the next ChatGPT: The article discusses the various physical and those that require a deeper
output. word. Perplexity is calculated as 2 to the power of the mental health benefits of exercise, including improved understanding of the underlying
•Also commonly used in the Transformer model. cross-entropy, which is a measure of how well the cardiovascular health, stronger muscles and bones, context.
model's predicted probabilities match the true better weight management, and reduced stress and
probabilities of the next word in the sequence.Also text anxiety. It also highlights the importance of finding an
can be evaluated using metrics such as BLEU or ROUGE
scores.
exercise routine that is enjoyable and sustainable.
Applications of LLMs
Translation: ChatGPT can translate text from one • LLMs are used in a wide range of
language to another. Example: applications, including language
translation, chatbots, content
Example - ChatGPT Translate this sentence from English to Spanish. "The creation, and text
sky is blue." summarization.
ChatGPT is a large language model developed ChatGPT: El cielo es azul.
by OpenAI that has been trained on a diverse • They can also be used to improve
range of internet text, including articles, forums, search engines, voice assistants,
and social media. This training allows ChatGPT and virtual assistants.
to generate human-like text and answer a wide Tools&Libraries support LLMs
variety of questions on many topics.
a. Popular NLP libraries, such as TensorFlow, PyTorch,
Use Cases ChatGPT spaCy, Hugging Face Transformers, AllenNLP,OpenAI
GPT-3 API, AllenAI's ELMO, spaCy Transformers etc Future of LLMs
Question Answering: ChatGPT can answer factual provide tools for working with large language models.
questions based on the information it has been These libraries allow for easy fine-tuning and The future of LLMs is promising, with ongoing research
trained on. Example: deployment of models. focused on improving their accuracy, reducing bias, and
making them more accessible and energy-efficient.
Human : What is the capital of France? b. Some large language models, such as GPT-3, provide As the demand for AI-driven applications continues to
ChatGPT: The capital of France is Paris. APIs for access to their models. This can simplify the grow, LLMs will play an increasingly important role in
process of integrating a large language model into a shaping the future of human-machine interaction.
Conversational: ChatGPT can engage in a real-world application.
conversation with a user. Example:

Human : Hi, how are you today?

ChatGPT: Hello! I'm just an AI, so I don't have
emotions, but I'm functioning well today. How can I
assist you?

BERT Architecture

Ashish Patel • Principal Research Scientist • [email protected] Abonia Sojasingarayar • Machine Learning Scientist • [email protected] Updated: 2023-02

Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (5)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Lan - Guage Mo - Del Cheat Sheet
100% (2)
Lan - Guage Mo - Del Cheat Sheet
3 pages
Lec # 12
No ratings yet
Lec # 12
26 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
2 Notes
No ratings yet
2 Notes
3 pages
Training Large Language Models
No ratings yet
Training Large Language Models
7 pages
Transformers
No ratings yet
Transformers
2 pages
LLMS&EMBEDDINGS
No ratings yet
LLMS&EMBEDDINGS
10 pages
Bueno Teoria 2307.06435
No ratings yet
Bueno Teoria 2307.06435
37 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Clase1 Generating Your First Text
No ratings yet
Clase1 Generating Your First Text
18 pages
Technical Seminar
No ratings yet
Technical Seminar
16 pages
Making A Chat
No ratings yet
Making A Chat
3 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
AILLM
No ratings yet
AILLM
3 pages
Introduction To Large Language Models
No ratings yet
Introduction To Large Language Models
3 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
LLM4BeSciV2 2024 04 29T13 - 02 - 01.601Z
No ratings yet
LLM4BeSciV2 2024 04 29T13 - 02 - 01.601Z
25 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
Large Language Models (LLM)
No ratings yet
Large Language Models (LLM)
139 pages
Three 150224 Generative A I Intro
No ratings yet
Three 150224 Generative A I Intro
19 pages
Building Finetuning Aimodels
No ratings yet
Building Finetuning Aimodels
41 pages
LMM Model
No ratings yet
LMM Model
41 pages
Toc 9780138199302
No ratings yet
Toc 9780138199302
8 pages
Large Language Models A Comprehensive Survey of It
No ratings yet
Large Language Models A Comprehensive Survey of It
30 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
Understanding Large Language Models (LLMS) - A Mode
No ratings yet
Understanding Large Language Models (LLMS) - A Mode
3 pages
LLMs
No ratings yet
LLMs
40 pages
Large Language Models
No ratings yet
Large Language Models
32 pages
Scalexm - Ai: A Compact Guide To Large Language Models
No ratings yet
Scalexm - Ai: A Compact Guide To Large Language Models
9 pages
Large Language Models
No ratings yet
Large Language Models
2 pages
A Comprehensive Overview of Large Language Models: A A, B, C, D, E, F E, F G, I H I
No ratings yet
A Comprehensive Overview of Large Language Models: A A, B, C, D, E, F E, F G, I H I
46 pages
A Comprehensive Overview of Large Language Models: A B, C, D, E, F, G F, G H, J I J
No ratings yet
A Comprehensive Overview of Large Language Models: A B, C, D, E, F, G F, G H, J I J
47 pages
(English) Introduction To Large Language Models (DownSub - Com)
No ratings yet
(English) Introduction To Large Language Models (DownSub - Com)
9 pages
LLM Model
No ratings yet
LLM Model
43 pages
A Comprehensive Overview of Large Language Models: Preprint 1
No ratings yet
A Comprehensive Overview of Large Language Models: Preprint 1
46 pages
Notes 4 Large Language Model
No ratings yet
Notes 4 Large Language Model
4 pages
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
No ratings yet
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
285 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Robotics - PPT For Ros Etc Students Good
No ratings yet
Robotics - PPT For Ros Etc Students Good
15 pages
Large Language Model Lifecycle
No ratings yet
Large Language Model Lifecycle
2 pages
LLM Overview
No ratings yet
LLM Overview
3 pages
Small Language Models (SLMS)
No ratings yet
Small Language Models (SLMS)
23 pages
Natural Learning
No ratings yet
Natural Learning
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
LLM Basics
No ratings yet
LLM Basics
3 pages
The Best LLMs Cheatsheet 1727364716
No ratings yet
The Best LLMs Cheatsheet 1727364716
15 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Large Language Models
From Everand
Large Language Models
A. Scholtens
2/5 (2)
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)
Application of Machine Learning
No ratings yet
Application of Machine Learning
11 pages
Visual Basic 6.0 Documentation
No ratings yet
Visual Basic 6.0 Documentation
33 pages
Rotational Motion - Torque and Center of Gravity
No ratings yet
Rotational Motion - Torque and Center of Gravity
39 pages
OpenText File System Archiving 10.2.0 Release Notes
No ratings yet
OpenText File System Archiving 10.2.0 Release Notes
13 pages
Gr09 Maths Term2 Pack01 Practice Paper Memo
No ratings yet
Gr09 Maths Term2 Pack01 Practice Paper Memo
5 pages
Chassis Control Systems
100% (2)
Chassis Control Systems
312 pages
Design Process en 1993-1!3!2006
No ratings yet
Design Process en 1993-1!3!2006
22 pages
Enzyme Practical 1
No ratings yet
Enzyme Practical 1
2 pages
Customize Pricing Procedure
No ratings yet
Customize Pricing Procedure
5 pages
UFO Glasnost Marina Popowitsch LQ
No ratings yet
UFO Glasnost Marina Popowitsch LQ
288 pages
01 Task Performance 1
No ratings yet
01 Task Performance 1
3 pages
2020 Vehicle Technologies Office Annual Merit Review High Efficiency Powertrain For Heavy Duty Trucks Using Silicon Carbide (Sic) Inverter
No ratings yet
2020 Vehicle Technologies Office Annual Merit Review High Efficiency Powertrain For Heavy Duty Trucks Using Silicon Carbide (Sic) Inverter
23 pages
Carbon and Its Compounds Mind Map
No ratings yet
Carbon and Its Compounds Mind Map
1 page
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
No ratings yet
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
10 pages
Auditing in Oracle 10g Release 2
No ratings yet
Auditing in Oracle 10g Release 2
9 pages
Formlabs Fuse F1 - Sift Tech Specs
No ratings yet
Formlabs Fuse F1 - Sift Tech Specs
4 pages
Sol 5
No ratings yet
Sol 5
7 pages
191008-Elmsbrook BREEAM Daylighting-Rev03
No ratings yet
191008-Elmsbrook BREEAM Daylighting-Rev03
10 pages
Heliax AVA5-50 Coaxial Cable: One Company. A World of Solutions
No ratings yet
Heliax AVA5-50 Coaxial Cable: One Company. A World of Solutions
2 pages
Chemistry 12 (PBA QIB)
No ratings yet
Chemistry 12 (PBA QIB)
27 pages
Ita Report-N7-V3 BD P
No ratings yet
Ita Report-N7-V3 BD P
12 pages
How To Tune Your TV
No ratings yet
How To Tune Your TV
5 pages
Omnia SST: Audio Processing Software
No ratings yet
Omnia SST: Audio Processing Software
3 pages
Unit 1
No ratings yet
Unit 1
57 pages
(MS-02.00) Condensing Unit & Ahu
No ratings yet
(MS-02.00) Condensing Unit & Ahu
52 pages
Chapter 7 (Part I) - User Defined Datatypes
No ratings yet
Chapter 7 (Part I) - User Defined Datatypes
53 pages
Aggregate Impact Value
No ratings yet
Aggregate Impact Value
8 pages
ITF24-DS-Assignment #1
No ratings yet
ITF24-DS-Assignment #1
3 pages
Windows OS Internal Training
No ratings yet
Windows OS Internal Training
66 pages
Unit-1 Lesson 1
No ratings yet
Unit-1 Lesson 1
10 pages