0% found this document useful (0 votes)
18 views3 pages

NLP Transformer Class Notes

transformer notes

Uploaded by

Aditya Mohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

NLP Transformer Class Notes

transformer notes

Uploaded by

Aditya Mohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Large Language Models : : CHEAT SHEET

Large Language Models (LLMs) How are LLMs trained? Fine-Tuning


LLMs are artificial intelligence models that can generate LLMs are trained using a process called Fine-tuning is the process of training a pre-trained
human-like text, based on patterns found in massive unsupervised learning. This involves
amounts of training data. They are used in applications large language model on a specific task using a
such as language translation, chatbots, and content feeding the model massive amounts of smaller dataset. This allows the model to learn
creation. text data, such as books, articles, and task-specific features and improve its performance.
websites, and having the model learn the The fine-tuning process typically involves freezing
patterns and relationships between words
Some popular LLMs and phrases in the text. The model is then
the weights of the pre-trained model and only
training the task-specific layers.
fine-tuned on a specific task, such as
Some popular LLMs include GPT-3 (Generative When fine-tuning a model, it's important to
language translation or text
Pretrained Transformer by OpenAI, BERT consider factors such as the size of the fine-tuning
summarization.
(Bidirectional Encoder Representations from dataset, the choice of optimizer and learning rate,
Transformers) by Google, and XLNet (eXtreme and the choice of evaluation metrics
MultiLingual Language Model) by Carnegie Mellon
University and Google.
Preprocessing
Text normalization is the process of converting text to a
standard format, such as lowercasing all text, removing
special characters, and converting numbers to their
Example of fine-tuning LLMs
written form.
•Model Cost: $500 - $5000 per month, depending on the
Tokenization is the process of breaking down text into size and complexity of the language model
individual units, such as words or phrases. This is an Input Representations:
important step in preparing text data for NLP tasks. •GPU size: NVIDIA GeForce RTX 3080 or higher
•Word embeddings: Each token is replaced by a vector
Stop Words are common words that are usually removed •Number of GPUs: 1-4, depending on the size of the that represents its meaning in a continuous vector
Choose between LLMs during text processing, as they do not carry much meaning
and can introduce noise or affect the results of NLP tasks.
language model and the desired speed of fine-tuning. For space. Common methods for word embeddings include
Word2Vec, GloVe, and fastText.
example, fine-tuning the GPT-3 model, which is one of the
Examples of stop words include "the," "a," "an," "in," and largest language models available, would require a
When comparing different models, it's
important to consider their
"is.” minimum of 4 GPUs. •Subword embeddings: Each token is broken down into
architecture, the size of the model, the smaller subword units (e.g., characters or character n-
grams), and each subword is replaced by a vector that
amount of training data used, and their Lemmatization is the process of reducing words to their •The size of the data that GPT-3 is fine-tuned on can vary
performance on specific NLP tasks. base or dictionary form, by taking into account their part represents its meaning. This approach can handle out-
greatly depending on the specific use case and the size of
of speech and context. It is a more sophisticated of-vocabulary (OOV) words and can improve the model's
the model itself. GPT-3 is one of the largest language
technique than stemming and produces more accurate ability to capture morphological and semantic
models available, with over 175 billion parameters, so it
Components of LLMs results, but it is computationally more expensive. typically requires a large amount of data for fine-tuning to
similarities. Common methods for subword embeddings
include Byte Pair Encoding (BPE), Unigram Language
see a noticeable improvement in performance.
LLMs typically consist of an encoder, a Model (ULM), and SentencePiece.
Stemming and lemmatization are techniques used to
decoder, and attention mechanisms. The
reduce words to their base form. This helps to reduce the Note: fine-tuning GPT-3 on a small dataset of only a few
encoder takes in input text and converts it
dimensionality of the data and improve the performance gigabytes may not result in a significant improvement in •Positional encodings: Since LLMs operate on
into a set of hidden representations, while sequences of tokens, they need a way to encode the
of models. performance, while fine-tuning on a much larger dataset of
the decoder generates the output text. The position of each token in the sequence. Positional
attention mechanisms help the model focus several terabytes could result in a substantial improvement.
The size of the fine-tuning data will also depend on the encodings are vectors that are added to the word or
on the most relevant parts of the input text. subword embeddings to provide information about the
specific NLP task the model is being fine-tuned for and the
desired level of accuracy. position of each token.
Applications of LLMs
•Segment embeddings: In some LLMs, such as the
• LLMs are used in a wide range of Transformer, the input sequence can be divided into
applications, including language This is just one example, and actual costs and GPU
multiple segments (e.g., sentences or paragraphs).
translation, chatbots, content specifications may vary depending on the language model,
Segment embeddings are added to the word or subword
creation, and text the data it is being fine-tuned on, and other factors. It's
embeddings to indicate which segment each token
summarization. always best to check with the language model provider for
belongs to.
the latest information and specific recommendations for
• They can also be used to improve fine-tuning.
search engines, voice assistants,
and virtual assistants.

Ashish Patel • Principal Research Scientist • [email protected] Abonia Sojasingarayar • Machine Learning Scientist • [email protected] Updated: 2023-02
Large Language Models : : CHEAT SHEET
Text Generation: ChatGPT can generate text based on a
Attention Mechanisms Evaluating LLMs prompt or seed text. Example: Challenges and limitations
Self-Attention:
with LLMs
•Accuracy measures the proportion of correctly classified Human : Write a short story about a magical world.
•A mechanism that allows a sequence to weigh the • One of the main challenges with
instances out of all instances. This metric is commonly ChatGPT: Once upon a time, in a land far away, there
importance of all other elements in the sequence when LLMs is the potential for biased or
used for text classification tasks such as sentiment was a magical world filled with wonder and
computing a representation for each element. offensive language, as the models
analysis, where the goal is to correctly classify a text as enchantment. The sky was always a brilliant shade of
•Can capture relationships between different elements in the positive, negative, or neutral. purple, and the stars twinkled like diamonds in the sky.
learn from the patterns found in the
sequence, making it well-suited for tasks that require training data.
The people of this world possessed magical powers that
modeling long-range dependencies. they used to create incredible wonders and to help • Unethical considerations, such as
•F1-score is a metric that takes into account both
•Popularized by the Transformer model. precision and recall. Precision is the proportion of true those in need. One day, a young woman discovered that gender and racial biases.
she too had magical powers, and set out on an • Amount of computational resources
positive results out of all predicted positive results, while
incredible journey to explore the wonders of this needed to train and run LLMs, which
Multi-Head Attention: recall is the proportion of true positive results out of all
magical world and to hone her skills. can be expensive and energy-
•A variation of self-attention where the attention mechanism actual positive results. The F1-score is the harmonic mean
intensive.
is applied multiple times in parallel with different sets of of precision and recall, and it provides a balanced
weights. measure of model performance on text classification, Summarization: ChatGPT can summarize a long text or • Handling out-of-vocabulary words
•Allows the model to attend to different aspects of the input question answering, and other tasks. article. Example: • Improving interpretability. While
sequence, improving its ability to capture complex patterns large language models have shown
impressive performance on a variety
and dependencies. •Perplexity : It measures how well a language model Human : Please summarize this article about the
of NLP tasks, they may not perform
•Each "head" produces a separate output that is predicts the next word in a sequence. The lower the benefits of exercise.
as well on specific tasks, such as
concatenated and linearly transformed to produce the final perplexity, the better the model is at predicting the next ChatGPT: The article discusses the various physical and those that require a deeper
output. word. Perplexity is calculated as 2 to the power of the mental health benefits of exercise, including improved understanding of the underlying
•Also commonly used in the Transformer model. cross-entropy, which is a measure of how well the cardiovascular health, stronger muscles and bones, context.
model's predicted probabilities match the true better weight management, and reduced stress and
probabilities of the next word in the sequence.Also text anxiety. It also highlights the importance of finding an
can be evaluated using metrics such as BLEU or ROUGE
scores.
exercise routine that is enjoyable and sustainable.
Applications of LLMs
Translation: ChatGPT can translate text from one • LLMs are used in a wide range of
language to another. Example: applications, including language
translation, chatbots, content
Example - ChatGPT Translate this sentence from English to Spanish. "The creation, and text
sky is blue." summarization.
ChatGPT is a large language model developed ChatGPT: El cielo es azul.
by OpenAI that has been trained on a diverse • They can also be used to improve
range of internet text, including articles, forums, search engines, voice assistants,
and social media. This training allows ChatGPT and virtual assistants.
to generate human-like text and answer a wide Tools&Libraries support LLMs
variety of questions on many topics.
a. Popular NLP libraries, such as TensorFlow, PyTorch,
Use Cases ChatGPT spaCy, Hugging Face Transformers, AllenNLP,OpenAI
GPT-3 API, AllenAI's ELMO, spaCy Transformers etc Future of LLMs
Question Answering: ChatGPT can answer factual provide tools for working with large language models.
questions based on the information it has been These libraries allow for easy fine-tuning and The future of LLMs is promising, with ongoing research
trained on. Example: deployment of models. focused on improving their accuracy, reducing bias, and
making them more accessible and energy-efficient.
Human : What is the capital of France? b. Some large language models, such as GPT-3, provide As the demand for AI-driven applications continues to
ChatGPT: The capital of France is Paris. APIs for access to their models. This can simplify the grow, LLMs will play an increasingly important role in
process of integrating a large language model into a shaping the future of human-machine interaction.
Conversational: ChatGPT can engage in a real-world application.
conversation with a user. Example:

Human : Hi, how are you today?


ChatGPT: Hello! I'm just an AI, so I don't have
emotions, but I'm functioning well today. How can I
assist you?

Ashish Patel • Principal Research Scientist • [email protected] Abonia Sojasingarayar • Machine Learning Scientist • [email protected] Updated: 2023-02
Large Language Models : : CHEAT SHEET
Transformer Architecture
GPT Architecture

BERT Architecture

Ashish Patel • Principal Research Scientist • [email protected] Abonia Sojasingarayar • Machine Learning Scientist • [email protected] Updated: 2023-02

You might also like