0% found this document useful (0 votes)
8 views

Slides

Uploaded by

Rafael Bosso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Slides

Uploaded by

Rafael Bosso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

LLMs,

Click toGPT,
edit and Prompt
Master title style
Engineering for
Developers

Sinan Ozdemir
Data Scientist, Entrepreneur,
Author, Lecturer
LLMs,
Click toGPT,
edit and Prompt
Master title style
Engineering for
Developers
Session 1: Introduction

Sinan Ozdemir
Data Scientist, Entrepreneur,
Author, Lecturer
Brief History of Modern NLP
Click to edit Master title style
2001 2014–2017
Neural Language Seq2seq +
Models Attention

2013 2017–Present
encoding semantic Transformers + Large
meaning with Language Models
Word2vec

Bengio et al. https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf


Mikolov et al. https://arxiv.org/abs/1301.3781
Xu et al. http://proceedings.mlr.press/v37/xuc15.pdf
https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
2017 – Transformers
Click to edit Master title style
“Attention is all you need”

• Introduced the transformer


architecture

• A sequence to sequence model


(takes text in and writes text back)

• The parent model of GPT, BERT,


T5, and many more
Source:
https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Language Models
Click to edit Master title style
Consider the following example:

If you don’t ___ at the sign, you will get a ticket.


Language Models
Click to edit Master title style
Consider the following example:

If you don’t ___ at the sign, you will get a ticket.

95%

5%
Language Models
Click to edit Master title style
In a language modeling task, a model is trained
to predict a missing word in a sequence of words.

In general, there are two types of language


models:

• Auto-regressive

• Auto-encoding
Auto-__ Language Models
Click to edit Master title style
Auto-regressive Models Auto-encoding Models

Predict a future token (word) Learn representations of the


given either the past tokens entire sequence by predicting
or the future tokens but not tokens given both the past
both. and future tokens.

If you don’t ___ (forward prediction) If you don’t ___ at the sign, you will get a ticket.
Auto-__ Language Model Use Cases
Click to edit Master title style
Auto-regressive Models Auto-encoding Models

1. Predicting next word in a 1. Comprehensive


sentence (auto-complete) understanding and
encoding of entire
sequences of tokens
2. Natural language
generation (NLG) 2. Natural language
understanding (NLU)
3. GPT family
3. BERT family
How Large is Large?
Click to edit Master title style
• Large language models (LLMs) are language models with many
parameters (generally 100M +) that are pre-trained on large
corpora to process and generate natural language text for a wide
variety of tasks. Includes BERT, GPT, T5, and many more.

• Massively large language models (like ChatGPT) have billions of


parameters and are pre-trained on much larger datasets.

• LLMs are trained on vast amounts of text data, capturing the


complexities and nuances of human language. LLMs can perform a
range of language-related tasks, from text classification to text
generation, with high accuracy, fluency, and style.
Massive LLM Playgrounds
Click to edit Master title style
Massive language models like GPT3 and ChatGPT
cannot run on anyone’s personal machine. They are
instead available via Playgrounds and APIs.

• Playgrounds are graphical interfaces to play with


and iterate on inputs to the model.

• APIs are programmatic interfaces to the LLM.


GPT-3’s Playground
Click to edit Master title style
Tweak inference parameters

Write the input to the model here


(your prompt)
Using the ChatGPT Playground
Click to edit Master title style
Write an instruction to the LLM, and see the response
Prompt

LLM Response
Source: ChatGPT Playground
Tradeoffs Between Different LLMs
Click to edit Master title style
• Auto-encoding models like BERT are fast at encoding
semantic meaning for Understanding tasks but
cannot generate free text.

• Auto-regressive (aka causal) models like GPT are


slower to process text but can generate accurate and
powerful free text for Generating tasks.

• Combination models like T5 can both encode quickly


and generate text but generally require more data to
train.
Click to edit Master title style

Popular Modern LLMs


BERT
Click to edit Master title style
Bi-directional Encoder Representation from Transformers

Auto-encoding
language Relying on The encoder is
model attention taken from the
Uses only the
encoder from the transformer
transformer architecture

Developed by Google in 2018, BERT was one of the first large


language models based on the transformer–specifically on the
encoder. It excels at natural language understanding (NLU)
tasks like sequence/token classification and semantic search
Pre-training BERT – Corpus
Click to edit Master title style
English Wikipedia (2.5B words)
https://en.wikipedia.org/wiki/English_Wikipedia

BookCorpus (800M words)


huggingface.co/datasets/bookcorpus
Pre-training BERT – What LLMs learn
Click to edit Master title style

Head 8-10 - relates objects to verbs


eg. love -> it

Source:
https://nlp.stanford.edu/pubs/clark2019what.pdf
T5
Click to edit Master title style
Text to Text Transfer Transformer

Relying on A pure
A sequence to transfer
sequence model transformer
learning using both the
and a fifth “t”!
encoder and
decoder

Developed by Google in 2020, T5 is a pure transformer (both


encoder and decoder) and can both process text quickly and
generate free text, making it one of the first models to tout the
ability to solve multiple NLP problems out of the box.
Pre-training T5
Click to edit Master title style
Common crawl web extracted text (commoncrawl.org)

Source: https://arxiv.org/pdf/1910.10683.pdf
GPT
Click to edit Master title style
Generative Pre-trained Transformers

Auto-regressive The decoder is


language model Decoders are taken from the
trained on transformer
huge corpora architecture
of data

Developed by OpenAI in 2018, GPT relies on the transformer’s


decoder to excel at natural language generation (NLG) tasks
such as summarization, creative writing, and much more.
It’s about Family
Click to edit Master title style
GPT-1 released in 2018 – .117B params

GPT-2 released in 2019 – 1.5B params

GPT-3 released in 2020 – 175B params

GPT-3.5 + ChatGPT released in 2022 – included reinforcement


learning for alignment

GPT-4 released in 2023 – larger, more capable with a promise of


multi-modality
GPT sees tokens left to right
Click to edit Master title style
“My friend was right about this class. It is so fun!”
GPT sees tokens left to right
Click to edit Master title style
Notice how tokens cannot attend to
tokens that came before. This is because
of the masking.

Said another way, notice that no lines


are drawn from tokens on the left to
tokens that come afterwards on the
right
GPT sees tokens left to right
Click to edit Master title style
Weights from GPT-2 for Layer 9, Head 0
Alignment Makes LLMs Do What We Want
Click to edit Master title style
Alignment in LLMs – Refers to
how a language model
understands and responds to input
prompts in a way that aligns with
the user's expectations. Humans
(or AI) in the loop judge and reward
LLM outputs to ensure that the
model's responses are "in line with"
what the user intended or
expected.
Pre-training GPT
Click to edit Master title style
GPT-2 is pre-trained on the auto-regressive language model
task using WebText (40 gigabytes of text)

“We scraped all outbound links from Reddit ... which received at
least 3 karma ... [resulting in] 45 million links”

GPT-3 was pre-trained on 45TB of text including WebText2,


CommonCrawl, and more!

Sources: GPT2 paper:


https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
GPT3 paper: https://arxiv.org/abs/2005.14165
LLaMA 2
Click to edit Master title style
Large Language Model Meta AI

An open-source The company


LLM that made
LLaMA

Initially developed by Meta in 2023, LLaMA is one of the more


capable open-source LLMs. The family of models was trained
similarly to GPT models but can be hosted locally and with
custom infrastructure to save on costs.
LLaMA 2
Click to edit Master title style
LLaMA 2 was trained on instruction data (“Supervised fine-tuning”) and aligned
using RLHF (“Human Preferences”) just like how modern GPT models are.

Source: https://ai.meta.com/resources/models-and-libraries/llama/
Pre-training LLaMA 2 – Corpus
Click to edit Master title style
They claim it was trained on “2 trillion tokens” of data but the
paper never specifies exactly what data it was trained on, just that
it was from the “web, mostly in English”.

This speaks to the biases found in LLMs and may also speak to the
legal controversies surrounding data used to train LLMs.

Anyone who wants to use LLMs commercially or even


privately should be aware of how their models were trained
and if they were trained ethically and fairly.
Evaluating Size of LLMs
Click to edit Master title style
• BERT has around 110 million parameters, which is
considered large

• GPT-3 has 175 billion parameters, which is comparatively


massive

• GPT-4 likely has even more parameters although this is


technically a rumor

• Size is not the only factor


• BERT achieves strong results on a number of tasks and is
faster at processing text at scale.
Applying LLMs
Click to edit Master title style
We can use LLMs in (generally) three ways:

1. Encode text into semantic vectors with little/no fine-tuning


a. For example, creating an information retrieval system using BERT vectors

2. Fine-tune a pre-trained LLM to perform a very specific task using


Transfer Learning
a. For example, fine-tuning BERT to classify sequences with labels

3. Ask an LLM to solve a task it was pre-trained to solve or could intuit


a. For example, prompting GPT3 to write a blog post
b. For example, prompting T5 to perform language translation
Challenges with LLMs
Click to edit Master title style
• LLMs are larger than classic ML models and can be
more difficult to manage without proper MLOps /
LLMOps.

• Choosing which LLM to use for a specific task


requires knowledge about the particular LLM.

• Encoded knowledge in LLMs may bias output to


produce untrue or harmful statements.
LLM Testing Harnesses
Click to edit Master title style
Testing multiple examples
against a grid of:
GPT 3.5 GPT 4 Claude

Performance Performance Performance


1. Models (e.g., GPT 3.5 vs
on test set: on test set: on test set:
GPT 4 vs Anthropic’s
Claude, etc.) 90% 70% 67%
3-shot / no
CoT
2. Prompt Versions (e.g.,
with or without chain of
thought [CoT] ) Performance Performance Performance
on test set: on test set: on test set:

5-shot / with 84% 78% 93%


CoT
Evaluating LLMs
Click to edit Master title style
Accuracy/Precision/Recall work for classification-like tasks

Metrics like Semantic Similarity can compare free text to see if


the LLM got the “gist” of the output right.

Latency (a measure of speed) - how fast it can solve these tasks

Cost (we will explore this in more detail next week)


Open-source tends to be far cheaper in the long run
Code Time!
Click to edit Master title style

Intro to LLMs
LLMs,
Click toGPT,
edit and Prompt
Master title style
Engineering for
Developers
Session 2: Working with Pre-trained LLMs

Sinan Ozdemir
Data Scientist, Entrepreneur,
Author, Lecturer
Applying LLMs
Click to edit Master title style
We can use LLMs in (generally) three ways:

1. Encode text into semantic vectors with little/no fine-tuning


a. For example, creating an information retrieval system using BERT vectors

2. Fine-tune a pre-trained LLM to perform a very specific task using


Transfer Learning
a. For example, fine-tuning BERT to classify sequences with labels

3. Ask an LLM to solve a task it was pre-trained to solve or could intuit


a. For example, prompting GPT3 to write a blog post
b. For example, prompting T5 to perform language translation
Encoding Ebay’s Recommendations with BERT
Click to edit Master title style
Ebay uses BERT to generate
more relevant
recommendations than
traditional search techniques

Source:
https://tech.ebayinc.com/engineering/how-ebay-created-a-language-model-with-three-billio
n-item-titles
Semantic Search
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Semantic Search
Click to edit Master title style
Semantic Search System – A system that
understands the meaning and context of a search
query and matches it against the meaning and context
of available documents for retrieval. It can find
relevant results without having to rely on exact
keyword or n-gram matching, often using a pre-trained
large language model (LLM) to understand the
nuances of the query and the documents.
Semantic Search
Click to edit Master title style
Semantic Search

Retrieving relevant documents


from a natural language query

Source:
https://www.sbert.net/examples/applications/seman
tic-search/README.html
Semantic Search
Click to edit Master title style

Document
Corpora
(context)

Retrieve Candidates List of


Query / Question Retrieve Results Optional Re-ranking Results

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Types of Search
Click to edit Master title style
Asymmetric Search – A search scenario where there is an
imbalance in the semantic information (or size) of the input
query and the documents or information that the search
system has to retrieve. This typically refers to situations where
one (usually the search query) is much shorter than the other.

Symmetric Search – The input query and the documents or


information that the search system has to retrieve are of
comparable semantic complexity or size. Similar to its
asymmetric counterpart, symmetric search may also use
advanced techniques like semantic understanding rather than
relying solely on exact keyword or n-gram matches.
Types of Search
Click to edit Master title style
Asymmetric Search

Matching a user’s eBay item query with paragraph descriptions


of the item

Symmetric Search

Matching a Google query with the titles of websites


Embeddings
Click to edit Master title style
Text Embeddings – A way to represent words or
phrases as machine-readable numerical vectors in a
multi-dimensional space, typically based on their
contextual meaning. The principle is that similar
phrases (in terms of semantic meaning) will have
vectors that are close together by some measure (like
Euclidean distance), and vice versa. We will start by
using OpenAI’s embedding feature and then work our
way to using some open-source LLMs.
OpenAI’s Embedding Feature
Click to edit Master title style
Off the shelf closed-source embedding models like OpenAI’s embedding
product have a fixed context window (input size) and embedding (output)
size. We cannot change this and have to work around it.

We will use OpenAI’s text-embedding-ada-002 model:

Source: OpenAI’s documentation


Open-source Embedding Models
Click to edit Master title style

Off the shelf open-source models also have a fixed context


window and output vector size, but we can alter these to fit our
needs.
Chunking
Click to edit Master title style
Turning large documents into smaller “chunks”

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Transfer Learning
Click to edit Master title style
Transfer Learning - A model trained for one task is reused
as the starting point for a model for a second task.

1. Select a source model from a repository of models (like


Huggingface)

2. Reuse and train the model for a second task using


task-specific data
Transfer Learning
Click to edit Master title style
Downstream task 1
Eg. sequence classification

Pre-train on a Downstream task 2


self-supervised Source Model Eg. question/answering
task to teach the
Eg. BERT for NLP or the
model a general
Vision Transformer for Downstream task 3
concept (like images
language) Eg. token classification

Fine-tune model on
task/domain specific ...
supervised task
Transfer Learning with BERT
Click to edit Master title style
Additional
Task Layers

Pre-trained
BERT

Training
data for
Selecting a source model second
task

Reusing and training model


Click to edit Master title style

Fine-tuning LLMs
Why Fine-Tune?
Click to edit Master title style
1. Improves task-specific performance by enabling the model to tailor its
knowledge to specific tasks, leading to improved performance and accuracy.

2. Custom data ensures that the model is trained on information that is relevant
and specific to your use-case, making its output more applicable and
accurate.

3. Fine-tuning with custom data enables the model to better understand and
respond to industry-specific jargon, regional language nuances, or other
unique data aspects.

4. Fine-tuning a pre-trained model saves time and computational resources,


while still yielding excellent results.
Basic Fine-Tuning Process
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Click to edit Master title style

Case Study: Predicting with


Amazon Reviews
Our Data
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
How (GPT-like) LLMs Expect Fine-Tuning Data
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Sequence Classification with BERT
Click to edit Master title style
positive 0.9

negative 0.1

Feedforward + Softmax

R[CLS] RIstanbul Ris Ra Rgreat Rcity R[SEP]

Encoder 12
.......
Pre-trained BERT Encoder 1

E[CLS] EIstanbul Eis Ea Egreat Ecity E[SEP]


Using Token Probabilities in GPT-3
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Click to edit Master title style

RLHF - A Primer
LLM Alignment - Reinforcement Learning from Feedback
Click to edit Master title style
- Fine-tuning to subtly adjust an LLM’s output.

- This is the current state of the art process to align LLMs by


adjusting them to produce outputs that are more inline with
training data / what a human might expect.

- Works by using a secondary reward pipeline that judges and


scores an LLM’s output and uses the reward to adjust the
model.

- Example - Adjusting a news summarizer to be more “neutral”.

Source: Quick start guide to LLMs by Sinan Ozdemir


GPT-3 – Before and After RLHF Alignment
Click to edit Master title style

Source: OpenAI Playground


Few-Shot versus Open-Source Alignment
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Simplified Process for RLHF
Click to edit Master title style
Reinforcement Learning from Human Feedback
(RLHF) – A method of fine-tuning machine learning
models, particularly language models, using feedback
from human evaluators. This feedback is generally used
as a signal to optimize the model's performance,
effectively aligning the model's behavior with complex
human values.

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Simplified Process for RLHF
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
RL from F
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
RL from F - Detailed
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
RL from F - More ”Neutral” Summarizations
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Click to edit Master title style

Fine-tuning LLMs in PyTorch


Using Hugging Face’s Trainer Object
Click to edit Master title style
Dataset – The collection of data used for machine learning,
consisting of input data (e.g., synopses) and target labels (e.g.,
genres) for the model to learn from. In this context, it's the
MyAnimeList dataset.

Data Collator – A tool for processing and preparing input data for a
model. It transforms raw input data into a format that the model
can understand, which may involve tokenization, padding, and
batching.

TrainingArguments – A configuration object provided by the


Hugging Face that holds our hyperparameters and options for the
training process, such as learning rate, batch size, and epochs.

Trainer – A utility provided by the Hugging Face library that


manages the fine-tuning process of a model. It handles tasks such
as loading data, updating model weights, and evaluating model
performance.
Source: Quick Start Guide to LLMs
by Sinan Ozdemir
Dynamic Padding
Click to edit Master title style
Dynamic Padding – A technique
used in the processing of
variable-length sequences (like text
data) to reduce wasted computational
resources. Unlike traditional padding
methods (top) which pad every
sequence to the length of the longest
one in the dataset, dynamic padding
(bottom) adjusts padding for each
batch separately. This results in a
more efficient use of computational
resources.
Source: Quick Start Guide to LLMs
by Sinan Ozdemir
Comparing Different Fine-Tuning Techniques
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Freezing Model Weights
Click to edit Master title style

When freezing model weights, it’s


generally better to freeze lower
weights near the
beginning of the model, as seen
here. The model shown here has
only six encoding layers.
Option 1 (top) doesn’t freeze
anything, option 2 (middle) partially
freezes some lower weights,
and option 3 (bottom) freezes the
entire model except for any additional
layers we add.
Source: Quick Start Guide to LLMs
by Sinan Ozdemir
Prompt Engineering for Open-Source LLMs
Click to edit Master title style
We will be fine-tuning GPT-2 using a specially designed
prompt to teach this old dog some new tricks.

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Click to edit Master title style

Case Study: LLM Instruction


Alignment for “Sinan’s
Attempt at Wise Yet Engaging
Responses” – SAWYER
SAWYER–Approach
Click to edit Master title style

Sinan’s Attempt at
Wise Yet Engaging
Responses

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
SAWYER–Supervised Fine-Tuning
Click to edit Master title style

Sinan’s Attempt at
Wise Yet Engaging
Responses

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
SAWYER–Reward Mechanism (Feedback)
Click to edit Master title style

Sinan’s Attempt at
Wise Yet Engaging
Responses

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
SAWYER–The RL Loop
Click to edit Master title style

Sinan’s Attempt at
Wise Yet Engaging
Responses

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Click to edit Master title style

Cost Projecting + Deploying


LLMs to Production
Cost Projecting with Closed-Sourced LLM APIs
Click to edit Master title style
Most closed-source LLMs charge per token (or per batch of tokens) so deploying
an API implementation of a prompt for a closed-source LLM is simply counting the
number of input and output tokens and matching that against pricing.

For example, thinking back to OpenAI’s embedding product:

Assume they charge $0.0004 per 1000 tokens for the embedding engine we used
(Ada-002).

If we assume an average of 500 tokens per document (roughly a page of text), the
cost per document would be $0.0002.

If we wanted to embed 1 million documents, it would cost approximately $200.


Cost Projecting with Open-Sourced LLMs
Click to edit Master title style
The cost for open-source LLMs are mainly in the compute cost for hosting the models and in
training costs.

Instead of thinking about a cost per token we would want to estimate things like:

- The cost to train a model


- Data gathering (person/hours for labeling included)
- Compute cost to fine-tune/align (could easily be in the thousands of dollars for
modern LLMs)

- The cost to host the model


- E.g., Compute cost for a REST API (Hugging Face has solutions as low as
$40/month for models the size of BERT-base)

- The cost to update the model


- Combination of more data gathering, training, etc
Deploying models with Hugging Face
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Interoperability
Click to edit Master title style
Interoperability – The ability of models to Hugging Face's utility package, Optimum, leverages
function across various frameworks. It's ONNX to load models into an ONNX format:
advantageous because it enhances
flexibility and adaptability of models in #!pip install optimum
different environments and platforms.
from optimum.onnxruntime import
ORTModelForSequenceClassification
ONNX (Open Neural Network Exchange)
– An open standard format for machine ort_model =
learning models that promotes ORTModelForSequenceClassification.from
_pretrained(
interoperability. It enables models to be f"genre-prediction-bert",
exported from one framework (like from_transformers=True
PyTorch) and imported into another (like )
TensorFlow) for inference.
Source: Quick Start Guide to LLMs
by Sinan Ozdemir
Optimizing Models
Click to edit Master title style
Quantization – Reducing the computational requirements of a neural
network by lowering the precision of its weights and biases. This might
slightly decrease the model's accuracy but it leads to a smaller model
size and faster computation times.

Pruning – Minimizes the complexity of a neural network. This technique


involves removing the least contributing weights in the network,
decreasing the model's size and enhancing its computational efficiency.
Pruning is especially beneficial when deploying models in environments
with limited resources.

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Knowledge Distillation
Click to edit Master title style
Knowledge Distillation – A method used in machine
learning where a smaller, more efficient model (known as the
student model) is trained to reproduce the behavior of a
larger, more complex model (known as the teacher model) or
an ensemble of models. The goal of this process is to create
a compact model that performs nearly as well as the more
complex model but is more efficient in terms of computational
resources, making it more practical for deployment in
resource-constrained environments.

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Knowledge Distillation
Click to edit Master title style
Task-Specific Distillation – A smaller, more
efficient model (student model) is fine-tuned on
both ground truth labels and the larger, original
model's (teacher model) output. This approach
aims to enhance the performance of the student
model by providing it with multiple sources of
knowledge.

Task-Agnostic Distillation – A student model


is trained from scratch using labeled data to
predict the output of a teacher model. The
weights of the student model are adjusted
based on the teacher model's output and the
ground truth labels. This method is called
task-agnostic as the model is distilled before
seeing any task-related data.

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Massive LLM Playgrounds
Click to edit Master title style
Massive language models like GPT3 and ChatGPT
cannot run on anyone’s personal machine. They are
instead available via Playgrounds and APIs

• Playgrounds are graphical interfaces to play with


and iterate on inputs to the model

• APIs are programmatic interfaces to the LLM


GPT-3’s Playground
Click to edit Master title style
Tweak inference parameters

Write the input to the model here


(your prompt)
Using the ChatGPT Playground
Click to edit Master title style
Write an instruction to the LLM, and see the response
Prompt

LLM Response
Source: ChatGPT Playground
LLMs,
Click toGPT,
edit and Prompt
Master title style
Engineering for
Developers
Session 3: Prompt Engineering

Sinan Ozdemir
Data Scientist, Entrepreneur,
Author, Lecturer
Prompt Engineering LLMs
Click to edit Master title style

Prompt Engineering – The process of carefully


designing inputs for massively large language models
such as GPT-3 and ChatGPT to guide them to produce
relevant and coherent outputs.

Many AI researchers consider prompt engineering a


“bug” in AI and that it will go away in the next few
years.
Just Ask
Click to edit Master title style

The Just Ask Principle – Most LLMs are great at


processing and reasoning through tasks if you just
ask the LLM to solve a task with clear instructions.
A Prompt for GPT-3 to Reply to an Email
Click to edit Master title style

JUST ASK

A specific and useful output

Source: OpenAI Playground Source: Quick Start Guide to LLMs


by Sinan Ozdemir
A Prompt for GPT-3 to Reply to an Email
Click to edit Master title style
Defining a persona/style:
“match their energy”

Clearly stating what you want:


“reply with interest”

A specific and useful output

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Just Ask
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Ask First, Shoot Later
Click to edit Master title style
Remember attention and how LLMs predict? They
predict one token/word at a time.

That means that order matters–put your


instruction FIRST and context SECOND so that when
the LLM reads the context, it has already read the
instruction and is “thinking” about the task the
whole time.
Just asking LLMs
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Few-shot Learning / In-context Learning
Click to edit Master title style

Few-shot learning – Giving an LLM examples of a


task being solved to teach the LLM how to reason
through a problem and also to format the answer in
a desired format
Pre-training GPT - How Few-Shot Works
Click to edit Master title style
GPT-3 paper’s title
called out few-shot
learning as a primary
source of in-context
learning–on the job
training for an LLM

Source: OpenAI
Few-Shot Learning with GPT-3
Click to edit Master title style
Given a description of a
book output:

a. “yes” if the
description is
subjective or

b. “no” if the
description is
objective

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Few-Shot Learning with GPT-3
Click to edit Master title style

“The book was about WWII”

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Few-Shot Learning with GPT-3
Click to edit Master title style

“The book was not amazing”

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Validating LLM Inputs and Outputs
Click to edit Master title style
Input Validation – Checking the integrity and correctness of
data input before the data is processed by a system or used
by a machine learning model. The goal is to prevent
incorrectly formed or improper data from entering and
potentially corrupting the system. It may involve checking for
format consistency, logical errors, security risks, or other
criteria defined as necessary for the input.

Output Validation – Examining the output or results


generated by a system or a machine learning model to
ensure they meet certain criteria or expectations. This could
involve checking for logical correctness, adherence to certain
rules or constraints, or other context-specific factors.
Using NLI to Perform Output Validation
Click to edit Master title style
Natural Language Inference (NLI) – An NLP task that involves
determining the relationship between a premise and the
hypothesis to identify whether the hypothesis is entailed by
(logically follows from), contradicted by, or neutral to the premise.

Premise – In the context of natural language inference, the


premise is the initial statement or fact. It's compared with a
hypothesis.

Hypothesis – In the context of natural language inference, the


hypothesis is a statement that is checked against the premise.
Using NLI to Perform Output Validation
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Using NLI to Perform Output Validation
Click to edit Master title style
Batch Prompting
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Prompt Chaining
Click to edit Master title style

Prompt Chaining involves using multiple calls to an


LLM to reason through more complex tasks.
Prompt Chaining
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Prompt Chaining
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Chain of Thought Prompting
Click to edit Master title style

Chain of Thought Prompting forces an LLM to


generate reasoning for an answer alongside an
answer. This usually leads to better/more actionable
results.
Chain of Thought Prompting
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
ChatGPT versus Math–Chain of Thought
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
ChatGPT versus Math–Few-Shot
Click to edit Master title style

huggingface.co/datasets/gsm8k

“GSM8K (Grade School Math 8K)


is a dataset of 8.5K high quality
linguistically diverse grade school
math word problems. The dataset
was created to support the task
of question answering on basic
mathematical problems that
require multi-step reasoning.”

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
ChatGPT versus Math–Few-Shot
Click to edit Master title style

This is another application


of semantic search. We can
store examples in a vector
database and retrieve them
as people ask questions.

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
ChatGPT versus Math–Combo of Techniques
Click to edit Master title style Prompt Variant ChatGPT DaVinci
Chain of Thought + 3 closest semantic examples Closest K=3
0.816 0.602
(CoT)

Closest K=5
0.788 0.601
(CoT)

Closest K=7
0.774 0.574
(CoT)

Random K=3
0.744 0.585
(CoT)

Closest K=1
0.709 0.519
(CoT)

Just Ask
0.628 0.382
(with CoT)

Closest K=3
0.27 0.18
(no CoT)

Just Ask
0.2 0.09
(no CoT)
Source: Quick Start Guide to LLMs
by Sinan Ozdemir
Injecting Personas into Prompts
Click to edit Master title style

Source: Quick Start Guide to LLMs


by Sinan Ozdemir
Prompt Injection
Click to edit Master title style
Addressing malicious attacks on LLMs

Prompt Injection – Feeding a prompt to an LLM to guide an


unintended output

Malicious Prompt Injection attack


intending to steal proprietary prompts
Prompt Injection
Click to edit Master title style
Addressing malicious attacks on LLMs

Prompt Injection – Feeding a prompt to an LLM to guide an


unintended output

Input/output validation are the best/easiest way to prevent


against this. For example, check for the semantic similarity
between your LLM’s output and your prompt.
How GPT predicts in real-time (inference)
Click to edit Master title style

Next token
predictions
happen one
token at a
time

Source: https://jalammar.github.io/illustrated-gpt2/
Parameters for generating text
Click to edit Master title style
temperature (float) - Lower (below 1) makes the model more
confident and less random. Higher values make generated
text more random.

top_k (int) - How many tokens it considers when generating.


0 to deactivate

top_p (float) - only considers tokens from the top X% of


confidences

beams (int) - How many tokens out should we consider


Temperature
Click to edit Master title style
Normal probability distribution

With temperature < 1, probabilities


are “sharper”

Source: https://huggingface.co/blog/how-to-generate
Temperature - Continued
Click to edit Master title style

Baseline performance More random outputs, Less random outputs, less


more creative creative

Source: Quick start guide to LLMs - Sinan Ozdemir


Click to edit Master title style

Assessing an LLM’s built-in


knowledge
Does the LLM Know Enough for My Task?
Click to edit Master title style
A. Yes, it has all knowledge encoded and it is ready to solve my task.
a. May still need to format output to make it easier to work with

B. Mostly. It knows the information but it lacks critical information (information is too
new to be in the model or it knows a topic but not to the specifics that I need).
a. Create a secondary system to retrieve information on demand (e.g., semantic
search)
b. Few-shots and chain of thought to help teach nuances/specifics

C. No, not at all, I need to teach it pretty much everything from scratch.
a. Just ask with comprehensive instructions + frameworks
b. Few-shot / chain of thought prompting
c. Fine-tuning for long term cost savings/speed
Does the LLM Know Enough for My Task?
Click to edit Master title style
A. Yes, it has all knowledge encoded and it is ready to solve my task.
a. Summarizing news articles
b. Recommending news articles from a list of articles

B. Mostly. It knows the information but it lacks critical information (information is too
new to be in the model or it knows a topic but not to the specifics that I need).
a. Recommending news articles that came out this morning

C. No, not at all, I need to teach it pretty much everything from scratch.
a. Recommending proprietary frameworks for thinking about marketing
strategies
Click to edit Master title style

Sinan Ozdemir’s Framework


for Prototyping with Aligned
LLMs with a Mind for
Production
Sinan’s LLM Prototyping Framework
Click to edit Master title style
1. Define Inputs and Outputs

● Identify and document the specific inputs and outputs for your LLM application.
● Example: Given a user's taste and a list of book descriptions, the model should output a ranked list of
book recommendations with reasons.
● Remember, requirements might change during testing or in different contexts.

2. Define Success/Failure States

● Clearly define what constitutes a success or a failure for your model.


● Example of success: The model should return at least 3 recommendations that match the given book list
with a rationale for each.
● Example of failure: The model doesn't provide 3 recommendations, or the suggestions aren't from the
given list.
● Failures are binary and don't reflect the quality of output, instead indicating whether the model meets the
basic requirements.
Sinan’s LLM Prototyping Framework
Click to edit Master title style
3. Consider Potential Bias

● Examine if the model's outputs can be influenced by subjective bias or unnecessary


information.
● Example: The model might utilize past knowledge or context about the books, leading to
bias. Ensure it's "staying on script" and relying on the input given.

4. Create Comprehensive Examples (to be used as few-shot later)

● Develop at least two detailed examples for training (few-shot) or testing.


● Example: real list of wines from a dataset, etc
● This step helps to classify the model's knowledge requirement (Class A, B, or C).
Sinan’s LLM Prototyping Framework
Click to edit Master title style
5. Determine the Model's Knowledge Requirement

● Assess if the model has the necessary information to perform the task.
● Class A: The model has all the required information encoded.
● Class B: The model mostly has the necessary information but lacks specific details or updated
data.
● Class C: The model lacks the majority of required knowledge and needs extensive training.

6. Write an MVP (Minimum Viable Product) Prompt

● Create various versions of a prompt and experiment with them in the model's playground. This helps to
refine the prompts and assess the model's knowledge requirement.

7. Iterate on Prompt Techniques and Parameters

● Adjust the parameters like temperature and top-p to refine the model's responses.
Sinan’s LLM Prototyping Framework
Click to edit Master title style
8. Evaluate and Plan for Scale/Production/Cost/Testing

● Assess the performance of the model, including its computational demands, and plan for
potential scaling and production deployment.
● Also, consider the cost of deployment, which includes financial costs (like cloud resources
and potential fine-tuning) and resource costs (like time and personnel for testing and
maintenance).

9. Prototyping and Iteration

● Create a basic version of the model using tools like Streamlit for quick testing and user
feedback.
● Iterate on the model by refining the prompts, adjusting parameters, and fine-tuning the
model based on feedback.
Sinan’s LLM Prototyping Framework
Click to edit Master title style
10. Labeling Data and Fine-tuning

● Plan for potential data labeling and fine-tuning. This includes considering the cost and time required for
these steps.
● Remember, fine-tuning not only requires labeled data but also extensive computational resources, which
can add to the overall cost.

11. Evaluation

● Consistently evaluate the model's performance using relevant metrics like semantic similarity, precision,
recall, etc. These evaluations will guide the iterations and improvements.

The above framework is not exhaustive but provides a good starting point for designing applications with

LLMs like ChatGPT. Each application will have unique needs and constraints, so this framework should be

adapted accordingly.
Week 1 Assignment
Click to edit Master title style
1. Come prepared with at least 2 examples of a task to solve with an LLM
a. Should fit within the idea of a larger product
2. For 1 example, complete the first 5 steps of designing an LLM application/feature

The examples I will walk through: (inspired on a recent trip to my favorite wine bar):

Product: A platform for sommeliers to keep track of their customers/clients to help give
recommendations

LLM Task 1: Given a list of wines my client liked with descriptions for the wines plus a list of
wines I have with descriptions, output an ordered subset of recommendations with reasoning
(My hunch is is B)

LLM Task 2: Given a lengthy wine description, output a summarization of the wine (Hunch: A)

Extra Credit: Write an MVP prompt (Step 6)


Summary + Next Steps
Click to edit Master title style
• The invention of the transformer in 2017 revitalized of the field
of NLP and an explosion of large language models.

• There are many types of LLMs with pros/cons, and knowing


which to use and how to use it makes all the difference.

• LLMs are not perfect and will eventually produce untrue and
harmful statements if left unchecked.

• Reinforcement learning can further align LLMs.

• Attention seems to be (mostly) all we need... for now.


Summary + Next Steps
Click to edit Master title style
Check out this playlist for more LLM content:
https://learning.oreilly.com/playlists/2953f6c7-0e13-49ac-88e2
-b951e11388de

Includes:

Quick start guide to LLMs!


https://learning.oreilly.com/library/view/quick-start-
guide/9780138199425

You might also like