0% found this document useful (0 votes)

230 views15 pages

How ChatGPT Works The Model Behind The Bot by Molly Ruby Towards Data Science

1) ChatGPT is powered by large language models that are trained on vast amounts of text data using techniques like self-attention and reinforcement learning from human feedback. 2) Self-attention allows the model to differentially weight parts of the input text, improving its understanding of context and relationships between words. 3) Reinforcement learning from human feedback was used to fine-tune the model based on supervised training data and reward modeling to better align responses with user intent.

Uploaded by

chonk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

230 views15 pages

How ChatGPT Works The Model Behind The Bot by Molly Ruby Towards Data Science

Uploaded by

chonk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Open in app Sign up Sign In

Search Medium

Published in Towards Data Science

You have 1 free member-only story left this month.

Molly Ruby Follow

Jan 31 · 8 min read · · Listen

Save

How ChatGPT Works: The Model Behind The

Bot
A brief introduction to the intuition and methodology behind the
chat bot you can’t stop hearing about.
This gentle introduction to the machine learning models that power ChatGPT,
will start at the introduction of Large Language Models, dive into the
revolutionary self-attention mechanism that enabled GPT-3 to be trained, and
then burrow into Reinforcement Learning From Human Feedback, the novel
technique that made ChatGPT exceptional.

Large Language Models

ChatGPT is an extrapolation of a class of machine learning Natural Language
Processing models known as Large Language Model (LLMs). LLMs digest huge
quantities of text data and infer relationships between words within the text.
These models have grown over the last few years as we’ve seen advancements
in computational power. LLMs increase their capability as the size of their
input datasets and parameter space increase.

The most basic training of language models involves predicting a word in a

sequence of words. Most commonly, this is observed as either next-token-
prediction and masked-language-modeling.
Arbitrary example of next-token-prediction and masked-language-modeling generated by the author.

In this basic sequencing technique, often deployed through a Long-Short-

Term-Memory (LSTM) model, the model is filling in the blank with the most
statistically probable word given the surrounding context. There are two major
limitations with this sequential modeling structure.

1. The model is unable to value some of the surrounding words more than
others. In the above example, while ‘reading’ may most often associate
with ‘hates’, in the database ‘Jacob’ may be such an avid reader that the
model should give more weight to ‘Jacob’ than to ‘reading’ and choose ‘love’
instead of ‘hates’.

2. The input data is processed individually and sequentially rather than as a

whole corpus. This means that when an LSTM is trained, the window of
context is fixed, extending only beyond an individual input for several
steps in the sequence. This limits the complexity of the relationships
between words and the meanings that can be derived.

In response to this issue, in 2017 a team at Google Brain introduced

transformers. Unlike LSTMs, transformers can process all input data
simultaneously. Using a self-attention mechanism, the model can give varying
weight to different parts of the input data in relation to any position of the
language sequence. This feature enabled massive improvements in infusing
meaning into LLMs and enables processing of significantly larger datasets.

GPT and Self-Attention

Generative Pre-training Transformer (GPT) models were first launched in 2018
by openAI as GPT-1. The models continued to evolve over 2019 with GPT-2,
2020 with GPT-3, and most recently in 2022 with InstructGPT and ChatGPT.
Prior to integrating human feedback into the system, the greatest advancement
in the GPT model evolution was driven by achievements in computational
efficiency, which enabled GPT-3 to be trained on significantly more data than
GPT-2, giving it a more diverse knowledge base and the capability to perform a
wider range of tasks.

Comparison of GPT-2 (left) and GPT-3 (right). Generated by the author.

All GPT models leverage the transformer architecture, which means they have
an encoder to process the input sequence and a decoder to generate the output
sequence. Both the encoder and decoder have a multi-head self-attention
mechanism that allows the model to differentially weight parts of the sequence
to infer meaning and context. In addition, the encoder leverages masked-
language-modeling to understand the relationship between words and produce
more comprehensible responses.

The self-attention mechanism that drives GPT works by converting tokens

(pieces of text, which can be a word, sentence, or other grouping of text) into
vectors that represent the importance of the token in the input sequence. To do
this, the model,

1. Creates a query, key, and value vector for each token in the input sequence.

2. Calculates the similarity between the query vector from step one and the
key vector of every other token by taking the dot product of the two vectors.

3. Generates normalized weights by feeding the output of step 2 into a

softmax function.

4. Generates a final vector, representing the importance of the token within

the sequence by multiplying the weights generated in step 3 by the value
vectors of each token.

The ‘multi-head’ attention mechanism that GPT uses is an evolution of self-

attention. Rather than performing steps 1–4 once, in parallel the model iterates
this mechanism several times, each time generating a new linear projection of
the query, key, and value vectors. By expanding self-attention in this way, the
model is capable of grasping sub-meanings and more complex relationships
within the input data.

Screenshot from ChatGPT generated by the author.

Although GPT-3 introduced remarkable advancements in natural language

processing, it is limited in its ability to align with user intentions. For example,
GPT-3 may produce outputs that

• Lack of helpfulness meaning they do not follow the user’s explicit

instructions.

• Contain hallucinations that reflect non-existing or incorrect facts.

• Lack interpretability making it difficult for humans to understand how the

model arrived at a particular decision or prediction.

• Include toxic or biased content that is harmful or offensive and spreads

misinformation.

Innovative training methodologies were introduced in ChatGPT to counteract

some of these inherent issues of standard LLMs.

ChatGPT
ChatGPT is a spinoff of InstructGPT, which introduced a novel approach to
incorporating human feedback into the training process to better align the
model outputs with user intent. Reinforcement Learning from Human
Feedback (RLHF) is described in depth in openAI’s 2022 paper Training
language models to follow instructions with human feedback and is simplified
below.

Step 1: Supervised Fine Tuning (SFT) Model

The first development involved fine-tuning the GPT-3 model by hiring 40
contractors to create a supervised training dataset, in which the input has a
known output for the model to learn from. Inputs, or prompts, were collected
from actual user entries into the Open API. The labelers then wrote an
appropriate response to the prompt thus creating a known output for each
input. The GPT-3 model was then fine-tuned using this new, supervised
dataset, to create GPT-3.5, also called the SFT model.

In order to maximize diversity in the prompts dataset, only 200 prompts could
come from any given user ID and any prompts that shared long common
prefixes were removed. Finally, all prompts containing personally identifiable
information (PII) were removed.

After aggregating prompts from OpenAI API, labelers were also asked to create
sample prompts to fill-out categories in which there was only minimal real
sample data. The categories of interest included

• Plain prompts: any arbitrary ask.

• Few-shot prompts: instructions that contain multiple query/response pairs.

• User-based prompts: correspond to a specific use-case that was requested

for the OpenAI API.

When generating responses, labelers were asked to do their best to infer what
the instruction from the user was. The paper describes the main three ways
that prompts request information.

1. Direct: “Tell me about…”

2. Few-shot: Given these two examples of a story, write another story about
the same topic.

3. Continuation: Given the start of a story, finish it.

The compilation of prompts from the OpenAI API and hand-written by labelers
resulted in 13,000 input / output samples to leverage for the supervised model.
Image (left) inserted from Training language models to follow instructions with human feedback
OpenAI et al., 2022 https://arxiv.org/pdf/2203.02155.pdf. Additional context added in red (right) by the
author.

Step 2: Reward Model

After the SFT model is trained in step 1, the model generates better aligned
responses to user prompts. The next refinement comes in the form of training
a reward model in which a model input is a series of prompts and responses,
and the output is a scaler value, called a reward. The reward model is required
in order to leverage Reinforcement Learning in which a model learns to
produce outputs to maximize its reward (see step 3).

To train the reward model, labelers are presented with 4 to 9 SFT model
outputs for a single input prompt. They are asked to rank these outputs from
best to worst, creating combinations of output ranking as follows.
Example of response ranking combinations. Generated by the author.

Including each combination in the model as a separate datapoint led to

overfitting (failure to extrapolate beyond seen data). To solve, the model was
built leveraging each group of rankings as a single batch datapoint.
Image (left) inserted from Training language models to follow instructions with human feedback
OpenAI et al., 2022 https://arxiv.org/pdf/2203.02155.pdf. Additional context added in red (right) by the
author.

Step 3: Reinforcement Learning Model

In the final stage, the model is presented with a random prompt and returns a
response. The response is generated using the ‘policy’ that the model has
learned in step 2. The policy represents a strategy that the machine has
learned to use to achieve its goal; in this case, maximizing its reward. Based on
the reward model developed in step 2, a scaler reward value is then determined
for the prompt and response pair. The reward then feeds back into the model
to evolve the policy.

In 2017, Schulman et al. introduced Proximal Policy Optimization (PPO), the

methodology that is used in updating the model’s policy as each response is
generated. PPO incorporates a per-token Kullback–Leibler (KL) penalty from
the SFT model. The KL divergence measures the similarity of two distribution
functions and penalizes extreme distances. In this case, using a KL penalty
reduces the distance that the responses can be from the SFT model outputs
trained in step 1 to avoid over-optimizing the reward model and deviating too
drastically from the human intention dataset.
Image (left) inserted from Training language models to follow instructions with human feedback
OpenAI et al., 2022 https://arxiv.org/pdf/2203.02155.pdf. Additional context added in red (right) by the
author.

Steps 2 and 3 of the process can be iterated through repeatedly though in

practice this has not been done extensively.
Screenshot from ChatGPT generated by the author.

Evaluation of the Model

Evaluation of the model is performed by setting aside a test set during training
that the model has not seen. On the test set, a series of evaluations are
conducted to determine if the model is better aligned than its predecessor,
GPT-3.

Helpfulness: the model’s ability to infer and follow user instructions. Labelers
preferred outputs from InstructGPT over GPT-3 85 ± 3% of the time.

Truthfulness: the model’s tendency for hallucinations. The PPO model

produced outputs that showed minor increases in truthfulness and
informativeness when assessed using the TruthfulQA dataset.
Harmlessness: the model’s ability to avoid inappropriate, derogatory, and
ChatGPT Machine Learning Data Science NLP Editors Pick
denigrating content. Harmlessness was tested using the RealToxicityPrompts dataset.
The test was performed under three conditions.

1. Instructed to provide respectful responses: resulted in a significant decrease in

toxic responses. 7.1K 121

2. Instructed to provide responses, without any setting for respectfulness: no

Signsignificant
up for The Variable
change in toxicity.
By Towards Data Science

3. Instructed
Every to provide
Thursday, the Variable toxic
delivers response:
the very responses
best of Towards were
Data Science: inhands-on
from fact significantly
tutorials and more
cutting-edge research to original features you don't want to miss. Take a look.
toxic than the GPT-3 model.
By signing up, you will create a Medium account if you don’t already have one. Review
For more information on the methodologies used in creating ChatGPT and
our Privacy Policy for more information about our privacy practices.

InstructGPT, read the original paper published by OpenAI Training language models
Get this newsletter
to follow instructions with human feedback, 2022 https://arxiv.org
/pdf/2203.02155.pdf.

About Help Terms Privacy

Get the Medium app

Screenshot from ChatGPT generated by the author.

Happy learning!

Sources
1. https://openai.com/blog/chatgpt/

2. https://arxiv.org/pdf/2203.02155.pdf

3. https://medium.com/r/?url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttps%2Fdeepai.org%2Fmachine-learning-
glossary-and-terms%2Fsoftmax-layer

4. https://www.assemblyai.com/blog/how-chatgpt-actually-works/
5. https://medium.com/r/?url=https%3A%2F
%2Ftowardsdatascience.com%2Fproximal-policy-optimization-ppo-explained-
abed1952457b

The Corporate Sponsorship Toolkit, Second Edition: Using sponsorship to help people fall in love with your brand
From Everand
The Corporate Sponsorship Toolkit, Second Edition: Using sponsorship to help people fall in love with your brand
Kim Skildum-Reid
No ratings yet
AI Driven Product Development Playbook
No ratings yet
AI Driven Product Development Playbook
108 pages
Copyright and Artificial Intelligence Part 3 Generative AI Training Report Pre Publication Version
No ratings yet
Copyright and Artificial Intelligence Part 3 Generative AI Training Report Pre Publication Version
113 pages
1300 AI Tools and Application in One Place
0% (1)
1300 AI Tools and Application in One Place
147 pages
Earn With Ai
No ratings yet
Earn With Ai
52 pages
Ai Drive - Prompt Library
No ratings yet
Ai Drive - Prompt Library
4 pages
RAG For Knowledge-Intensive NLP Tasks
No ratings yet
RAG For Knowledge-Intensive NLP Tasks
18 pages
Starter Prompt Library
No ratings yet
Starter Prompt Library
10 pages
Skill 2 Employment Program Workshop
No ratings yet
Skill 2 Employment Program Workshop
130 pages
ULTIMATAE GUIDE How To Make Money With ChatGPT
50% (2)
ULTIMATAE GUIDE How To Make Money With ChatGPT
34 pages
AI Redefining The Future of Psychology
No ratings yet
AI Redefining The Future of Psychology
35 pages
Generative Ai For Enterprises Vishal Anand
No ratings yet
Generative Ai For Enterprises Vishal Anand
30 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
How To Craft Effective ChatGPT Prompts
No ratings yet
How To Craft Effective ChatGPT Prompts
10 pages
Top AI Tools 1685083232
No ratings yet
Top AI Tools 1685083232
9 pages
‎⁨الدليل الشامل لــ تشات جي بي تي ⁩
100% (1)
‎⁨الدليل الشامل لــ تشات جي بي تي ⁩
92 pages
Day 1 - ChatGPT Architecture Principles
100% (1)
Day 1 - ChatGPT Architecture Principles
69 pages
5 Useful Prompts To Try in Chat GPT As A Product
100% (1)
5 Useful Prompts To Try in Chat GPT As A Product
7 pages
Generative Ai Ebook
100% (3)
Generative Ai Ebook
26 pages
5 Advanced Prompts To Get Better
No ratings yet
5 Advanced Prompts To Get Better
15 pages
Jailbreaking For Education Inquiry
No ratings yet
Jailbreaking For Education Inquiry
66 pages
Chatbot Using HTML, CSS, JAVASCRIPT and PHP Presentation
No ratings yet
Chatbot Using HTML, CSS, JAVASCRIPT and PHP Presentation
10 pages
Generative AI For Everyone - Coursera
No ratings yet
Generative AI For Everyone - Coursera
5 pages
Viplav - FB Ads Ai Sop
100% (1)
Viplav - FB Ads Ai Sop
22 pages
Mastering Chatgpt and LLM in 2024
100% (2)
Mastering Chatgpt and LLM in 2024
71 pages
22 Revolutionary AI Websites You Need To Try in 2024
No ratings yet
22 Revolutionary AI Websites You Need To Try in 2024
41 pages
CGPT For DS
100% (1)
CGPT For DS
24 pages
No Code Agent
No ratings yet
No Code Agent
15 pages
AI, Robots and Humans: Our Servants or Masters?
From Everand
AI, Robots and Humans: Our Servants or Masters?
Oreste J DAversa
No ratings yet
Artificial Intelligence and ChatGPT - The Future World
100% (4)
Artificial Intelligence and ChatGPT - The Future World
10 pages
Conv AI Brochure v2
100% (1)
Conv AI Brochure v2
7 pages
AI Tools
100% (1)
AI Tools
14 pages
State of GPT
No ratings yet
State of GPT
50 pages
Leveraging Generative AI For Training
No ratings yet
Leveraging Generative AI For Training
46 pages
Acsnano 3c01544
No ratings yet
Acsnano 3c01544
3 pages
Complete AI Automation Agency Guide
No ratings yet
Complete AI Automation Agency Guide
2 pages
Building of Personal Ai Assistant Edt
No ratings yet
Building of Personal Ai Assistant Edt
5 pages
AI ChatGPT, The Good, The Bad and The Ugly
100% (1)
AI ChatGPT, The Good, The Bad and The Ugly
15 pages
AI Powered Tools
No ratings yet
AI Powered Tools
12 pages
AI Automation
No ratings yet
AI Automation
3 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Custom AI Solutions or Ready-To-Use Products - How To Approach AI Software Development
No ratings yet
Custom AI Solutions or Ready-To-Use Products - How To Approach AI Software Development
8 pages
Step by Step Guide To Using ChatGPT For Business Professional Clean
No ratings yet
Step by Step Guide To Using ChatGPT For Business Professional Clean
5 pages
AI For Kids Assignment 45
No ratings yet
AI For Kids Assignment 45
4 pages
Finxter OpenAI Glossary
No ratings yet
Finxter OpenAI Glossary
1 page
ChatGPT Prompt Patterns For Improving Code Quality, - Refactoring, Requirements Elicitation, and Software Design
No ratings yet
ChatGPT Prompt Patterns For Improving Code Quality, - Refactoring, Requirements Elicitation, and Software Design
14 pages
ChatGPT, LLM and RLHF
No ratings yet
ChatGPT, LLM and RLHF
45 pages
ChatGPT For DataScience CheatSheet v1.0
No ratings yet
ChatGPT For DataScience CheatSheet v1.0
1 page
Foundations GPT
No ratings yet
Foundations GPT
8 pages
Generative Ai
No ratings yet
Generative Ai
2 pages
Chatgpt Is A Powerful Ai Assistant. But It'S Not The Only Ai Super Tool. Here Are 9 Ai Tools You Should Use in 2023
No ratings yet
Chatgpt Is A Powerful Ai Assistant. But It'S Not The Only Ai Super Tool. Here Are 9 Ai Tools You Should Use in 2023
10 pages
The Startups Guide to Winning With Artificial Intelligence
From Everand
The Startups Guide to Winning With Artificial Intelligence
Bob Bakker
No ratings yet
ChatGPT Hack For Summarizing Your Work
No ratings yet
ChatGPT Hack For Summarizing Your Work
7 pages
97 - How-to-use-ChatGPT-to-automate-tasks-and-improve-efficiency-in-various-industries PDF
No ratings yet
97 - How-to-use-ChatGPT-to-automate-tasks-and-improve-efficiency-in-various-industries PDF
1 page
Whatischatgpt 221208190752 7a70dcc8
No ratings yet
Whatischatgpt 221208190752 7a70dcc8
16 pages
Chatgpt Is Old Stuff
No ratings yet
Chatgpt Is Old Stuff
22 pages
GenAI Top 10 Weekly PDF
No ratings yet
GenAI Top 10 Weekly PDF
12 pages
Using An AI Product Description Generator For Your ECommerce Website
No ratings yet
Using An AI Product Description Generator For Your ECommerce Website
9 pages
How To Use AI Ad Generator For Google Ads, LinkedIn and Instagram Ads
No ratings yet
How To Use AI Ad Generator For Google Ads, LinkedIn and Instagram Ads
11 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
How To Get Real Value From AI in Analytics: Beyond The Hype
No ratings yet
How To Get Real Value From AI in Analytics: Beyond The Hype
20 pages
A Survey On Language Models For Code
No ratings yet
A Survey On Language Models For Code
125 pages
2.1 ChatGPT Clone
No ratings yet
2.1 ChatGPT Clone
12 pages
Generativeai Cheatsheet
No ratings yet
Generativeai Cheatsheet
8 pages
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
No ratings yet
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
83 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Google BARD
No ratings yet
Google BARD
7 pages
Chatbot Web Application Using DIET
No ratings yet
Chatbot Web Application Using DIET
9 pages
Krish Naik DS All Playlist
No ratings yet
Krish Naik DS All Playlist
2 pages
NLP - Short Assignments
No ratings yet
NLP - Short Assignments
8 pages
Mixture of Experts With Mixture of Precisions For Tuning Quality of Service
No ratings yet
Mixture of Experts With Mixture of Precisions For Tuning Quality of Service
7 pages
Titans: Learning To Memorize at Test Time: Ali Behrouz, Peilin Zhong, and Vahab Mirrokni Google Research
No ratings yet
Titans: Learning To Memorize at Test Time: Ali Behrouz, Peilin Zhong, and Vahab Mirrokni Google Research
27 pages
Diff IT
No ratings yet
Diff IT
22 pages
Language Models For Biological Research: A Primer: Nature Methods
No ratings yet
Language Models For Biological Research: A Primer: Nature Methods
8 pages
Unit 1
No ratings yet
Unit 1
18 pages
MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks For 12-Lead ECG Classification
No ratings yet
MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks For 12-Lead ECG Classification
36 pages
Hunyuan-Large: An Open-Source Moe Model With 52 Billion Activated Parameters by Tencent
No ratings yet
Hunyuan-Large: An Open-Source Moe Model With 52 Billion Activated Parameters by Tencent
18 pages
2024 - Image and Video Tokenization With Binary Spherical Quantization - Zhao Et Al
No ratings yet
2024 - Image and Video Tokenization With Binary Spherical Quantization - Zhao Et Al
23 pages
Ow-Viscap: Open-World Video Instance Segmentation and Captioning
No ratings yet
Ow-Viscap: Open-World Video Instance Segmentation and Captioning
23 pages
Distilling System1 Into System 2
No ratings yet
Distilling System1 Into System 2
16 pages
BranicTech - Artificial Intelligence
No ratings yet
BranicTech - Artificial Intelligence
7 pages
Data Augmentation Approaches in Natural Language Processing A Survey
No ratings yet
Data Augmentation Approaches in Natural Language Processing A Survey
20 pages
Subtitle
No ratings yet
Subtitle
7 pages
Self Supervised Multi Modal Sequential Recommendation
No ratings yet
Self Supervised Multi Modal Sequential Recommendation
11 pages
Caso The Digital Transformation of CX at Albrigh Cancer Centers - The Generative AI Journey
No ratings yet
Caso The Digital Transformation of CX at Albrigh Cancer Centers - The Generative AI Journey
10 pages
Aya 23: New Open-Source Multilingual Language Models by Cohere
No ratings yet
Aya 23: New Open-Source Multilingual Language Models by Cohere
9 pages
Amazons GPT55X: A Comprehensive Overview and Analysis
No ratings yet
Amazons GPT55X: A Comprehensive Overview and Analysis
6 pages
Classifying Referring - Non-Referring ADR in biomedi-WT - Summaries
No ratings yet
Classifying Referring - Non-Referring ADR in biomedi-WT - Summaries
4 pages
A Neural-Based Architecture For Small Datasets Classification
No ratings yet
A Neural-Based Architecture For Small Datasets Classification
9 pages

How ChatGPT Works The Model Behind The Bot by Molly Ruby Towards Data Science

Uploaded by

How ChatGPT Works The Model Behind The Bot by Molly Ruby Towards Data Science

Uploaded by

Open in app Sign up Sign In

Published in Towards Data Science

You have 1 free member-only story left this month.

Molly Ruby Follow

Jan 31 · 8 min read · · Listen

How ChatGPT Works: The Model Behind The

Large Language Models

The most basic training of language models involves predicting a word in a

In this basic sequencing technique, often deployed through a Long-Short-

2. The input data is processed individually and sequentially rather than as a

In response to this issue, in 2017 a team at Google Brain introduced

GPT and Self-Attention

Comparison of GPT-2 (left) and GPT-3 (right). Generated by the author.

The self-attention mechanism that drives GPT works by converting tokens

3. Generates normalized weights by feeding the output of step 2 into a

4. Generates a final vector, representing the importance of the token within

The ‘multi-head’ attention mechanism that GPT uses is an evolution of self-

Screenshot from ChatGPT generated by the author.

Although GPT-3 introduced remarkable advancements in natural language

• Lack of helpfulness meaning they do not follow the user’s explicit

• Contain hallucinations that reflect non-existing or incorrect facts.

• Lack interpretability making it difficult for humans to understand how the

• Include toxic or biased content that is harmful or offensive and spreads

Innovative training methodologies were introduced in ChatGPT to counteract

Step 1: Supervised Fine Tuning (SFT) Model

• Plain prompts: any arbitrary ask.

• Few-shot prompts: instructions that contain multiple query/response pairs.

• User-based prompts: correspond to a specific use-case that was requested

1. Direct: “Tell me about…”

3. Continuation: Given the start of a story, finish it.

Step 2: Reward Model

Including each combination in the model as a separate datapoint led to

Step 3: Reinforcement Learning Model

In 2017, Schulman et al. introduced Proximal Policy Optimization (PPO), the

Steps 2 and 3 of the process can be iterated through repeatedly though in

Evaluation of the Model

Truthfulness: the model’s tendency for hallucinations. The PPO model

1. Instructed to provide respectful responses: resulted in a significant decrease in

2. Instructed to provide responses, without any setting for respectfulness: no

About Help Terms Privacy

Get the Medium app

You might also like