Artificial Intelligence - Assignment 3

The document summarizes GPT-3, a large language model created by OpenAI. It is a 175 billion parameter transformer model trained on a large text corpus using self-supervised learning. GPT-3 can perform a wide range of tasks like summarization, translation, and question answering without explicit supervision through conditioning on examples. The architecture uses transformer layers with attention and sparse attention patterns. GPT-3 demonstrates high-quality human-like text generation that is difficult to distinguish from human-written text.

Uploaded by

Pankhuri Bhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views11 pages

Artificial Intelligence - Assignment 3

Uploaded by

Pankhuri Bhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Submitted by: Group 2

Ashutosh - A18PT2-33
Jigyasa - 19PT1-12
Navneet - 19PT1-17
Pankhuri - 19PT1-18
Deep - A18PT2-37
Artificial intelligence- Assignment 3

Open AI’s GPT-3 - A leap forward in Deep Learning and NLP

● GPT-3 stands for Generative Pretrained Transformer version 3, and it is a sequence

transduction model, which is a technique that transforms an input sequence to an output
sequence.
● It is an autoregressive language model that uses deep learning to produce human-like text. It is
the third-generation language prediction model in the GPT-n series created by OpenAI, a for-
profit San Francisco-based artificial intelligence research laboratory.
● By using sequence transduction, it can predict the likelihood of an output sequence given an
input sequence. This can be used, for instance, to predict which word makes the most sense
given a text sequence.
● What is new about GPT-3?
Its size.GPT-3's full version has a capacity of 175 billion machine learning parameters. GPT-3,
which was introduced in May 2020, and is in beta testing as of July 2020, is part of a trend in
natural language processing (NLP) systems of pre-trained language representations.

● Prior to the release of GPT-3, the largest language model was Microsoft's Turing NLG,
introduced in February 2020, with a capacity of 17 billion parameters or less than 10 percent
compared to GPT-3.
● The quality of the text generated by GPT-3 is so high that it is difficult to distinguish from
that written by a human, which has both benefits and risks.
● David Chalmers, an Australian philosopher, described GPT-3 as "one of the most
interesting and important AI systems ever produced."
● GPT-n models are based on this Transformer-based deep learning neural network
architecture.
● GPT-3 is based on a specific neural network architecture type called Transformer that,
simply put, is more effective than other architectures like RNNs (Recurrent Neural
Networks).

Language Model ?
“The diversity of tasks
the model is able to
perform in a zero-shot
setting suggests that High-capacity models trained to maximize the likelihood of a sufficiently varied text
corpus begin to learn how to perform a surprising amount of tasks without the need for explicit
supervision.”

GPT-3 Architecture

GPT-3 is a neural-network-powered language model and sequence transduction model based on deep
learning. GPT-3 is made up of a Transformers-based architecture similar to GPT-2, including the
modified initialization, pre-normalization, and reversible tokenization described therein, with the
exception that it uses alternating dense and locally banded sparse attention patterns in the layers of the
transformer, similar to the Sparse Transformer. Being a transformer-based with the basis of NLP model
BERT, it's not very novel.However, with 175 billion parameters which makes it really big and the largest
language model trained which makes it perform specific tasks without special tuning and few training
examples. GPT-3 is trained using Common Crawl, Wikipedia, Book1, Book2. GPT-3 175B has a lower data
compression ratio 499/175=2.85 in comparison to GPT-2 1.5G 10/1.5 = 6.66.

Essentially, the architecture is the same as GPT-2 (including the modified initialization, pre-
normalization, and reversible tokenization described therein), with the exception that authors use
alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar
to the Sparse Transformer.
The model became proportionally larger: more layers (up to 96), a higher number of units in each
bottleneck layer (up to 12288), larger context window (2048 tokens compared to 1024 in GPT-2 and 512
in GPT).
I/O - It's based on input (2048 words) and output sequences. The input is a sequence of N words (a.k.a
tokens). The output is a guess for the word most likely to be put at the end of the input sequence.
Followed by sequences of guesses with a probability of each likely word.

Encoding - As GPt can’t understand the words, each word operates in a vector of numbers from the GPT
vocabulary (50257 words). 2048 x 50257 matrix of ones and zeroes. GPT-3 actually uses byte-level Byte
Pair Encoding (BPE) tokenization.

Embedding - 50257 is pretty big for a vector, and it's mostly filled with zeroes. That's a lot of wasted
space. To solve this, we learn an embedding function: a neural network that takes a 50257-length vector
of ones and zeroes, and outputs a n-length vector of numbers. Here, we are trying to store (or project)
the information of the word's meaning to a smaller dimensional space. Example, picture below - N size
vector into 2-d vector. Mostly its not 2 dimensional and is generally 12288 dimensions which multiply
the 2048 x 50257 sequence-encodings matrix with the 50257 x 12288 embedding-weights matrix
(learned) and end up with a 2048 x 12288 sequence-embeddings matrix.
Positional Encoding: To encode the position of the current token in the sequence, the authors take the
token's position (a scalar i, in [0-2047]) and pass it through 12288 sinusoidal functions, each with a
different frequency. Finally, sequence-positional-encodings matrix, having the same shape as the
sequence-embeddings matrix made to get simply added.

Attention: For each output in the sequence, predict which input tokens to focus on and how much.
Here, imagine a sequence of 3 tokens, each represented with a 512-values embedding. The first two
matrices ("queries" and "keys") are multiplied together (QKT), which yields a 3x3 matrix. This matrix
(normalized through softmax) represents the importance of each token to each other.
Note: This (QKT) is the only operation in GPT which operates across words in the sequence. It is the only
operation where matrix rows interact.

The third matrix ("values") is multiplied with this importance matrix, resulting in, for each token, a mix of
all other token values weighted by the importance of their respective tokens.
Multi-Head Attention which is attention used many times i.e. (96x in GPT-3) also, sparse attention is
used by GPT-3.

Feed-Forward: The feed-forward block is a good-old multi-layer-perceptron with 1 hidden layer. Take
input, multiply with learned weights, add learned bias, do it again, get a result.

Add & Norm: After both the Multi-Head attention and the feed-forward blocks, the input of the block is
added to it's output, and the result is normalized. This is common in deep learning models.

Decoding: After passing through all 96 layers of GPT-3's attention/neural net machinery, the input has
been processed into a 2048 x 12288 matrix. This matrix is supposed to contain, for each of the 2048
output positions in the sequence, a 12288-vector of information about which word should appear. But
how do we extract this information?

As stated in the Embedding section, we learned a mapping which transforms a given (one-hot encoding
of a) word into a 12288-vector embedding. It turns out, we can just reverse this mapping to transform
our output 12288-vector embedding back into a 50257-word-encoding.

In addition, the GPT papers mention the parameter top-k, which limits the amount of possible words to
sample in the output to the k most likely predicted words. For example, with a top-k parameter of 1, we
always pick the most likely word.

Combing the steps in sequence below :

Step 1 & 2:
Step 3 & 4

Step 5,6 & 7

Architecture Summary: Like the models invented before it, the Transformer is an encoder-decoder
architecture. The encoder consists of a set of encoding layers that processes the input iteratively one
layer after another and the decoder consists of a set of decoding layers that does the same thing to the
output of the encoder.
The function of each encoder layer is to process its input to generate encodings, containing information
about which parts of the inputs are relevant to each other. It passes its set of encodings to the next
encoder layer as inputs. Each decoder layer does the opposite, taking all the encodings and processes
them, using their incorporated contextual information to generate an output sequence. To achieve this,
each encoder and decoder layer makes use of an attention mechanism, which for each input, weighs the
relevance of every other input and draws information from them accordingly to produce the
output.Each layer decoder also has an additional attention mechanism which draws information from
the outputs of previous decoders, before the decoder layer draws information from the encodings. Both
the encoder and decoder layers have a feed-forward neural network for additional processing of the
outputs, and contain residual connections and layer normalization steps.

GPT-3 Use Cases:

1. Text summarizing
2. Natural language to SQL
3. Natural language to LaTeX equations
4. Creative writing
5. Interface design and coding
6. Text to DevOps
7. Automatic mail answering
8. Dialog flows workbench for gaming and chatbots

GPT Key Features:

● GPT-3 is the largest language model trained today. GPT-3 is OpenAI's latest and greatest natural
language prediction model.
● The basic operating mode of GPT-3 is to generate text responses based on the input text. Eg to
answer a question or to write an essay based on a title.
● OpenAI now provides a developer API to interact with GPT-3 and build applications on top of it.
● GPT-3 is a few-shot learner. It requires priming with a few examples to work in a specific
context.
● Once primed correctly, GPT-3 could perform math calculations and generate answers in
programming languages, although it has not learned either explicitly
● GPT-3 shows that language model performance scales as a power-law of model size, dataset
size, and the amount of computation.
● GPT-3 demonstrates that a language model trained on enough data can solve NLP tasks that it
has never encountered. That is, GPT-3 studies the model as a general solution for many
downstream jobs without fine-tuning.
● The cost of AI is increasing exponentially. Training GPT-3 would cost over $4.6M using a Tesla
V100 cloud instance.
● The size of state-of-the-art (SOTA) language models is growing by at least a factor of 10 every
year. This outpaces the growth of GPU memory. For NLP, the days of "embarrassingly parallel" is
coming to the end; model parallelization will become indispensable.
● The GPT-3 is pre-trained with a large amount of natural language text from the Internet (45TB of
training text with 499 billion words). It cost at least 4.6 million US dollars (some estimated as
high as $12 million) to train on GPUs. The resulting model has 175 billion parameters.

GPT-3: Reaping the Benefits and Mitigating the Risks >>> For end

OpenAI’s GPT-3 is a massive step forward for the AI space, specifically for natural language generation,
and it is going to exponentially help multiple industries. Selecting the right application and combining it
with other components can really help leverage this model to a great extent.

GPT-3 is great, however, it also has its own setbacks and flaws. The algorithm can go completely off-
topic sometimes and it can get offensive too. These drawbacks may make people apprehensive to use
this model in production, which leads to them ignoring the benefits that come with it.

Consequently, there is also a need to balance the risks and benefits of using generative models. We can
mitigate the risks and reap the benefits of the algorithm in the following ways:

● Use it for use cases where the risk doesn’t have major repercussions: Instead of always using
it for an end-user application, it can be used for internal use cases which can boost productivity.
This can control the risk and will still reap the benefits.
● Build components to control the content: There is a way to control the content by architecting
different ML models on top of GPT-3, which will flag content that is not correct. This can help
prevent incorrect or inappropriate content from going out when it’s not supposed to. These
components on top can err on the side of caution and be more strict because their role is to
prevent bad content from going out even if it’s at the risk of restricting some good content. This
can lead to more false positives however, it will provide the reassurance that the content going
out is safe. This allows the generative model to be creative while still having an architecture to
prevent things from going off track.
● Adapt it to domain-specific data: o OpenAI does provide access to training APIs (on request)
which will allow to adapt GPT3 to a particular task/domain and make it more relevant for the
task at hand.
● Political design and governance of AI systems is the key.
● Results are amazing, but at what cost ?
● Finally, more data doesn’t necessarily mean better data. We need quality data , infact
unbiased and diverse data.

****************************************************************

AI API Course
No ratings yet
AI API Course
85 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Week 12
100% (1)
Week 12
64 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
Early Childhood Education 2017
100% (1)
Early Childhood Education 2017
7 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Day 1
No ratings yet
Day 1
32 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
X33fcon 2023 Empowering Security GenerativeAI Fundamentals Applications
No ratings yet
X33fcon 2023 Empowering Security GenerativeAI Fundamentals Applications
60 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Douglas Mark - Learning Flawless Execution
100% (3)
Douglas Mark - Learning Flawless Execution
4 pages
Course3 LM
No ratings yet
Course3 LM
69 pages
GPT in 60 Lines of NumPy - Jay Mody
No ratings yet
GPT in 60 Lines of NumPy - Jay Mody
41 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
CS480 Lecture November 28th
No ratings yet
CS480 Lecture November 28th
96 pages
cl13 gpt-2
No ratings yet
cl13 gpt-2
26 pages
cl13 GPT
No ratings yet
cl13 GPT
26 pages
Session 15-2 Future NLP & Deep Learning
No ratings yet
Session 15-2 Future NLP & Deep Learning
81 pages
Course Outline PHD Education Updated 27-08-2020
No ratings yet
Course Outline PHD Education Updated 27-08-2020
29 pages
Hi Everyone So by Now You Have Probtranscript
No ratings yet
Hi Everyone So by Now You Have Probtranscript
31 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
OpenAI Generative Pre-Trained Transformer 3 (GPT-3) For Developers
No ratings yet
OpenAI Generative Pre-Trained Transformer 3 (GPT-3) For Developers
24 pages
How Does A GPT Tool Process Inputs
No ratings yet
How Does A GPT Tool Process Inputs
19 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
2005 14165v3 PDF
No ratings yet
2005 14165v3 PDF
74 pages
Numenta Case Analysis-Group 2
100% (1)
Numenta Case Analysis-Group 2
3 pages
Augmenting LLMs Survey
No ratings yet
Augmenting LLMs Survey
33 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
GPT 3
No ratings yet
GPT 3
10 pages
Fundamentals of Generative AI
No ratings yet
Fundamentals of Generative AI
17 pages
GPT 3
No ratings yet
GPT 3
15 pages
5th Unit
No ratings yet
5th Unit
36 pages
GPT2
No ratings yet
GPT2
14 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
GPT-3 - Wikipedia
No ratings yet
GPT-3 - Wikipedia
22 pages
GEN-AI-unit 3
No ratings yet
GEN-AI-unit 3
30 pages
Understanding GPT The AI Revolution in Language Processing
No ratings yet
Understanding GPT The AI Revolution in Language Processing
10 pages
How It All Works
No ratings yet
How It All Works
12 pages
Machine Translation Wise 2016/2017
No ratings yet
Machine Translation Wise 2016/2017
58 pages
Ceo Updates PDF
No ratings yet
Ceo Updates PDF
26 pages
Transformer Structure
No ratings yet
Transformer Structure
11 pages
Module1 L5 GPT Variants
No ratings yet
Module1 L5 GPT Variants
7 pages
Basics of NLP
No ratings yet
Basics of NLP
9 pages
GPT-3 Presentation Summary
No ratings yet
GPT-3 Presentation Summary
6 pages
19PT2-36 - Operations Management Assignment
No ratings yet
19PT2-36 - Operations Management Assignment
6 pages
Final Opcrf Hs
No ratings yet
Final Opcrf Hs
32 pages
GPT3
No ratings yet
GPT3
11 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
All About Open AI's GPT-3
No ratings yet
All About Open AI's GPT-3
11 pages
How ChatGPT Understands You in 30 Tokens or Less-3
No ratings yet
How ChatGPT Understands You in 30 Tokens or Less-3
7 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
An AIRevolutionfroman Open AIFull Paper 1
No ratings yet
An AIRevolutionfroman Open AIFull Paper 1
14 pages
Exam ml4nlp1 Hs21.example Solution
No ratings yet
Exam ml4nlp1 Hs21.example Solution
6 pages
What Is GPT-3 - Everything You Need To Know - TechTarget
No ratings yet
What Is GPT-3 - Everything You Need To Know - TechTarget
11 pages
Hortatory Exposition Text Text 1
No ratings yet
Hortatory Exposition Text Text 1
1 page
GPT 3
No ratings yet
GPT 3
14 pages
1 s2.0 S2667325821002193 Main
No ratings yet
1 s2.0 S2667325821002193 Main
3 pages
Batch Profile Exec. & HCA - 2019
No ratings yet
Batch Profile Exec. & HCA - 2019
48 pages
Cars Data
No ratings yet
Cars Data
72 pages
Barbara Case: Documents: Barbara Norris Leading Change in The General Surgery Unit Case Solution & Answer
No ratings yet
Barbara Case: Documents: Barbara Norris Leading Change in The General Surgery Unit Case Solution & Answer
19 pages
Artificial Intelligence Innovation The Future With OpenAI GPT-3
No ratings yet
Artificial Intelligence Innovation The Future With OpenAI GPT-3
4 pages
Script 2
No ratings yet
Script 2
2 pages
Finals Reviewer
No ratings yet
Finals Reviewer
29 pages
Global Maldives
No ratings yet
Global Maldives
3 pages
The Generative Pre-Trained Transformer: GPT-3
No ratings yet
The Generative Pre-Trained Transformer: GPT-3
1 page
How To Make Custom AI-Generated Text With GPT-2
No ratings yet
How To Make Custom AI-Generated Text With GPT-2
3 pages
The Language Machines: A Remarkable AI Can Write Like Humans - But With No Understanding of What It's Saying
No ratings yet
The Language Machines: A Remarkable AI Can Write Like Humans - But With No Understanding of What It's Saying
4 pages
Form Dir 2
No ratings yet
Form Dir 2
2 pages
What Is GPT-3: The Machine Learning Model That Has Taken AI World by Storm
No ratings yet
What Is GPT-3: The Machine Learning Model That Has Taken AI World by Storm
1 page
Group 3: Attitudinal Barrier in Communication
No ratings yet
Group 3: Attitudinal Barrier in Communication
4 pages
What Is GPT PDF
No ratings yet
What Is GPT PDF
1 page
BDAM - Assignment 1 - Group 2
No ratings yet
BDAM - Assignment 1 - Group 2
8 pages
Selected Interview Questions For Counselors
No ratings yet
Selected Interview Questions For Counselors
3 pages
Rubrics
No ratings yet
Rubrics
1 page
Lesson Plan G8 - Propaganda
No ratings yet
Lesson Plan G8 - Propaganda
10 pages
Chapter 4
No ratings yet
Chapter 4
45 pages
Motivation in Learning English Language: A Case Study at Vietnam National University, Hanoi
No ratings yet
Motivation in Learning English Language: A Case Study at Vietnam National University, Hanoi
17 pages
Violeta Vázquez Rojas Maldonado - The Syntax and Semantics of Purépecha Noun Phrases and The Mass - Count Distinction-New York University (2012)
No ratings yet
Violeta Vázquez Rojas Maldonado - The Syntax and Semantics of Purépecha Noun Phrases and The Mass - Count Distinction-New York University (2012)
215 pages
STS Activity 1
No ratings yet
STS Activity 1
2 pages
Column 1 Column 2 Column 3
No ratings yet
Column 1 Column 2 Column 3
15 pages
Umesh 156340007 Psychoanalysis Film Theory
No ratings yet
Umesh 156340007 Psychoanalysis Film Theory
10 pages
Form No. Inc-9 Affidavit (Pursuant To Section 7 (1) (C) of The Companies Act, 2013 and Rule 15 of Thecompanies (Incorporation) Rules, 2014)
No ratings yet
Form No. Inc-9 Affidavit (Pursuant To Section 7 (1) (C) of The Companies Act, 2013 and Rule 15 of Thecompanies (Incorporation) Rules, 2014)
3 pages
Boston Children Project
No ratings yet
Boston Children Project
2 pages
Ed 243 - Final Lesson Plan Bella Roumain Cec Binder
No ratings yet
Ed 243 - Final Lesson Plan Bella Roumain Cec Binder
11 pages
Mark Franko. Archaeological Choreographic Practices
No ratings yet
Mark Franko. Archaeological Choreographic Practices
16 pages
Ict-9 LP Sample
No ratings yet
Ict-9 LP Sample
2 pages
Wine Data Output
No ratings yet
Wine Data Output
10 pages
M Tech Mid 2 Nnfs Paper
No ratings yet
M Tech Mid 2 Nnfs Paper
2 pages
Attitude Assessment
No ratings yet
Attitude Assessment
3 pages
Training: Melike Bendas ET 3-A
No ratings yet
Training: Melike Bendas ET 3-A
5 pages
Discuss Mesos and Yarn and The Relative Placement of The Two Respectively
No ratings yet
Discuss Mesos and Yarn and The Relative Placement of The Two Respectively
6 pages
Ed508-5e-Lesson-Plan-Template - Lesson 2
No ratings yet
Ed508-5e-Lesson-Plan-Template - Lesson 2
3 pages
7 Focus 4 Lesson Plan
No ratings yet
7 Focus 4 Lesson Plan
2 pages
Winnipeg General Strike Lesson Plan - Lydia Gibbs
No ratings yet
Winnipeg General Strike Lesson Plan - Lydia Gibbs
2 pages
Gavin Hammond: Outdoor Maintenance
No ratings yet
Gavin Hammond: Outdoor Maintenance
2 pages
19PT1-18 Pankhuri Bhatnagar
No ratings yet
19PT1-18 Pankhuri Bhatnagar
3 pages
Business Planning & Policy - XYZ Ltd. Wishes To Adopt The Cost-Leadership
No ratings yet
Business Planning & Policy - XYZ Ltd. Wishes To Adopt The Cost-Leadership
3 pages
Tashfeen Resume Template For Experienced One
No ratings yet
Tashfeen Resume Template For Experienced One
2 pages
SBM MT
No ratings yet
SBM MT
1 page