0% found this document useful (0 votes)
560 views24 pages

OpenAI Generative Pre-Trained Transformer 3 (GPT-3) For Developers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
560 views24 pages

OpenAI Generative Pre-Trained Transformer 3 (GPT-3) For Developers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Artificial intelligence is making a huge progress in mimicking the cognitive capabilities of humans.

One such breakthrough in this area is the introduction of Generative Pre-trained Transformer 3 (GPT-
3) language model which can perform any natural language task. GPT-3 can easily solve wide range of
problems including text summarization, classification, question answer, creating articles, create
functioning code and many more.

This course will help learners to gain understanding of GPT-3 language model and its possible
applications. It will also enable learners to use GPT-3 for their specific language tasks. Learners will
be introduced to the details of the transformer model which is the core of GPT-3.

After completing the course, learner will be able to:

 Understand GPT-3 and its possible applications

 Familiarize with GPT-3 playground and its configurations

 Learn to use GPT-3 for solving specific natural language task using playground and python
programming language

 Understand building blocks of a language model

 Know the working of transformer model which is the core of GPT-3

In Artificial intelligence/Machine Learning, a model is an algorithm trained with huge corpus of data
and human input to perform tasks which otherwise would be performed by humans when provided
with the same information. Models augment human intelligence by performing tasks at speed , scale
and in an efficient manner.

Generative Pre-trained Transformer 3 (GPT-3) is a language model that is trained to produce human
like text. At its core, GPT-3 is a neural network based deep learning model named "transformers".
This model can produce text sequence given the input text sequence and are primarily designed to
perform language tasks like text summarization, text classification, language translation, Question &
Answer systems and so on.

GPT-3 is developed by OpenAI, a San Francisco-based artificial intelligence research laboratory and
was introduced in May 2020. It is the third-generation language model in the GPT-n series succeeding
GPT-2.

In addition to GPT-2, GPT-3 joins the list of other pre-trained language models like Google’s BERT,
Facebook’s RoBERTa and Microsoft’s Turing-NLG among others. These pre-trained language models
are trained on massive generic data sets and can solve v

GPT-3 has become popular and has attracted attention due to the following factors:

Size:

1. GPT-3 is the largest language model created till date. The largest version of GPT-3 is pre-
trained with diverse and enormous text data sets consisting of billions of words and 175
billion parameters .
2. The data set used for training includes the text from CommonCrawl, which is a publicly
available dataset created by crawling the internet along with other texts selected by OpenAI,
including the text of Wikipedia. This vast size of the model makes GPT-3 perform significantly
better than other language models.

Simplicity & Performance:

1. Most of the language models like GPT-3 are trained with diverse and large corpus of
unlabeled text data . This pre-trained model is further fine-tuned to perform specific
language tasks like summarization, classification etc.

2. In contrast, GPT-3 goes one step further and does not require fine tuning the pre-trained
model. Users can interact with the GPT-3 by giving any text prompt like a phrase or a
sentence. GPT-3 returns a text completion in natural language. Users can also program GPT-3
by showing few examples to perform more complex language tasks.

3. GPT-3 can perform range of tasks like writing articles, answering questions, classifying text,
summarizing text, creating SQL queries for a given natural language description, generating
functioning code , code translation and so on.

Currently, GPT-3 is not open-source and OpenAI is making the model available through a commercial
API which you can find here: https://beta.openai.com/ . Users can interact with the API through
HTTP requests from different programming languages. OpenAI officially supports Python bindings.
Users also have choice to use other languages like C#/.Net, Crystal, Dart, Go, Java, JavaScript/Node,
Ruby and Unity through community libraries built and maintained by the broader developer
community.

To work with GPT-3 you will need the following:

 OpenAI Account

 OpenAI GPT-3 license.

 Python 3.6 or above ( optional and it is needed when you want to create standalone GPT-3
applications with Python)
Playground is a web-based interface that allows users to experiment and iterate range of use cases
or problems using GPT-3 language model. Before exploring the playground, one needs to understand
the basic constructs of OpenAI API.

There are three concepts that are core to the API: prompt, completion, and tokens.

 The “prompt” is text input to the API

 The “completion” is the text that the API generates based on the prompt.

 The “Tokens” can be thought of as pieces of words.

For example:

Prompt: “i saw a cat drinking milk”


Completion: i saw a cat drinking milk in the street. He was a white cat with black spots, a very large
cat. He was not afraid of people, and he sat there, licking his whiskers, and cleaning his paws. "That's
a strange cat," said the old man. "He acts as if he owned the street."

To use the API, user needs to give text prompt (the text-based input or "instructions" you provide to
the API) and it will return a text completion, attempting to match the context or pattern you gave it.
A well-written prompt provides enough information for the API to know what you are asking for and
how it should respond.

Note: One limitation to keep in mind is that combined, the text prompt and generated completion
must be below 2048 tokens (roughly ~1500 words). Read more about the key concepts and prompt
design in “Documentation” section.
You can explore additional examples across catergories like Translation, Conversation, Generation,
Classification, Answering etc.

The examples can be accessed in https://beta.openai.com/examples.

API provided by OpenAI is a text in-text out interface, which can be used in various language tasks. To
use the API, one should enter the text prompt. Text prompt is a text-based input or set of instructions
or set of examples one provides to the API. Based on the text prompt, the GPT-3 model generates the
completion text in relation to the context and set of examples given in the text prompt.

The GPT-3 model is trained on variety of data available in internet and Wikipedia. Training was
stopped in October 2019. Therefore, GPT-3 model may not be aware of the events after October
2019. OpenAI is planning to add on continuous training in future.

There are three main concepts which forms the core of the API are:

Prompt: Prompt is the text input given to the API.

Completion: Completion is the text generated by the API based on the text input given in the
Prompt.

It is important to know that, completion generated by API is stochastic in nature. Which means, even
though the text prompt is same, the API might generate slightly different completions each time you
call it.

Tokens: The API turns the input text prompt into tokens (pieces of words) prior to processing. For
example: the word “Descartes” can be broken into three tokens, “Desc”, “art”, “es”. Simple words like
“pear” will be not be broken further. It is considered as a single token.

Note: One limitation to keep in mind is, resulting tokens from the combination of the text input and
completion from the GPT-3 should be less than 2048 (~1500 words).

Playground: Playground is simple text box like interface, where one can write the input text prompt
and click on the submit button to generate the text completion.
Prompt design:

There are three basic guidelines to make best use of prompts to get things done, they are:

1. Show and tell: If you want the GPT-3 to do certain task like sentiment classification. The prompt
can have a clear instruction saying sentiment classification followed by some examples.

2. Quality of data: This step is to make sure the data provided in the examples are clear and free
from errors.

3. Check settings: If the answer expected from GPT-3 model is deterministic (only one right answer
types), then it is recommended to use lower values for temperature and top_p in settings. If the
expected answer is not obvious, it is recommended to use higher values for temperature and top_p
in settings.

Engines:

OpenAI offers access to four different engines: Ada, Babbage, Curie and Davinci. Davinci engine is the
most capable engine which can provide nearly accurate predictions for many language tasks. It is
recommended to use Davinci engine during experimentation and development. Other engines have
advantages in terms of the low latency compared to Davinci and can chose if they perform well with
low latency for a particular application.
Step 2: Word Embedding

 Token IDs are just numbers, they do not represent any meaning. These token IDs (words) are
transformed into vectors of fixed length called word embeddings.

 Word embeddings are great, but they do not have the contextual information. For example,

Sentence 1: There is a tree next to river bank.

Sentence 2: I deposited some money to my bank account.

 The word bank is used in different contexts in the above sentences. But word embedding will
not capture the contextual information. The word bank will have the same word embedding
in both the sentences.

 Therefore, to generate context aware vectors, we use a mechanism called attention.

In GPT-3 architecture, before the attention block, there is a step called positional encoding.

Step 3: Positional encoding

Positional encoding is just a vector addition process, where positional vectors are added into the
word embeddings and the resulting vectors will have positional information encoded in them.

Step 4: Attention

Positional encoded vectors are mapped into different dimension by matrix multiplication (Q, K, V
matrices), the model is free to learn these Q, K, V matrices during training. The scaled dot product of
Q and K results in scores. New context aware vector is calculated as a weighted sum of values (V),
where weights are the scores generated by the dot product of Q and K. Now each word embedding is
represented by a context aware vector.

In GPT-3, instead of one attention block, there are multiple attention blocks with different Q, K, V
matrices helping the model to learn more complex relationship between words. Since it is a multi-
head attention, each attention head will produce a set of context aware vectors corresponding to
each word embedding. Context aware vectors from each attention head are concatenated together
and multiplied by a weight matrix to generate one context aware vector per word embedding.

Step 5: Add and normalize, FFNN

The context aware vectors are passed into add and normalize block, where context aware vectors are
added with word embeddings with positional information and normalized so that there is no
information decay. The output of this block is fed into a feed forward neural network (FFNN) followed
by another add and normalize layer.

The self-attention block, add and normalize block, FFNN followed by another add and normalize
block forms one transformer block. There can be many such transformer blocks.

The output after many such transformer blocks are a set of context aware vectors (also called hidden
states). To generate the next word, last word hidden state is used.

How to convert this context aware vector into a next word prediction?

The last word hidden state (context aware vector) is mapped into a vector of size of the vocabulary
by passing it through a FFNN. The resulting vector is passed through a SoftMax function resulting in
another vector of the same size whose values correspond to probabilities for each word in the
vocabulary.

The word (token) which has highest probability can be chosen as the prediction for next word in the
sentence. The generated word can be fed as input again to the model to generate next few words in
the sentence.
As we can see lower values for temperature produces deterministic kind of completions. With
increase in temperature, the GPT-3 model takes some risks and produces interesting completions.

top_p

This nucleus sampling parameter is an alternative to temperature parameter. If this parameter is set
to 0.1, model considers tokens with probability mass of 10% in results.

It is recommended to change either temperature or top_p but not both.

Number of completions to be generated for each prompt.


Note: This parameter should be set appropriately since higher values of n may result in many
completions which quickly consumes the 2048 tokens quota available.

Stream

If set True, tokens are sent as response when they are available. Once the response gets completed,
stream gets terminated with data: [DONE] message.

Echo

If set TRUE, this will echo back the input prompt text along with the completion generated.

stop

Up to 4 sequences where the API will stop generating further tokens. The returned text will not
contain the stop sequence.

Presence penalty

Number between 0 and 1 that penalizes new tokens based on whether they appear in the text so far.
Increases the model's likelihood to talk about new topics.

Frequency penalty

Number between 0 and 1 that penalizes new tokens based on their existing frequency in the text so
far. Decreases the model's likelihood to repeat the same line verbatim.

best_of

GPT-3 generates the number of completions specified in the best_of parameter and returns the best
completion i.e. the completion with lowest log probability per token.

Results generated cannot be streamed.

Note: This parameter should be set appropriately since higher values of n may result in many
completions which quickly consumes the 2048 tokens quota available. best_of should be greater
than n.

logprobs

Generates the log probabilities of the most likely tokens. For example, If logprobs parameter is set to
6, the response from GPT-3 API is the list of six most likely tokens.

You might also like