Prompting - Survey On Prompting Techniques in LLMs
Prompting - Survey On Prompting Techniques in LLMs
Prabin Bhandari
Department of Computer Science
George Mason University
Fairfax, Virginia, USA
[email protected]
Abstract—Autoregressive Large Language Models have trans- modeling through pre-training. Recently, the pre-train and
formed the landscape of Natural Language Processing. Pre-train fine-tune paradigm has evolved into a pre-train and prompt
and prompt paradigm has replaced the conventional approach approach, mainly due to the emergence of Large Language
arXiv:2312.03740v2 [cs.CL] 16 Apr 2024
training. Notable examples of Encoder-Decoder PLMs include to convert a sentence in English to a sentence in Nepali, our
Meta’s BART [17] and Google’s T5 [18]. prefix-style prompt will be as follows:
4) Large language models (LLM): The scaling law [2]
Convert the following English sentences to
dictates that an increase in the model’s size, dataset size, and
Nepali:
computational resources used during training often results in
English: All the world’s a stage, and all
enhanced performance on the downstream tasks. Researchers
the men and women merely players.
have tried constantly to push the boundaries of this law by
Nepali:
continually increasing the model’s size. For instance, GPT-
3 [5] has a massive 175 billion parameters, while PaLM [19]
surpasses even that with 540 billion parameters. Despite hav- The model is expected to produce a continuation of this input
ing similar training methodologies compared to other PLMs, where the output will be the Nepali translation of the input
these large PLMs exhibit emergent abilities [3]. For example, English sentence.
GPT-3 can learn a task description with the help of a few If we provide these templates without additional examples,
examples passed as context whereas the predecessor of GPT-3, it is referred to as zero-shot prompting. However, if we provide
GPT-2 can not do that. In contemporary times, the term “Large a few illustrative examples of the correct inputs and outputs,
language models (LLMs)” primarily refers to these massive it is referred to as few-shot prompting.
language models, having tens or even hundreds of billions of
parameters, and trained on vast datasets. These LLMs predom- III. A REA TAXONOMY
inantly adopt Left-to-right transformer architecture (decoder- Figure 2 presents the area taxonomy of prompting meth-
only), and they commonly exhibit an autoregressive nature. ods in autoregressive LLMs. The classification of prompts
is based on two key dimensions: the level of human in-
B. Prompting volvement in prompt creation and the specific types of these
Prompting refers to providing a specific input or instruction prompts. In terms of human effort, prompts are categorized
to guide the model’s output. Basically, an input x with the help into two groups: “Hand-Crated” and “Automated”, reflecting
of a prompt template f is converted to a new representation the extent of manual input required in the prompt creation
f (x), which is then fed into the model to get the desired output process. Additionally, prompts are categorized into three
y. We generally employ two kinds of prompts, cloze prompts, distinct groups based on their intended objectives. These
and prefix prompts. categories include “Task-Based”, ‘Generate-Auxiliary” and,
Cloze prompts are popular with masked language models, “Resource/Tools Augmented”. It is important to note that
where the objective is to fill in the blanks. For example, if this classification is based on the goals and purpose of the
our task is to find the capital city of a country, our cloze-style prompts themselves, rather than the ultimate objective of the
prompt will be as follows: downstream tasks. In the next section, we delve into existing
research within each of these classifications to provide a
The capital of Nepal is [BLANK].
comprehensive overview of the field.
The model is expected to fill the blank with the correct
answer. IV. TAXONOMY-BASED S URVEY
Prefix prompts are generally employed with autoregressive In this section, we offer a survey of the existing literature
LLMs, where the goal is for the model to produce the about prompting within the domain of autoregressive LLMs,
continuation of the string input. For example, if our task is structured according to the taxonomy introduced in section 3.
Prompting
methods in [5, 20–31]
Human Effort Hand-crafted
autoregressive
LLMs
Continuous [41–43]
Generate- Chain of
Type [20–24, 39]
Auxiliary Thought
Generate-
[25, 26]
Knowledge
Resource/Tools
[26, 28–31, 40]
Augmented
It is noteworthy that while some of the research discussed may a prompt for some complex downstream tasks. Consequently,
not be exclusive to autoregressive LLMs, or may have origi- researchers are exploring automated methods for prompt tem-
nally targeted PLMs, many of these approaches are applicable plate design. These automatically generated prompts can be
and adaptable for effective use with autoregressive LLMs. further classified into discrete and continuous prompts.
a) Discrete Prompts: Discrete Prompts, also referred to
A. Human Effort as “hard prompts”, are those prompts where the prompt input
On the basis of the amount of human effort required to to the underlying LLM is still an actual text. These prompts
create the prompts, they can be classified into Hand-crafted are named discrete prompts because our search space for
and Automated. the prompt is limited to the discrete space of the tokens of
1) Hand-crafted Prompts: Hand-crafted prompts are the the underlying LLM. Different techniques including mining,
most natural way of creating prompts where a prompt tem- paraphrasing, and searching have been explored to generate
plate is created based on human intuition. Brown et al. [5] discrete prompts.
introduced hand-crafted prefix prompts for solving a variety In their work, Jiang et al. [32] proposed a mining-based
of tasks. We provide an example of a hand-crafted prompt approach to find discrete prompts. Originally proposed for
taken from [5]: masked language models, this approach is adaptable to au-
toregressive LLMs. Given an input-output pair of x, y, the
Translate English to French:
method scraps a large text corpus, identifying strings con-
Cheese =>
taining both x and y, and subsequently extracts the middle
The prompt above is called a zero-shot prompt as we have word or dependency path between them to determine a relation
only provided the task description along with the input. If (r) for use as a prompt: “[x] r ...”. Additionally, Jiang
we provide a few input-output examples along with the task et al. [32] also proposed the use of paraphrasing for creating
description, we call them few-shot prompts. An example of a discrete prompts. The proposed solution was to translate a seed
few-shot prompt is provided below: prompt into another language which is back-translated to the
original language. Other paraphrasing-based solutions include
Translate English to French: synonym substitution using a Thesaurus [33] and employing
Sea otter => loutre de mer a neural model for rewriting the prompt [34].
plush girafe => girafe peluche
Wallace et al. [35] proposed a gradient-based search tech-
Cheese =>
nique to find discrete prompts. They employ a gradient-guided
2) Automated Prompts: Although manual prompt creation search across tokens of the underlying LLM to identify short
is intuitive, such hand-created prompts may have limitations. trigger sequences capable of inducing the desired output from
One such limitation is that these hand-crafted prompts may be the LLM. Some approaches score the prompt using another
sub-optimal. Another issue is that hand-crafted prompts are LM. For example, Davison et al. [36], handcrafted a set of
domain-specific and it can be an arduous task to hand-craft potential templates which were filled with input and output
data from the training set. These filled prompt templates were contextually appropriate responses. In classification tasks, the
scored using GPT-2, with the highest-scoring prompt being goal is to discriminate between different class labels and assign
selected for use. Lu et al. [37] also proposed a scoring-based the proper class label for an input. The broad objective of
method to address the problem of order sensitivity within aligning with the goals of the downstream task they support
the few-shot setting. Their approach involves considering is shared by all prompting approaches. However, in practice,
all possible ordering permutations for the provided few-shot prompts might often serve additional auxiliary purposes or use
examples. They then use the underlying LLM to generate from additional tools to facilitate the downstream task objective.
these permutations to generate a probing set. The probing set is Based on these additional or auxiliary purposes or tools of
scored using entropy-based measures to rank the effectiveness prompts, we can classify them into different categories. It is
of different permutations. important to note that these categories may encompass both
As adaptation of reinforcement learning (RL) continues hand-crafted and automated prompts. We provide a description
to grow within LLMs, efforts have been made to leverage of each category below:
RL for the optimization of discrete prompts. One notable 1) Task-Based: Task-based prompts are the most straight-
contribution is RLPrompt [38] which introduces a parameter- forward category within the taxonomy of prompts based on
efficient policy network. This network is trained with rewards their objective. These prompts do not serve any auxiliary
to generate optimized discrete prompts. goal and are characterized by their single objective of the
b) Continuous Prompts: Continuous prompts, also re- downstream task. All the different prompting techniques that
ferred to as “soft prompts”, are those prompts that are defined we discussed under Hand-Crafted and Automated prompts fall
in the embedding space of the LLM and therefore are not under this category.
in human-readable format. The templates of soft prompts 2) Generate-Auxiliary: Generate-Auxiliary prompts are the
have their own parameters that can be tuned. These prompts types of prompts that generate auxiliary output text in order
are called continuous because our prompt to the LLM are to facilitate the downstream tasks. Generate-Auxiliary prompts
continuous vectors instead of discrete tokens. We discuss some can be further classified into chain of thought and generate-
of the seminal works below. knowledge prompts.
Prefix Tuning [41] involves the addition of task-specific
prefixes to the beginning of input sequences. These prefixes are
free-parameters (Pθ ), which undergo reparameterized through
a small multi-layer perception during the training process.
Then, the log-likelihood objective is optimized, with the LLM
parameters frozen while only updating the prefix parameters.
Mathematically,
X
max log P (y|x; θ; ϕ) = max log P (yi |h<i ; θ; ϕ)
θ θ
yi