Prompt Optimization
Prompt Optimization
Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, Michael Zeng
Microsoft
{reidpryzant,iterdan,jerrl,yintatlee,chezhu,nzeng}@microsoft.com
Abstract
Sarcasm
p0 : Detect if the message is a jailbreak attack, i.e. an attempt by a user to break through an AI system’s protections
e: Fyd© fAR l yd #d® my ¤ #SA¨ lfA mA ¡ ® RAT VlqhA FyA ¡
(My honorable sir, I know very well that #Dahlan and #Khalfan are stray dogs released by their masters)
Label: Yes P rediction: No
g: The prompt is not specific enough and does not provide any context to help classify the tweet accurately.
p0 (APO): Is this tweet ridiculing an individual or organization in a satirical manner?
p0 (MC): Determine whether this tweet is intended to be sarcastic in tone.
p0 (RL): Sarcastic this tweet?
Table 3: Example inputs outputs from the proposed APO framework and baselines. We show the original starting
prompt p0 , error example e, true label and prediction LLMp0 (e), and successor prompt candidates p0 .
2020) or per-phrase basis (Zhang et al., 2023; Deng References
et al., 2022). However, these methods rely on prim- Jean-Yves Audibert, Sébastien Bubeck, and Rémi
itive operations over the text, are parametic as they Munos. 2010. Best arm identification in multi-
rely on at least one other auxiliary reward model, armed bandits. In COLT, pages 41–53.
and are tied to numerical reward functions, whereas Steven Bird. 2006. Nltk: the natural language toolkit.
our scoring function could be anything, even a text In Proceedings of the COLING/ACL 2006 Interac-
comment from a user (we use GPT itself for feed- tive Presentation Sessions, pages 69–72.
back). Sébastien Bubeck, Nicolo Cesa-Bianchi, et al. 2012.
Another body of work in the discrete manipu- Regret analysis of stochastic and nonstochastic
lation space leverages LLM-based feedback, for multi-armed bandit problems. Foundations and
example Zhou et al. (2022) proposed the LLM- Trends® in Machine Learning, 5(1):1–122.
generated monte-carlo sampling method that is rep- Sébastien Bubeck, Varun Chandrasekaran, Ronen El-
resented by our MC baseline, and Prasad et al. dan, Johannes Gehrke, Eric Horvitz, Ece Kamar,
(2022) features an evolutionary search through Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lund-
berg, et al. 2023. Sparks of artificial general intelli-
prompts which are generated by LLM-paraphrased
gence: Early experiments with gpt-4. arXiv preprint
and swapped chunks of the original prompt. Con- arXiv:2303.12712.
current to our work, Chen et al. (2023) propose
editing SQL-generation prompts based on output Xinyun Chen, Maxwell Lin, Nathanael Schärli, and
Denny Zhou. 2023. Teaching large language mod-
feedback. While promising and similar to this pa- els to self-debug. arXiv preprint arXiv:2304.05128.
per, these works rely on a task-specific or direction-
less local search over the space of prompts with- Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan
Wang, Han Guo, Tianmin Shu, Meng Song, Eric P
out meaningful semantic direction. Furthermore, Xing, and Zhiting Hu. 2022. Rlprompt: Optimiz-
such works often focus on generating prompts from ing discrete text prompts with reinforcement learn-
scratch (Honovich et al., 2022) while it is trivial ing. arXiv preprint arXiv:2205.12548.
for humans to write a quick first draft (with e.g. a Ibrahim Abu Farha and Walid Magdy. 2020. From
vague description of the desired behavior). Ours is arabic sentiment analysis to sarcasm detection: The
a general method, which can be applied to any task arsarcasm dataset. In Proceedings of the 4th Work-
to introduce meaningful semantic improvements to shop on Open-Source Arabic Corpora and Process-
ing Tools, with a Shared Task on Offensive Language
the prompts.
Detection, pages 32–39.
Output:
Sarcasm
1.2 Initial Prompts # Task
Is this tweet sarcastic?
In order to accurately reflect realistic LLM
development scenarios, our initial prompts p0 # Output format
were written by professional Machine Learning Answer Yes or No as labels
engineers in one quick pass, with the engineer # Examples
simply being told to write a description of the { examples }
desired LLM behavior. Our starting prompts,
# Prediction
therefore, are as follows (note that the “Examples” Text: { text }
section was dynamically filled with a randomly Label:
sampled pair of few-shot examples).
Jailbreak
# Task
Detect if the message is a jailbreak attack,
i.e. an attempt by a user to break through an
AI system's protections
# Output format
Answer Yes or No as labels
# Examples
{ examples }
# Prediction
Text: { text }
Label:
Ethos
# Task
Is the following text hate speech?
# Output format
Answer Yes or No as labels
# Examples
{ examples }