Learning To Retrieve Iteratively For In-Context Learning
Learning To Retrieve Iteratively For In-Context Learning
Yunmo Chen* , Tongfei Chen, Harsh Jhamtani, Patrick Xia, Richard Shin†
Jason Eisner, Benjamin Van Durme
Microsoft
[email protected], {tongfeichen,hjhamtani,patrickxia,jeisner,ben.vandurme}@microsoft.com
Abstract x ;
<latexit sha1_base64="wh59+9gENwSk+Z4jzA6EiA+X4Mg=">AAAC6nicfVFNbxMxEPUuLZTwlcKRi0WElEpVtItQQeqlCA4ciAiCtJWyUWQ7k8SqP1a2txCZzY/ghrjylzjwW7jg3eyBNKhPWu/zvGfPeIbmgluXJL+j+MbO7s1be7dbd+7eu/+gvf/w1OrCMBgyLbQ5p8SC4AqGjjsB57kBIqmAM3rxutLPLsFYrtUnt8xhLMlc8RlnxIXQpK0HE58Zid/1S5xRPu9mbKodXlX86wqvahyvttFd/w7r9eDwGgnXV9r6+oNJu5P0khp4m6QN6aAGg8l+xLKpZoUE5Zgg1o7SJHdjT4zjTEDZygoLOWEXZA6jQBWRYMe+7kyJn4bIFM+0CZ9yuI7+e8ITae1S0uCUxC3sVa0K/k8bFW72cuy5ygsHiq0TzQqBncZVm/GUG2BOLAMhzPBQK2YLYghzYRgbWRwPBW8+I6QVAozOw0MUfHZf6jJa2RsIHTDQD7tXIl8QCs5nlcaIKP37/sfSM2mXpZelV9f5KS39MNgsbbxhLOnVIWyT02e99Kh39OF556TbDGgPPUZPUBel6AU6QW/RAA0RQ7/Qn2gn2o1F/C3+Hv9YW+OoOfMIbSD++RfYY/U0</latexit>
%LM · ( x1 , y1 ), ( x2 , y2 ), · · ·
D R
arXiv:2406.14739v1 [cs.CL] 20 Jun 2024
%LM · x ; ( x1 , y1 ), ( x2 , y2 ), · · ·
a combinatorial optimization problem, gener-
ally considered NP-hard. This approach pro- Rstep Rstep
D D
<latexit sha1_base64="0anXxibT8mkEgPd43ALq1L7SyJU=">AAAChHicfVFNbxMxEHUW2oZQaApHLiuiSjmgaBfox6kKogcuFUE0H1ISVbYzaaz4Y2XPtkTW/gyu8Lv4NzjJIpEEMZKl55k3mjdvWCaFwyT5VYkePd7bP6g+qT09fPb8qH78oudMbjl0uZHGDhh1IIWGLgqUMMgsUMUk9Nn847LevwfrhNE3uMhgrOidFlPBKYbUcKQozjiV/qq4rTeSVrKKeBekJWiQMjq3xxU+mhieK9DIJXVumCYZjj21KLiEojbKHWSUz+kdDAPUVIEb+5XmIj4JmUk8NTY8jfEq+3eHp8q5hWKBudTotmvL5L9qwxynF2MvdJYjaL4eNM1ljCZeGhBPhAWOchEA5VYErTGfUUs5Bps2pqAIgjfXCGOlBGuysIiGB/y2klEbXUFwwMJ1+H2Q2YwyQP/H28J/vv5aeK7covCq8Pp/fMYK3w00x0puOEu6fYRd0HvbSs9aZ1/eN9rN8kBV8oq8Jk2SknPSJp9Ih3QJJ4Z8Jz/Iz2g/ehO9i07X1KhS9rwkGxFd/gbd5snH</latexit> <latexit sha1_base64="0anXxibT8mkEgPd43ALq1L7SyJU=">AAAChHicfVFNbxMxEHUW2oZQaApHLiuiSjmgaBfox6kKogcuFUE0H1ISVbYzaaz4Y2XPtkTW/gyu8Lv4NzjJIpEEMZKl55k3mjdvWCaFwyT5VYkePd7bP6g+qT09fPb8qH78oudMbjl0uZHGDhh1IIWGLgqUMMgsUMUk9Nn847LevwfrhNE3uMhgrOidFlPBKYbUcKQozjiV/qq4rTeSVrKKeBekJWiQMjq3xxU+mhieK9DIJXVumCYZjj21KLiEojbKHWSUz+kdDAPUVIEb+5XmIj4JmUk8NTY8jfEq+3eHp8q5hWKBudTotmvL5L9qwxynF2MvdJYjaL4eNM1ljCZeGhBPhAWOchEA5VYErTGfUUs5Bps2pqAIgjfXCGOlBGuysIiGB/y2klEbXUFwwMJ1+H2Q2YwyQP/H28J/vv5aeK7covCq8Pp/fMYK3w00x0puOEu6fYRd0HvbSs9aZ1/eN9rN8kBV8oq8Jk2SknPSJp9Ih3QJJ4Z8Jz/Iz2g/ehO9i07X1KhS9rwkGxFd/gbd5snH</latexit>
g s1 <latexit sha1_base64="QyTU34SLBrGGRjgaPsBsv1A4/4I=">AAACfXicfVFNTxsxEHWWUmhKWz6OvawaIXGoot0KQY8gOHBBgCCAlI3Q2JkQC9u7smeByNq/wJX+tf4a8IY9EKg6kqXneW80b2Z4oaSjJPnbiuY+zH9cWPzU/rz05eu35ZXVc5eXVmBP5Cq3lxwcKmmwR5IUXhYWQXOFF/xmr+YvbtE6mZszmhQ40HBt5EgKoDqVEZRXy52km0wjfg/SBnRYE8dXKy2RDXNRajQkFDjXT5OCBh4sSaGwamelwwLEDVxjP0ADGt3AT81W8XrIDONRbsMzFE+zrys8aOcmmgelBhq7t1yd/BfXL2n0e+ClKUpCI14ajUoVUx7Xk8dDaVGQmgQAwsrgNRZjsCAo7GemC8lgeHaM0FYptHkRBjF4R/dTG+1sH8MGLB6G364qxsCRfFZzAlTljw5PKy+0m1ReV978T8955XtB5nijDWdJ3x7hPTj/1U23ulsnm52djeZAi+w7+8E2WMq22Q47YMesxwQbswf2yP60nqL16GfUfZFGraZmjc1EtP0MFKnHtA==</latexit>
g s2
given family of large language models (LLMs).
We propose a training procedure based on re- Figure 1: Above: ICL under a single retriever call. Be-
inforcement learning, incorporating feedback low: ICL under our proposed iterative retriever.
from LLMs. We instantiate an iterative re-
triever for composing in-context learning (ICL)
multiple targets are required, nor the specific char-
exemplars and apply it to various semantic pars-
ing tasks that demand synthesized programs
acteristics of the inference LLMs and downstream
as outputs. By adding only 4M additional pa- task requirements. Research (Gao et al., 2021; Liu
rameters for state encoding, we convert an off- et al., 2022; Lu et al., 2022, i.a.) has shown that
the-shelf dense retriever into a stateful iterative ICL is sensitive to both the exemplars provided
retriever, outperforming previous methods in and their order within prompts. Off-the-shelf re-
selecting ICL exemplars on semantic parsing trievers, which generally rank items based solely
datasets such as SMC AL F LOW, T REE DST, on semantic similarity (Lee et al., 2019; Reimers
and MTOP. Additionally, the trained iterative
and Gurevych, 2019a, i.a.), do not ensure optimal
retriever generalizes across different inference
LLMs beyond the one used during training. conditions for either criterion, leading to subopti-
mal performance in downstream LLM generation.
1 Introduction Hence, there is a need for a retriever capable of
constructing a portfolio of items tailored to achieve
A significant emergent capability of large language optimal generation with LLMs.
models (LLMs) is in-context learning (ICL; Brown We propose iterative retrieval to address this
et al., 2020), which facilitates few-shot learning. In problem. Unlike traditional retrievers that perform
ICL, a set of exemplars1 is usually provided to build a single call to obtain a list of similar items ordered
the mapping relationship between inputs and out- by their similarities, iterative retrieval involves a se-
puts. These exemplars can either be hand-crafted quence of retrieval calls, each using different query
and fixed or retrieved from a training set. However, vectors. This makes the retriever stateful, maintain-
if retrieving from the dataset, the retrievers used in ing an internal state. The process can be likened to
such applications are typically off-the-shelf models navigating the encoding space of exemplars, with
(e.g., Contriever (Izacard et al., 2022)) that do not each step adjusting direction based on previously
consider interactions among retrieved items when selected exemplars, thus building a trajectory of
* Johns Hopkins University; performed while interning at exemplar selections.
Microsoft.
† Google; performed while at Microsoft. This approach can be formulated as Markov de-
1 An exemplar is a tuple of input and output, demonstrating cision processes (MDPs). At each step, the action
the mapping relationship between the two. taken by the retriever is a retrieval call that fetches
(potentially multiple) documents from the dataset exactly. Much of prior work resort to selecting
D.2 The policy is trained to optimally select ex- top-𝑘 exemplars based on a scoring function 𝑆:
emplars at each step so that the overall trajectory
maximizes the reward, leading to better ICL perfor- 𝑅(𝑥) = arg top 𝑘 𝑆(𝑥, (𝑥 ′ , 𝑦 ′ )) (2)
( 𝑥 ′ ,𝑦 ′ ) ∈D
mance. By leveraging the LLMs as environments,
we create simulators that allow a policy to roll out Prior work has differed on the choice of the scor-
in the environment and receive feedback on the ing function 𝑆: BM25 (Roy et al., 2023), coverage
effectiveness of the composed prompts, measured (Gupta et al., 2022), etc. However, such method
by a reward (metric). Thus, exemplar selection and did not model the interaction between the retrieved
prompt composition can be framed as policy opti- exemplars and the language model. We propose an
mization aimed at maximizing rewards, which can iterative version, where we create a retrieval state
be addressed through reinforcement learning. 𝑠, and for each step 𝑖 one exemplar (𝑥, 𝑦) ∈ D is
We situate our study in in-context semantic pars- retrieved. This is an approximation to the optimiza-
ing due to its difficulty, popularity, and practical tion problem in Equation 1.
value.3 We instantiate an iterative retriever and in-
vestigate the performance of policy learning under (𝑥 𝑖 , 𝑦 𝑖 ) ← 𝑅step (𝑠𝑖 ) (3)
this setup. Our contributions include: 𝑠𝑖+1 ← 𝜏(𝑠𝑖 , (𝑥 𝑖 , 𝑦 𝑖 )) (4)
• We propose a novel iterative retrieval framework
After 𝐾 steps, the retrieved sequence would be
that builds a portfolio of exemplars for ICL, con-
𝑅iter (𝑥) = ((𝑥 𝑖 , 𝑦 𝑖 ))1≤𝑖 ≤𝐾 . This formulation of an
sidering both interactions among retrieved exem-
iterative retriever naturally fits in the definition
plars and their relationship with LLMs;
of a Markov decision process (MDP). Here, our
• We instantiate this iterative retriever for the in- decision process comprises of (D∗ , D, 𝜏, 𝑟), where
context semantic parsing task and train its policy
• The state set D∗ contains exemplar sequences
via reinforcement learning, demonstrating supe-
whose elements are in D;
rior performance over strong baselines from prior
work, thereby proving its effectiveness; • The action set is just D: each action selects one
exemplar from D. In theory, more than 1 exem-
• Through a series of analyses, we provide insights
plar can be selected at each step, but we proceed
into the behaviors of an iterative retriever initial-
with just 1 exemplar for simplicity;
ized with an off-the-shelf retriever.
• The transition function 𝜏 : D∗ ×D → D∗ appends
2 Overview of an Iterative Retriever an exemplar to the existing sequence;
We consider the problem of in-context learning • The reward function 𝑟 : D∗ × D → R funnels
(ICL): given a dataset D = {(𝑥 𝑖 , 𝑦 𝑖 )}𝑖 of exem- signal from the LLM back to the retriever. It will
plars, a retriever 𝑅 retrieves a sequence of exem- be discussed in §4.
plars 𝑅(𝑥) based on input query 𝑥 and generate the By situating our proposed iterative retriever un-
answer 𝑦 based on the distribution 𝑃LM (·|𝑥; 𝑅(𝑥)). der this RL scenario, we can utilize all sorts of RL
This retriever 𝑅 : X → D 𝐾 retrieves an ordered techniques to train this retriever from the environ-
list (of length 𝐾) of exemplars for the LM. The ment, which is the LLM itself. In the next section,
goal of the retriever 𝑅 is to select a sequence of we instantiate a neural iterative retriever and situate
exemplars ((𝑥 𝑖 , 𝑦 𝑖 ))1≤𝑖 ≤𝐾 such that the probability it under a common task, namely semantic parsing,
of the expected output 𝑦 is maximized: under this ICL framework.
arg max 𝑃LM (𝑦|𝑥; ((𝑥𝑖 , 𝑦 𝑖 ))1≤𝑖 ≤𝐾 ). (1) 3 Instantiating an Iterative Retriever
( 𝑥𝑖 ,𝑦𝑖 ) ∈D
We consider an instance of in-context learning,
However, this is a combinatorial optimization
namely few-shot semantic parsing. Given a natural
problem that is computationally infeasible to solve
language query 𝑥, a model is expected to output a
2The action space is at least as large as D. semantic representation 𝑦 of 𝑥 given a sequence of
3Code generation is considered one of the most useful but exemplars (see Figure 3).
challenging techniques in the era of LLMs. Some semantic
parsing tasks share structural similarity with code generation We instantiate a neural iterative retriever based
and program synthesis. on the formulation we proposed above:
<latexit sha1_base64="4kwaA5pwv9kDLucouR3yTnL/CPg=">AAADYXicpVJda9swFJXjfbTZV9o99kUsDFIYwR6j22PX7qEPDc3Y0hbiECTlJhGVZCPJ24zq/M8+D/Y7Krt+aNoxxnbAcHzvubr36ohmghsbRVdBK3zw8NHjjc32k6fPnr/obG2fmjTXDEYsFak+p8SA4ApGllsB55kGIqmAM3pxWOXPvoE2PFVfbZHBRJKF4nPOiPWhaecXHk5doiU+HpTJAV8senhV4FVNL/Hqr4Hf4ITyqvofUZfv/u8ptw9is9QaXG+yO+10o35UA98ncUO6qMFwuhWwZJayXIKyTBBjxnGU2Ykj2nImoGwnuYGMsAuygLGnikgwE1cbUuLXPjLD81T7T1lcR29XOCKNKST1Skns0tzNVcHf5ca5nX+YOK6y3IJiN43mucA2xZW7eMY1MCsKTwjT3M+K2ZJowqx/A2tdLPcDr6/h2woBOs38Igq+2x/1GO3kE/gb0DDwfx9FtiQUrEuqHCOidCeDL6Vj0hSlk6VTf9JTWrqRlxnaaL0t8V0T7pPTt/14r7/3+V13v9cYtIF20CvUQzF6j/bRERqiEWLBcaADF1y2foabYSfcvpG2gqbmJVpDuHMNPK8qRA==</latexit>
(do
(Yield 1 y
(Yield
(Execute y2 !
Thank you, can (Execute (ReviseConstraint …
What about the What about in
%LM H x
tailgating party?
, x1 you also decline (ConfirmAndReturnAction)))
(Yield
, x2 January?
(EventDuringRange
(FullMonthOfMonth
,···
the Tailgate Party (EventAttendence … (January))))))
(?~= “Tailgate Party”)))))
Fenc ·
<latexit sha1_base64="0anXxibT8mkEgPd43ALq1L7SyJU=">AAAChHicfVFNbxMxEHUW2oZQaApHLiuiSjmgaBfox6kKogcuFUE0H1ISVbYzaaz4Y2XPtkTW/gyu8Lv4NzjJIpEEMZKl55k3mjdvWCaFwyT5VYkePd7bP6g+qT09fPb8qH78oudMbjl0uZHGDhh1IIWGLgqUMMgsUMUk9Nn847LevwfrhNE3uMhgrOidFlPBKYbUcKQozjiV/qq4rTeSVrKKeBekJWiQMjq3xxU+mhieK9DIJXVumCYZjj21KLiEojbKHWSUz+kdDAPUVIEb+5XmIj4JmUk8NTY8jfEq+3eHp8q5hWKBudTotmvL5L9qwxynF2MvdJYjaL4eNM1ljCZeGhBPhAWOchEA5VYErTGfUUs5Bps2pqAIgjfXCGOlBGuysIiGB/y2klEbXUFwwMJ1+H2Q2YwyQP/H28J/vv5aeK7covCq8Pp/fMYK3w00x0puOEu6fYRd0HvbSs9aZ1/eN9rN8kBV8oq8Jk2SknPSJp9Ih3QJJ4Z8Jz/Iz2g/ehO9i07X1KhS9rwkGxFd/gbd5snH</latexit>
<latexit sha1_base64="4kwaA5pwv9kDLucouR3yTnL/CPg=">AAADYXicpVJda9swFJXjfbTZV9o99kUsDFIYwR6j22PX7qEPDc3Y0hbiECTlJhGVZCPJ24zq/M8+D/Y7Krt+aNoxxnbAcHzvubr36ohmghsbRVdBK3zw8NHjjc32k6fPnr/obG2fmjTXDEYsFak+p8SA4ApGllsB55kGIqmAM3pxWOXPvoE2PFVfbZHBRJKF4nPOiPWhaecXHk5doiU+HpTJAV8senhV4FVNL/Hqr4Hf4ITyqvofUZfv/u8ptw9is9QaXG+yO+10o35UA98ncUO6qMFwuhWwZJayXIKyTBBjxnGU2Ykj2nImoGwnuYGMsAuygLGnikgwE1cbUuLXPjLD81T7T1lcR29XOCKNKST1Skns0tzNVcHf5ca5nX+YOK6y3IJiN43mucA2xZW7eMY1MCsKTwjT3M+K2ZJowqx/A2tdLPcDr6/h2woBOs38Igq+2x/1GO3kE/gb0DDwfx9FtiQUrEuqHCOidCeDL6Vj0hSlk6VTf9JTWrqRlxnaaL0t8V0T7pPTt/14r7/3+V13v9cYtIF20CvUQzF6j/bRERqiEWLBcaADF1y2foabYSfcvpG2gqbmJVpDuHMNPK8qRA==</latexit>
(Yield
(Execute 1 y
(Yield
(Execute y 2 !
(ReviseConstraint …) (ReviseConstraint …)
What about the what about the OK, how about
%LM H x
tailgating party?
, x1
video camp?
(ConstraintTypeIntension
(Event.subject_? (
, x2 the circus?
(ConstraintTypeIntension)
(Event.subject_? (
,···
(?~= “video camp”)))))) (?~= “circus”))))))
D · D ·
<latexit sha1_base64="0anXxibT8mkEgPd43ALq1L7SyJU=">AAAChHicfVFNbxMxEHUW2oZQaApHLiuiSjmgaBfox6kKogcuFUE0H1ISVbYzaaz4Y2XPtkTW/gyu8Lv4NzjJIpEEMZKl55k3mjdvWCaFwyT5VYkePd7bP6g+qT09fPb8qH78oudMbjl0uZHGDhh1IIWGLgqUMMgsUMUk9Nn847LevwfrhNE3uMhgrOidFlPBKYbUcKQozjiV/qq4rTeSVrKKeBekJWiQMjq3xxU+mhieK9DIJXVumCYZjj21KLiEojbKHWSUz+kdDAPUVIEb+5XmIj4JmUk8NTY8jfEq+3eHp8q5hWKBudTotmvL5L9qwxynF2MvdJYjaL4eNM1ljCZeGhBPhAWOchEA5VYErTGfUUs5Bps2pqAIgjfXCGOlBGuysIiGB/y2klEbXUFwwMJ1+H2Q2YwyQP/H28J/vv5aeK7covCq8Pp/fMYK3w00x0puOEu6fYRd0HvbSs9aZ1/eN9rN8kBV8oq8Jk2SknPSJp9Ih3QJJ4Z8Jz/Iz2g/ehO9i07X1KhS9rwkGxFd/gbd5snH</latexit>
<latexit sha1_base64="0anXxibT8mkEgPd43ALq1L7SyJU=">AAAChHicfVFNbxMxEHUW2oZQaApHLiuiSjmgaBfox6kKogcuFUE0H1ISVbYzaaz4Y2XPtkTW/gyu8Lv4NzjJIpEEMZKl55k3mjdvWCaFwyT5VYkePd7bP6g+qT09fPb8qH78oudMbjl0uZHGDhh1IIWGLgqUMMgsUMUk9Nn847LevwfrhNE3uMhgrOidFlPBKYbUcKQozjiV/qq4rTeSVrKKeBekJWiQMjq3xxU+mhieK9DIJXVumCYZjj21KLiEojbKHWSUz+kdDAPUVIEb+5XmIj4JmUk8NTY8jfEq+3eHp8q5hWKBudTotmvL5L9qwxynF2MvdJYjaL4eNM1ljCZeGhBPhAWOchEA5VYErTGfUUs5Bps2pqAIgjfXCGOlBGuysIiGB/y2klEbXUFwwMJ1+H2Q2YwyQP/H28J/vv5aeK7covCq8Pp/fMYK3w00x0puOEu6fYRd0HvbSs9aZ1/eN9rN8kBV8oq8Jk2SknPSJp9Ih3QJJ4Z8Jz/Iz2g/ehO9i07X1KhS9rwkGxFd/gbd5snH</latexit>
Figure 2: ICL prompt construction for an example in SMC AL F LOW. Above: ICL with BM25 as the retriever. Below:
An instance of our iterative retriever. BM25 retrieves examples that overlaps lexically with the query, whereas the
trained iterative retriever is better at retrieving structurally similar exemplars since it is trained to maximize the
probability of the LM generating the reference parse.
CalFlow
(Yield Under this policy, if we take greedy decoding,
When is my next staff :output (Event.start
meeting scheduled for? :obj (FindNumNextEvent the retrieval step would just be
:constraint (Event.subject_?
:obj (?~= “staff meeting”))
:number 1L)))
(𝑥 𝑖 , 𝑦 𝑖 ) ← 𝑅step (s𝑖 ) = arg max 𝜋((𝑥 𝑖 , 𝑦 𝑖 )|𝑠𝑖 )
( 𝑥 ′ ,𝑦 ′ ) ∈D
TreeDST
(plan = arg max Q(s𝑖 ) · Fenc (𝑥 𝑖 ). (6)
Hey assistant, what is (Find ( 𝑥 ′ ,𝑦 ′ ) ∈D
:focus (Restaurant.priceRange_?
the price range of
always)
Stazione restaurant? :object (Restaurant.restaurantName_? This is a maximum inner product search (MIPS)
(?= “Stazione restaurant”))))
problem, and thus can be solved with a vector
MTOP
[IN:CREATE_REMINDER
index such as FAISS (Douze et al., 2024).
Set up a reminder to [SL:TODO
[IN:SEND_MESSAGE
• State transition is modeled by a gated recurrent
message Mike at 7pm
[SL:METHOD_MESSAGE message] unit (GRU; Chung et al., 2014) update:
tonight. [SL:RECIPIENT Mike]]]
[SL:DATE_TIME at 7pm tonight]]
s𝑖+1 ← GRU(s𝑖 , Fenc (𝑥 𝑖 )) (7)
Figure 3: Samples of (𝑥, 𝑦) pairs for semantic parsing
where the encoded vector of the retrieved exem-
under different datasets used in this paper.
plar 𝑥 𝑖 is passed to the GRU to update the state.4
Note that the only additional parameters we in-
• The state of the MDP, i.e. the sequence of exem-
cluded in this neural iterative retriever is the state
plars, is modeled by a fixed-length vector s ∈ R𝑑 .
transition model, where we instantiate as a GRU.
The initial state s0 is a parameter.
This is different from a regular retriever, where
• At each step 1 exemplar is retrieved. We define a a single retrieval call to the training set 𝑅(𝑥) =
policy distribution that picks one exemplar from arg max ( 𝑥,𝑦) ∈D q · Fenc (𝑥) is made. The iterative
the training set D, similar to Lu et al. (2023): retriever navigates the encoding space of exem-
plars, adjusting the query vector q′ at each step
𝜋((𝑥 𝑖 , 𝑦 𝑖 )|𝑠𝑖 ) ∝ exp(Q(s𝑖 ) · Fenc (𝑥 𝑖 )/𝛽) (5)
based on previously selected exemplars 𝑠, thus
where Q : R𝑑 → R𝑑 maps a state vector s𝑖 to steering the search process to find new candidates.
a query vector q𝑖 , Fenc : 𝑉 ∗ → R𝑑 is a text Figure 2 demonstrates the process of such an it-
embedder that maps a text sequence into a vector, erative retriever. This stateful design allows for
and 𝛽 is a temperature hyperparameter. In our optimized retrieval results through iterative inter-
experiments, Fenc is initialized with the weights actions, incorporating signals from both external
of Contriever (Izacard et al., 2022), a general- 4
Using a Transformer decoder here results in more unsta-
purpose text embedder trained for retrieval. ble training as we discovered in our experiments. See §6.1.
sources (LLMs) and internal states (previously re- given the existing exemplar sequence 𝑠𝑖 :
trieved items tracked via state transitions).
𝑟 (𝑠𝑖 , 𝑥𝑖 ) = 𝑃LM (𝑦 ∗ | 𝑥; 𝑠𝑖 , (𝑥𝑖 , 𝑦 𝑖 ))
4 Training − 𝑃LM (𝑦 ∗ | 𝑥; 𝑠𝑖 ). (8)
5 Experimental Setup
Renormalization
Datasets We validate our pilot iterative retriever
for ICL on a set of semantic parsing datasets,
Stratified sampling namely SMC AL F LOW (Andreas et al., 2020),
<latexit sha1_base64="hktWJImfT7U7+HhJvwRURczYoU4=">AAACf3icfVFNTxsxEHW2tEBKy9eRy4qogl6iXYT4uIHKgQsCBAlI2QjZzoQYbK9lzwKRtf+BK/wz/g1O2AOBipEsPc97o3kzw4wUDpPkpRZ9m/r+Y3pmtv5z7tfv+YXFpbbLC8uhxXOZ20tGHUihoYUCJVwaC1QxCRfs9t+Iv7gD60Suz3FooKvotRZ9wSmGVDszYv3h79VCI2km44g/g7QCDVLFydVijWe9nBcKNHJJneukicGupxYFl1DWs8KBofyWXkMnQE0VuK4f2y3jPyHTi/u5DU9jPM6+r/BUOTdULCgVxYH7yI2S/+M6BfZ3ul5oUyBo/taoX8gY83g0e9wTFjjKYQCUWxG8xnxALeUYNjTRBUUwPDlGaCsl2NyEQTTc48PYRj07gLABC0fhty/NgDJAn404TmXpj4/OSs+VG5ZelV5/pWes9K0gc6zShrOkH4/wGbQ3mulWc+t0s7G3Ux1ohqyQVbJOUrJN9sghOSEtwskNeSRP5DmqRWtRM0repFGtqlkmExHtvgL4Xccx</latexit>
is more representative for models sharing the same Variant EM@1 SMatch-F
pre-training procedure. Notably, with Mistral, Con- Contriever 44.0 67.6
triever performs worse than BM25 on MTOP, but I TER R 54.1 77.3
I TER R still shows improvement. This suggests that − EPR intialization 45.1 68.8
I TER R, comprising a frozen EPR and additional − GRU; + Transformer decoder 50.1 75.1
− Stratified sampling 52.3 75.7
GRU layers, can learn task-specific abilities not
present in the vanilla EPR. Table 2: Results on ablation study. − EPR intialization
indicates the model is trained from Contriever instead
of a EPR finetuned checkpoint. + Transformer decoder
replaces GRU with a Transformer decoder. − Stratified
ICL & Number of Exemplars We investigated sampling replaces the stratified sampling described in
how the performance of I TER R changes with the Figure 4 with sampling directly from the buffer.
number of exemplars ({1, · · · , 10}) used for ICL
on the SMC AL F LOW dataset (Figure 6). I TER R 6.1 Ablation Study
consistently outperforms baseline models across
various metrics and numbers of exemplars, with We further conduct ablation study on components
one exception for the EM@3 metric when using of an iterative retriever, focusing on the SM-
6 exemplars. This aligns with our training objec- C AL F LOW dataset and use Llama-2-7b while
tive, where actions that boost performance at each changing the configuration of the iterative retriever.
step receive higher advantages. I TER R achieves Results are reported in Table 2.
comparable performance with fewer exemplars.
EPR Initialization Although we follow prior
CEIL shows a similar trend in EM, but its work in using EPR as initialization for Fenc , our
SMatch performance lags significantly, indicating iterative retriever is agnostic to the choice of base
poorer quality in near-miss predictions compared encoders for similarity search. Even without EPR
to I TER R. Practically, this means our method al- initialization, our training procedure still improves
lows for a trade-off between performance and cost, performance against Contriever (≈ 1% gain under
enabling effective ICL with fewer exemplars and Contriever, but ≈ 6% gain under EPR). We see
reducing the number of tokens processed by LLMs. that I TER R benefits more when using EPR initial-
EM@1 EM@3 SMatch
77.5
55
50 75.0
CalFlowV2
EPR
50 CEIL
45 72.5
IterR
45
40 70.0
40
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
Figure 6: Performance comparisons across the various numbers of exemplars used for ICL.
Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan,
Kashif Rasul, Weixun Wang, and Lewis Tunstall. Lawrence Carin, and Weizhu Chen. 2022. What
2024. The N+ implementation details of RLHF with makes good in-context examples for GPT-3? In
PPO: A case study on tl;dr summarization. CoRR, Proceedings of Deep Learning Inside Out (DeeLIO
abs/2403.17031. 2022): The 3rd Workshop on Knowledge Extrac-
tion and Integration for Deep Learning Architectures,
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Se- pages 100–114, Dublin, Ireland and Online. Associa-
bastian Riedel, Piotr Bojanowski, Armand Joulin, tion for Computational Linguistics.
and Edouard Grave. 2022. Unsupervised dense in-
formation retrieval with contrastive learning. Trans. Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu,
Mach. Learn. Res., 2022. Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark,
and Ashwin Kalyan. 2023. Dynamic prompt learn-
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Men- ing via policy gradient for semi-structured mathe-
sch, Chris Bamford, Devendra Singh Chaplot, Diego matical reasoning. In The Eleventh International
de Las Casas, Florian Bressand, Gianna Lengyel, Conference on Learning Representations, ICLR 2023,
Guillaume Lample, Lucile Saulnier, Lélio Re- Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
nard Lavaud, Marie-Anne Lachaux, Pierre Stock,
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel,
Teven Le Scao, Thibaut Lavril, Thomas Wang, Timo-
and Pontus Stenetorp. 2022. Fantastically ordered
thée Lacroix, and William El Sayed. 2023. Mistral
prompts and where to find them: Overcoming few-
7b. CoRR, abs/2310.06825.
shot prompt order sensitivity. In Proceedings of the
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying 60th Annual Meeting of the Association for Compu-
Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonza- tational Linguistics (Volume 1: Long Papers), pages
lez, Hao Zhang, and Ion Stoica. 2023. Efficient mem- 8086–8098, Dublin, Ireland. Association for Compu-
ory management for large language model serving tational Linguistics.
with pagedattention. In Proceedings of the 29th Sym- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi
posium on Operating Systems Principles, SOSP 2023, Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley,
Koblenz, Germany, October 23-26, 2023, pages 611– David Silver, and Koray Kavukcuoglu. 2016. Asyn-
626. ACM. chronous methods for deep reinforcement learning.
CoRR, abs/1602.01783.
Irene Langkilde and Kevin Knight. 1998. Generation
that exploits corpus-based statistical knowledge. In Inbar Oren, Jonathan Herzig, and Jonathan Berant. 2021.
36th Annual Meeting of the Association for Compu- Finding needles in a haystack: Sampling structurally-
tational Linguistics and 17th International Confer- diverse training sets from synthetic data for compo-
ence on Computational Linguistics, Volume 1, pages sitional generalization. In Proceedings of the 2021
704–710, Montreal, Quebec, Canada. Association for Conference on Empirical Methods in Natural Lan-
Computational Linguistics. guage Processing, pages 10793–10809, Online and
Punta Cana, Dominican Republic. Association for
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Computational Linguistics.
2019. Latent retrieval for weakly supervised open
domain question answering. In Proceedings of the Nils Reimers and Iryna Gurevych. 2019a. Sentence-
57th Annual Meeting of the Association for Computa- bert: Sentence embeddings using siamese bert-
tional Linguistics, pages 6086–6096, Florence, Italy. networks. In Proceedings of the 2019 Conference on
Association for Computational Linguistics. Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on Nat-
Itay Levy, Ben Bogin, and Jonathan Berant. 2023. Di- ural Language Processing, EMNLP-IJCNLP 2019,
verse demonstrations improve in-context composi- Hong Kong, China, November 3-7, 2019, pages 3980–
tional generalization. In Proceedings of the 61st An- 3990. Association for Computational Linguistics.
nual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 1401– Nils Reimers and Iryna Gurevych. 2019b. Sentence-
1422, Toronto, Canada. Association for Computa- BERT: Sentence embeddings using Siamese BERT-
tional Linguistics. networks. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing Richard Shin and Benjamin Van Durme. 2022. Few-
and the 9th International Joint Conference on Natu- shot semantic parsing with language models trained
ral Language Processing (EMNLP-IJCNLP), pages on code. In Proceedings of the 2022 Conference of
3982–3992, Hong Kong, China. Association for Com- the North American Chapter of the Association for
putational Linguistics. Computational Linguistics: Human Language Tech-
nologies, pages 5417–5425, Seattle, United States.
Stephen E. Robertson and Hugo Zaragoza. 2009. The Association for Computational Linguistics.
probabilistic relevance framework: BM25 and be-
yond. Found. Trends Inf. Retr., 3(4):333–389. Richard S. Sutton, David A. McAllester, Satinder Singh,
and Yishay Mansour. 1999. Policy gradient methods
Subhro Roy, Samuel Thomson, Tongfei Chen, Richard for reinforcement learning with function approxima-
Shin, Adam Pauls, Jason Eisner, and Benjamin Van tion. In Advances in Neural Information Processing
Durme. 2023. Benchclamp: A benchmark for eval- Systems 12, [NIPS Conference, Denver, Colorado,
uating language models on syntactic and semantic USA, November 29 - December 4, 1999], pages 1057–
parsing. In Advances in Neural Information Pro- 1063. The MIT Press.
cessing Systems 36: Annual Conference on Neural
Information Processing Systems 2023, NeurIPS 2023, Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng,
New Orleans, LA, USA, December 10 - 16, 2023. Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan
Lan, Liwei Wang, and Tie-Yan Liu. 2020. On layer
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten normalization in the transformer architecture. In Pro-
Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, ceedings of the 37th International Conference on
Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Machine Learning, ICML 2020, 13-18 July 2020, Vir-
Kozhevnikov, Ivan Evtimov, Joanna Bitton, Man- tual Event, volume 119 of Proceedings of Machine
ish Bhatt, Cristian Canton-Ferrer, Aaron Grattafiori, Learning Research, pages 10524–10533. PMLR.
Wenhan Xiong, Alexandre Défossez, Jade Copet, Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, and
Faisal Azhar, Hugo Touvron, Louis Martin, Nico- Lingpeng Kong. 2023. Compositional exemplars
las Usunier, Thomas Scialom, and Gabriel Synnaeve. for in-context learning. In International Conference
2023. Code llama: Open foundation models for code. on Machine Learning, ICML 2023, 23-29 July 2023,
CoRR, abs/2308.12950. Honolulu, Hawaii, USA, volume 202 of Proceedings
of Machine Learning Research, pages 39818–39833.
Ohad Rubin, Jonathan Herzig, and Jonathan Berant. PMLR.
2022. Learning to retrieve prompts for in-context
learning. In Proceedings of the 2022 Conference of Yiming Zhang, Shi Feng, and Chenhao Tan. 2022. Ac-
the North American Chapter of the Association for tive example selection for in-context learning. In Pro-
Computational Linguistics: Human Language Tech- ceedings of the 2022 Conference on Empirical Meth-
nologies, pages 2655–2671, Seattle, United States. ods in Natural Language Processing, pages 9134–
Association for Computational Linguistics. 9148, Abu Dhabi, United Arab Emirates. Association
for Computational Linguistics.
John Schulman, Philipp Moritz, Sergey Levine,
Michael I. Jordan, and Pieter Abbeel. 2016. High-
dimensional continuous control using generalized
advantage estimation. In 4th International Confer-
ence on Learning Representations, ICLR 2016, San
Juan, Puerto Rico, May 2-4, 2016, Conference Track
Proceedings.
Table 4: Hyperparameters and other reproducibility in- Figure 7: Example AMR based on the previous parse.
formation for I TER R. 𝛽renorm is the temperature used to
create a renormalized action distribution. 𝑐 1 and 𝑐 2 are
coefficients used in the PPO loss. 𝛾 and 𝜆 are discount This AMR can be easily converted to the follow-
factors used in GAE. ing triples.
instance ($0 , Yield )
output ($0 , $1 )
A.3 Prompt Template instance ($1 , Event . start )
obj ($1 , $2 )
The prompt template used across all our experi- instance ($2 , FindNumNextEvent )
ments is shown in Table 5. constraint ($2 , $3 )
instance ($3 , Event . subject_ ?)
obj ($3 , $4 )
Let’s translate what a human user says instance ($4 , ?~=)
into what a computer might say. ARG0 ($4 , " staff ␣ meeting ")
number ($2 , 1L)
Human: 𝑥1
Computer: 𝑦 1
···
Human: 𝑥 𝑁
Computer: 𝑦 𝑁
Human: 𝑥
Computer: