Dynamic Planning With A LLM

Uploaded by

Expecto Limited

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views9 pages

Dynamic Planning With A LLM

Uploaded by

Expecto Limited

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Dynamic Planning with a LLM

Gautier Dagan Frank Keller Alex Lascarides

School of Informatics
University of Edinburgh, UK
[email protected], {keller, alex}@inf.ed.ac.uk

Abstract Consistency (Wang et al., 2023b) augment the con-

text with reasoning traces. Other, agent-based ap-
While Large Language Models (LLMs) can
proaches, such as ReAct (Yao et al., 2023), inte-
solve many NLP tasks in zero-shot settings, ap-
grate feedback from the environment iteratively,
arXiv:2308.06391v1 [cs.CL] 11 Aug 2023

plications involving embodied agents remain

problematic. In particular, complex plans that giving the agent the ability to take ‘thinking’ steps
require multi-step reasoning become difficult or to augment its context with a reasoning trace.
and too costly as the context window grows. However, these approaches frequently involve high
Planning requires understanding the likely ef- computational costs due to the iterated invocations
fects of one’s actions and identifying whether of LLMs and still face challenges dealing with the
the current environment satisfies the goal state. limits of the context window and recovering from
While symbolic planners find optimal solu-
hallucinations, which can compromise the quality
tions quickly, they require a complete and ac-
curate representation of the planning problem, of the plans.
severely limiting their use in practical scenarios. Conversely, traditional symbolic planners, such
In contrast, modern LLMs cope with noisy ob- as the Fast-Forward planner (Hoffmann and Nebel,
servations and high levels of uncertainty when 2001) or the BFS(f) planner(Lipovetzky et al.,
reasoning about a task. Our work presents
2014), excel at finding optimal plans efficiently.
LLM Dynamic Planner (LLM-DP): a neuro-
symbolic framework where an LLM works
But symbolic planners require problem and domain
hand-in-hand with a traditional planner to solve descriptions as prerequisites (McDermott, 2000),
an embodied task. Given action-descriptions, which hampers their applicability in real-world sce-
LLM-DP solves Alfworld faster and more effi- narios where it may be infeasible to achieve these
ciently than a naive LLM ReAct baseline. high informational demands. For instance, know-
ing a complete and accurate description of the goal
1 Introduction may not be possible before exploring the environ-
Large Language Models (LLMs), like GPT-4 (Ope- ment through actions.
nAI, 2023), have proven remarkably effective at Previous work by (Liu et al., 2023) has shown
various natural language processing tasks, partic- that LLMs can generate valid problem files in the
ularly in zero-shot or few-shot settings (Brown Planning Domain Definition Language (PDDL ) for
et al., 2020). However, employing LLMs in em- many simple examples. Yet, the problem of incom-
bodied agents, which interact with dynamic envi- plete information remains: agents often need to
ronments, presents substantial challenges. LLMs interact with the world to discover their surround-
tend to generate incorrect or spurious information, ings before optimal planning can be applied. Some
a phenomenon known as hallucination, and their versions of PDDL have been proposed in the past to
performance is brittle to the phrasing of prompts deal with probabilities or Task and Motion Plan-
(Ji et al., 2022). Moreover, LLMs are ill-equipped ning, such as PPDDL and PDDLStream (Younes
for naive long-term planning since managing an and Littman, 2004; Garrett et al., 2018), but these
extensive context over multiple steps is complex still assume a human designer encoding the agent’s
and resource-consuming (Silver et al., 2022; Liu understanding of the domain and the planning prob-
et al., 2023). lem, rather than the agent learning from interac-
Various approaches have aimed to mitigate some tions. Therefore, where modern LLMs need mini-
of these limitations. For instance, methods like mal information to figure out a task, e.g. through
Chain-of-Thought (Wei et al., 2022) and Self- Few-shot or In-Context Learning (Honovich et al.,
(:action go-to
PDDL Domain
...)
(:action pick-up
...)
PDDL Problem(s)
(:action heat <Go to shelf-1,
...) (:goal (exists (?t - potato ?r - countertop)

Goal
pick-up potato-1,
(and (inReceptacle ?t ?r) Go to microwave-1
... > ✅
Heat a potato (isHot ?t))))
Plan
task

and put it on a LLM ❌

Init States
No Plan found
countertop (init:
(init: ...
...
Generator
(inReceptacle (init: ... fridge-1))
potato-1
<Go to fridge-1,
(inReceptacle
(inReceptacle potato-1 fridge-1))
potato-1 fridge-1))
open fridge-1,
pick-up potato-1
... > ✅

Observation
Action
Env
Selector

Figure 1: LLM Dynamic Planner (LLM-DP). The LLM grounds observations and processes natural language
instructions into PDDL to use with a symbolic planner. This model can solve plans for unobserved or previously
unknown objects because the LLM generates plausible predicates for relevant objects through semantic and
pragmatic inference. Through sampling possible predicates, multiple plans can be found, and an Action Selector
decides whether to act, review its understanding of the problem, or ask clarification questions.

2022; Chen et al., 2022; Min et al., 2022), tradi- tions quickly in well-defined domains. However,
tional planners need maximal information. their up-front requirement for comprehensive prob-
In this work, we introduce the LLM Dynamic lem and domain descriptions limits their applicabil-
Planner (LLM-DP), a neuro-symbolic frame- ity in complex real-world settings where complete
work that integrates an LLM with a symbolic information may not be available.
planner to solve embodied tasks.1 LLM-DP capi- LLMs in Planning and Reasoning In contrast
talises on the LLM’s ability to understand actions to symbolic planners, LLMs have shown promise
and their impact on their environment and com- in adapting to noisy planning and reasoning tasks
bines it with the planner’s efficiency in finding so- through various methods. Some general ap-
lutions. Using domain knowledge, LLM-DP solves proaches such as Chain-of-Thought (Wei et al.,
the Alfworld test set faster and more efficiently than 2022), Self-Consistency (Wang et al., 2023b), and
a LLM-only (ReAct) approach. The remainder of Reasoning via Planning (Hao et al., 2023) augment
this paper explores the architecture of LLM-DP, dis- the context with a reasoning trace that the LLM gen-
cusses how to combine the strengths of LLMs and erates to improve its final prediction. Alternatively,
symbolic planning and presents potential research giving access to tools/APIs (Schick et al., 2023;
avenues for future work in LLM-driven agents. Patil et al., 2023), outside knowledge or databases
(Peng et al., 2023; Hu et al., 2023), code (Surís
2 Related Work
et al., 2023), and even symbolic reasoners (Yang
Symbolic Planners Symbolic planners have been et al., 2023) to enrich an LLM’s context and abil-
a cornerstone in automated planning and artificial ity to reason. The LLM can trigger these external
intelligence for decades (Fikes and Nilsson, 1971). sources of information or logic (through fine-tuning
Based on formal logic, they operate over symbolic or prompting) to obtain additional context and im-
representations of the world to find a sequence of prove its downstream performance.
actions that transition from an initial state to a goal Embodied Agents with LLMs In a parallel di-
state. Since the introduction of PDDL (McDermott, rection, recent works such as ReAct (Yao et al.,
2000), the AI planning community has developed 2023), Reflexion (Shinn et al., 2023), AutoGPT
an array of efficient planning algorithms. For exam- (Significant-Gravitas, 2023), and Voyager (Wang
ple, the Fast-Forward planner (FF) (Hoffmann and et al., 2023a), take an agent-based approach and
Nebel, 2001) employs heuristics derived from a augment the reasoning process through a closed
relaxed version of the planning problem. Similarly, ‘while’ loop that feeds environment observations
the BFS(f) planner (Lipovetzky et al., 2014) com- back to the LLM. ReAct (Yao et al., 2023) allows
bines breadth-first search and specialised heuristics. the LLM agent to either take an action or a ‘think-
These planners find high-quality or optimal solu- ing’ step. This allows the LLM to augment its
1
Our code is available at github.com/itl-ed/llm-dp context with its reasoning, which can be seen as
agent-driven Chain-of-Thought prompting. Voy- Algorithm 1 LLM-DP Pseudo-code
ager (Wang et al., 2023a) incrementally builds an Require: LLM, PG, AS, Domain, task, obs0
agent’s capabilities from its interactions with the goal ← LLM(Domain, task)
environment and an accessible memory compo- W, B ←observe(goal, obs0 )
nent (skill library). While many of these works while goal not reached do
show promising results in building general exe- plans ← ∅
cutable agents in embodied environments (Wang for i in N do
et al., 2023a), they still require many expensive wbelief ←LLM(B, W)
calls to the LLMs, are limited by the LLM’s con-
S
plans ←PG(wbelief W)
text window, and do not guarantee optimal plans. end for
action ←AS(plans)
3 Alfworld
obs ←Env(action)
Alfworld (Shridhar et al., 2020) is a text-only home W, B ←observe(action, obs)
environment where an agent is tasked with seven end while
possible tasks, such as interacting with one or more
objects and placing them in a specific receptacle. 2. Perfect observations: The Alfworld environ-
At the start of each episode, the goal is given in ment provides a perfect textual description of
natural language, and the initial observation does the current location. This observation also
not include the location of any objects. Therefore contains the intrinsic attributes of observed
an agent must navigate the environment to search objects and receptacles, such as whether or
for the relevant objects and perform the correct ac- not a given receptacle can be opened.
tions. The possible locations of the environment
are known, and the agent can navigate to any re- 3. Causal Environment: changes in the envi-
ceptacle by using a ‘go to’ action. However, since ronment are entirely caused by the agent.
none of the objects’ locations are initially observed, 4. Valid actions always succeed
the agent must be able to plan around uncertainty,
4.2 Generating the Goal State
estimate where objects are likely to be observed
and adjust accordingly. LLM-DP uses an LLM to generate a PDDL goal,
given the natural language instruction (task) and
4 LLM-DP the valid predicates defined by the PDDL domain
file. Figure 1 shows an example task converted
To tackle an embodied environment like Alfworld,
to a valid PDDL goal. For each episode, we use a
we introduce the Large Language Model Dynamic
set of three in-context examples that are fixed for
Planner (LLM-DP), which operates as a closed-
the entire evaluation duration. We use the OpenAI
loop agent. LLM-DP uses a combination of lan-
gpt-3.5-turbo-0613 LLM model with a temper-
guage understanding and symbolic reasoning to
ature of 0 in all our LLM-DP experiments.
plan and solve tasks in the simulated environment.
The model tracks a World State W and beliefs B 4.3 Sampling Beliefs
about predicates in the environment, uses an LLM
We parse the initial scene description into a struc-
to translate the task description into an executable
tured representation of the environment W and a
goal state and samples its beliefs to generate plau-
set of beliefs B. The internal representation of the
sible world states. We describe the working of the
world W contains all known information, for in-
LLM-DP agent as pseudo-code in Algorithm 1.
stance, all receptacles (possible locations) in the
4.1 Assumptions scene from the initial observation and their intrin-
sic attributes are known (i.e. a fridge holds the
We make several simplifying assumptions when
isFridge predicate). Whereas the set of beliefs B
applying the LLM-DP framework to Alfworld:
are a set of possible valid predicates that can be
1. Known action-descriptions and predicates: true or false and which the model does not have
Our input to the planner and the LLM re- enough information to disambiguate. In Alfworld,
quires the PDDL domain file, which describes the objects’ locations are unknown; therefore, the
what actions can be taken, their pre- and post- set of possible predicates for each object includes
conditions, and what predicates exist. all possible locations.
Average Accuracy (%)
Model clean cool examine heat put puttwo overall (↑) LLM Tokens (↓)
LLM-DP 0.94 1.00 1.00 0.87 1.00 0.94 0.96 633k
LLM-DP-random 0.94 1.00 1.00 0.87 0.96 1.00 0.96 67k
ReAct (Yao et al., 2023) 0.61 0.81 0.89 0.30 0.79 0.47 0.64 —*
ReAct (ours) 0.35 0.90 0.33 0.65 0.71 0.29 0.54 9.16M

(a) The average accuracy and number of LLM Tokens processed (context + generation) for each model. *Not reported.
Average Episode Length
Model clean cool examine heat put puttwo overall (↓)
LLM-DP 12.00 13.67 12.06 12.30 12.75 17.59 13.16
LLM-DP-random 15.06 17.14 10.56 14.04 14.62 18.94 15.02
ReAct (ours) 25.10 9.86 21.67 14.70 15.33 24.94 18.69
(b) The average episode length for each model, where the length of an episode denotes how many actions the agent has taken or
attempted to take to complete a task. We do not count the ‘thinking’ action of ReAct as an action in this metric.

Table 1: Summary of model performance on the Alfword test set. LLM-DP and LLM-DP-random differ in the sampling
strategy of the belief. LLM-DP uses an LLM to generate n = 3 plausible world states, while LLM-DP-random
randomly samples n = 3 plausible world states.

LLM-DP uses stored observations W, beliefs B 4.4 Plan Generator

and an LLM to construct different planning prob-
Upon constructing the different PDDL problems, the
lem files in PDDL . A PDDL problem file includes the
agent uses a Plan Generator (PG) to solve each
objects observed (:objects), a representation of
problem and obtain a plan. We use the BFS(f)
the current state (:init) of the world and the ob-
solver (Lipovetzky et al., 2014) implemented as an
ject attributes, and the goal to be achieved (:goal).
executable by LAPKT (Ramirez et al., 2015). A
The goal is derived from the LLM (Section 4.2),
generated plan is a sequence of actions, where each
while the objects and their attributes are obtained
action is represented in a symbolic form, which,
from W (observations) and the beliefs the B has
if executed, would lead to the goal state from the
about the objects.
initial state.
Since B includes possible predicates which are
unknown, we sample from B using an LLM to
4.5 Action Selector
obtain wbelief . For instance, our belief could be
that (inReceptacle tomato ?x) where ?x can The Action Selector (AS) module decides the
be countertop, cabinet, fridge, etc. Since agent’s immediate next action. It takes the plan-
we want to condition the sampling of where the ner’s output, a set of plans, and selects an action
tomato can appear, we pass the known world from them. In our Alfworld experiments, the Ac-
state W along with the predicate (in this case tion Selector simply selects the shortest plan re-
inReceptacle) and its options to the LLM.This turned. If no valid plans are returned, all sampled
sampling leverages the LLM to complete a world states were satisfying goal states, there is a mistake
state and is extendable to any unknown predicate with the constructed domain/problem files, or the
from which a set of beliefs can be deduced. We planner has failed to find a path to the goal. In the
also compare LLM sampling with random sam- first case, we re-sample random world states and
pling (llmdp-random). re-run the planners once.
We describe our likely world state as the union We also propose exploring different strategies
between a sampled S set of beliefs and the known when valid plans cannot be found. For instance,
world state wbelief W. Then sampling i = similarly to self-reflection (Shinn et al., 2023), the
1, .., N different sets of beliefs during the planning Action Selector could prompt an update in the
loop, we obtain N likely world states. Finally, we agent’s belief about the world state if none of gener-
convert each likely world state to lists of predicates ated problem descriptions are solvable. The Action
to interface with the PDDL planner. Selector could also interact with a human teacher
or oracle to adjust its understanding of the environ- the type of task, such that two examples of the
ment (problem) or its logic (domain). same type of task being solved are always shown.
We also find that our reproduction of ReAct is
4.6 Observation Processing worse than the original and attribute this to the
LLM-DP uses the result of each action to update its gpt-3.5-turbo model being more conversational
internal state representation. It uses the symbolic than text-davinci-002, and thus less likely to
effects of the action to infer changes in the state of output valid actions as it favours fluency over fol-
the objects and receptacles. Then it integrates the lowing the templated action language.
information from the new observation, which might We also measure the length of each successful
reveal additional details not directly inferred from episode and find that LLM-DP reaches the goal
the action itself. For instance, opening an unseen state faster on average (13.16 actions) versus ReAct
drawer might reveal new objects inside. Observing (18.69 actions) and a random search strategy (15.02
also updates the beliefs – if an object is observed at actions). The Average Episode Length measures
a location, it cannot be elsewhere, but if an object the number of actions taken in the environment and
is not observed at a location, it cannot be there. how efficient the agent is.
Observations incorporate beliefs into W.
If the agent detects new information from the 6 Conclusion
scene - such as discovering new objects - it triggers
The LLM-DP agent effectively integrates language
a re-planning process. The agent then generates a
understanding, symbolic planning, and state track-
new set of possible PDDL problems using the up-
ing in a dynamic environment. It uses the language
dated state representation and corresponding plans
model to understand tasks and scenes expressed
using the Plan Generator. This approach is similar
in natural language, constructs and solves plan-
to some Task and Motion Planning (TAMP) meth-
ning problems to decide on a course of action, and
ods (Garrett et al., 2018; Chen et al., 2023), en-
keeps track of the world state to adapt to changes
abling the agent to adapt to environmental changes
and make informed decisions. This workflow en-
and unexpected outcomes of actions.
ables the agent to perform complex tasks in the
5 Results Alfworld environment, making it a promising ap-
proach for embodied tasks that involve language
We contrast the LLM-DP approach with ReAct understanding, reasoning, and decision-making.
(LLM-only baseline) from the original implemen- LLM-DP offers a cost and efficiency trade-off
tation by Yao et al. (2023). Since we use a differ- between a wholly symbolic solution and an LLM-
ent backbone LLM model (gpt-3.5-turbo rather only model. The LLM’s semantic knowledge of
than text-davinci-002) than the ReAct base- the world is leveraged to translate the problem into
line for cost purposes, we also reproduce their re- PDDL while guiding the search process through be-
sults using gpt-3.5-turbo and adapt the ReAct lief instantiation. We find that not only is LLM-DP
prompts to a chat format. cheaper, on a per-token comparison, but it is also
As shown in Table 1, LLM-DP solves Alfworld faster and more successful at long-term planning in
almost perfectly (96%) compared to our baseline an embodied environment. LLM-DP validates the
reproduction of ReAct (53%). The LLM-DP can need for LLM research to incorporate specialised
translate the task description into an executable tools, such as PDDL solvers, in embodied agents to
PDDL goal 97% of the time, but sampling reduces promote valid
the accuracy further when it fails to select a valid Despite these promising results, numerous topics
set of possible world states – for instance, by sam- and unresolved issues remain open for future in-
pling states where the goal is already satisfied. vestigation. Key among these is devising strategies
We note, that the ReAct baseline makes differ- to encode the world model and belief, currently
ent assumptions about the problem; while it does handled symbolically, and managing uncertain ob-
not require a domain file containing the action- servations — particularly from an image model
descriptions and object predicates, it uses two sep- — along with propagating any uncertainty to the
arate human-annotated episodes per example to planner and Action Selector. We intentionally kept
bootstrap its in-context logic. ReAct also switches the Action Selector simple for our experiments, but
out which examples to use in-context based on future work may also explore different strategies to
encourage self-reflection within the agent loop. For Shibo Hao, Yilan Gu, Haodi Ma, Joshua Jiahua Hong,
instance, if all plans prove invalid, beliefs may be Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. 2023.
Reasoning with language model is planning with
updated, or it might indicate an incorrect domain
world model. ArXiv, abs/2305.14992.
definition. Such instances may necessitate agents
to interact with an instructor who can provide in- Jörg Hoffmann and Bernhard Nebel. 2001. The FF plan-
sights about action pre-conditions and effects. This ning system: Fast plan generation through heuristic
search. Journal of Artificial Intelligence Research,
direction could lead us from a static domain file 14:253–302.
towards an agent truly adaptable to new environ-
ments, fostering continual learning and adaptation. Or Honovich, Uri Shaham, Samuel R. Bowman, and
Omer Levy. 2022. Instruction induction: From
Acknowledgements few examples to natural language task descriptions.
ArXiv, abs/2205.10782.
This work was supported in part by the UKRI Cen-
Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo,
tre for Doctoral Training in Natural Language Pro- Junbo Jake Zhao, and Hang Zhao. 2023. Chatdb:
cessing, funded by the UKRI (grant EP/S022481/1) Augmenting llms with databases as their symbolic
at the University of Edinburgh, School of Infor- memory. ArXiv, abs/2306.03901.
matics and School of Philosophy, Psychology &
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan
Language Sciences and by the UKRI-funded TAS Su, Yan Xu, Etsuko Ishii, Yejin Bang, Wenliang Dai,
Governance Node (grant number EP/V026607/1). Andrea Madotto, and Pascale Fung. 2022. Survey of
hallucination in natural language generation. ACM
Computing Surveys, 55:1 – 38.
References
Nir Lipovetzky, Miquel Ramirez, Christian Muise, and
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Hector Geffner. 2014. Width and inference based
Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind planners: Siw, bfs (f), and probe. Proceedings of the
Neelakantan, Pranav Shyam, Girish Sastry, Amanda 8th International Planning Competition (IPC-2014),
Askell, Sandhini Agarwal, Ariel Herbert-Voss, page 43.
Gretchen Krueger, Tom Henighan, Rewon Child,
Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, B. Liu, Yuqian Jiang, Xiaohan Zhang, Qian Liu, Shiqi
Clemens Winter, Christopher Hesse, Mark Chen, Eric Zhang, Joydeep Biswas, and Peter Stone. 2023.
Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Llm+p: Empowering large language models with op-
Jack Clark, Christopher Berner, Sam McCandlish, timal planning proficiency. ArXiv, abs/2304.11477.
Alec Radford, Ilya Sutskever, and Dario Amodei.
2020. Language models are few-shot learners. In Ad- Drew McDermott. 2000. The 1998 ai planning systems
vances in Neural Information Processing Systems 33: competition. AI Magazine, 21(2):35–55.
Annual Conference on Neural Information Process-
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe,
ing Systems 2020, NeurIPS 2020, December 6-12,
Mike Lewis, Hannaneh Hajishirzi, and Luke Zettle-
2020, virtual.
moyer. 2022. Rethinking the role of demonstrations:
Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis, What makes in-context learning work? In Confer-
and He He. 2022. Meta-learning via language model ence on Empirical Methods in Natural Language
in-context tuning. In Proceedings of the 60th Annual Processing.
Meeting of the Association for Computational Lin-
guistics (Volume 1: Long Papers), pages 719–730, OpenAI. 2023. Gpt-4 technical report. arXiv preprint
Dublin, Ireland. Association for Computational Lin- arXiv:2303.08774. Computation and Language
guistics. (cs.CL); Artificial Intelligence (cs.AI).

Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas A. Shishir G. Patil, Tianjun Zhang, Xin Wang, and
Roy, and Chuchu Fan. 2023. Autotamp: Autoregres- Joseph E. Gonzalez. 2023. Gorilla: Large language
sive task and motion planning with llms as translators model connected with massive apis. arXiv preprint
and checkers. ArXiv, abs/2306.06531. arXiv:2305.15334.

Richard E. Fikes and Nils J. Nilsson. 1971. Strips: A Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng,
new approach to the application of theorem proving Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Lidén, Zhou
to problem solving. Artificial Intelligence, 2(3):189– Yu, Weizhu Chen, and Jianfeng Gao. 2023. Check
208. your facts and try again: Improving large language
models with external knowledge and automated feed-
Caelan Reed Garrett, Tomas Lozano-Perez, and back. ArXiv, abs/2302.12813.
Leslie Pack Kaelbling. 2018. Pddlstream: Integrat-
ing symbolic planners and blackbox samplers via Miquel Ramirez, Nir Lipovetzky, and Christian Muise.
optimistic adaptive planning. In International Con- 2015. Lightweight Automated Planning ToolKiT.
ference on Automated Planning and Scheduling. http://lapkt.org/. Accessed: 2020.
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Håkan LS Younes and Michael L Littman. 2004.
Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Ppddl1. 0: An extension to pddl for expressing plan-
Cancedda, and Thomas Scialom. 2023. Toolformer: ning domains with probabilistic effects. Techn. Rep.
Language models can teach themselves to use tools. CMU-CS-04-162, 2:99.
ArXiv, abs/2302.04761.

Noah Shinn, Beck Labash, and Ashwin Gopinath. SR EL

2023. Reflexion: An autonomous agent with dy- LLM-DP (n=3) 0.96 13.16
namic memory and self-reflection. arXiv preprint
arXiv:2303.11366. LLM-DP (n=3) - fallback 0.92 12.80
LLM-DP (n=5) 0.96 12.54
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, LLM-DP (n=5) - fallback 0.94 12.24
Yonatan Bisk, Adam Trischler, and Matthew J.
Hausknecht. 2020. Alfworld: Aligning text and em- Table 2: We compare the average Success Rate (SR)
bodied environments for interactive learning. CoRR, and average Episode Length (EL) for different sam-
abs/2010.03768. pling sizes n and with or without a fallback to random
sampling. The random sampling fallback affects the
Significant-Gravitas. 2023. An experimental open- success rate as the LLM sampler can more often sample
source attempt to make gpt-4 fully autonomous. n world states which are already satisfied. However
https://github.com/significant-gravitas/ as n increases, we see that it becomes more likely for
auto-gpt. Accessed: 2023-06-09.
the sampling procedure to at find at least one plan, and
therefore the SR increases when no fallback (- fallback)
Tom Silver, Varun Hariprasad, Reece S Shuttle-
worth, Nishanth Kumar, Tomás Lozano-Pérez, and is used.
Leslie Pack Kaelbling. 2022. Pddl planning with
pretrained large language models. In NeurIPS 2022
Foundation Models for Decision Making Workshop. A Prompts and Few-shot details
Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. See Table 3 and Table 4 for LLM-DP prompts used.
Vipergpt: Visual inference via python execution for
reasoning. ArXiv, abs/2303.08128. B ReAct
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Man- B.1 Reproduction with Chat Model
dlekar, Chaowei Xiao, Yuke Zhu, Linxi (Jim) Fan, We slightly modify the ‘system’ prompt of
and Anima Anandkumar. 2023a. Voyager: An open-
ended embodied agent with large language models. the original ReAct (see Table 5) to guide the
ArXiv, abs/2305.16291. model away from its conversational tendencies.
gpt-3.5-turbo apologises significantly more than
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, the text-davinci-002 model, and we found that
Ed Huai hsin Chi, and Denny Zhou. 2023b. Self- it would often get stuck in loops of apologising.
consistency improves chain of thought reasoning in
language models. In International Conference on We also modify the code so that we replace all gen-
Learning Representations (ICLR). erated instances of ‘in’ and ‘on’ with ‘in/on’ if the
model did not generate it correctly, since Alfworld
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten expects ‘in/on’ but gpt-3.5-turbo tends to gen-
Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le,
and Denny Zhou. 2022. Chain-of-thought prompt-
erate only the correct preposition. Without these
ing elicits reasoning in large language models. In changes, ReAct would be significantly worse than
NeurIPS. our reported metric.

Zhun Yang, Adam Ishay, and Joohyung Lee. 2023. Cou- C LLM-DP
pling large language models with logic programming
for robust and general reasoning from text. In Find- C.1 Generated Goal Examples
ings of the Association for Computational Linguistics:
ACL 2023, Toronto, Canada, July 9-14, 2023, pages
See Table 6 for examples of generated goals, both
5186–5219. Association for Computational Linguis- valid and invalid.
tics.
C.2 Varying n
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak See Table 6 for results when different varying n
Shafran, Karthik Narasimhan, and Yuan Cao. 2023.
ReAct: Synergizing reasoning and acting in language
and fallback. Fallback is when no plans are sam-
models. In International Conference on Learning pled successfully through the LLM, LLM-DP re-
Representations (ICLR). samples n plans randomly.
(define (domain alfred)
(:predicates
(isReceptacle ?o - object) ; true if the object is a receptacle
(atReceptacleLocation ?r - object) ; true if the robot is at the receptacle location
(inReceptacle ?o - object ?r - object) ; true if object ?o is in receptacle ?r
(openable ?r - object) ; true if a receptacle is openable
(opened ?r - object) ; true if a receptacle is opened
(isLight ?o - object) ; true if an object is light source
(examined ?o - object ?l - object) ; whether the object has been looked at with light
(holds ?o - object) ; object ?o is held by robot
(isClean ?o - object) ; true if the object has been cleaned in sink
(isHot ?o - object) ; true if the object has been heated up
(isCool ?o - object) ; true if the object has been cooled
(isSink ?o - object) ; true if the object is a sink
(isMicrowave ?o - object) ; true if the object is a microwave
(isFridge ?o - object) ; true if the object is a fridge
))

Table 3: System Prompt used by gpt-3.5-turbo for generating the :goal in LLM-DP

Your task is to: put a clean plate in microwave.

(:goal
(exists (?t - plate ?r - microwave)
(and (inReceptacle ?t ?r)
(isClean ?t)
)))

Your task is to: examine an alarmclock with the desklamp",

(:goal
(exists (?t - alarmclock ?l - desklamp)
(and (examined ?t ?l) (holds ?t)
)))

Your task is to: put two cellphone in bed

(:goal
(exists (?t1 - cellphone ?t2 - cellphone ?r - bed)
(and (inReceptacle ?t1 ?r)
(inReceptacle ?t2 ?r)
(not (= ?t1 ?t2))
)))

Table 4: Fixed Few-shot examples used by gpt-3.5-turbo for generating the :goal in LLM-DP

Interact with a household to solve a task.

Only reply with > followed by the action to take or 'think'.
Do not apologize.
Follow the format of the two examples below.

Table 5: System Prompt used by gpt-3.5-turbo in our reproduction of ReAct

task: put a clean mug in coffeemachine.
task: put some peppershaker on drawer.
Generated:
Generated:
(:goal
(:goal
(exists (?t - mug ?r - coffeemachine)
(exists (?t - peppershaker ?r - drawer)
(and (inReceptacle ?t ?r)
(inReceptacle ?t ?r)
(isClean ?t)
))
)))
VALID ✓
VALID ✓
task: put two cd in safe. task: heat some mug and put it in coffeemachine.
Generated: Generated:
(:goal (:goal
(exists (?t1 - cd ?t2 - cd ?r - safe) (exists (?m - mug ?c - coffeemachine)
(and (inReceptacle ?t1 ?r) (and (isReceptacle ?m)
(inReceptacle ?t2 ?r) (isHot ?m)
(not (= ?t1 ?t2)) (inReceptacle ?m ?c)
))) )))
VALID ✓ INVALID ✗

Table 6: Sample of generated PDDL goals from LLM-DP. The generation gets confused by the semantics of
‘receptacle’ and identifies a mug as a receptacle. While it is true that a mug is a receptacle, in our defined logic,
receptacles are fixed, immovable objects which can contain other objects and therefore, a mug is not a Receptacle
which leads the planning to fail subsequently.

On The Prospects of Incorporating Large Language Models (LLMS) in Automated Planning and Scheduling (APS)
No ratings yet
On The Prospects of Incorporating Large Language Models (LLMS) in Automated Planning and Scheduling (APS)
13 pages
Fundamental of Time-Frequency Analyses
100% (1)
Fundamental of Time-Frequency Analyses
160 pages
Survey Agent Optimization Arxiv 2503
No ratings yet
Survey Agent Optimization Arxiv 2503
42 pages
Gpon Cli Manual-V1.01
No ratings yet
Gpon Cli Manual-V1.01
257 pages
Autoagents: A Framework For Automatic Agent Generation
No ratings yet
Autoagents: A Framework For Automatic Agent Generation
9 pages
T K C P: A LLM E E R L: RUE Nowledge Omes From Ractice Ligning S With Mbodied Nvironments VIA Einforcement Earning
No ratings yet
T K C P: A LLM E E R L: RUE Nowledge Omes From Ractice Ligning S With Mbodied Nvironments VIA Einforcement Earning
48 pages
Reasoning On A Budget: A Survey of Adaptive and Controllable Test-Time Compute in Llms
No ratings yet
Reasoning On A Budget: A Survey of Adaptive and Controllable Test-Time Compute in Llms
28 pages
A Comprehensive Evaluation of Cognitive Biases in Llms
No ratings yet
A Comprehensive Evaluation of Cognitive Biases in Llms
36 pages
World-Aware Planning Narratives Enhance Large Vision-Language Model Planner
No ratings yet
World-Aware Planning Narratives Enhance Large Vision-Language Model Planner
17 pages
Generating Symbolic World Models Via Test-Time Scaling of Large Language Models
No ratings yet
Generating Symbolic World Models Via Test-Time Scaling of Large Language Models
32 pages
Flowbench
No ratings yet
Flowbench
17 pages
Pddlfuse: A Tool For Generating Diverse Planning Domains: Vedant Khandelwal, Amit Sheth, Forest Agostinelli
No ratings yet
Pddlfuse: A Tool For Generating Diverse Planning Domains: Vedant Khandelwal, Amit Sheth, Forest Agostinelli
30 pages
Enhancing LLM-Based Agents Via Global Planning and Hierarchical Execution
No ratings yet
Enhancing LLM-Based Agents Via Global Planning and Hierarchical Execution
13 pages
Kinetic AppStudioExtensionsUserGuide
No ratings yet
Kinetic AppStudioExtensionsUserGuide
144 pages
Complex LLM Planning Via Automated Heuristics Discovery
No ratings yet
Complex LLM Planning Via Automated Heuristics Discovery
22 pages
WorldCoder, A Model-Based LLM Agent
No ratings yet
WorldCoder, A Model-Based LLM Agent
42 pages
Davis S. - Fundamentals of Reliability and Maintainability
100% (1)
Davis S. - Fundamentals of Reliability and Maintainability
123 pages
Empowering Large Language Model Agents Through Act - 250303 - 024632
No ratings yet
Empowering Large Language Model Agents Through Act - 250303 - 024632
19 pages
Mental Modelling of Reinforcement Learni
No ratings yet
Mental Modelling of Reinforcement Learni
46 pages
LLMPC: Large Language Model Predictive Control: Gabriel Maher January 7, 2025
No ratings yet
LLMPC: Large Language Model Predictive Control: Gabriel Maher January 7, 2025
27 pages
第七次课参考文献-REX Rapid Exploration and eXploitation for AI agents
No ratings yet
第七次课参考文献-REX Rapid Exploration and eXploitation for AI agents
16 pages
Open World Planning Via
No ratings yet
Open World Planning Via
16 pages
Formal-LLM: Integrating Formal Language and Natural Language For Controllable LLM-based Agents
No ratings yet
Formal-LLM: Integrating Formal Language and Natural Language For Controllable LLM-based Agents
22 pages
Enabling Intelligent Interactions Between An Agent and An
No ratings yet
Enabling Intelligent Interactions Between An Agent and An
17 pages
Position - LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
No ratings yet
Position - LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
13 pages
13415 - Open - World - Planning - via - - 副本
No ratings yet
13415 - Open - World - Planning - via - - 副本
9 pages
Can Graph Learning Improve Planning in LLM-based Agents?
No ratings yet
Can Graph Learning Improve Planning in LLM-based Agents?
39 pages
Paper - Planning Abilities of OpenAI O1 Models
No ratings yet
Paper - Planning Abilities of OpenAI O1 Models
14 pages
Experiment 3: Spatial Domain Image Enhancement: MATLAB Code
No ratings yet
Experiment 3: Spatial Domain Image Enhancement: MATLAB Code
8 pages
A Human-Like Reasoning Framework For Multi-Phases Planning Task With Large Language Models
No ratings yet
A Human-Like Reasoning Framework For Multi-Phases Planning Task With Large Language Models
24 pages
Understanding The Planning of LLM Agents: A Survey
No ratings yet
Understanding The Planning of LLM Agents: A Survey
9 pages
Program Generation For Situated Robot Task Planning Using Large Language Models
No ratings yet
Program Generation For Situated Robot Task Planning Using Large Language Models
14 pages
Plansformer
No ratings yet
Plansformer
44 pages
Research Paper
No ratings yet
Research Paper
17 pages
Agent Q
No ratings yet
Agent Q
22 pages
Learning-Based Planning: Sergio Jiménez Celorrio
No ratings yet
Learning-Based Planning: Sergio Jiménez Celorrio
5 pages
Test2 QP VE Resit2
No ratings yet
Test2 QP VE Resit2
3 pages
29936-Article Text-33990-1-2-20240324
No ratings yet
29936-Article Text-33990-1-2-20240324
11 pages
Summary of Lectures 02 Vector Spaces
No ratings yet
Summary of Lectures 02 Vector Spaces
3 pages
Kig1009 Um-Pt01-Mqf-Br003-S00
No ratings yet
Kig1009 Um-Pt01-Mqf-Br003-S00
2 pages
Pre-Trained Language Models For Interactive Decision-Making
No ratings yet
Pre-Trained Language Models For Interactive Decision-Making
23 pages
LLM+P Peter Stone
No ratings yet
LLM+P Peter Stone
8 pages
Can Language Models Serve As Text-Based World Simulators?
No ratings yet
Can Language Models Serve As Text-Based World Simulators?
17 pages
LLMs Abs Aws
No ratings yet
LLMs Abs Aws
12 pages
Song LLM-Planner Few-Shot Grounded Planning For Embodied Agents With Large Language ICCV 2023 Paper
No ratings yet
Song LLM-Planner Few-Shot Grounded Planning For Embodied Agents With Large Language ICCV 2023 Paper
12 pages
Planning
No ratings yet
Planning
54 pages
A G: Enhancing Planning Abilities For Large Language Model Based Agent Via Environment and Task Generation
No ratings yet
A G: Enhancing Planning Abilities For Large Language Model Based Agent Via Environment and Task Generation
22 pages
Download
No ratings yet
Download
15 pages
241 Galley
No ratings yet
241 Galley
7 pages
Planetarium 2407.03321v1
No ratings yet
Planetarium 2407.03321v1
18 pages
Internship Report
100% (2)
Internship Report
59 pages
AdaPlanner Adaptive Planning From Feedback With Language Models
No ratings yet
AdaPlanner Adaptive Planning From Feedback With Language Models
43 pages
Agentq
No ratings yet
Agentq
22 pages
Dynamic Planning For LLM-based Graphical User Interface Automation
No ratings yet
Dynamic Planning For LLM-based Graphical User Interface Automation
17 pages
23 Towards More Likely Models For
No ratings yet
23 Towards More Likely Models For
9 pages
Can Large Language Models Reason and Plan?
No ratings yet
Can Large Language Models Reason and Plan?
5 pages
Dynasaur:: Large Language Agents Beyond Predefined Actions
No ratings yet
Dynasaur:: Large Language Agents Beyond Predefined Actions
15 pages
Ground Motion Selection and Scaling For Seismic Design of RC Frames Against Collapse
No ratings yet
Ground Motion Selection and Scaling For Seismic Design of RC Frames Against Collapse
16 pages
One STEP at A Time: Language Agents Are Stepwise Planners: Minh Nguyen Ehsan Shareghi
No ratings yet
One STEP at A Time: Language Agents Are Stepwise Planners: Minh Nguyen Ehsan Shareghi
14 pages
LLM Agents - Prompt Engineering Guide
No ratings yet
LLM Agents - Prompt Engineering Guide
16 pages
3-Terminal 1A Positive Voltage Regulator
No ratings yet
3-Terminal 1A Positive Voltage Regulator
2 pages
LLM Powered Autonomous Agents - Lil'Log
No ratings yet
LLM Powered Autonomous Agents - Lil'Log
24 pages
MMW Midterms Notes
No ratings yet
MMW Midterms Notes
6 pages
Large Language Models
No ratings yet
Large Language Models
19 pages
Agent Based
No ratings yet
Agent Based
38 pages
Thinking Machines: A Survey of LLM Based Reasoning Strategies
No ratings yet
Thinking Machines: A Survey of LLM Based Reasoning Strategies
15 pages
Shop Manual PC27MRX1 PC30MRX1 PC35MRX1 PC40MRX1 PC45MRX1
No ratings yet
Shop Manual PC27MRX1 PC30MRX1 PC35MRX1 PC40MRX1 PC45MRX1
946 pages
Baumer Capacitive Senson
No ratings yet
Baumer Capacitive Senson
60 pages
An In-Depth Survey of Large Language Model-Based Artificial Intelligence Agents
No ratings yet
An In-Depth Survey of Large Language Model-Based Artificial Intelligence Agents
15 pages
Acknowledgement Abstract
No ratings yet
Acknowledgement Abstract
6 pages
LMM Model
No ratings yet
LMM Model
41 pages
Courses and Instructors To Develop Your Potential.: Vmware Cloud Foundation Management and Operations V3.9.1
No ratings yet
Courses and Instructors To Develop Your Potential.: Vmware Cloud Foundation Management and Operations V3.9.1
4 pages
TPTU: Task Planning and Tool Usage of Large Language Model-Based AI Agents
No ratings yet
TPTU: Task Planning and Tool Usage of Large Language Model-Based AI Agents
36 pages
Hysys Course 2012
100% (2)
Hysys Course 2012
71 pages
Complex Knowledgebase in Prolog
No ratings yet
Complex Knowledgebase in Prolog
28 pages
Tissin Positioner TS900-manual E
No ratings yet
Tissin Positioner TS900-manual E
52 pages
Multi V-S
No ratings yet
Multi V-S
11 pages
Iso 20344 2021
No ratings yet
Iso 20344 2021
15 pages
MA507 Syllabus
No ratings yet
MA507 Syllabus
2 pages
Cinematography: Lighting
88% (24)
Cinematography: Lighting
77 pages
Design of Pulley and V Belt
100% (1)
Design of Pulley and V Belt
12 pages
Hydrocracking Technology
100% (1)
Hydrocracking Technology
12 pages
LLMs Agents Guide
No ratings yet
LLMs Agents Guide
11 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
52 pages
Design Considerations For The Vibration of Floors - Part 2: Advisory Desk
No ratings yet
Design Considerations For The Vibration of Floors - Part 2: Advisory Desk
3 pages
Ls Inverter Ic5
No ratings yet
Ls Inverter Ic5
20 pages
Analog Electronics Instrumentation - Current Loops
No ratings yet
Analog Electronics Instrumentation - Current Loops
23 pages
Automated Planning and Scheduling: Fundamentals and Applications
From Everand
Automated Planning and Scheduling: Fundamentals and Applications
Fouad Sabry
No ratings yet
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet