Lecture17 18 LLM Based Agent
Lecture17 18 LLM Based Agent
• Code Interpreter
https://openai.com/blog/chatgpt-plugins#code-interpreter
https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/
Guanhua Chen @ Stat-DS,
Department SUSTech and Data
of Statistics 6
LLM with Tools
[2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools (arxiv.org)
• When result is better compared to not receiving the API call at all, or receiving only its
input
• Inference mode
• The New England Journal of Medicine is a registered trademark of ...
• “The New England Journal of Medicine is a registered trademark of <API> QA(“Who
is the publisher of The New England Journal of Medicine?”) →
• Tool mode when meets token “→”
• The New England Journal of Medicine is a registered trademark of <API> QA(“Who
is the publisher of The New England Journal of Medicine?”) → Massachusetts
Medical Society </API>
• Inference mode
• The New England Journal of Medicine is a registered trademark of <API> QA(“Who
is the publisher of The New England Journal of Medicine?”) → Massachusetts
Medical Society </API> Massachusetts Medical Society.
[2307.16789] ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (arxiv.org)
• Instruction Generation
• First sample APIs and then prompt ChatGPT to generate diverse instructions
• Both single-tool and multi-tool scenarios
• The prompt is composed of
1. a general description of the instruction generation task
2. comprehensive documentation of each sampled API
3. three in-context seed examples {seed1, seed2, seed3}. Each seed example is an
ideal instruction generation written by human experts
[2307.16789] ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (arxiv.org)
[2303.17580] HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face (arxiv.org)
Guanhua Chen @ Stat-DS,
Department SUSTech and Data
of Statistics 29
HuggingGPT
• Model selection
• Distributes the tasks to expert models
• LLM is presented with a list of models to choose from
• Task execution
• Expert models execute on the specific tasks and log results
• Response generation
• LLM receives the execution results and provides summarized results to users
• Thought
• Compose useful information by reasoning over the current context, and update the
context to support future reasoning or acting
• Examples
• Decomposing task goals and create action plans
• Injecting commonsense knowledge relevant to task solving
• Extracting important parts from observations
• Track progress and transit action plans
• Handle exceptions and adjust action plans
Example:
Question: Musician and satirist Allie Goertz wrote a song about the
"The Simpsons" character Milhouse, who Matt Groening named after
who?
Thought 1: The question simplifies to "The Simpsons" character
Milhouse is named after who. I only need to search Milhouse and
find who it is named after.
Action 1: Search[Milhouse]
Observation 1: Milhouse Mussolini Van Houten is a recurring character
in the Fox animated television series The Simpsons voiced by Pamela
Hayden and created by Matt Groening.
…
Thought 3: Milhouse was named after U.S. president Richard Nixon, so
the answer is Richard Nixon.
Action 3: Finish[Richard Nixon]
• Improving initial outputs from LLMs through iterative feedback and refinement
• The same LLM provides feedback for its output and uses it to refine itself,
iteratively
madaan
[2303.17651] Self-Refine: Iterative Refinement with Self-Feedback (arxiv.org)
/self-refine: LLMs can generate feedback on their wor
k, use it to improve the output
Guanhua Chen @ Stat-DS,
Department SUSTech and Data
of Statistics 40
Reflexion
[2303.11366] Reflexion: Language Agents with Verbal Reinforcement Learning (arxiv.org) noahshinn/reflexion: [NeurIPS 2023] Reflexion
: Language Agents with Verbal Reinforcement Learning (git
hub.com)
Guanhua Chen @ Stat-DS,
Department SUSTech and Data
of Statistics 44
ReWOO
[2305.18323] ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models (arxiv.org)
billxbf/ReWOO
: Decoupling Reasoning from Observations for Efficient Aug
mented Language Models (github.com)
[2306.02224] Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions (arxiv.org)
Significant-Gravitas/AutoGPT:
Stanford SmallVille
• Generative agents
• Computational software agents that simulate believable human behavior
• Take their current environment and past experiences as input and generate
behavior as output.
• Behaviors like
• Wake up, cook breakfast, and head to work
• artists paint, while authors write
• They form opinions, notice each other, and initiate conversations;
• They remember and reflect on days past as they plan the next day
Architecture
• Memory stream
• A database that maintains a comprehensive record of an agent’s experience
• Records are retrieved as relevant to plan the agent’s actions and react
appropriately to the environment
• Records are recursively synthesized into higher- and higher-level reflections that
guide behaviors
• Each record contains a natural language description, a creation timestamp, and a
most recent access timestamp
Architecture
• Reflection
• A higher-level, more abstract thoughts generated by the agent
• Generate reflections when the sum of the importance scores for the latest events
perceived by the agents exceeds a threshold (roughly two or three times a day)
• First, determine what to reflect on, by identifying questions that can be asked
given the agent’s recent experiences.
• E.g., prompt the language model, “Given only the information above, what are 3
most salient high-level questions we can answer about the subjects in the
statements?
• Then, prompt LLM to extract insights and cite the particular records that
served as evidence for the insights
Architecture
• Reflection
• Example
Architecture
• Planning
• A plan includes a location, a starting time, and a duration
• Consider observations, reflections, and plans all together when deciding how to
behave
• Starts top-down and then recursively generates more detail
• First step is to create a plan that outlines the day’s agenda in broad strokes
• Prompt the language model with the agent’s summary description (e.g., name,
traits, and a summary of their recent experiences) and a summary of their
previous day
• Recursively decomposes it to create finer-grained actions, first into hour-long
chunks of actions. e then recursively decompose this again into 5–15 minute
chunks
Architecture
• Reacting and Updating Plans
• Prompt the language model with
these observations to decide
whether the agent should continue
with their existing plan, or react