0% found this document useful (0 votes)
89 views62 pages

LLM Agents and Tool Use

LLM Agents and Tool Use. IE686 Large Language Models and Agents

Uploaded by

Woody Woodpecker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views62 pages

LLM Agents and Tool Use

LLM Agents and Tool Use. IE686 Large Language Models and Agents

Uploaded by

Woody Woodpecker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

LLM Agents and Tool Use Data and Web Science Group

IE686 Large Language Models and Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 1
Credits Data and Web Science Group

• This slide set is based on slides from


– Shunyu Yao
– Yankai Lin
– Yang Deng, An Zhang et al.

• Many thanks to all of you!

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 2
Outline Data and Web Science Group

• Recap: Prompt Engineering and Efficient Adaptation


• What is an Agent?
• Tool Usage for LLMs
• The ReAct Paradigm
• Unified Framework for LLM Agents
• Evaluating Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 3
Recap: Prompting Data and Web Science Group

• For many tasks, supervised fine-tuning data may not be


available or may be costly to obtain
• Due to emergent abilities coupled with instruction tuning,
we can simply prompt or instruct models to do a task!
• Prompts are written in natural language
• Prompting is non-invasive:
– No additional parameters are introduced
– No tuning of existing parameters
– No need to inspect model’s embeddings

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 4
Recap: Fine-tuning Methods Data and Web Science Group

• Given enough data and computing resources


• Overall performance on T5-base: Full fine-tuning > LoRA > Adapters >
Prefix Tuning > Prompt Tuning

Ding, N., et al., 2022. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv
preprint arXiv:2203.06904.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 5
Recap: Evaluating LLMs Data and Web Science Group

• Benchmark-based evaluation
– Format problem into prompt and generate result
– Parse result and compute standard metrics like accuracy
– Good for close-ended evaluation
• Model-based evaluation
– Use LLM like GPT-4 as surrogate for human evaluation
– Shown to achieve high agreement with human evaluators
• Human-based evaluation
– Human evaluators judge answer of LLMs
• Pair-wise comparison of two answers from different models
• Single-answer grading: score a single answer from an LLM
– Good for open-ended evaluation

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 6
Outline Data and Web Science Group

• Recap: Prompt Engineering and Efficient Adaptation


• What is an Agent?
• Tool Usage for LLMs
• The ReAct Paradigm
• Unified Framework for LLM Agents
• Evaluating Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 7
What is an Agent? Data and Web Science Group

• LLM-powered Agents are artificial entities that enhance LLMs with


essential capabilities enabling them to sense their environment, make
decisions, and take actions.

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 8
What is an Agent? Data and Web Science Group

• An “intelligent” system that interacts with some


“environment”
– Physical environments: robot, autonomous car, …
– Digital environments: DQN for Atari, Siri, AlphaGo
– Humans as environment: Chatbots

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 9
What is an Agent? Data and Web Science Group

• Sam Altman said in one of his key notes: “GPTs and


Assistants are precursors to agents. They will gradually be
able to plan and to perform more complex actions on your
behalf. These are our first steps toward AI Agents.”
• Bill Gates wrote in his Blog: “Agents are not only going to
change how everyone interacts with computers. They’re
also going to upend the software industry, bringing about
the biggest revolution in computing since we went from
typing commands to tapping on icons.”

Financial Times. “The advent of the AI agent”


GatesNotes. “The Future of Agents: AI is about to completely change how you use computers”
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 10
LLM Agents over Time Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 11
A brief history of LLM agents Data and Web Science Group

Wang, L., et al., 2024. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), p.186345.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 12
Outline Data and Web Science Group

• Recap: Prompt Engineering and Efficient Adaptation


• What is an Agent?
• Tool Usage for LLMs
• The ReAct Paradigm
• Unified Framework for LLM Agents
• Evaluating Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 13
Example Task: Question Answering Data and Web Science Group

• Various solutions were developed for the different QA tasks

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 14
Supporting LLMs with Tools Data and Web Science Group

• How did humanity develop over time to where we are now?


• An important factor: Usage of Tools
– Spears, the plow, electricity, computers, …
➔Today we have many complex tools to help us solve problems, e.g.
calculators, search engines, …

Image Source

Mialon, G., et al. 2023, Augmented Language Models: a Survey. Transactions on Machine Learning Research.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 15
Example: Code Generation for Data and Web Science Group

Computational Problems

• Leverages external tool (python interpreter) to decouple


computation from reasoning
• LLM can make calls to the interpreter to run generated code
Chen, W. et al., 2023 Program of Thoughts Prompting: Disentangling Computation from
Reasoning for Numerical Reasoning Tasks. Transactions on Machine Learning Research.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 16
Retrieval-augmented Generation for Data and Web Science Group

Knowledge Problems
• Answer knowledge-intensive questions with
– Extra corpora
– A retriever (e.g. BM25, DensePassageRetrieval, etc.)

• What if there is no corpus?


– Example Question: Who are the two candidates for the 2024 US
presidential election?

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 17
Teaching LLMs to use Tools Data and Web Science Group

• Add special tokens to invoke tool


calls for
– Search engines, calculators, etc.
– Task-specific models (translation)
– APIs
• Unnatural format requires
task/tool-specific fine-tuning

Parisi, A., et al., 2022. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255.

Schick, T., et al., 2024. Toolformer: Language models can teach themselves to use tools. Advances in Neural
Information Processing Systems, 36.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 18
Tool Usage: General Process Data and Web Science Group

Image Source

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 19
Tool Learning: Tutorial Data and Web Science Group

• Tutorial Learning
– Have model tuned for tool use read tool manuals (tutorials), so that
it understands the functions of the tool and how to invoke them
– Works well with powerful LLMs

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 20
Tool Learning Prompt Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 21
Tool Learning: RL Data and Web Science Group

• Reinforcement Learning
– Autonomous exploration and correction of errors based on
environmental feedback through reinforcement learning
– Action space defined by tools
– Agent learns to select appropriate tool
– Correct action maximize reward signal

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 22
Tool Learning: Self-supervised Data and Web Science Group

• Self-supervised Tool Learning


– Pre-defined tool APIs
– Encourage models to call and execute tool APIs
– Design self-supervised loss to evaluate tool execution helpfulness

Schick, T et al., 2024. Toolformer: Language models can teach themselves to


use tools. Advances in Neural Information Processing Systems, 36.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 23
Early Example: WebGPT Data and Web Science Group

• Supervised Learning performed at OpenAI


– Trying to copy human behavior to use search engines
– Supervised fine-tuning + reinforcement learning
– Only 6000 annotated data instances

Nakano, R., et al., 2021. WebGPT: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 24
Early Example: WebGPT Data and Web Science Group

• Excellent performance in long-form QA, even surpassing


human experts sometimes

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 25
What if Both External Knowledge and Data and Web Science Group

Reasoning are needed?

• Some methods combine tool use/RAG and reasoning


methods for specific tasks
Trivedi, H., et al., 2023, July. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions.
In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 10014-10037).

Press, O., et al., 2023, December. Measuring and Narrowing the Compositionality Gap in Language Models. In Findings of the Association
for Computational Linguistics: EMNLP 2023 (pp. 5687-5711).
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 26
Reasoning OR Acting Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 27
Outline Data and Web Science Group

• Recap: Prompt Engineering and Efficient Adaptation


• What is an Agent?
• Tool Usage for LLMs
• The ReAct Paradigm
• Unified Framework for LLM Agents
• Evaluating Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 28
The ReAct Paradigm Data and Web Science Group

Yao, S., et al., 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh
International Conference on Learning Representations.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 29
ReAct is Simple and Intuitive to Use Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 30
Zero-shot ReAct Prompt Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 31
Zero-shot ReAct Prompt Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 32
Zero-shot ReAct Prompt Data and Web Science Group

• Synergy
– Acting supports reasoning
– Reasoning guides acting
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 33
Converting Tasks to Text Data and Web Science Group

• Many tasks can be turned into natural language for LLM agents
• “LLM grounding”: Supplementing the LLM with use-case specific
information, e.g a data store that is part of a RAG system
Brohan, A., et al., 2023, March. Do as i can, not as i say: Grounding language in robotic
affordances. In Conference on robot learning (pp. 287-318). PMLR.
Huang, W., et al., 2023, March. Inner Monologue: Embodied Reasoning through Planning with
Language Models. In Conference on Robot Learning (pp. 1769-1782). PMLR.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 34
Acting without Reasoning Data and Web Science Group

• Cannot explore systematically or incorporate feedback

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 35
ReAct Enables Systematic Exploration Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 36
ReAct is general and effective Data and Web Science Group

Yao, S., et al., 2023, ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh
International Conference on Learning Representations.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 37
ReAct vs. Traditional Agents Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 38
Outline Data and Web Science Group

• Recap: Prompt Engineering and Efficient Adaptation


• What is an Agent?
• Tool Usage for LLMs
• The ReAct Paradigm
• Unified Framework for LLM Agents
• Evaluating Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 39
Unified Framework for LLM-powered Data and Web Science Group

Agents
• LLMs pave the way for the use of AI agents to simulate
users and other entities, as well as their interactions

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 40
Observation and Action Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 41
The “Brain” Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 42
The “Brain” Data and Web Science Group

• Memory: stores sequences of agent’s past observations,


thoughts and actions
– Long-term and short-term memory
– Long-term memory is abstract
– Used to retrieve relevant past memory
• Decision Making Process:
– Planning: Subgoal and decomposition – Break down large tasks into
smaller, manageable subgoals, enabling efficient handling of complex tasks
– Reasoning: Self-criticism and self-reflection over past actions, learn from
mistakes and refine for future steps
• Personalized memory and reasoning lead to diversity and
independence of AI Agents.

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 43
Collaboration Data and Web Science Group

• Diverse Agents interact with each other to solve problems


in fully autonomous systems
• Human-in-the-loop in cooperative systems
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 44
Unified Framework for LLM Agents Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 45
Example: Agent creation with OpenAI Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 46
Example: Long-term Memory Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 47
Long-term Memory for Reflexion Data and Web Science Group

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 48
Example: Voyager - Procedural Data and Web Science Group

Memory of Skills

Wang, G., et al., 2024. Voyager: An Open-Ended Embodied Agent with Large Language Models. Transactions
on Machine Learning Research.

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 49
Multi-Agent Orchestration Data and Web Science Group

• Usually a “Manager” or
“Commander” for
orchestrating many agents
• Context may be shared or
isolated
• Cooperative vs. competitive
environments
• Centralized vs. decentralized
communication
• Human intervention vs. full
automation

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 50
Example: Retrieval-Augmented QA Data and Web Science Group

• Two agents
• User Proxy processes documents into vectorstore
• User Question and relevant context passed to assistant that
generates answer
• Conversation continues until satisfactory answer
Wu, Q., et al., 2024, AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent
Conversation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 51
Example: Decision Making Data and Web Science Group

• Two agents: One suggests next step, Executor does action


and provides feedback
• Three agents: additional agent that provides commonsense
facts about the domain when needed
Wu, Q., et al., 2024. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent
Conversation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 52
Example: Multi-Agent Coding Data and Web Science Group

• Commander receives user questions and executes code


• Writer writes code
• Safeguard ensures no information leakage or malicious code
Wu, Q., et al., 2024. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent
Conversation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 53
Example: GPT-Researcher Data and Web Science Group

• Multi-agent system for online


research
• Uses “Plan-and-Solve” prompting to
divide task into subtasks…
• Which are carried out by multiple
agents in parallel using web crawling
as a tool.
• Each resource is stored, filtered and a
selection is summarized to aggregate
a final report after the crawler agents
have finished.
https://docs.gptr.dev/blog/building-gpt-researcher

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 54
Summary: LLM Agents Data and Web Science Group

• Current hot topic in research and application


• Combination of tool use and reasoning allows enhancement
of LLM abilities while mitigating problematic behavior like
hallucinations
➔Reasoning Agents
• Orchestrating agents with different capabilities
(specializations) allows to solve complex problems

For more application examples, see the following surveys:


Guo, T., et al., 2024. Large language model based multi-agents: A survey of progress and
challenges. arXiv preprint arXiv:2402.01680.
Liu, J., et al., 2024. Large Language Model-Based Agents for Software Engineering: A Survey. arXiv
preprint arXiv:2409.02977.

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 55
Outline Data and Web Science Group

• Recap: Prompt Engineering and Efficient Adaptation


• What is an Agent?
• Tool Usage for LLMs
• The ReAct Paradigm
• Unified Framework for LLM Agents
• Evaluating Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 56
Evaluating (Multi-)Agent Systems Data and Web Science Group

• LLM-powered agents enable a rich set of capabilities but


also amplify potential risks
– How to evaluate agent performance and awareness of safety risks?
• Potential Risks: leaking private data or causing financial loss
• Identifying these risks is labor-intensive as testing becomes difficult
with increased agent complexity
• Benchmarks for Agents need to cover a broad space
including
– Tools
– External resources
– Correct behavioral traces or labels

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 57
Example: AgentBench Data and Web Science Group

• Simulate interactive environments for LLMs to operate as


autonomous agents
• 8 distinct environments of 3 types (Coding, Games, Web)
• Evaluation of agent core abilities like logical reasoning
Liu, X.,et al.., 2024. AgentBench: Evaluating LLMs as Agents. In The Twelfth International Conference on Learning Representations.

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 58
Example: ToolEMU Data and Web Science Group

• Goal: Identify risky behavior of agents


• Emulates tool execution and enables scalable testing of
agents

Ruan, Y., et al., 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. In The
Twelfth International Conference on Learning Representations.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 59
Example: WebShop Data and Web Science Group

• Large-scale complex environment based on 1.16M Amazon products


• Challenges language and visual understanding and decision-making
Yao, S., et al., 2022. Webshop: Towards scalable real-world web interaction with grounded language
agents. Advances in Neural Information Processing Systems, 35, pp.20744-20757.

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 60
Example: WebArena Data and Web Science Group

• Simulate web environment with high similarity to real-world


popular websites
• Embeds tools and knowledge resources as independent
websites
• Benchmark for concrete web-based actions
Zhou, S., et al., 2024. WebArena: A Realistic Web Environment for Building Autonomous
Agents. In The Twelfth International Conference on Learning Representations.
University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 61
See you next week! Data and Web Science Group

• Next time: Introduction to LangGraph


– Exercise: learn to apply things
– Learn how to use tools with LLMs
– Learn how to build complex interactions between Agents

University of Mannheim | IE686 LLMs and Agents | LLM Agents and Tool Use| Version 04.10.2024 62

You might also like