Newwhitepaper Agents2
Newwhitepaper Agents2
Acknowledgements
Evan Huang
Emily Xue
Olcan Sercinoglu
Sebastian Riedel
Satinder Baveja
Antonio Gulli
Anant Nawalgaria
Antonio Gulli
Anant Nawalgaria
Grace Mollison
Technical Writer
Joey Haymaker
Designer
Michael Lanning
September 2024 2
代理⼈
Agents
致谢
Acknowledgements
审稿⼈和贡献者
Reviewers and Contributors
Evan
EvanHuang
Huang
艾⽶莉·薛
Emily Xue
Olcan
OlcanSercinoglu
Sercinoglu
塞巴斯蒂安·⾥德尔
Sebastian Riedel
萨廷德·巴维贾
Satinder Baveja
安东尼奥·古利
Antonio Gulli
Anant
AnantNawalgaria
Nawalgaria
策展⼈和编辑
Curators and Editors
安东尼奥·古利
Antonio Gulli
Anant
AnantNawalgaria
Nawalgaria
格雷斯·莫利森
Grace Mollison
技术写作⼈员
Technical Writer
乔伊·海梅克
Joey Haymaker
设计师
Designer
迈克尔·兰宁
Michael Lanning
2024 年 9 ⽉ 2024
September 22
Table of contents
Introduction 4
What is an agent? 5
The model 6
The tools 7
Extensions 13
Sample Extensions 15
Functions 18
Use cases 21
Data stores 27
Tools recap 32
Summary 40
Endnotes 42
⽬录
Table of contents
介绍
Introduction 44
什么是代理?
What is an agent? 55
模型
The model 66
⼯具
The tools 77
编排层
The orchestration layer 77
代理与模型
Agents vs. models 88
认知架构:代理如何运作
Cognitive architectures: How agents operate 88
⼯具:我们通往外部世界的钥匙
Tools: Our keys to the outside world 12
12
扩展
Extensions 13
样本扩展
Sample Extensions 15
函数
Functions 18
使⽤案例
Use cases 21
函数示例代码
Function sample code 24
24
数据存储
Data stores 27
27
实施与应⽤
Implementation and application 28
28
⼯具回顾
Tools recap 32
32
通过有针对性的学习提升模型性能
Enhancing model performance with targeted learning 33
33
LangChain 代理快速⼊⻔
Agent quick start with LangChain 35
35
使⽤ Vertex AIapplications
Production 代理的⽣产应⽤程序 with Vertex AI agents 38
38
摘要
Summary 40
40
尾注
Endnotes 42
42
Agents
Introduction
Humans are fantastic at messy pattern recognition tasks. However, they often rely on tools
- like books, Google Search, or a calculator - to supplement their prior knowledge before
arriving at a conclusion. Just like humans, Generative AI models can be trained to use tools
to access real-time information or suggest a real-world action. For example, a model can
leverage a database retrieval tool to access specific information, like a customer's purchase
history, so it can generate tailored shopping recommendations. Alternatively, based on a
user's query, a model can make various API calls to send an email response to a colleague
or complete a financial transaction on your behalf. To do so, the model must not only have
access to a set of external tools, it needs the ability to plan and execute any task in a self-
directed fashion. This combination of reasoning, logic, and access to external information
that are all connected to a Generative AI model invokes the concept of an agent, or a
program that extends beyond the standalone capabilities of a Generative AI model. This
whitepaper dives into all these and associated aspects in more detail.
September 2024 4
代理⼈
Agents
这种推理、逻辑和对外部信息的访问与
This combination of reasoning,
⽣成式⼈⼯智能模型相连接的组合引发
logic, and access to external
了代理的概念。
information that are all connected
to a Generative AI model invokes
the concept of an agent.
介绍
Introduction
⼈类在复杂的模式识别任务中表现出⾊。然⽽,他们常常依赖⼯具。
Humans are fantastic at messy pattern recognition tasks. However, they often rely on tools
-- like
像书籍、Google 搜索或计算器⼀样
books, Google Search, or a -calculator
在之前补充他们的知识- to supplement their prior knowledge before
得出结论。就像⼈类⼀样,⽣成式⼈⼯智能模型可以被训练使⽤⼯具。
arriving at a conclusion. Just like humans, Generative AI models can be trained to use tools
访问实时信息或建议现实世界的⾏动。例如,⼀个模型可以
to access real-time information or suggest a real-world action. For example, a model can
利⽤数据库检索⼯具访问特定信息,例如客户的购买记录
leverage a database retrieval tool to access specific information, like a customer's purchase
历史,因此它可以⽣成量身定制的购物推荐。或者,基于⼀个
history, so it can generate tailored shopping recommendations. Alternatively, based on a
⽤户的查询,模型可以进⾏各种 API various
user's query, a model can make 调⽤,以向同事发送电⼦邮件回复
API calls to send an email response to a colleague
或代表您完成⾦融交易。为此,模型不仅必须具备
or complete a financial transaction on your behalf. To do so, the model must not only have
访问⼀组外部⼯具,它需要能够⾃主规划和执⾏任何任务的能⼒
access to a set of external tools, it needs the ability to plan and execute any task in a self-
有针对性的⽅式。这种推理、逻辑和获取外部信息的结合
directed fashion. This combination of reasoning, logic, and access to external information
与⽣成性⼈⼯智能模型相关的所有内容都涉及代理的概念,或者⼀个
that are all connected to a Generative AI model invokes the concept of an agent, or a
超越⽣成式⼈⼯智能模型独⽴能⼒的程序。这个
program that extends beyond the standalone capabilities of a Generative AI model. This
⽩⽪书更详细地探讨了所有这些及相关⽅⾯。
whitepaper dives into all these and associated aspects in more detail.
2024 年 9 ⽉ 2024
September 44
Agents
What is an agent?
In its most fundamental form, a Generative AI agent can be defined as an application that
attempts to achieve a goal by observing the world and acting upon it using the tools that it
has at its disposal. Agents are autonomous and can act independently of human intervention,
especially when provided with proper goals or objectives they are meant to achieve. Agents
can also be proactive in their approach to reaching their goals. Even in the absence of
explicit instruction sets from a human, an agent can reason about what it should do next to
achieve its ultimate goal. While the notion of agents in AI is quite general and powerful, this
whitepaper focuses on the specific types of agents that Generative AI models are capable of
building at the time of publication.
In order to understand the inner workings of an agent, let’s first introduce the foundational
components that drive the agent’s behavior, actions, and decision making. The combination
of these components can be described as a cognitive architecture, and there are many
such architectures that can be achieved by the mixing and matching of these components.
Focusing on the core functionalities, there are three essential components in an agent’s
cognitive architecture as shown in Figure 1.
September 2024 5
代理⼈
Agents
什么是代理?
What is an agent?
在其最基本的形式中,⽣成式 AI 代理可以被定义为⼀个应⽤程序,旨在
In its most fundamental form, a Generative AI agent can be defined as an application that
通过观察世界并利⽤⼯具采取⾏动来实现⽬标的尝试
attempts to achieve a goal by observing the world and acting upon it using the tools that it
拥有⾃主权。代理是⾃主的,可以独⽴于⼈类⼲预进⾏⾏动,
has at its disposal. Agents are autonomous and can act independently of human intervention,
尤其是在提供了他们要实现的适当⽬标或⽬的时。代理
especially when provided with proper goals or objectives they are meant to achieve. Agents
也可以在实现⽬标的过程中采取积极主动的态度。即使在缺乏的情况下
can also be proactive in their approach to reaching their goals. Even in the absence of
来⾃⼈类的明确指令集,代理可以推理它接下来应该做什么
explicit instruction sets from a human, an agent can reason about what it should do next to
实现其最终⽬标。虽然 AI 中代理的概念相当普遍且强⼤,但这
achieve its ultimate goal. While the notion of agents in AI is quite general and powerful, this
⽩⽪书专注于⽣成性⼈⼯智能模型能够实现的特定类型的代理
whitepaper focuses on the specific types of agents that Generative AI models are capable of
出版时的建筑。
building at the time of publication.
为了理解⼀个代理的内部⼯作原理,我们⾸先介绍基础知识
In order to understand the inner workings of an agent, let’s first introduce the foundational
驱动代理⾏为、⾏动和决策的组件。组合
components that drive the agent’s behavior, actions, and decision making. The combination
这些组件可以被描述为⼀种认知架构,并且有很多
of these components can be described as a cognitive architecture, and there are many
这样的架构可以通过这些组件的混合和匹配来实现。
such architectures that can be achieved by the mixing and matching of these components.
专注于核⼼功能,代理的三个基本组成部分是
Focusing on the core functionalities, there are three essential components in an agent’s
如图 1 所示的认知架构。
cognitive architecture as shown in Figure 1.
2024 年 9 ⽉ 2024
September 55
Agents
The model
In the scope of an agent, a model refers to the language model (LM) that will be utilized as
the centralized decision maker for agent processes. The model used by an agent can be one
or multiple LM’s of any size (small / large) that are capable of following instruction based
reasoning and logic frameworks, like ReAct, Chain-of-Thought, or Tree-of-Thoughts. Models
can be general purpose, multimodal or fine-tuned based on the needs of your specific agent
architecture. For best production results, you should leverage a model that best fits your
desired end application and, ideally, has been trained on data signatures associated with the
tools that you plan to use in the cognitive architecture. It’s important to note that the model is
typically not trained with the specific configuration settings (i.e. tool choices, orchestration/
reasoning setup) of the agent. However, it’s possible to further refine the model for the
agent’s tasks by providing it with examples that showcase the agent’s capabilities, including
instances of the agent using specific tools or reasoning steps in various contexts.
September 2024 6
代理⼈
Agents
图 1. 通⽤代理架构和组件
Figure 1. General agent architecture and components
模型
The model
在代理的范围内,模型指的是将被使⽤的语⾔模型(LM)
In the scope of an agent, a model refers to the language model (LM) that will be utilized as
代理过程的集中决策者。代理使⽤的模型可以是⼀个
the centralized decision maker for agent processes. The model used by an agent can be one
或多个任何⼤⼩的
or multiple LM’s LM(⼩型/⼤型),能够根据指令进⾏操作
of any size (small / large) that are capable of following instruction based
推理和逻辑框架,如 ReAct、Chain-of-Thought
reasoning and logic frameworks, like ReAct, 或 Tree-of-Thoughts。模型
Chain-of-Thought, or Tree-of-Thoughts. Models
可以是通⽤的、多模态的或根据您特定代理的需求进⾏微调的
can be general purpose, multimodal or fine-tuned based on the needs of your specific agent
架构。为了获得最佳的⽣产结果,您应该利⽤最适合您的模型。
architecture. For best production results, you should leverage a model that best fits your
期望的最终应⽤,并且理想情况下,已经在与之相关的数据签名上进⾏了训练
desired end application and, ideally, has been trained on data signatures associated with the
您计划在认知架构中使⽤的⼯具。重要的是要注意,该模型是
tools that you plan to use in the cognitive architecture. It’s important to note that the model is
通常没有使⽤特定的配置设置进⾏训练(即⼯具选择、编排/
typically not trained with the specific configuration settings (i.e. tool choices, orchestration/
代理的推理设置。然⽽,可以进⼀步优化模型以适应
reasoning setup) of the agent. However, it’s possible to further refine the model for the
通过提供展示代理能⼒的示例来完成代理的任务,包括
agent’s tasks by providing it with examples that showcase the agent’s capabilities, including
代理在各种上下⽂中使⽤特定⼯具或推理步骤的实例。
instances of the agent using specific tools or reasoning steps in various contexts.
2024 年 9 ⽉ 2024
September 66
Agents
The tools
Foundational models, despite their impressive text and image generation, remain constrained
by their inability to interact with the outside world. Tools bridge this gap, empowering agents
to interact with external data and services while unlocking a wider range of actions beyond
that of the underlying model alone. Tools can take a variety of forms and have varying
depths of complexity, but typically align with common web API methods like GET, POST,
PATCH, and DELETE. For example, a tool could update customer information in a database
or fetch weather data to influence a travel recommendation that the agent is providing to
the user. With tools, agents can access and process real-world information. This empowers
them to support more specialized systems like retrieval augmented generation (RAG),
which significantly extends an agent’s capabilities beyond what the foundational model can
achieve on its own. We’ll discuss tools in more detail below, but the most important thing
to understand is that tools bridge the gap between the agent’s internal capabilities and the
external world, unlocking a broader range of possibilities.
The orchestration layer describes a cyclical process that governs how the agent takes in
information, performs some internal reasoning, and uses that reasoning to inform its next
action or decision. In general, this loop will continue until an agent has reached its goal or a
stopping point. The complexity of the orchestration layer can vary greatly depending on the
agent and task it’s performing. Some loops can be simple calculations with decision rules,
while others may contain chained logic, involve additional machine learning algorithms, or
implement other probabilistic reasoning techniques. We’ll discuss more about the detailed
implementation of the agent orchestration layers in the cognitive architecture section.
September 2024 7
代理⼈
Agents
⼯具
The tools
基础模型尽管在⽂本和图像⽣成⽅⾯表现出⾊,但仍然受到限制
Foundational models, despite their impressive text and image generation, remain constrained
由于他们⽆法与外部世界互动。⼯具弥补了这⼀差距,使代理能够发挥作⽤。
by their inability to interact with the outside world. Tools bridge this gap, empowering agents
与外部数据和服务进⾏交互,同时解锁更⼴泛的操作范围
to interact with external data and services while unlocking a wider range of actions beyond
仅仅是基础模型的。⼯具可以采取多种形式,并具有不同的
that of the underlying model alone. Tools can take a variety of forms and have varying
复杂性的深度,但通常与常⻅的⽹络
depths of complexity, but typicallyAPI ⽅法如
align withGET、POST 对⻬,
common web API methods like GET, POST,
PATCH
PATCH,和and
DELETE。例如,⼀个⼯具可以在数据库中更新客户信息。
DELETE. For example, a tool could update customer information in a database
或获取天⽓数据以影响代理提供的旅⾏推荐
or fetch weather data to influence a travel recommendation that the agent is providing to
⽤户。通过⼯具,代理可以访问和处理现实世界的信息。这赋予了
the user. With tools, agents can access and process real-world information. This empowers
他们⽀持更多专业化的系统,如检索增强⽣成(RAG),
them to support more specialized systems like retrieval augmented generation (RAG),
这⼤⼤扩展了代理的能⼒,超出了基础模型所能提供的范围
which significantly extends an agent’s capabilities beyond what the foundational model can
独⽴实现。我们将在下⾯更详细地讨论⼯具,但最重要的事情
achieve on its own. We’ll discuss tools in more detail below, but the most important thing
理解⼯具是弥合代理内部能⼒与
to understand is that tools bridge the gap between the agent’s internal capabilities and the
外部世界,开启更⼴泛的可能性。
external world, unlocking a broader range of possibilities.
编排层
The orchestration layer
编排层描述了⼀个循环过程,管理代理如何接收
The orchestration layer describes a cyclical process that governs how the agent takes in
信息,进⾏⼀些内部推理,并利⽤该推理来指导其下⼀步
information, performs some internal reasoning, and uses that reasoning to inform its next
⾏动或决策。⼀般来说,这个循环将持续到代理达到其⽬标或⼀个
action or decision. In general, this loop will continue until an agent has reached its goal or a
停⽌点。编排层的复杂性可能会因情况⽽异。
stopping point. The complexity of the orchestration layer can vary greatly depending on the
代理和它正在执⾏的任务。⼀些循环可以是带有决策规则的简单计算,
agent and task it’s performing. Some loops can be simple calculations with decision rules,
⽽其他可能包含链式逻辑,涉及额外的机器学习算法,或
while others may contain chained logic, involve additional machine learning algorithms, or
实现其他概率推理技术。我们将详细讨论更多内容。
implement other probabilistic reasoning techniques. We’ll discuss more about the detailed
认知架构部分中代理编排层的实现。
implementation of the agent orchestration layers in the cognitive architecture section.
2024 年 9 ⽉ 2024
September 77
Agents
To gain a clearer understanding of the distinction between agents and models, consider the
following chart:
Models Agents
Knowledge is limited to what is available in their Knowledge is extended through the connection
training data. with external systems via tools
Single inference / prediction based on the Managed session history (i.e. chat history) to
user query. Unless explicitly implemented for allow for multi turn inference / prediction based
the model, there is no management of session on user queries and decisions made in the
history or continuous context. (i.e. chat history) orchestration layer. In this context, a ‘turn’ is
defined as an interaction between the interacting
system and the agent. (i.e. 1 incoming event/
query and 1 agent response)
No native logic layer implemented. Users can Native cognitive architecture that uses reasoning
form prompts as simple questions or use frameworks like CoT, ReAct, or other pre-built
reasoning frameworks (CoT, ReAct, etc.) to agent frameworks like LangChain.
form complex prompts to guide the model in
prediction.
Imagine a chef in a busy kitchen. Their goal is to create delicious dishes for restaurant
patrons which involves some cycle of planning, execution, and adjustment.
September 2024 8
代理⼈
Agents
代理与模型
Agents vs. models
为了更清楚地理解代理和模型之间的区别,请考虑
To gain a clearer understanding of the distinction between agents and models, consider the
以下图表:
following chart:
模型
Models 代理⼈
Agents
知识仅限于他们训练数据中可⽤的内容。
Knowledge is limited to what is available in their 知识通过与外部系统的⼯具连接⽽得以扩展
Knowledge is extended through the connection
training data. with external systems via tools
基于⽤户查询的单次推理/预测。除⾮为模型明确实现,
Single inference / prediction based on the 管理会话历史(即聊天历史),以便根据⽤户查询和在编
Managed session history (i.e. chat history) to
否则不管理会话历史或连续上下⽂。(即聊天历史)
user query. Unless explicitly implemented for 排层做出的决策进⾏多轮推理/预测。在此上下⽂中,“轮
allow for multi turn inference / prediction based
次”被定义为交互系统与代理之间的交互。(即 1 个输⼊事
the model, there is no management of session on user queries and decisions made in the
件/查询和 1 个代理响应)
history or continuous context. (i.e. chat history) orchestration layer. In this context, a ‘turn’ is
defined as an interaction between the interacting
system and the agent. (i.e. 1 incoming event/
query and 1 agent response)
没有本地⼯具实现。
No native tool implementation. ⼯具在代理架构中是原⽣实现的。
Tools are natively implemented in agent
architecture.
未实现本地逻辑层。⽤户可以
No native logic layer implemented. Users can 使⽤推理的本⼟认知架构
Native cognitive architecture that uses reasoning
将提示形式化为简单问题或使⽤推理框架(CoT、
form prompts as simple questions or use 像 CoT、ReAct
frameworks 或其他预构建代理框架如
like CoT, ReAct, or otherLangChain
pre-built
ReAct 等)来
reasoning frameworks (CoT, ReAct, etc.) to 的框架。
agent frameworks like LangChain.
形成复杂的提示以引导模型进⾏预测。
form complex prompts to guide the model in
prediction.
认知架构:代理如何运作
Cognitive architectures: How agents operate
想象⼀下⼀个忙碌厨房⾥的厨师。他们的⽬标是为餐厅制作美味的菜肴。
Imagine a chef in a busy kitchen. Their goal is to create delicious dishes for restaurant
涉及⼀些规划、执⾏和调整周期的赞助者。
patrons which involves some cycle of planning, execution, and adjustment.
2024 年 9 ⽉ 2024
September 88
Agents
• They gather information, like the patron’s order and what ingredients are in the pantry
and refrigerator.
• They perform some internal reasoning about what dishes and flavor profiles they can
create based on the information they have just gathered.
• They take action to create the dish: chopping vegetables, blending spices, searing meat.
At each stage in the process the chef makes adjustments as needed, refining their plan as
ingredients are depleted or customer feedback is received, and uses the set of previous
outcomes to determine the next plan of action. This cycle of information intake, planning,
executing, and adjusting describes a unique cognitive architecture that the chef employs to
reach their goal.
Just like the chef, agents can use cognitive architectures to reach their end goals by
iteratively processing information, making informed decisions, and refining next actions
based on previous outputs. At the core of agent cognitive architectures lies the orchestration
layer, responsible for maintaining memory, state, reasoning and planning. It uses the rapidly
evolving field of prompt engineering and associated frameworks to guide reasoning and
planning, enabling the agent to interact more effectively with its environment and complete
tasks. Research in the area of prompt engineering frameworks and task planning for
language models is rapidly evolving, yielding a variety of promising approaches. While not an
exhaustive list, these are a few of the most popular frameworks and reasoning techniques
available at the time of this publication:
• ReAct, a prompt engineering framework that provides a thought process strategy for
language models to Reason and take action on a user query, with or without in-context
examples. ReAct prompting has shown to outperform several SOTA baselines and improve
human interoperability and trustworthiness of LLMs.
September 2024 9
代理⼈
Agents
•• 他们收集信息,⽐如顾客的订单和储藏室⾥的⻝材
They gather information, like the patron’s order and what ingredients are in the pantry
和冰箱。
and refrigerator.
•• 他们进⾏⼀些内部推理,关于他们可以提供的菜肴和⻛味特征
They perform some internal reasoning about what dishes and flavor profiles they can
根据他们刚刚收集的信息创建。
create based on the information they have just gathered.
•• 他们采取⾏动来制作菜肴:切菜、调配⾹料、煎⾁。
They take action to create the dish: chopping vegetables, blending spices, searing meat.
在过程的每个阶段,厨师根据需要进⾏调整,完善他们的计划
At each stage in the process the chef makes adjustments as needed, refining their plan as
当原料耗尽或收到客户反馈时,使⽤之前的设置
ingredients are depleted or customer feedback is received, and uses the set of previous
结果以确定下⼀步⾏动计划。这个信息获取、规划的循环,
outcomes to determine the next plan of action. This cycle of information intake, planning,
执⾏和调整描述了厨师所采⽤的⼀种独特的认知架构
executing, and adjusting describes a unique cognitive architecture that the chef employs to
达到他们的⽬标。
reach their goal.
就像厨师⼀样,代理可以使⽤认知架构来实现他们的最终⽬标
Just like the chef, agents can use cognitive architectures to reach their end goals by
迭代处理信息,做出明智的决策,并优化下⼀步⾏动
iteratively processing information, making informed decisions, and refining next actions
基于之前的输出。在代理认知架构的核⼼是编排
based on previous outputs. At the core of agent cognitive architectures lies the orchestration
层,负责维护记忆、状态、推理和规划。它使⽤快速
layer, responsible for maintaining memory, state, reasoning and planning. It uses the rapidly
不断发展的提示⼯程领域及相关框架,以指导推理和
evolving field of prompt engineering and associated frameworks to guide reasoning and
规划,使代理能够更有效地与其环境互动并完成
planning, enabling the agent to interact more effectively with its environment and complete
任务。研究提示⼯程框架和任务规划领域的内容。
tasks. Research in the area of prompt engineering frameworks and task planning for
语⾔模型正在迅速发展,产⽣了多种有前景的⽅法。虽然不是⼀个
language models is rapidly evolving, yielding a variety of promising approaches. While not an
详尽的列表,这些是⼀些最受欢迎的框架和推理技术
exhaustive list, these are a few of the most popular frameworks and reasoning techniques
在本出版物发布时可⽤:
available at the time of this publication:
•• ReAct,⼀个提供思维过程策略的提示⼯程框架
ReAct, a prompt engineering framework that provides a thought process strategy for
语⾔模型根据⽤户查询进⾏推理和采取⾏动,⽆论是否在上下⽂中
language models to Reason and take action on a user query, with or without in-context
示例。ReAct 提示已显示出优于多个
examples. ReAct SOTA 基准并改善
prompting has shown to outperform several SOTA baselines and improve
⼈类互操作性和LLMs的可信度。
human interoperability and trustworthiness of LLMs.
2024 年 9 ⽉ 2024
September 99
Agents
Agents can utilize one of the above reasoning techniques, or many other techniques, to
choose the next best action for the given user request. For example, let’s consider an agent
that is programmed to use the ReAct framework to choose the correct actions and tools for
the user query. The sequence of events might go something like this:
3. The agent provides a prompt to the model, asking it to generate one of the next ReAct
steps and its corresponding output:
a. Question: The input question from the user query, provided with the prompt
ii. For example, an action could be one of [Flights, Search, Code, None], where the first
3 represent a known tool that the model can choose, and the last represents “no
tool choice”
September 2024 10
代理⼈
Agents
•• 链式思维(CoT),⼀种促进推理的提示⼯程框架
Chain-of-Thought (CoT), a prompt engineering framework that enables reasoning
通过中间步骤的能⼒。CoT 有多种⼦技术,包括
capabilities through intermediate steps. There are various sub-techniques of CoT including
⾃⼀致性、主动提示和多模态 CoT,各⾃具有优势和
self-consistency, active-prompt, and multimodal CoT that each have strengths and
根据具体应⽤的弱点。
weaknesses depending on the specific application.
•• 思维树(ToT),⼀个适合的提示⼯程框架
Tree-of-thoughts (ToT),, a prompt engineering framework that is well suited for
探索或战略前瞻任务。它对思维链提示进⾏了概括。
exploration or strategic lookahead tasks. It generalizes over chain-of-thought prompting
并允许模型探索作为中间步骤的各种思维链
and allows the model to explore various thought chains that serve as intermediate steps
⽤于语⾔模型的⼀般问题解决。
for general problem solving with language models.
代理可以利⽤上述推理技术之⼀,或许多其他技术,来
Agents can utilize one of the above reasoning techniques, or many other techniques, to
为给定⽤户请求选择下⼀个最佳⾏动。例如,让我们考虑⼀个代理
choose the next best action for the given user request. For example, let’s consider an agent
该程序被编程为使⽤ ReAct
that is programmed 框架来选择正确的⾏动和⼯具
to use the ReAct framework to choose the correct actions and tools for
⽤户查询。事件的顺序可能是这样的:
the user query. The sequence of events might go something like this:
1. ⽤户向代理发送查询
1. User sends query to the agent
2. 代理开始
2. ReActthe
Agent begins 序列 ReAct sequence
3. 代理向模型提供提示,要求其⽣成下⼀个
3. ReAct 之⼀
The agent provides a prompt to the model, asking it to generate one of the next ReAct
步骤及其相应输出:
steps and its corresponding output:
a.
a. 问题:⽤户查询中提供的输⼊问题,附带提示
Question: The input question from the user query, provided with the prompt
b.
b. 思考:模型关于接下来应该做什么的想法
Thought: The model’s thoughts about what it should do next
c.
c. ⾏动:模型对接下来采取何种⾏动的决策
Action: The model’s decision on what action to take next
i. 这就是⼯具选择可以发⽣的地⽅
This is where tool choice can occur
2024 年 9 ⽉ 2024
September 10
10
Agents
d. Action input: The model’s decision on what inputs to provide to the tool (if any)
i. This thought / action / action input / observation could repeat N-times as needed
f. Final answer: The model’s final answer to provide to the original user query
4. The ReAct loop concludes and a final answer is provided back to the user
As shown in Figure 2, the model, tools, and agent configuration work together to provide
a grounded, concise response back to the user based on the user’s original query. While
the model could have guessed at an answer (hallucinated) based on its prior knowledge,
it instead used a tool (Flights) to search for real-time external information. This additional
information was provided to the model, allowing it to make a more informed decision based
on real factual data and to summarize this information back to the user.
September 2024 11
代理⼈
Agents
d.
d. ⾏动输⼊:模型对提供给⼯具的输⼊(如果有的话)的决策
Action input: The model’s decision on what inputs to provide to the tool (if any)
e.
e. 观察:动作/动作输⼊序列的结果
Observation: The result of the action / action input sequence
i. 这个想法 / ⾏动 //action
This thought ⾏动输⼊ / 观察可以根据需要重复
/ action N次
input / observation could repeat N-times as needed
最终答案:模型对原始⽤户查询的最终答案
f. Final answer: The model’s final answer to provide to the original user query
4. ReAct
4. 循环结束,最终答案返回给⽤户
The ReAct loop concludes and a final answer is provided back to the user
图 2. 在编排层中具有
Figure 2. ExampleReAct
agent推理的示例代理
with ReAct reasoning in the orchestration layer
如图 2 所示,模型、⼯具和代理配置协同⼯作以提供
As shown in Figure 2, the model, tools, and agent configuration work together to provide
根据⽤户的原始查询,给出⼀个切实、简洁的回复。虽然
a grounded, concise response back to the user based on the user’s original query. While
模型可能基于其先前的知识猜测了⼀个答案(幻觉)
the model could have guessed at an answer (hallucinated) based on its prior knowledge,
它改为使⽤⼀个⼯具(Flights)来搜索实时外部信息。这个额外的
it instead used a tool (Flights) to search for real-time external information. This additional
信息被提供给模型,使其能够做出更明智的决策
information was provided to the model, allowing it to make a more informed decision based
基于真实的事实数据,并将这些信息总结回⽤户。
on real factual data and to summarize this information back to the user.
2024 年 9 ⽉ 2024
September 11
Agents
In summary, the quality of agent responses can be tied directly to the model’s ability to
reason and act about these various tasks, including the ability to select the right tools, and
how well that tools has been defined. Like a chef crafting a dish with fresh ingredients and
attentive to customer feedback, agents rely on sound reasoning and reliable information to
deliver optimal results. In the next section, we’ll dive into the various ways agents connect
with fresh data.
While they go by many names, tools are what create a link between our foundational models
and the outside world. This link to external systems and data allows our agent to perform a
wider variety of tasks and do so with more accuracy and reliability. For instance, tools can
enable agents to adjust smart home settings, update calendars, fetch user information from
a database, or send emails based on a specific set of instructions.
As of the date of this publication, there are three primary tool types that Google models are
able to interact with: Extensions, Functions, and Data Stores. By equipping agents with tools,
we unlock a vast potential for them to not only understand the world but also act upon it,
opening doors to a myriad of new applications and possibilities.
September 2024 12
代理⼈
Agents
总之,代理响应的质量可以直接与模型的能⼒相关联
In summary, the quality of agent responses can be tied directly to the model’s ability to
关于这些各种任务的原因和⾏动,包括选择合适⼯具的能⼒,以及
reason and act about these various tasks, including the ability to select the right tools, and
这个⼯具定义得多么好。就像⼀个厨师⽤新鲜的⻝材制作菜肴⼀样。
how well that tools has been defined. Like a chef crafting a dish with fresh ingredients and
关注客户反馈,代理商依靠合理的推理和可靠的信息来
attentive to customer feedback, agents rely on sound reasoning and reliable information to
提供最佳结果。在下⼀部分,我们将深⼊探讨代理连接的各种⽅式。
deliver optimal results. In the next section, we’ll dive into the various ways agents connect
使⽤新数据。
with fresh data.
⼯具:我们通往外部世界的钥匙
Tools: Our keys to the outside world
虽然语⾔模型在处理信息⽅⾯表现出⾊,但它们缺乏直接的能⼒
While language models excel at processing information, they lack the ability to directly
感知和影响现实世界。这限制了它们在需要的情况下的实⽤性。
perceive and influence the real world. This limits their usefulness in situations requiring
与外部系统或数据的交互。这意味着,从某种意义上说,语⾔模型
interaction with external systems or data. This means that, in a sense, a language model
仅仅取决于它从训练数据中学到的内容。但⽆论多少
is only as good as what it has learned from its training data. But regardless of how much
我们投给模型的数据,他们仍然缺乏与外界互动的基本能⼒
data we throw at a model, they still lack the fundamental ability to interact with the outside
世界。那么我们如何能够使我们的模型具备实时、上下⽂感知的互动能⼒呢?
world. So how can we empower our models to have real-time, context-aware interaction with
外部系统?功能、扩展、数据存储和插件都是提供这⼀点的⽅式
external systems? Functions, Extensions, Data Stores and Plugins are all ways to provide this
对模型的关键能⼒。
critical capability to the model.
虽然它们有很多名称,但⼯具是连接我们基础模型的桥梁
While they go by many names, tools are what create a link between our foundational models
与外部世界的联系。这个与外部系统和数据的链接使我们的代理能够执⾏⼀个
and the outside world. This link to external systems and data allows our agent to perform a
更⼴泛的任务种类,并且能够以更⾼的准确性和可靠性完成。例如,⼯具可以
wider variety of tasks and do so with more accuracy and reliability. For instance, tools can
使代理能够调整智能家居设置、更新⽇历、获取⽤户信息
enable agents to adjust smart home settings, update calendars, fetch user information from
⼀个数据库,或根据特定指令发送电⼦邮件。
a database, or send emails based on a specific set of instructions.
截⾄本出版物的⽇期,Google 模型有三种主要⼯具类型
As of the date of this publication, there are three primary tool types that Google models are
能够与:Extensions、Functions 和 Data
able to interact with: Extensions, Stores 进⾏交互。通过为代理配备⼯具,
Functions, and Data Stores. By equipping agents with tools,
我们为他们解锁了巨⼤的潜⼒,不仅能够理解世界,还能够对其采取⾏动,
we unlock a vast potential for them to not only understand the world but also act upon it,
为⽆数新应⽤和可能性打开⼤⻔。
opening doors to a myriad of new applications and possibilities.
2024 年 9 ⽉ 2024
September 12
12
Agents
Extensions
The easiest way to understand Extensions is to think of them as bridging the gap between
an API and an agent in a standardized way, allowing agents to seamlessly execute APIs
regardless of their underlying implementation. Let’s say that you’ve built an agent with a goal
of helping users book flights. You know that you want to use the Google Flights API to retrieve
flight information, but you’re not sure how you’re going to get your agent to make calls to this
API endpoint.
One approach could be to implement custom code that would take the incoming user query,
parse the query for relevant information, then make the API call. For example, in a flight
booking use case a user might state “I want to book a flight from Austin to Zurich.” In this
scenario, our custom code solution would need to extract “Austin” and “Zurich” as relevant
entities from the user query before attempting to make the API call. But what happens if the
user says “I want to book a flight to Zurich” and never provides a departure city? The API call
would fail without the required data and more code would need to be implemented in order
to catch edge and corner cases like this. This approach is not scalable and could easily break
in any scenario that falls outside of the implemented custom code.
September 2024 13
代理⼈
Agents
扩展
Extensions
理解扩展的最简单⽅法是将它们视为弥合之间的差距
The easiest way to understand Extensions is to think of them as bridging the gap between
以标准化的⽅式使⽤ APIin和代理,使代理能够⽆缝执⾏
an API and an agent API agents to seamlessly execute APIs
a standardized way, allowing
⽆论其底层实现如何。假设你已经构建了⼀个⽬标为
regardless of their underlying implementation. Let’s say that you’ve built an agent with a goal
帮助⽤户预订航班。你知道你想使⽤
of helping users book flights. YouGoogle Flights
know that youAPI 来检索
want to use the Google Flights API to retrieve
航班信息,但你不确定如何让你的代理打电话给这个
flight information, but you’re not sure how you’re going to get your agent to make calls to this
API
API端点。
endpoint.
图 3. 代理如何与外部
Figure API 交互?
3. How do Agents interact with External APIs?
⼀种⽅法是实现⾃定义代码,以处理传⼊的⽤户查询,
One approach could be to implement custom code that would take the incoming user query,
解析查询以获取相关信息,然后进⾏ API 调⽤。例如,在航班中
parse the query for relevant information, then make the API call. For example, in a flight
预订⽤例,⽤户可能会说:“我想预订⼀趟从奥斯丁到苏黎世的航班。”在此
booking use case a user might state “I want to book a flight from Austin to Zurich.” In this
在这种情况下,我们的⾃定义代码解决⽅案需要提取“Austin”和“Zurich”作为相关内容
scenario, our custom code solution would need to extract “Austin” and “Zurich” as relevant
在尝试进⾏ APIthe
entities from 调⽤之前,从⽤户查询中提取实体。但如果发⽣什么情况,
user query before attempting to make the API call. But what happens if the
⽤户说“我想预订⼀趟⻜往苏黎世的航班”,但从未提供出发城市?API
user says “I want to book a flight to Zurich” and never provides调⽤
a departure city? The API call
如果没有所需的数据,将会失败,并且需要实现更多的代码
would fail without the required data and more code would need to be implemented in order
捕捉像这样的边缘和极端情况。这个⽅法不可扩展,可能很容易失效。
to catch edge and corner cases like this. This approach is not scalable and could easily break
在任何不属于已实施⾃定义代码的场景中。
in any scenario that falls outside of the implemented custom code.
2024 年 9 ⽉ 2024
September 13
13
Agents
A more resilient approach would be to use an Extension. An Extension bridges the gap
between an agent and an API by:
1. Teaching the agent how to use the API endpoint using examples.
2. Teaching the agent what arguments or parameters are needed to successfully call the
API endpoint.
Extensions can be crafted independently of the agent, but should be provided as part of the
agent’s configuration. The agent uses the model and examples at run time to decide which
Extension, if any, would be suitable for solving the user’s query. This highlights a key strength
of Extensions, their built-in example types, that allow the agent to dynamically select the
most appropriate Extension for the task.
September 2024 14
代理⼈
Agents
⼀种更具韧性的⽅法是使⽤扩展。扩展弥补了差距。
A more resilient approach would be to use an Extension. An Extension bridges the gap
通过代理和
between anAPI 之间:
agent and an API by:
1. 教代理如何使⽤
1. API 端点的示例。
Teaching the agent how to use the API endpoint using examples.
2. 教代理需要哪些参数或参数才能成功调⽤
2. Teaching the agent what arguments or parameters are needed to successfully call the
API
API端点。
endpoint.
图 4. 扩展将代理连接到外部
Figure API
4. Extensions connect Agents to External APIs
扩展可以独⽴于代理进⾏制作,但应作为的⼀部分提供
Extensions can be crafted independently of the agent, but should be provided as part of the
代理的配置。代理在运⾏时使⽤模型和示例来决定哪个
agent’s configuration. The agent uses the model and examples at run time to decide which
扩展(如有)将适合解决⽤户的查询。这突显了⼀个关键优势。
Extension, if any, would be suitable for solving the user’s query. This highlights a key strength
扩展的内置示例类型,使代理能够动态选择
of Extensions, their built-in example types, that allow the agent to dynamically select the
最适合该任务的扩展。
most appropriate Extension for the task.
图 5. 代理、扩展和
Figure API 之间的⼀对多关系
5. 1-to-many relationship between Agents, Extensions and APIs
2024 年 9 ⽉ 2024
September 14
14
Agents
Think of this the same way that a software developer decides which API endpoints to use
while solving and solutioning for a user’s problem. If the user wants to book a flight, the
developer might use the Google Flights API. If the user wants to know where the nearest
coffee shop is relative to their location, the developer might use the Google Maps API. In
this same way, the agent / model stack uses a set of known Extensions to decide which one
will be the best fit for the user’s query. If you’d like to see Extensions in action, you can try
them out on the Gemini application by going to Settings > Extensions and then enabling any
you would like to test. For example, you could enable the Google Flights extension then ask
Gemini “Show me flights from Austin to Zurich leaving next Friday.”
Sample Extensions
To simplify the usage of Extensions, Google provides some out of the box extensions that
can be quickly imported into your project and used with minimal configurations. For example,
the Code Interpreter extension in Snippet 1 allows you to generate and run Python code from
a natural language description.
September 2024 15
代理⼈
Agents
把这看作软件开发⼈员决定使⽤哪些
Think of this the same way that aAPI 端点的⽅式
software developer decides which API endpoints to use
在为⽤户的问题进⾏解决和⽅案制定时。如果⽤户想要预订航班,
while solving and solutioning for a user’s problem. If the user wants to book a flight, the
开发者可能会使⽤
developer might Google
use theFlights
Google API。如果⽤户想知道最近的
Flights API. If the user wants to know where the nearest
咖啡店与其位置相关,开发者可能会使⽤ Google
coffee shop is relative to their location, Maps API。
the developer might use the Google Maps API. In
以这种⽅式,代理/模型堆栈使⽤⼀组已知的扩展来决定使⽤哪⼀个
this same way, the agent / model stack uses a set of known Extensions to decide which one
将最适合⽤户查询的内容。如果您想查看扩展的实际效果,可以尝试
will be the best fit for the user’s query. If you’d like to see Extensions in action, you can try
在 Gemini
them 应⽤程序中通过转到设置
out on > 扩展,然后启⽤任何功能
the Gemini application by going to Settings > Extensions and then enabling any
您想要测试。例如,您可以启⽤ Google Flights
you would like to test. For example, 扩展,然后询问
you could enable the Google Flights extension then ask
双⼦座
Gemini“给我查⼀下下周五从奥斯丁到苏黎世的航班。”
“Show me flights from Austin to Zurich leaving next Friday.”
样本扩展
Sample Extensions
为了简化扩展的使⽤,⾕歌提供了⼀些开箱即⽤的扩展
To simplify the usage of Extensions, Google provides some out of the box extensions that
可以快速导⼊到您的项⽬中,并以最⼩的配置使⽤。例如,
can be quickly imported into your project and used with minimal configurations. For example,
代码解释器扩展在⽚段
the Code Interpreter 1extension
中允许您⽣成并运⾏in SnippetPython 代码
1 allows you to generate and run Python code from
⾃然语⾔描述。
a natural language description.
2024 年 9 ⽉ 2024
September 15
15
Agents
Python
import vertexai
import pprint
PROJECT_ID = "YOUR_PROJECT_ID"
REGION = "us-central1"
vertexai.init(project=PROJECT_ID, location=REGION)
extension_code_interpreter = Extension.from_hub("code_interpreter")
CODE_QUERY = """Write a python method to invert a binary tree in O(n) time."""
response = extension_code_interpreter.execute(
operation_id = "generate_and_execute",
operation_params = {"query": CODE_QUERY}
)
print("Generated Code:")
pprint.pprint({response['generated_code']})
September 2024 16
代理⼈
Agents
Python
Python
导⼊ vertexai
import vertexai
import
导⼊ pprint
pprint
PROJECT_ID
PROJECT_ID= ="YOUR_PROJECT_ID"
"YOUR_PROJECT_ID"
REGION
REGION =="us-central1"
"us-central1"
vertexai.init(project=PROJECT_ID,
vertexai.init(project=PROJECT_ID, location=REGION)
location=REGION)
从 vertexai.preview.extensions
from 导⼊ Extension
vertexai.preview.extensions import Extension
extension_code_interpreter
extension_code_interpreter = Extension.from_hub("code_interpreter")
= Extension.from_hub("code_interpreter") CODE_QUERY = """编写
⼀个 Python ⽅法,以
CODE_QUERY O(n) 时间反转⼆叉树。"""
= """Write a python method to invert a binary tree in O(n) time."""
response= =extension_code_interpreter.execute(
response extension_code_interpreter.execute(
operation_id = "generate_and_execute",
operation_id = "generate_and_execute",
operation_params = {"query": CODE_QUERY} )
operation_params = {"query": CODE_QUERY}
)
print("Generated Code:")
打印("⽣成的代码:")
pprint.pprint({response['generated_code']})
pprint.pprint({response['generated_code']})
## 上⾯的代码⽚段将⽣成以下代码。
The above snippet will ``` ⽣成的代码:class
generate TreeNode:
the following code.
```
Generated Code:
class TreeNode:
def __init__(self,
def __init__(self, val=0,
val=0, left=None,
left=None, right=None):
right=None):
self.val = val self.left
self.val = val = left self.right = right
self.left = left
self.right = right
继续下⼀⻚...
Continues next page...
2024 年 9 ⽉ 2024
September 16
16
Agents
Python
def invert_binary_tree(root):
"""
Inverts a binary tree.
Args:
root: The root of the binary tree.
Returns:
The root of the inverted binary tree.
"""
if not root:
return None
return root
# Example usage:
# Construct a sample binary tree
root = TreeNode(4)
root.left = TreeNode(2)
root.right = TreeNode(7)
root.left.left = TreeNode(1)
root.left.right = TreeNode(3)
root.right.left = TreeNode(6)
root.right.right = TreeNode(9)
Snippet 1. Code Interpreter Extension can generate and run Python code
September 2024 17
代理⼈
Agents
Python
Python
def
def invert_binary_tree(root):
invert_binary_tree(root):
"""
反转⼆叉树。参数:
Inverts a binary tree.
Args:
root: The root of the binary tree.
根:⼆叉树的根。返回:
Returns:
The root """
反转⼆叉树的根。 of the inverted binary tree.
"""
if not root:
如果不是根节点:
NoneNone
return
返回
return root
返回根节点
Example #
## 示例⽤法: usage:
构造⼀个示例⼆叉树 root =
TreeNode(4)
# Constructroot.left
a sample= binary
TreeNode(2)
tree
root.right = TreeNode(7)
root = TreeNode(4)
root.left.left = TreeNode(1)
root.left = TreeNode(2)
root.left.right = TreeNode(3)
root.right = TreeNode(7)
root.right.left = TreeNode(6)
root.right.right = TreeNode(9)
root.left.left TreeNode(1)
root.left.right = TreeNode(3)
root.right.left = TreeNode(6)
root.right.right = TreeNode(9)
## 反转⼆叉树 inverted_root
Invert the =
binary tree
invert_binary_tree(root) ```
inverted_root = invert_binary_tree(root)
```
代码解释器扩展可以⽣成并运⾏
Snippet 1. Code InterpreterPython 代码 can generate and run Python code
Extension
2024 年 9 ⽉ 2024
September 17
17
Agents
To summarize, Extensions provide a way for agents to perceive, interact, and influence the
outside world in a myriad of ways. The selection and invocation of these Extensions is guided
by the use of Examples, all of which are defined as part of the Extension configuration.
Functions
Functions work very similarly in the world of agents, but we can replace the software
developer with a model. A model can take a set of known functions and decide when to use
each Function and what arguments the Function needs based on its specification. Functions
differ from Extensions in a few ways, most notably:
1. A model outputs a Function and its arguments, but doesn’t make a live API call.
Using our Google Flights example again, a simple setup for functions might look like the
example in Figure 7.
September 2024 18
代理⼈
Agents
总之,扩展为代理提供了⼀种感知、互动和影响的⽅式
To summarize, Extensions provide a way for agents to perceive, interact, and influence the
以多种⽅式影响外部世界。这些扩展的选择和调⽤是由
outside world in a myriad of ways. The selection and invocation of these Extensions is guided
通过使⽤示例,所有这些都被定义为扩展配置的⼀部分。
by the use of Examples, all of which are defined as part of the Extension configuration.
函数
Functions
在软件⼯程的世界中,函数被定义为⾃包含的模块
In the world of software engineering, functions are defined as self-contained modules
完成特定任务并可以根据需要重复使⽤的代码。当软件
of code that accomplish a specific task and can be reused as needed. When a software
开发者在编写程序时,通常会创建许多函数来执⾏各种任务。
developer is writing a program, they will often create many functions to do various tasks.
它们还将定义何时调⽤
They will also define function_a 与 when
the logic for function_b 的逻辑,以及
to call function_a versus function_b, as well as the
预期的输⼊和输出。
expected inputs and outputs.
在代理的世界中,函数的⼯作⽅式⾮常相似,但我们可以替换软件
Functions work very similarly in the world of agents, but we can replace the software
带有模型的开发者。模型可以接受⼀组已知函数并决定何时使⽤。
developer with a model. A model can take a set of known functions and decide when to use
每个函数及其根据规范所需的参数。函数
each Function and what arguments the Function needs based on its specification. Functions
在⼏个⽅⾯与扩展不同,最显著的是:
differ from Extensions in a few ways, most notably:
1. ⼀个模型输出⼀个函数及其参数,但不进⾏实时
1. API 调⽤。
A model outputs a Function and its arguments, but doesn’t make a live API call.
2. 功能在客户端执⾏,⽽扩展在
2. Functions are executed on the client-side, while Extensions are executed on
代理端。
the agent-side.
再次以我们的 GoogleFlights
Using our Google Flightsexample
示例为例,函数的简单设置可能看起来像这样
again, a simple setup for functions might look like the
图 7 中的示例。
example in Figure 7.
2024 年 9 ⽉ 2024
September 18
18
Agents
Note that the main difference here is that neither the Function nor the agent interact directly
with the Google Flights API. So how does the API call actually happen?
With functions, the logic and execution of calling the actual API endpoint is offloaded away
from the agent and back to the client-side application as seen in Figure 8 and Figure 9 below.
This offers the developer more granular control over the flow of data in the application. There
are many reasons why a Developer might choose to use functions over Extensions, but a few
common use cases are:
• API calls need to be made at another layer of the application stack, outside of the direct
agent architecture flow (e.g. a middleware system, a front end framework, etc.)
• Security or Authentication restrictions that prevent the agent from calling an API directly
(e.g API is not exposed to the internet, or non-accessible by agent infrastructure)
• Timing or order-of-operations constraints that prevent the agent from making API calls in
real-time. (i.e. batch operations, human-in-the-loop review, etc.)
September 2024 19
代理⼈
Agents
图 7. 函数如何与外部
Figure API 交互? interact with external APIs?
7. How do functions
请注意,这⾥的主要区别在于函数和代理都没有直接交互
Note that the main difference here is that neither the Function nor the agent interact directly
使⽤
with Google Flights
the Google API。那么
Flights APIhow
API. So 调⽤实际上是如何发⽣的呢?
does the API call actually happen?
通过函数,调⽤实际
With functions, theAPI 端点的逻辑和执⾏被转移出去
logic and execution of calling the actual API endpoint is offloaded away
从代理到客户端应⽤程序,如下图
from the agent and back to the8client-side
和图 9 所示。 application as seen in Figure 8 and Figure 9 below.
这为开发者提供了对应⽤程序中数据流的更细粒度控制。
This offers the developer more granular control over the flow of data in the application. There
有很多原因可以解释为什么开发者可能选择使⽤函数⽽不是扩展,但有⼀些
are many reasons why a Developer might choose to use functions over Extensions, but a few
常⻅的⽤例包括:
common use cases are:
•• API
API调⽤需要在应⽤程序堆栈的另⼀个层次上进⾏,位于直接之外
calls need to be made at another layer of the application stack, outside of the direct
代理架构流程(例如,中间件系统、前端框架等)
agent architecture flow (e.g. a middleware system, a front end framework, etc.)
•• 安全或身份验证限制,阻⽌代理直接调⽤ API
Security or Authentication restrictions that prevent the agent from calling an API directly
(例如,API 未暴露于互联⽹,或⽆法通过代理基础设施访问)
(e.g API is not exposed to the internet, or non-accessible by agent infrastructure)
•• 时间或操作顺序约束,阻⽌代理进⾏ API 调⽤ that prevent the agent from making API calls in
Timing or order-of-operations constraints
实时。(即批处理操作、⼈机协作审查等)
real-time. (i.e. batch operations, human-in-the-loop review, etc.)
2024 年 9 ⽉ 2024
September 19
19
Agents
• Additional data transformation logic needs to be applied to the API Response that the
agent cannot perform. For example, consider an API endpoint that doesn’t provide a
filtering mechanism for limiting the number of results returned. Using Functions on the
client-side provides the developer additional opportunities to make these transformations.
While the difference in internal architecture between the two approaches is subtle as seen in
Figure 8, the additional control and decoupled dependency on external infrastructure makes
Function Calling an appealing option for the Developer.
Figure 8. Delineating client vs. agent side control for extensions and function calling
September 2024 20
代理⼈
Agents
•• 开发者希望在不部署额外资源的情况下迭代代理开发
The developer wants to iterate on agent development without deploying additional
API 端点的基础设施(即函数调⽤可以像“存根”API
infrastructure ⼀样⼯作)
for the API endpoints (i.e. Function Calling can act like “stubbing” of APIs)
)While
尽管两种⽅法之间的内部架构差异如所⻅微妙
the difference in internal architecture between the two approaches is subtle as seen in
图 8,额外的控制和对外部基础设施的解耦依赖使得
Figure 8, the additional control and decoupled dependency on external infrastructure makes
函数调⽤是开发者的⼀个吸引⼈的选择。
Function Calling an appealing option for the Developer.
图 8. 划分客户端与代理端控制的扩展和函数调⽤
Figure 8. Delineating client vs. agent side control for extensions and function calling
2024 年 9 ⽉ 2024
September 20
20
Agents
Use cases
A model can be used to invoke functions in order to handle complex, client-side execution
flows for the end user, where the agent Developer might not want the language model to
manage the API execution (as is the case with Extensions). Let’s consider the following
example where an agent is being trained as a travel concierge to interact with users that want
to book vacation trips. The goal is to get the agent to produce a list of cities that we can use
in our middleware application to download images, data, etc. for the user’s trip planning. A
user might say something like:
I’d like to take a ski trip with my family but I’m not sure where to go.
In a typical prompt to the model, the output might look like the following:
Sure, here’s a list of cities that you can consider for family ski trips:
• Zermatt, Switzerland
While the above output contains the data that we need (city names), the format isn’t ideal
for parsing. With Function Calling, we can teach a model to format this output in a structured
style (like JSON) that’s more convenient for another system to parse. Given the same input
prompt from the user, an example JSON output from a Function might look like Snippet
5 instead.
September 2024 21
代理⼈
Agents
使⽤案例
Use cases
可以使⽤模型来调⽤函数,以处理复杂的客户端执⾏
A model can be used to invoke functions in order to handle complex, client-side execution
最终⽤户的流程,其中代理开发者可能不希望语⾔模型去
flows for the end user, where the agent Developer might not want the language model to
管理 API 执⾏(与扩展的情况相同)。让我们考虑以下内容
manage the API execution (as is the case with Extensions). Let’s consider the following
示例,其中⼀个代理被训练为旅⾏礼宾,与想要互动的⽤户交流
example where an agent is being trained as a travel concierge to interact with users that want
预订度假旅⾏。⽬标是让代理⽣成我们可以使⽤的城市列表。
to book vacation trips. The goal is to get the agent to produce a list of cities that we can use
在我们的中间件应⽤程序中下载图像、数据等,以便⽤户进⾏旅⾏规划。
in our middleware application to download images, data, etc. for the user’s trip planning. A
⽤户可能会说类似于:
user might say something like:
我想和家⼈⼀起去滑雪,但我不确定去哪⾥。
I’d like to take a ski trip with my family but I’m not sure where to go.
在对模型的典型提示中,输出可能如下所示:
In a typical prompt to the model, the output might look like the following:
当然,这⾥有⼀份适合家庭滑雪旅⾏的城市列表:
Sure, here’s a list of cities that you can consider for family ski trips:
•• 科罗拉多州克雷斯特德⽐特,美国
Crested Butte, Colorado, USA
•• 惠斯勒,加拿⼤不列颠哥伦⽐亚省
Whistler, BC, Canada
•• 瑞⼠采尔⻢特
Zermatt, Switzerland
虽然上述输出包含我们需要的数据(城市名称),但格式并不理想
While the above output contains the data that we need (city names), the format isn’t ideal
⽤于解析。通过函数调⽤,我们可以教模型以结构化的⽅式格式化此输出。
for parsing. With Function Calling, we can teach a model to format this output in a structured
更⽅便另⼀个系统解析的样式(如 JSON)。给定相同的输⼊
style (like JSON) that’s more convenient for another system to parse. Given the same input
⽤户的提示,来⾃函数的示例 JSON 输出可能看起来像
prompt from the user, an example Snippet
JSON output from a Function might look like Snippet
55 代替。
instead.
2024 年 9 ⽉ 2024
September 21
21
Agents
Unset
function_call {
name: "display_cities"
args: {
"cities": ["Crested Butte", "Whistler", "Zermatt"],
"preferences": "skiing"
}
}
Snippet 5. Sample Function Call payload for displaying a list of cities and user preferences
This JSON payload is generated by the model, and then sent to our Client-side server to do
whatever we would like to do with it. In this specific case, we’ll call the Google Places API to
take the cities provided by the model and look up Images, then provide them as formatted
rich content back to our User. Consider this sequence diagram in Figure 9 showing the above
interaction in step by step detail.
September 2024 22
代理⼈
Agents
未设置
Unset
function_call
function_call{ { name:
"display_cities" args: {
name: "display_cities"
args: {
"城市": ["克雷斯特德·巴特",
"cities": ["Crested"惠斯勒",
Butte","采尔⻢特"], "偏好":
"Whistler", "滑
"Zermatt"],
雪" }"preferences":
} "skiing"
}
}
⽚段 5. 显示城市列表和⽤户偏好的示例函数调⽤有效负载
Snippet 5. Sample Function Call payload for displaying a list of cities and user preferences
此 JSON
This 负载由模型⽣成,然后发送到我们的客户端服务器进⾏处理
JSON payload is generated by the model, and then sent to our Client-side server to do
我们可以对它做任何我们想做的事情。在这个特定的例⼦中,我们将调⽤
whatever we would like to do with it. In this specific case, we’ll callGoogle PlacesPlaces
the Google API 来 API to
获取模型提供的城市并查找图像,然后将其格式化后提供
take the cities provided by the model and look up Images, then provide them as formatted
将丰富的内容返回给我们的⽤户。请考虑图
rich content back to our User. Consider9this
中的此序列图,显示上述内容。
sequence diagram in Figure 9 showing the above
逐步详细的互动。
interaction in step by step detail.
2024 年 9 ⽉ 2024
September 22
22
Agents
The result of the example in Figure 9 is that the model is leveraged to “fill in the blanks” with
the parameters required for the Client side UI to make the call to the Google Places API. The
Client side UI manages the actual API call using the parameters provided by the model in the
returned Function. This is just one use case for Function Calling, but there are many other
scenarios to consider like:
• You want a language model to suggest a function that you can use in your code, but you
don't want to include credentials in your code. Because function calling doesn't run the
function, you don't need to include credentials in your code with the function information.
September 2024 23
代理⼈
Agents
图 9. 显示函数调⽤⽣命周期的序列图
Figure 9. Sequence diagram showing the lifecycle of a Function Call
图 9 中的示例结果是模型被⽤来“填补空⽩”与
The result of the example in Figure 9 is that the model is leveraged to “fill in the blanks” with
客户端 UI 调⽤ Google
the parameters Places
required for API
the 所需的参数。
Client side UI to make the call to the Google Places API. The
客户端 UI 使⽤模型提供的参数管理实际的
Client side UI manages the actual APIAPI
call调⽤
using the parameters provided by the model in the
返回函数。这只是函数调⽤的⼀个⽤例,但还有许多其他⽤例。
returned Function. This is just one use case for Function Calling, but there are many other
需要考虑的场景,例如:
scenarios to consider like:
•• 您希望⼀个语⾔模型建议⼀个可以在您的代码中使⽤的函数,但您
You want a language model to suggest a function that you can use in your code, but you
不想在代码中包含凭据。因为函数调⽤不会运⾏
don't want to include credentials in your code. Because function calling doesn't run the
函数,您不需要在代码中包含凭据与函数信息。
function, you don't need to include credentials in your code with the function information.
2024 年 9 ⽉ 2024
September 23
23
Agents
• You are running asynchronous operations that can take more than a few seconds. These
scenarios work well with function calling because it's an asynchronous operation.
• You want to run functions on a device that's different from the system producing the
function calls and their arguments.
One key thing to remember about functions is that they are meant to offer the developer
much more control over not only the execution of API calls, but also the entire flow of data
in the application as a whole. In the example in Figure 9, the developer chose to not return
API information back to the agent as it was not pertinent for future actions the agent might
take. However, based on the architecture of the application, it may make sense to return the
external API call data to the agent in order to influence future reasoning, logic, and action
choices. Ultimately, it is up to the application developer to choose what is right for the
specific application.
To achieve the above output from our ski vacation scenario, let’s build out each of the
components to make this work with our gemini-1.5-flash-001 model.
September 2024 24
代理⼈
Agents
•• 您正在运⾏可能需要超过⼏秒钟的异步操作。这些
You are running asynchronous operations that can take more than a few seconds. These
场景与函数调⽤配合良好,因为这是⼀种异步操作。
scenarios work well with function calling because it's an asynchronous operation.
•• 您想在与⽣成该系统不同的设备上运⾏函数
You want to run functions on a device that's different from the system producing the
函数调⽤及其参数。
function calls and their arguments.
关于函数,有⼀点关键的事情需要记住,那就是它们旨在为开发者提供
One key thing to remember about functions is that they are meant to offer the developer
不仅对 API 调⽤的执⾏有更多控制,还对整个数据流有更多控制
much more control over not only the execution of API calls, but also the entire flow of data
在整个应⽤程序中。在图 9 的示例中,开发者选择不返回
in the application as a whole. In the example in Figure 9, the developer chose to not return
API
API信息返回给代理,因为它与代理可能采取的未来⾏动⽆关
information back to the agent as it was not pertinent for future actions the agent might
采取。然⽽,根据应⽤程序的架构,返回可能是有意义的。
take. However, based on the architecture of the application, it may make sense to return the
将外部 APIAPI
external 调⽤数据传递给代理,以影响未来的推理、逻辑和⾏动
call data to the agent in order to influence future reasoning, logic, and action
选择。最终,应⽤程序开发者需要选择什么是适合的。
choices. Ultimately, it is up to the application developer to choose what is right for the
特定应⽤。
specific application.
函数示例代码
Function sample code
为了从我们的滑雪假期场景中实现上述输出,让我们构建每⼀个
To achieve the above output from our ski vacation scenario, let’s build out each of the
使其与我们的
components gemini-1.5-flash-001 模型⼀起⼯作的组件。
to make this work with our gemini-1.5-flash-001 model.
⾸先,我们将把 display_cities
First, we’ll define 函数定义为⼀个简单的
our display_cities function as aPython
simple⽅法。
Python method.
2024 年 9 ⽉ 2024
September 24
24
Agents
Python
Args:
preferences (str): The user's preferences for the search, like skiing,
beach, restaurants, bbq, etc.
cities (list[str]): The list of cities being recommended to the user.
Returns:
list[str]: The list of cities being recommended to the user.
"""
return cities
Snippet 6. Sample python method for a function that will display a list of cities.
Next, we’ll instantiate our model, build the Tool, then pass in our user’s query and tools to
the model. Executing the code below would result in the output as seen at the bottom of the
code snippet.
September 2024 25
代理⼈
Agents
Python
Python
def
def display_cities(cities:
display_cities(cities:list[str], preferences:
list[str], Optional[str]
preferences: = None): = None):
Optional[str]
"""Provides a list of cities based on the user's search query and preferences.
提供基于⽤户搜索查询和偏好的城市列表。
参数:preferences(str):⽤户的搜索偏好,如滑雪、海滩、餐厅、烧烤等。
Args:
preferences (str): The user's preferences for the search, like skiing,
beach, restaurants, bbq, etc.
城市 (list[str]):
cities 推荐给⽤户的城市列表。
(list[str]): The list of cities being recommended to the user.
返回城市
return cities
⽚段 6. 显示城市列表的函数的示例
Snippet Pythonfor
6. Sample python method ⽅法。a function that will display a list of cities.
接下来,我们将实例化我们的模型,构建⼯具,然后传⼊⽤户的查询和⼯具
Next, we’ll instantiate our model, build the Tool, then pass in our user’s query and tools to
模型。执⾏下⾯的代码将产⽣如底部所示的输出。
the model. Executing the code below would result in the output as seen at the bottom of the
代码⽚段。
code snippet.
2024 年 9 ⽉ 2024
September 25
25
Agents
Python
model = GenerativeModel("gemini-1.5-flash-001")
display_cities_function = FunctionDeclaration.from_func(display_cities)
tool = Tool(function_declarations=[display_cities_function])
message = "I’d like to take a ski trip with my family but I’m not sure where
to go."
Snippet 7. Building a Tool, sending to the model with a user query and allowing the function call to take place
September 2024 26
代理⼈
Agents
Python
Python
从 vertexai.generative_models
from 导⼊ GenerativeModel、Tool、FunctionDeclaration
vertexai.generative_models import GenerativeModel, Tool, FunctionDeclaration
model
model ==GenerativeModel("gemini-1.5-flash-001")
GenerativeModel("gemini-1.5-flash-001")
display_cities_function
display_cities_function = FunctionDeclaration.from_func(display_cities) tool =
= FunctionDeclaration.from_func(display_cities)
Tool(function_declarations=[display_cities_function])
tool = Tool(function_declarations=[display_cities_function])
message
message= ="我想和家⼈⼀起去滑雪,但我不确定去哪⾥。"
"I’d like to take a ski trip with my family but I’m not sure where
to go."
res
res == model.generate_content(message,
model.generate_content(message,tools=[tool])
tools=[tool])
打印(f"函数名称: {res.candidates[0].content.parts[0].function_call.name}")
print(f"Function 打印(f"函数参数:
Name: {res.candidates[0].content.parts[0].function_call.name}")
{res.candidates[0].content.parts[0].function_call.args}")
print(f"Function Args: {res.candidates[0].content.parts[0].function_call.args}")
>> 函数名称:display_cities
Function Name: display_cities
> '滑雪', 'cities':
> Function Args: {'preferences':
函数参数:{'preferences': ['阿斯彭',
'skiing', '维尔', '公园城']}
'cities': ['Aspen', 'Vail',
'Park City']}
⽚段 7. 构建⼀个⼯具,向模型发送⽤户查询并允许函数调⽤发⽣
Snippet 7. Building a Tool, sending to the model with a user query and allowing the function call to take place
总之,函数提供了⼀个简单明了的框架,使应⽤程序能够
In summary, functions offer a straightforward framework that empowers application
开发⼈员可以对数据流和系统执⾏进⾏细粒度控制,同时有效地
developers with fine-grained control over data flow and system execution, while effectively
利⽤代理/模型进⾏关键输⼊⽣成。开发者可以选择性地选择
leveraging the agent/model for critical input generation. Developers can selectively choose
是否通过返回外部数据来让代理“保持在循环中”,或根据情况省略它
whether to keep the agent “in the loop” by returning external data, or omit it based on
特定应⽤架构要求。
specific application architecture requirements.
2024 年 9 ⽉ 2024
September 26
26
Agents
Data stores
Imagine a language model as a vast library of books, containing its training data. But unlike
a library that continuously acquires new volumes, this one remains static, holding only the
knowledge it was initially trained on. This presents a challenge, as real-world knowledge is
constantly evolving. Data Stores address this limitation by providing access to more dynamic
and up-to-date information, and ensuring a model’s responses remain grounded in factuality
and relevance.
Consider a common scenario where a developer might need to provide a small amount of
additional data to a model, perhaps in the form of spreadsheets or PDFs.
Figure 10. How can Agents interact with structured and unstructured data?
September 2024 27
代理⼈
Agents
数据存储
Data stores
想象⼀个语⾔模型就像⼀个庞⼤的图书馆,⾥⾯包含了它的训练数据。但与
Imagine a language model as a vast library of books, containing its training data. But unlike
⼀个不断获取新书籍的图书馆,⽽这个则保持静态,仅保留了
a library that continuously acquires new volumes, this one remains static, holding only the
知识它最初训练的。这带来了⼀个挑战,因为现实世界的知识是
knowledge it was initially trained on. This presents a challenge, as real-world knowledge is
不断发展。数据存储通过提供对更动态的访问来解决这⼀限制。
constantly evolving. Data Stores address this limitation by providing access to more dynamic
并且确保信息是最新的,并确保模型的响应保持在事实基础上
and up-to-date information, and ensuring a model’s responses remain grounded in factuality
和相关性。
and relevance.
考虑⼀个常⻅场景,开发者可能需要提供少量的
Consider a common scenario where a developer might need to provide a small amount of
将额外数据提供给模型,可能以电⼦表格或 PDFform
additional data to a model, perhaps in the 的形式。of spreadsheets or PDFs.
图 10. 代理如何与结构化和⾮结构化数据互动?
Figure 10. How can Agents interact with structured and unstructured data?
2024 年 9 ⽉ 2024
September 27
27
Agents
Data Stores allow developers to provide additional data in its original format to an agent,
eliminating the need for time-consuming data transformations, model retraining, or fine-
tuning. The Data Store converts the incoming document into a set of vector database
embeddings that the agent can use to extract the information it needs to supplement its next
action or response to the user.
Figure 11. Data Stores connect Agents to new real-time data sources of various types.
In the context of Generative AI agents, Data Stores are typically implemented as a vector
database that the developer wants the agent to have access to at runtime. While we won’t
cover vector databases in depth here, the key point to understand is that they store data
in the form of vector embeddings, a type of high-dimensional vector or mathematical
representation of the data provided. One of the most prolific examples of Data Store usage
with language models in recent times has been the implementation of Retrieval Augmented
September 2024 28
代理⼈
Agents
数据存储允许开发者以原始格式向代理提供额外数据,
Data Stores allow developers to provide additional data in its original format to an agent,
消除对耗时的数据转换、模型重训练或微调的需求,
eliminating the need for time-consuming data transformations, model retraining, or fine-
调优。数据存储将传⼊的⽂档转换为⼀组向量数据库
tuning. The Data Store converts the incoming document into a set of vector database
代理可以使⽤的嵌⼊,以提取其所需的信息,以补充其下⼀个
embeddings that the agent can use to extract the information it needs to supplement its next
对⽤户的⾏动或响应。
action or response to the user.
图 11. 数据存储将代理连接到各种类型的新实时数据源。
Figure 11. Data Stores connect Agents to new real-time data sources of various types.
实施与应⽤
Implementation and application
在⽣成性⼈⼯智能代理的背景下,数据存储通常被实现为向量
In the context of Generative AI agents, Data Stores are typically implemented as a vector
开发者希望代理在运⾏时访问的数据库。虽然我们不会
database that the developer wants the agent to have access to at runtime. While we won’t
在这⾥深⼊探讨覆盖向量数据库,关键点是它们存储数据
cover vector databases in depth here, the key point to understand is that they store data
以向量嵌⼊的形式,⼀种⾼维向量或数学类型
in the form of vector embeddings, a type of high-dimensional vector or mathematical
所提供数据的表示。数据存储使⽤的最丰富的例⼦之⼀
representation of the data provided. One of the most prolific examples of Data Store usage
最近语⾔模型的⼀个发展是实现了检索增强
with language models in recent times has been the implementation of Retrieval Augmented
2024 年 9 ⽉ 2024
September 28
28
Agents
Generation (RAG) based applications. These applications seek to extend the breadth and
depth of a model’s knowledge beyond the foundational training data by giving the model
access to data in various formats like:
• Website content
• Structured Data in formats like PDF, Word Docs, CSV, Spreadsheets, etc.
Figure 12. 1-to-many relationship between agents and data stores, which can represent various types of
pre-indexed data
The underlying process for each user request and agent response loop is generally modeled
as seen in Figure 13.
1. A user query is sent to an embedding model to generate embeddings for the query
2. The query embeddings are then matched against the contents of the vector database
using a matching algorithm like SCaNN
3. The matched content is retrieved from the vector database in text format and sent back to
the agent
4. The agent receives both the user query and retrieved content, then formulates a response
or action
September 2024 29
代理⼈
Agents
基于⽣成(RAG)的应⽤程序。这些应⽤程序旨在扩展范围和
Generation (RAG) based applications. These applications seek to extend the breadth and
模型知识的深度超越基础训练数据,通过给模型
depth of a model’s knowledge beyond the foundational training data by giving the model
访问各种格式的数据,如:
access to data in various formats like:
•• ⽹站内容
Website content
•• PDF、Word ⽂档、CSV、电⼦表格等格式中的结构化数据。
Structured Data in formats like PDF, Word Docs, CSV, Spreadsheets, etc.
•• HTML、PDF、TXT
Unstructured Data等格式的⾮结构化数据。
in formats like HTML, PDF, TXT, etc.
图 12. 代理与数据存储之间的⼀对多关系,可以表示各种类型的
Figure 12. 1-to-many relationship between agents and data stores, which can represent various types of
预先索引的数据
pre-indexed data
每个⽤户请求和代理响应循环的基本过程通常被建模为
The underlying process for each user request and agent response loop is generally modeled
如图 13 所示。
as seen in Figure 13.
1. ⽤户查询被发送到嵌⼊模型,以⽣成查询的嵌⼊
1. A user query is sent to an embedding model to generate embeddings for the query
2. 查询嵌⼊随后与向量数据库的内容进⾏匹配
2. The query embeddings are then matched against the contents of the vector database
使⽤像
using aSCaNN 这样的匹配算法
matching algorithm like SCaNN
3. 匹配的内容以⽂本格式从向量数据库中检索并发送回
3. The matched content is retrieved from the vector database in text format and sent back to
代理
the agent
4. 代理接收⽤户查询和检索到的内容,然后制定响应
4. The agent receives both the user query and retrieved content, then formulates a response
或⾏动
or action
2024 年 9 ⽉ 2024
September 29
29
Agents
Figure 13. The lifecycle of a user request and agent response in a RAG based application
The end result is an application that allows the agent to match a user’s query to a known data
store through vector search, retrieve the original content, and provide it to the orchestration
layer and model for further processing. The next action might be to provide a final answer to
the user, or perform an additional vector search to further refine the results.
A sample interaction with an agent that implements RAG with ReAct reasoning/planning can
be seen in Figure 14.
September 2024 30
代理⼈
Agents
5. 最终响应已发送给⽤户
5. A final response is sent to the user
图 13. 基于
Figure 13.RAG
The的应⽤中⽤户请求和代理响应的⽣命周期
lifecycle of a user request and agent response in a RAG based application
最终结果是⼀个应⽤程序,允许代理将⽤户的查询与已知数据匹配
The end result is an application that allows the agent to match a user’s query to a known data
通过向量搜索存储,检索原始内容,并将其提供给编排
store through vector search, retrieve the original content, and provide it to the orchestration
进⼀步处理的层和模型。下⼀步可能是提供最终答案给
layer and model for further processing. The next action might be to provide a final answer to
⽤户,或执⾏额外的向量搜索以进⼀步优化结果。
the user, or perform an additional vector search to further refine the results.
与实现 RAGinteraction
A sample 的代理进⾏的示例交互,使⽤ ReAct
with an agent that 推理/规划可以
implements RAG with ReAct reasoning/planning can
如图 14 所示。
be seen in Figure 14.
2024 年 9 ⽉ 2024
September 30
30
Agents
September 2024 31
代理⼈
Agents
图 14. 基于
Figure 14.RAG 的应⽤示例
Sample RAG w/ ReAct
based 推理/规划 w/ ReAct reasoning/planning
application
2024 年 9 ⽉ 2024
September 31
31
Agents
Tools recap
To summarize, extensions, functions and data stores make up a few different tool types
available for agents to use at runtime. Each has their own purpose and they can be used
together or independently at the discretion of the agent developer.
September 2024 32
代理⼈
Agents
⼯具回顾
Tools recap
总⽽⾔之,扩展、函数和数据存储构成了⼏种不同的⼯具类型
To summarize, extensions, functions and data stores make up a few different tool types
可供代理在运⾏时使⽤。每个都有其⾃⼰的⽬的,可以被使⽤。
available for agents to use at runtime. Each has their own purpose and they can be used
由代理开发者⾃⾏决定,可以⼀起或独⽴进⾏。
together or independently at the discretion of the agent developer.
扩展
Extensions 函数调⽤
Function Calling 数据存储
Data Stores
执⾏
Execution 代理端执⾏
Agent-Side Execution 客户端执⾏
Client-Side Execution 代理端执⾏
Agent-Side Execution
⽤例
Use Case •• 开发者想要
Developer wants •• 安全或
Security or 开发者想要
Developer wants to
代理以控制与的交互
agent to control 身份验证限制阻⽌代理调⽤
Authentication 实现检索增强⽣成
implement Retrieval
interactions with the ⼀个
restrictions prevent the Augmented Generation
API
API端点endpoints agent from calling an (RAG)
(RAG)与任何的
with any of the
API
API直接
directly 以下数据类型:
following data types:
•• 在利⽤本地预处理时很有
Useful when
⽤
leveraging native pre- •• 时间约束或操作顺序约
Timing constraints or •• 来⾃预先索引的域名和⽹
Website Content from
构建扩展(即 Vertex
built Extensions (i.e. 束,阻⽌代理
order-of-operations 址的⽹站内容
pre-indexed domains
Search、Code
Vertex Search, Code constraints that and URLs
Interpreter 等)
Interpreter, etc.) prevent the agent
•• 结构化数据在
Structured Data in
从实时进⾏
from makingAPIAPI调⽤。(即
calls
•• 多跳规划
Multi-hop planning 像 PDF 这样的格式,
formats like PDF,
批量操作,⼈类参与的审查
in real-time. (i.e. batch
和
andAPI 调⽤(即下⼀个
API calling 等。) Word
Word ⽂档,
Docs,CSV,
CSV,
代理 operations, human-in-
(i.e. the next agent 电⼦表格等。
Spreadsheets, etc.
the-loop review, etc.)
⾏动取决于
action depends on
•• 关系 / ⾮- / Non-
Relational
输出的
the outputs of the •• 不对外公开的 API,或者
API that is not exposed
关系数据库
Relational Databases
上⼀个动作 /
previous action / to the internet, or
API
API调⽤)
call) 不可访问的
non-accessible by •• ⾮结构化数据在
Unstructured Data in
⾕歌系统
Google systems 格式如
formatsHTML、PDF、TXT
like HTML, PDF,
等。
TXT, etc.
2024 年 9 ⽉ 2024
September 32
32
Agents
To help the model gain access to this type of specific knowledge, several approaches exist:
• In-context learning: This method provides a generalized model with a prompt, tools, and
few-shot examples at inference time which allows it to learn ‘on the fly' how and when to
use those tools for a specific task. The ReAct framework is an example of this approach in
natural language.
• Fine-tuning based learning: This method involves training a model using a larger dataset
of specific examples prior to inference. This helps the model understand when and how to
apply certain tools prior to receiving any user queries.
To provide additional insights on each of the targeted learning approaches, let’s revisit our
cooking analogy.
September 2024 33
代理⼈
Agents
通过增强模型性能
Enhancing model performance with
针对性学习
targeted learning
有效使⽤模型的⼀个关键⽅⾯是它们能够在需要时选择合适的⼯具
A crucial aspect of using models effectively is their ability to choose the right tools when
⽣成输出,特别是在⽣产中⼤规模使⽤⼯具时。虽然⼀般训练
generating output, especially when using tools at scale in production. While general training
帮助模型发展这⼀技能,现实世界的场景通常需要超出该知识的了解
helps models develop this skill, real-world scenarios often require knowledge beyond the
训练数据。想象⼀下,这就像基本烹饪技能与精通之间的区别。
training data. Imagine this as the difference between basic cooking skills and mastering
⼀种特定的烹饪⻛格。两者都需要基础的烹饪知识,但后者要求更⾼。
a specific cuisine. Both require foundational cooking knowledge, but the latter demands
针对性学习以获得更细致的结果。
targeted learning for more nuanced results.
为了帮助模型获取这种特定知识,有⼏种⽅法可供选择:
To help the model gain access to this type of specific knowledge, several approaches exist:
•• 上下⽂学习:该⽅法提供了⼀个带有提示、⼯具和的通⽤模型
In-context learning: This method provides a generalized model with a prompt, tools, and
推理时的少量示例,使其能够“即时”学习如何以及何时去
few-shot examples at inference time which allows it to learn ‘on the fly' how and when to
将这些⼯具⽤于特定任务。ReAct 框架就是这种⽅法的⼀个例⼦。
use those tools for a specific task. The ReAct framework is an example of this approach in
⾃然语⾔。
natural language.
•• 基于检索的上下⽂学习:该技术动态填充模型
Retrieval-based in-context learning: This technique dynamically populates the model
通过检索最相关的信息、⼯具和相关示例的提示
prompt with the most relevant information, tools, and associated examples by retrieving
它们来⾃外部存储器。⼀个例⼦是
them from external memory. AnVertex AI 中的“示例商店”。
example of this would be the ‘Example Store’ in Vertex AI
扩展或之前提到的基于 RAG
extensions or the data 架构的数据存储。
stores RAG based architecture mentioned previously.
•• 基于微调的学习:该⽅法涉及使⽤更⼤的数据集训练模型
Fine-tuning based learning: This method involves training a model using a larger dataset
在推理之前的具体示例。这有助于模型理解何时以及如何
of specific examples prior to inference. This helps the model understand when and how to
在接收任何⽤户查询之前应⽤某些⼯具。
apply certain tools prior to receiving any user queries.
为了对每种⽬标学习⽅法提供更多⻅解,让我们重新审视我们的
To provide additional insights on each of the targeted learning approaches, let’s revisit our
烹饪类⽐。
cooking analogy.
2024 年 9 ⽉ 2024
September 33
33
Agents
• Imagine a chef has received a specific recipe (the prompt), a few key ingredients (relevant
tools) and some example dishes (few-shot examples) from a customer. Based on this
limited information and the chef’s general knowledge of cooking, they will need to figure
out how to prepare the dish ‘on the fly’ that most closely aligns with the recipe and the
customer’s preferences. This is in-context learning.
• Now let’s imagine our chef in a kitchen that has a well-stocked pantry (external data
stores) filled with various ingredients and cookbooks (examples and tools). The chef is now
able to dynamically choose ingredients and cookbooks from the pantry and better align
to the customer’s recipe and preferences. This allows the chef to create a more informed
and refined dish leveraging both existing and new knowledge. This is retrieval-based
in-context learning.
• Finally, let’s imagine that we sent our chef back to school to learn a new cuisine or set of
cuisines (pre-training on a larger dataset of specific examples). This allows the chef to
approach future unseen customer recipes with deeper understanding. This approach is
perfect if we want the chef to excel in specific cuisines (knowledge domains). This is fine-
tuning based learning.
Each of these approaches offers unique advantages and disadvantages in terms of speed,
cost, and latency. However, by combining these techniques in an agent framework, we can
leverage the various strengths and minimize their weaknesses, allowing for a more robust and
adaptable solution.
September 2024 34
代理⼈
Agents
•• 想象⼀下,⼀个厨师收到了⼀个特定的⻝谱(提示),⼀些关键成分(相关的
Imagine a chef has received a specific recipe (the prompt), a few key ingredients (relevant
⼯具)和来⾃客户的⼀些示例菜肴(少量示例)。基于此
tools) and some example dishes (few-shot examples) from a customer. Based on this
有限的信息和厨师对烹饪的⼀般知识,他们需要弄清楚
limited information and the chef’s general knowledge of cooking, they will need to figure
快速准备与⻝谱最接近的菜肴的⽅法
out how to prepare the dish ‘on the fly’ that most closely aligns with the recipe and the
客户的偏好。这是在上下⽂中学习。
customer’s preferences. This is in-context learning.
•• 现在让我们想象⼀下我们的厨师在⼀个储备丰富的⻝品储藏室的厨房⾥
Now let’s imagine our chef in a kitchen that has a well-stocked pantry (external data
商店)⾥装满了各种⻝材和⻝谱(示例和⼯具)。厨师现在
stores) filled with various ingredients and cookbooks (examples and tools). The chef is now
能够动态选择储藏室中的⻝材和⻝谱,并更好地对⻬
able to dynamically choose ingredients and cookbooks from the pantry and better align
根据客户的配⽅和偏好。这使得厨师能够更有根据地创作。
to the customer’s recipe and preferences. This allows the chef to create a more informed
并利⽤现有和新知识精炼的菜肴。这是基于检索的
and refined dish leveraging both existing and new knowledge. This is retrieval-based
上下⽂学习。
in-context learning.
•• 最后,让我们想象⼀下,我们把厨师送回学校学习⼀种新的烹饪⻛格或⼀套新的技能
Finally, let’s imagine that we sent our chef back to school to learn a new cuisine or set of
菜肴(在更⼤数据集的特定示例上进⾏预训练)。这使得厨师能够
cuisines (pre-training on a larger dataset of specific examples). This allows the chef to
以更深⼊的理解来接触未来未⻅的客户需求。这种⽅法是
approach future unseen customer recipes with deeper understanding. This approach is
如果我们希望厨师在特定的烹饪⻛格(知识领域)中出⾊,这很好。
perfect if we want the chef to excel in specific cuisines (knowledge domains). This is fine-
基于调优的学习。
tuning based learning.
这些⽅法在速度⽅⾯各有独特的优缺点
Each of these approaches offers unique advantages and disadvantages in terms of speed,
成本和延迟。然⽽,通过在代理框架中结合这些技术,我们可以
cost, and latency. However, by combining these techniques in an agent framework, we can
利⽤各种优势,最⼩化其劣势,从⽽实现更强⼤的和
leverage the various strengths and minimize their weaknesses, allowing for a more robust and
适应性解决⽅案。
adaptable solution.
2024 年 9 ⽉ 2024
September 34
34
Agents
The tools we are using are the SerpAPI (for Google Search) and the Google Places API. After
executing our program in Snippet 8, you can see the sample output in Snippet 9.
September 2024 35
代理⼈
Agents
LangChain 代理快速⼊⻔
Agent quick start with LangChain
为了提供⼀个真实可执⾏的代理实例,我们将快速构建⼀个
In order to provide a real-world executable example of an agent in action, we’ll build a quick
使⽤ LangChain
prototype 和 LangGraph
with the LangChain库的原型。这些流⾏的开源库
and LangGraph libraries. These popular open source libraries
允许⽤户通过“链接”逻辑和推理的序列来构建客户代理
allow users to build customer agents by “chaining” together sequences of logic, reasoning,
和⼯具调⽤来回答⽤户的查询。我们将使⽤我们的 use our gemini-1.5-flash-001
and tool calls to answer a user’s query. We’ll gemini-1.5-flash-001 模型和 model and
⼀些简单⼯具,⽤于回答⽤户的多阶段查询,如在⽚段
some simple tools to answer a multi-stage query 8from
中所示。
the user as seen in Snippet 8.
我们使⽤的⼯具是
The tools we areSerpAPI(⽤于
using are the Google
SerpAPI搜索)和 Google
(for Google Placesand
Search) API。之后
the Google Places API. After
在执⾏我们在代码⽚段 8 中的程序时,您可以在代码⽚段
executing our program 9 sample
in Snippet 8, you can see the 中看到示例输出。
output in Snippet 9.
2024 年 9 ⽉ 2024
September 35
35
Agents
Python
os.environ["SERPAPI_API_KEY"] = "XXXXX"
os.environ["GPLACES_API_KEY"] = "XXXXX"
@tool
def search(query: str):
"""Use the SerpAPI to run a Google Search."""
search = SerpAPIWrapper()
return search.run(query)
@tool
def places(query: str):
"""Use the Google Places API to run a Google Places Query."""
places = GooglePlacesTool()
return places.run(query)
model = ChatVertexAI(model="gemini-1.5-flash-001")
tools = [search, places]
query = "Who did the Texas Longhorns play in football last week? What is the
address of the other team's stadium?"
September 2024 36
代理⼈
Agents
Python
Python
from
from langgraph.prebuilt
langgraph.prebuiltimport create_react_agent
import from
create_react_agent
langchain_core.tools import tool from
from langchain_core.tools import tool
langchain_community.utilities import SerpAPIWrapper from
from langchain_community.utilities import SerpAPIWrapper
langchain_community.tools import GooglePlacesTool
from langchain_community.tools import GooglePlacesTool
os.environ["SERPAPI_API_KEY"]
os.environ["SERPAPI_API_KEY"] = "XXXXX"
= "XXXXX"
os.environ["GPLACES_API_KEY"]
os.environ["GPLACES_API_KEY"] = "XXXXX"
= "XXXXX"
@tool def places(query: str): """使⽤ Google Places API 运⾏ Google Places
@tool
查询。""" places = GooglePlacesTool()
def places(query: str): return places.run(query)
"""Use the Google Places API to run a Google Places Query."""
places = GooglePlacesTool()
return places.run(query)
model ==ChatVertexAI(model="gemini-1.5-flash-001")
model ChatVertexAI(model="gemini-1.5-flash-001")tools
=tools
[search, places]
= [search, places]
query =="德克萨斯⻓⻆⽜队上周踢了哪⽀球队的⾜球?另⼀⽀球队的体育场地址是什么?"
query "Who did the Texas Longhorns play in football last week? What is the
address of the other team's stadium?"
agent
agent ==create_react_agent(model,
create_react_agent(model,tools)
tools)
input
input =={"messages":
{"messages":[("human", query)]}
[("human", query)]}
for ss 在
对于 inagent.stream(input,
agent.stream(input, stream_mode="values"):
stream_mode="values") 中:
message = s["messages"][-1]
message = s["messages"][-1]
if isinstance(message,
如果 tuple):
isinstance(message, tuple):
print(message) else:
print(message)
message.pretty_print()
else:
message.pretty_print()
⽚段 8. 基于
Snippet 8. LangChain 和 LangGraph
Sample LangChain and的代理示例与⼯具
LangGraph based agent with tools
2024 年 9 ⽉ 2024
September 36
36
Agents
Unset
While this is a fairly simple agent example, it demonstrates the foundational components
of Model, Orchestration, and tools all working together to achieve a specific goal. In the
final section, we’ll explore how these components come together in Google-scale managed
products like Vertex AI agents and Generative Playbooks.
September 2024 37
代理⼈
Agents
未设置
Unset
⽚段 9. 来⾃我们在⽚段
Snippet 8 中程序的输出
9. Output from our program in Snippet 8
虽然这是⼀个相当简单的代理示例,但它展示了基础组件
While this is a fairly simple agent example, it demonstrates the foundational components
模型、编排和⼯具共同协作以实现特定⽬标。 在
of Model, Orchestration, and tools all working together to achieve a specific goal. In the
最后⼀部分,我们将探讨这些组件如何在
final section, we’ll explore how theseGoogle 规模的管理中结合在⼀起
components come together in Google-scale managed
像 Vertex AI
products like代理和⽣成性剧本这样的产品。
Vertex AI agents and Generative Playbooks.
2024 年 9 ⽉ 2024
September 37
37
Agents
In Figure 15 we’ve provided a sample architecture of an agent that was built on the Vertex
AI platform using various features such as Vertex Agent Builder, Vertex Extensions, Vertex
Function Calling and Vertex Example Store to name a few. The architecture includes many of
the various components necessary for a production ready application.
September 2024 38
代理⼈
Agents
使⽤ Vertex 的⽣产应⽤程序
Production applications with Vertex
AI
AI 代理
agents
虽然这份⽩⽪书探讨了代理的核⼼组件,但构建⽣产级
While this whitepaper explored the core components of agents, building production-grade
应⽤程序需要将它们与额外的⼯具集成,如⽤户界⾯、评估
applications requires integrating them with additional tools like user interfaces, evaluation
框架和持续改进机制。Google 的improvement
frameworks, and continuous Vertex AI 平台mechanisms. Google’s Vertex AI platform
通过提供⼀个完全托管的环境,简化了这个过程,包含所有基本要素
simplifies this process by offering a fully managed environment with all the fundamental
之前提到的元素。通过⾃然语⾔接⼝,开发⼈员可以快速
elements covered earlier. Using a natural language interface, developers can rapidly
定义其代理的关键要素 - ⽬标、任务指令、⼯具、任务的⼦代理
define crucial elements of their agents - goals, task instructions, tools, sub-agents for task
委托和示例
delegation,- and
以便轻松构建所需的系统⾏为。此外,
examples - to easily construct the desired system behavior. In addition, the
该平台配备了⼀套开发⼯具,允许进⾏测试、评估和测量
platform comes with a set of development tools that allow for testing, evaluation, measuring
代理性能、调试和提⾼开发代理的整体质量。这个
agent performance, debugging, and improving the overall quality of developed agents. This
允许开发者专注于构建和完善他们的代理,同时处理复杂性
allows developers to focus on building and refining their agents while the complexities of
基础设施、部署和维护由平台本身管理。
infrastructure, deployment and maintenance are managed by the platform itself.
在图 15 中,我们提供了⼀个基于
In Figure 15 we’ve provided a Vertex
sample构建的代理的示例架构
architecture of an agent that was built on the Vertex
AI
AI 平台使⽤各种功能,如 Vertex
platform using various Agentsuch
features Builder、Vertex Extensions、Vertex
as Vertex Agent Builder, Vertex Extensions, Vertex
函数调⽤和顶点示例存储等。该架构包括许多
Function Calling and Vertex Example Store to name a few. The architecture includes many of
⽣产就绪应⽤程序所需的各种组件。
the various components necessary for a production ready application.
2024 年 9 ⽉ 2024
September 38
38
Agents
You can try a sample of this prebuilt agent architecture from our official documentation.
September 2024 39
代理⼈
Agents
图 15. 基于
Figure 15.Vertex
SampleAI 平台构建的端到端代理架构示例
end-to-end agent architecture built on Vertex AI platform
您可以从我们的官⽅⽂档中尝试这个预构建代理架构的示例。
You can try a sample of this prebuilt agent architecture from our official documentation.
2024 年 9 ⽉ 2024
September 39
39
Agents
Summary
In this whitepaper we’ve discussed the foundational building blocks of Generative AI
agents, their compositions, and effective ways to implement them in the form of cognitive
architectures. Some key takeaways from this whitepaper include:
1. Agents extend the capabilities of language models by leveraging tools to access real-
time information, suggest real-world actions, and plan and execute complex tasks
autonomously. agents can leverage one or more language models to decide when and
how to transition through states and use external tools to complete any number of
complex tasks that would be difficult or impossible for the model to complete on its own.
2. At the heart of an agent’s operation is the orchestration layer, a cognitive architecture that
structures reasoning, planning, decision-making and guides its actions. Various reasoning
techniques such as ReAct, Chain-of-Thought, and Tree-of-Thoughts, provide a framework
for the orchestration layer to take in information, perform internal reasoning, and generate
informed decisions or responses.
3. Tools, such as Extensions, Functions, and Data Stores, serve as the keys to the outside
world for agents, allowing them to interact with external systems and access knowledge
beyond their training data. Extensions provide a bridge between agents and external APIs,
enabling the execution of API calls and retrieval of real-time information. functions provide
a more nuanced control for the developer through the division of labor, allowing agents
to generate Function parameters which can be executed client-side. Data Stores provide
agents with access to structured or unstructured data, enabling data-driven applications.
The future of agents holds exciting advancements and we’ve only begun to scratch the
surface of what is possible. As tools become more sophisticated and reasoning capabilities
are enhanced, agents will be empowered to solve increasingly complex problems.
Furthermore, the strategic approach of ‘agent chaining’ will continue to gain momentum. By
September 2024 40
代理⼈
Agents
摘要
Summary
在本⽩⽪书中,我们讨论了⽣成性⼈⼯智能的基础构建模块
In this whitepaper we’ve discussed the foundational building blocks of Generative AI
代理、它们的组成以及以认知形式有效实施它们的⽅法
agents, their compositions, and effective ways to implement them in the form of cognitive
架构。该⽩⽪书的⼀些关键要点包括:
architectures. Some key takeaways from this whitepaper include:
1. 代理通过利⽤⼯具访问真实的语⾔模型能⼒
1. Agents extend the capabilities of language models by leveraging tools to access real-
时间信息,建议现实世界的⾏动,并计划和执⾏复杂任务
time information, suggest real-world actions, and plan and execute complex tasks
⾃主地。代理可以利⽤⼀个或多个语⾔模型来决定何时和
autonomously. agents can leverage one or more language models to decide when and
如何在状态之间过渡并使⽤外部⼯具完成任意数量的
how to transition through states and use external tools to complete any number of
复杂的任务对于模型来说,单独完成将是困难或不可能的。
complex tasks that would be difficult or impossible for the model to complete on its own.
2. 在代理操作的核⼼是编排层,这是⼀种认知架构
2. At the heart of an agent’s operation is the orchestration layer, a cognitive architecture that
结构推理、规划、决策并指导其⾏动。各种推理
structures reasoning, planning, decision-making and guides its actions. Various reasoning
技术如 ReAct、Chain-of-Thought
techniques 和 Tree-of-Thoughts,提供了⼀个框架
such as ReAct, Chain-of-Thought, and Tree-of-Thoughts, provide a framework
为了让编排层接收信息、进⾏内部推理并⽣成
for the orchestration layer to take in information, perform internal reasoning, and generate
知情的决策或回应。
informed decisions or responses.
3. ⼯具,如扩展、函数和数据存储,作为通往外部的钥匙
3. Tools, such as Extensions, Functions, and Data Stores, serve as the keys to the outside
为代理提供的世界,使它们能够与外部系统互动并获取知识
world for agents, allowing them to interact with external systems and access knowledge
超出他们的训练数据。扩展提供了代理与外部 API 之间的桥梁,
beyond their training data. Extensions provide a bridge between agents and external APIs,
启⽤ API 调⽤的执⾏和实时信息的检索。功能提供
enabling the execution of API calls and retrieval of real-time information. functions provide
通过分⼯为开发者提供更细致的控制,允许代理
a more nuanced control for the developer through the division of labor, allowing agents
⽣成可以在客户端执⾏的函数参数。数据存储提供
to generate Function parameters which can be executed client-side. Data Stores provide
具有访问结构化或⾮结构化数据的代理,⽀持数据驱动的应⽤程序。
agents with access to structured or unstructured data, enabling data-driven applications.
代理的未来充满了令⼈兴奋的进展,我们才刚刚开始探索
The future of agents holds exciting advancements and we’ve only begun to scratch the
可能性的表⾯。随着⼯具变得更加复杂和推理能⼒
surface of what is possible. As tools become more sophisticated and reasoning capabilities
将增强,代理将被赋予解决⽇益复杂问题的能⼒。
are enhanced, agents will be empowered to solve increasingly complex problems.
此外,“代理链”的战略⽅法将继续获得动⼒。通过
Furthermore, the strategic approach of ‘agent chaining’ will continue to gain momentum. By
2024 年 9 ⽉ 2024
September 40
40
Agents
combining specialized agents - each excelling in a particular domain or task - we can create
a ‘mixture of agent experts’ approach, capable of delivering exceptional results across
various industries and problem areas.
It’s important to remember that building complex agent architectures demands an iterative
approach. Experimentation and refinement are key to finding solutions for specific business
cases and organizational needs. No two agents are created alike due to the generative nature
of the foundational models that underpin their architecture. However, by harnessing the
strengths of each of these foundational components, we can create impactful applications
that extend the capabilities of language models and drive real-world value.
September 2024 41
代理⼈
Agents
通过结合专⻔的代理 - 每个代理在特定领域或任务中表现出⾊
combining specialized - 我们可以创建
agents - each excelling in a particular domain or task - we can create
⼀种“代理专家混合”⽅法,能够在各个领域提供卓越的成果
a ‘mixture of agent experts’ approach, capable of delivering exceptional results across
各个⾏业和问题领域。
various industries and problem areas.
重要的是要记住,构建复杂的代理架构需要⼀个迭代的过程
It’s important to remember that building complex agent architectures demands an iterative
⽅法。实验和改进是寻找特定业务解决⽅案的关键。
approach. Experimentation and refinement are key to finding solutions for specific business
案例和组织需求。由于⽣成性质,没有两个代理是相同的。
cases and organizational needs. No two agents are created alike due to the generative nature
⽀撑其架构的基础模型。然⽽,通过利⽤
of the foundational models that underpin their architecture. However, by harnessing the
这些基础组件的优势,我们可以创建有影响⼒的应⽤程序
strengths of each of these foundational components, we can create impactful applications
扩展语⾔模型的能⼒并带来实际价值。
that extend the capabilities of language models and drive real-world value.
2024 年 9 ⽉ 2024
September 41
41
Agents
Endnotes
1. Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at:
https://arxiv.org/abs/2210.03629
2. Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'.
Available at: https://arxiv.org/pdf/2201.11903.pdf.
3. Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'.
Available at: https://arxiv.org/abs/2203.11171.
4. Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at:
https://arxiv.org/pdf/2302.12246.pdf.
5. Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at:
https://arxiv.org/abs/2302.00923.
6. Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at:
https://arxiv.org/abs/2305.10601.
7. Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Available at:
https://arxiv.org/abs/2305.08291.
10. Xie, M., 2022, 'How does in-context learning work? A framework for understanding the differences from
traditional supervised learning'. Available at: https://ai.stanford.edu/blog/understanding-incontext/.
September 2024 42
代理⼈
Agents
尾注
Endnotes
1.
1. Shafran,
Shafran,I.,I.,Cao,
Cao,Y. Y.
等,et2022, 'ReAct:
al., 2022, 在语⾔模型中协同推理与⾏动'.
'ReAct: 可在以下⽹址获取:
Synergizing Reasoning and Acting in Language Models'. Available at:
https://arxiv.org/abs/2210.03629
https://arxiv.org/abs/2210.03629
2. 魏,
2. J., J.,
Wei, 王,Wang,
X. 等, 2023,
X. et '链式思维提示引发⼤型语⾔模型的推理'.
al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'.
可在以下地址获取: https://arxiv.org/pdf/2201.11903.pdf。
Available at: https://arxiv.org/pdf/2201.11903.pdf.
3. 王,
3. X. 等,X.2022,
Wang, '⾃洽性提⾼了语⾔模型中的思维链推理'。
et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'.
可在以下⽹址获取: https://arxiv.org/abs/2203.11171。
Available at: https://arxiv.org/abs/2203.11171.
4. Diao,
4. Diao,S.S.等,
et2023, '针对⼤型语⾔模型的链式思维主动提示'.
al., 2023, 可在以下⽹址获取:
'Active Prompting with Chain-of-Thought for Large Language Models'. Available at:
https://arxiv.org/pdf/2302.12246.pdf.
https://arxiv.org/pdf/2302.12246.pdf.
5. Zhang,
5. Zhang,H.H.等,et2023, '语⾔模型中的多模态思维链推理'.
al., 2023, 可在以下⽹址获取:
'Multimodal Chain-of-Thought Reasoning https://arxiv.org/abs/2302.00923.
in Language Models'. Available at:
https://arxiv.org/abs/2302.00923.
6. 姚,
6. Yao,S. S.
等, et
2023, '思维树:使⽤⼤型语⾔模型进⾏深思熟虑的问题解决'。可在以下⽹址获取:
al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at:
https://arxiv.org/abs/2305.10601。
https://arxiv.org/abs/2305.10601.
7. Long,
7. Long,X.,X.,2023,
2023,'⼤型语⾔模型引导的思维树'.
'Large Language Model 可在以下⽹址获取:
Guided Tree-of-Thought'. Available at:
https://arxiv.org/abs/2305.08291.
https://arxiv.org/abs/2305.08291.
8. ⾕歌。
8. 'Google
Google. Gemini
'Google 应⽤程序'。
Gemini 可在:http://gemini.google.com
Application'. 获取。
Available at: http://gemini.google.com.
9. Swagger.
9. Swagger.'OpenAPI
'OpenAPI规范'. 可在以下⽹址获取:
Specification'. https://swagger.io/specification/.
Available at: https://swagger.io/specification/.
10. 谢,M.,2022
10. Xie, M., 2022,年,'上下⽂学习是如何⼯作的?理解与传统监督学习的差异的框架'。可在以下⽹址获取:
'How does in-context learning work? A framework for understanding the differences from
https://ai.stanford.edu/blog/understanding-incontext/.
traditional supervised learning'. Available at: https://ai.stanford.edu/blog/understanding-incontext/.
11. ⾕歌研究。'ScaNN(可扩展最近邻)'。可在以下⽹址获取:
11. Google Research. 'ScaNN (Scalable Nearest Neighbors)'. Available at:
https://github.com/google-research/google-research/tree/master/scann.
https://github.com/google-research/google-research/tree/master/scann.
12. LangChain.
12. LangChain.'LangChain'. 可在以下⽹址获取:
'LangChain'. https://python.langchain.com/v0.2/docs/introduction/.
Available at: https://python.langchain.com/v0.2/docs/introduction/.
2024 年 9 ⽉ 2024
September 42
42