Introducing Transformers Agents 20
Introducing Transformers Agents 20
0
medium.com/@amanatulla1606/introducing-transformers-agents-2-0-14a5601ade0b
Amanatullah
What is an agent?
Large Language Models (LLMs) can tackle a wide range of tasks, but they often struggle
with specific tasks like logic, calculation, and search. When prompted in these domains in
which they do not perform well, they frequently fail to generate a correct answer.
One approach to overcome this weakness is to create an agent, which is just a program
driven by an LLM. The agent is empowered by tools to help it perform actions. When the
agent needs a specific skill to solve a particular problem, it relies on an appropriate tool
from its toolbox.
Thus when during problem-solving the agent needs a specific skill, it can just rely on an
appropriate tool from its toolbox.
1/8
Experimentally, agent frameworks generally work very well, achieving state-of-the-art
performance on several benchmarks.
Clarity through simplicity: reduce abstractions to the minimum. Simple error logs
and accessible attributes let you easily inspect what’s happening and give you more
clarity.
Modularity: prefer to propose building blocks rather than full, complex feature sets.
You are free to choose whatever building blocks are best for your project.
For instance, since any agent system is just a vehicle powered by an LLM engine,
they decided to conceptually separate the two, which lets you create any agent type
from any underlying LLM.
On top of that, they have sharing features that let you build on the shoulders of giants!
Main elements
Tool: this is the class that lets you use a tool or implement a new one. It is
composed mainly of a callable forward method that executes the tool action, and a
set of a few essential attributes: name, descriptions, inputs and output_type.
These attributes are used to dynamically generate a usage manual for the tool and
insert it into the LLM’s prompt.
Toolbox: It's a set of tools that are provided to an agent as resources to solve a
particular task. For performance reasons, tools in a toolbox are already instantiated
and ready to go. This is because some tools take time to initialize, so it’s usually
better to re-use an existing toolbox and just swap one tool, rather than re-building a
set of tools from scratch at each agent initialization.
CodeAgent: a very simple agent that generates its actions as one single blob of
Python code. It will not be able to iterate on previous observations.
ReactAgent: ReAct agents follow a cycle of Thought ⇒ Action ⇒ Observation until
they’ve solve the task. We propose two classes of ReactAgent:
ReactCodeAgent generates its actions as python blobs.
ReactJsonAgent generates its actions as JSON blobs.
In essence, what an agent does is “allowing an LLM to use tools”. Agents have a key
agent.run() method that:
2/8
Provides information about tool usage to your LLM in a specific prompt. This way,
the LLM can select tools to run to solve the task.
Parses the tool calls from the LLM output (can be via code, JSON format, or any
other format).
Executes the calls.
If the agent is designed to iterate on previous outputs, it keeps a memory with
previous tool calls and observations. This memory can be more or less fine-grained
depending on how long-term you want it to be.
pip install
Self-correcting Retrieval-Augmented-Generation
Quick definition: Retrieval-Augmented-Generation (RAG) is “using an LLM to answer a
user query, but basing the answer on information retrieved from a knowledge base”. It has
many advantages over using a vanilla or fine-tuned LLM: to name a few, it allows to
ground the answer on true facts and reduce confabulations, it allows to provide the LLM
with domain-specific knowledge, and it allows fine-grained control of access to
information from the knowledge base.
3/8
Let’s say we want to perform RAG, and some parameters must be dynamically
generated. For example, depending on the user query we could want to restrict the
search to specific subsets of the knowledge base, or we could want to adjust the number
of documents retrieved. The difficulty is: how to dynamically adjust these parameters
based on the user query?
We first load a knowledge base on which we want to perform RAG: this dataset is a
compilation of the documentation pages for many huggingface packages, stored as
markdown.
Now we prepare the knowledge base by processing the dataset and storing it into a
vector database to be used by the retriever. We are going to use LangChain, since it
features excellent utilities for vector databases:
source_docs = [
Document(
page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]}
) for doc in knowledge_base
]
docs_processed =
RecursiveCharacterTextSplitter(chunk_size=500).split_documents(source_docs)[:1000]
embedding_model = HuggingFaceEmbeddings(model_name=)vectordb =
FAISS.from_documents( documents=docs_processed, embedding=embedding_model)
Now that we have the database ready, let’s build a RAG system that answers user
queries based on it!
We want our system to select only from the most relevant sources of information,
depending on the query.
4/8
Our documentation pages come from the following sources:
👉 Let us build our RAG system as an agent that will be free to choose its sources!
We create a retriever tool that the agent can call with the parameters of its choice:
5/8
import json
from transformers.agents import Tool
from langchain_core.vectorstores import VectorStore
classRetrieverTool(Tool):
name = "retriever"
description = "Retrieves some documents from the knowledge base that have the
closest embeddings to the input query."
inputs = {
"query": {
"type": "text",
"description": "The query to perform. This should be semantically close to your
target documents. Use the affirmative form rather than a question.",
},
"source": {
"type": "text",
"description": ""
},
}
output_type = "text"
if source:
ifisinstance(source, str) and"["notinstr(source): # if the source is not
representing a list
source = [source]
source = json.loads(str(source).replace("'", '"'))
6/8
tools: a list of tools that the agent will be able to call.
llm_engine: the LLM that powers the agent.
Our llm_engine must be a callable that takes as input a list of messages and returns text.
It also needs to accept a stop_sequences argument that indicates when to stop its
generation. For convenience, we directly use the HfEngine class provided in the package
to get a LLM engine that calls our Inference API.
llm_engine = HfEngine("meta-llama/Meta-Llama-3-70B-Instruct")
agent = ReactJsonAgent(
tools=[RetrieverTool(vectordb, all_sources)],
llm_engine=llm_engine
)
()(agent_output)
Then when its .run() method is launched, the agent takes care of calling the LLM
engine, parsing the tool call JSON blobs and executing these tool calls, all in a loop that
ends only when the final answer is provided.
Web browsing requires diving deeper into subpages and scrolling through lots of text
tokens that will not be necessary for the higher-level task-solving. We assign the web-
browsing sub-tasks to a specialized web surfer agent. We provide it with some tools to
browse the web and a specific prompt (check the repo to find specific implementations).
Defining these tools is outside the scope of this post: but you can check the repository to
find specific implementations.
7/8
from transformers.agents import ReactJsonAgent, HfEngine
WEB_TOOLS = [
SearchInformationTool(),
NavigationalSearchTool(),
VisitTool(),
DownloadTool(),
PageUpTool(),
PageDownTool(),
FinderTool(),
FindNextTool(),
]
websurfer_llm_engine = HfEngine(
model="CohereForAI/c4ai-command-r-plus"
) # We choose Command-R+ for its high context length
To allow this agent to be called by a higher-level task solving agent, we can simply
encapsulate it in another tool:
classSearchTool(Tool):
name = "ask_search_agent"
description = "A search agent that will browse the internet to answer a
question. Use it to gather informations, not for problem-solving."
inputs = {
"question": {
"description": "Your question, as a natural language sentence. You are talking to
an agent, so provide them with as much context as possible.",
"type": "text",
}
}
output_type = "text"
() -> : websurfer_agent.run(question)
8/8