Principles of Building A I Agents
Principles of Building A I Agents
of Building
• The key building blocks of agents:
providers, models, prompts, tools, memory
• How to break down complex tasks
with agentic workflows
AI Agents
• Giving agents access to knowledge bases
with RAG (retrieval-augmented generation)
Sam Bhagwat
Cofounder & CEO Mastra.ai
PRINCIPLES OF BUILDING
AI AGENTS
SAM BHAGWAT
FOREWORD
PROMPTING A LARGE
LANGUAGE MODEL (LLM)
1
A
I has been a perennial on-the-horizon
technology for over forty years.
There have been notable advances
over the 2000s and 2010s: chess engines, speech
recognition, self-driving cars.
The bulk of the progress on “generative AI” has
come since 2017, when eight researchers from
Google wrote a paper called “Attention is All You
Need”.
It described an architecture for generating text
where a “large language model” (LLM) was given a
set of “tokens” (words and punctuation) and was
focused on predicting the next “token”.
The next big step forward happened in
November 2022. A chat interface called ChatGPT,
produced by a well-funded startup called OpenAI,
went viral overnight.
4 S A M B H A G WAT
O
ne of the first choices you’ll need to
make building an AI application is
which model to build on. Here are some
considerations:
Hosted vs open-source
Reasoning models
O
ne of the foundational skills in AI
engineering is writing good prompts.
LLMs will follow instructions, if you
know how to specify them well. Here’s a few tips and
techniques that will help:
BUILDING AN AGENT
4
AGENTS 101
Y
ou can use direct LLM calls for one-shot
transformations: “given a video transcript,
write a draft description.”
For ongoing, complex interactions, you typically
need to build an agent on top. Think of agents as AI
employees rather than contractors: they main-
tain context, have specific roles, and can use tools to
accomplish tasks.
Levels of Autonomy
Code Example
I
t’s useful to be able to quickly test and
experiment with different models without
needing to learn multiple provider SDKs. This
is known as model routing.
Here’s a JavaScript example with the AI SDK
library:
18 S A M B H A G WAT
Structured ouput
TOOL CALLING
T
ools are functions that agents can call to
perform specific tasks - whether that's
fetching weather data, querying a data-
base, or processing calculations.
The key to effective tool use is clear communica-
tion with the model about what each tool does and
when to use it.
Here's an example of creating and using a tool:
20 S A M B H A G WAT
Best practices:
AGENT MEMORY
M
emory is crucial for creating agents
that maintain meaningful, contextual
conversations over time. While LLMs
can process individual messages effectively, they
need help managing longer-term context and histor-
ical interactions.
Working memory
Hierarchical memory
GRAPH-BASED
WORKFLOWS
8
WORKFLOWS 101
W
e've seen how individual agents can
work.
At every step, agents have flexi-
bility to call any tool (function).
Sometimes, this is too much freedom.
Graph-based workflows have emerged as a
useful technique for building with LLMs when
agents don’t deliver predictable enough output.
Sometimes, you’ve just gotta break a problem
down, define the decision tree, and have an agent (or
agents) make a few binary decisions instead of one
big decision.
A workflow primitive is helpful for defining
branching logic, parallel execution, checkpoints, and
adding tracing.
Let’s dive in.
9
S
o, what’s the best way to build workflow
graphs?
Let’s walk through the basic operations,
and then we can get to best practices.
Branching
Chaining
Merging
Conditions
S
ometimes workflows need to pause
execution while waiting for a third-party
(like a human-in-the-loop) to provide input.
Because the third party can take arbitrarily long
to respond, you don’t want to keep a running
process.
Instead, you want to persist the state of the work-
flow, and have some a function that you can call to
pick up where you left off.
Let’s diagram out a simple example with Mastra,
which has .suspend()and .resume() functions:
34 S A M B H A G WAT
STREAMING UPDATES
O
ne of the keys to making LLM
applications that feel performant is
letting the user know what’s going on
while the reasoning process is going on.
As an example, my partner and I have for the last
year been (unsuccessfully) trying to take a trip to
Hawaii.
So I recently pulled up two different tabs with
reasoning models — on the left, OpenAI’s o1 pro, on
the right, OpenAI’s Deep Research — and asked
them to plan me a vacation.
36 S A M B H A G WAT
TRACING
B
ecause LLMs are non-deterministic, the
question isn’t whether your application will
go off the rails.
It’s when and how much.
Teams that have shipped agents into production
typically talk about how important it is to look at
production data for every step, of every run, of each
of their workflows.
Agent frameworks like Mastra that let you write
your code as structured workflow graphs will also
emit telemetry that enables this.
We’ll talk about this more later (see the Observ-
ability section) but some quick notes:
RETRIEVAL-AUGMENTED
GENERATION (RAG)
13
RAG 101
R
AG lets agents ingest user data and
synthesize it with their global knowledge
base to give users high quality responses.
Here’s how it works.
Chunking: You start by taking a document
(although we can use other kinds of sources as well)
and chunking it. We want to split the document into
bite-sized pieces for search.
Embedding: After chunking, you’ll want to
embed your data – transform it into a vector, or an
array of 1536 values between 0 and 1, representing
the meaning of the text.
We do this with LLMs, because they make the
embeddings much more accurate; OpenAI has an
API for this, there are other providers like Voyage or
Cohere.
You need to use a vector DB which can store
42 S A M B H A G WAT
O
ne of the biggest questions people
having around RAG is how they should
think of a vector DB.
There are multiple form factors of vector
databases:
Chunking
C
hunking is the process of breaking down
large documents into smaller, manageable
pieces for processing.
The key thing you’ll need to choose here is a
strategy and an overlap window. Good chunking
balances context preservation with retrieval gran-
ularity.
Chunking strategies including recursive, charac-
ter-based, token-aware, and format-specific (Mark-
down, HTML, JSON, LaTeX) splitting. Mastra
supports all of them.
46 S A M B H A G WAT
Embedding
Upsert
Indexing
Querying
Reranking
Code Example
MULTI-AGENT SYSTEMS
16
MULTI-AGENT 101
T
hink about a multi-agent systems like a
specialized team, like marketing or engi-
neering, at a company. Different AI agents
work together, each with their own specialized role,
to ultimately accomplish more complex tasks.
Interestingly, if you’ve used a code-generation
tool like Replit agent that’s deployed in production,
you’ve actually already been using a multi-agent
system.
One agent works with you to plan / architect
your code. After you’ve worked with the agent to
plan it out, you work with a “code manager” agent
that passes instructions to a code writer, then
executes the resulting code in a sandbox and passes
any errors back to the code writer.
Each of these agents has different memories,
54 S A M B H A G WAT
AGENT SUPERVISOR
A
gent supervisors are specialized agents
that coordinate and manage other agents.
The most straightforward way to do this is
to pass in the other agents wrapped as tools.
For example, in a content creation system, a
publisher agent might supervise both a copywriter
and an editor:
18
CONTROL FLOW
W
hen building complex AI applications,
you need a structured way to manage
how agents think and work through
tasks. Just as a project manager wouldn't start coding
without a plan, agents should establish an approach
before diving into execution.
Just like how it’s common practice for PMs to
spec out features, get stakeholder approval, and only
then commission engineering work, you shouldn’t
expect to work with agents without first aligning on
what the desired work is.
We recommend engaging with your agents on
architectural details first — and perhaps adding a
few checkpoints for human feedback in their
workflows.
19
WORKFLOWS AS TOOLS
H
opefully, by now, you’re starting to see
that all multi-agent architecture comes
down to primitives and how they’re
arranged. It’s particularly important to remember
this framing when trying to build more complex
tasks into agents.
Let’s say you want your agent(s) to accomplish 3
separate tasks. You can’t do this easily in a single
LLM call. But you can turn each of those tasks into
individual workflows. There’s more certainty in
doing it this way because you can stipulate a work-
flow’s order of steps and provide more structure.
Each of these workflows can then be passed
along as tools to the agent(s).
20
I
f you’ve played around with code writing tools
like Repl.it and Lovable.dev, you’ll notice that
they have planning agents and a code writing
agent. (And in fact the code writing agent is two
different agents, a reviewer and writer that work
together.)
It’s critical for these tools to have planning agents
if they’re to create any good deliverables for you at
all. The planning agent proposes an architecture for
the app you desire. It asks you how does it sound?
You get to give it feedback until you and the agent
are aligned enough on the plan such that it can pass
it along to the code writing agents.
In this example, agents embody different steps in a
workflow. They are responsible either for planning,
coding, or review and each work in a specific order.
In the previous example, you’ll notice that work-
60 S A M B H A G WAT
EVALS
21
EVALS 101
W
hile traditional software tests have
clear pass/fail conditions, AI outputs
are non-deterministic — they can vary
with the same input. Evals help bridge this gap by
providing quantifiable metrics for measuring agent
quality.
Instead of binary pass/fail results, evals
return scores between 0 and 1.
Think about evals sort of like including, say,
performance testing in your CI pipeline. There’s
going to be some randomness in each result, but on
the whole and over time there should be a correla-
tion between application performance and test
results.
When writing evals, it’s important to think about
what exactly you want to test.
64 S A M B H A G WAT
TEXTUAL EVALS
T
extual evals can feel a bit like a grad
student TA grading your homework with a
rubric. They are going to be a bit pedantic,
but they usually have a point.
Understanding context
Output
Code Example
OTHER EVALS
T
here are a few other types of evals as well.
A/B testing
DEVELOPMENT &
DEPLOYMENT
24
LOCAL DEVELOPMENT
W
hen developing AI applications, it’s
important to see what your agents are
doing, make sure your tools work, and
be able to quickly iterate on your prompts.
Some things that we’ve seen be helpful for a
local agent development:
SERVERLESS DEPLOYMENT
N
obody really wants to manage
infrastructure.
Over the last decade serverless (in the
form of Vercel, Render, AWS Lambda, etc) has
caught on. We don’t want to worry about spiky loads,
scaling up, configuring nginx, running Kubernetes.
But what works well for web request/response
cycles can work less well for long-lived agent calls.
Long-running processes can cause function
timeouts. Bundle sizes are too large. Some serverless
hosts don’t support the full Node.js runtime.
But from a developer perspective: the ideal
runtime for agents and workflows is serverless, and
autoscales based on demand, with state maintained
across agent invocations by storage built into the
platform.
But it can take a lot of work coordinating plat-
78 S A M B H A G WAT
OBSERVABILITY
O
bservability is a word that gets a lot of
airplay, but since its meaning has been
largely diluted and generalized by self-
interested vendors let’s go to the root.
The initial term was popularized by Honey-
comb’s Charity Majors in the late 2010s to describe
the quality of being able to visualize application
traces.
Tracing
Evals
EVERYTHING ELSE
I
f you’re building a real-world system, there are
a large number of considerations that we
haven’t started discussing.
I’ll list them here; we’ll likely expand on them in
future editions: