0% found this document useful (0 votes)
100 views33 pages

Akka Infoq Agentic Ai Design Patterns

The document discusses the evolution and challenges of agentic AI systems, highlighting the importance of reliable outcomes from distributed AI agents despite the inherent unpredictability of large language models (LLMs). It outlines the need for effective system engineering practices to address issues of uncertainty, privacy, and scalability in deploying AI applications. The agenda includes topics on agentic AI, system engineering challenges, and next steps for development and implementation.

Uploaded by

abulaala360
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views33 pages

Akka Infoq Agentic Ai Design Patterns

The document discusses the evolution and challenges of agentic AI systems, highlighting the importance of reliable outcomes from distributed AI agents despite the inherent unpredictability of large language models (LLMs). It outlines the need for effective system engineering practices to address issues of uncertainty, privacy, and scalability in deploying AI applications. The agenda includes topics on agentic AI, system engineering challenges, and next steps for development and implementation.

Uploaded by

abulaala360
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Design Patterns for TYLER JEWELL

Agentic AI: CEO @ Akka

Building Scalable, Event-Driven Systems

May 1, 2025

Moderated by: Erik Costlow, InfoQ Editor


RICHARD LI

Founder of Amorphous
Data
Today’s agenda

LLMs, agents, agentic systems


01

System engineering challenges


02

System engineering practices


03

Resources and next steps


04

Q&A
05

2
Poll question

3
Agentic is real, but…there is a lot to learn
Visit akka.io

The basics: What is agentic AI?


User stories: Agentic AI customer stories
Webinar: A blueprint for agentic AI services
Samples Production-ready agents
Blogs: Agentic AI blogs
News: Akka launches new deployment options for agentic AI at scale
Get Started: Develop your own agentic app

4
Agents and agentic systems are
distributed systems, powered by AI
…that must deliver reliable outcomes
…while depending upon unreliable LLMs.

5
AI Agency
Capacity to make meaning from your environment

RPA Agents Agentic Humans


➔ low autonomy ➔ partial autonomy ➔ high autonomy
➔ coded ➔ LLM advice ➔ distributed
decisions ➔ human guidance decisions
➔ human control ➔ group
coordination
Low Agency High Agency

static adaptive
“A big gap exists between current
LLM-based assistants and full-fledged
reactive AI agents, but this gap will close as proactive
we learn how to build, govern and
trust agentic AI solutions.”
tasks goals
–Gartner
supervised autonomous

economic productivity
cost
A paradigm shift to AI-fueled app ecosystems
AI agents and apps become part of a symbiotic existence
By 2028, 33% of enterprise software applications will include agentic AI, up from less than 1% in 2024.
Gartner, TSP 2025 Trends: Agentic AI — The Evolution of Experience, 24 February 2025

AI agents personalize
App Enhanced User
Experience
interactions to increase
satisfaction
Ecosystem
Cloud-native
AI agents automate routine
Applications Operational
tasks to allow humans to focus
Efficiency
on strategic initiatives
+
Agentic AI Scalability
AI-driven SaaS adapt to
business needs without
Services proportional increases in cost

7
LLM-powered app services are intelligent
Models can be prompted to perform a range of user & system tasks
Input Response
LLM automation varies SaaS app use cases and
by data type behavior

audio / video LLM interpret / summarize / analyze

App Ecosystem metrics ML trend projection

questions LLM answers

Cloud-native state / data changes LLM recommendations

Applications template LLM populate fields

document LLM validate against a schema


+ LLM personalization
user behavior

Agentic AI parameters LLM routing decision (path a or b?)

Services functions LLM tool selection + invocation

algorithm description LLM compilable, runnable code


Rethinking how your system makes decisions
Solve problems where deterministic and rule-based approaches fall short

Multi-faceted Workflows involving judgement,


decision making exceptions or context-sensitive decisions,
for example when to escalate a support
ticket

Constantly Systems whose rulesets frequently


changing rules change, have extensive conditions, or
burdensome to maintain, such as
identifying inappropriate language

Reliance on Extracting meaning from content,


unstructured interpreting language, audio or images,
data and conversational responses, such as
with a support chatbot
From LLMs to Agentic Systems
Agents give structure to LLMs; agentic systems give scale to agents

chunked
response
Stateless, long-running, computationally
LLM client LLM intensive resources that can analyze,
prompt reason, and plan

LLM

Tools, APIs Structured enrichment loop that builds


Agent agent Vector DB context, invokes tools, takes action, and
Memory
gathers human feedback
Humans

Agentic agent agent


Networks of multiple agents orchestrated
System multi-agent
protocol to solve complex tasks
Patterns for agentic systems create intelligence
Agent collaboration enables reliable, goal-driven reasoning
pass
sub-agent out Tasks that can easily be decomposed to subtasks:
prompt chaining agent 1 check
exit e.g. write a blog then translate to French.
fail

task agent 1
routing
classifier
which
An LLM router classifies a task for routing to an LLM specialist:
agent out
task agent 2 e.g. classify this support call as either sales or technical

task agent 1 LLM subtasks can be divided for speed or multiple runs:
parallelization blast vote out
task agent 2 e.g. execute security tests from different povs, with success voting

orchestrator
task agent 1 An orchestrator LLM breaks down tasks not known in advance:
synthesizer agent
combine out
task agent 2 e.g. gathering information from targets identified by orchestrator LLM

solution

generator evaluator One LLM generates a response while another provides feedback:
evaluator-optimizer out
agent agent e.g. a translation LLM that has nuance checking from evaluator LLM
accepted
feedback

action
Create and execute a complex plan while staying “grounded” with feedback:
service agent LLM loop environment out e.g. create a travel itinerary and book all reservations for a vacation
stop
feedback
Multi-agent systems are orchestrated
Traceable, auditable, debuggable, with point-in-time recovery
Agentic systems are workflows
reliable execution of AI tasks with visibility into request /
response data, built-in retries, and error compensation
agent
workflow
sequential 1. Pick locale 2. Share dates 3. Travel plan

flight search
parallel done build itinerary
hotel search
monitor

timeout rain recommend


event-driven forecasted
indoor
activities
trigger storage

policies rules rules rules

domain logic state machine fave


budget itinerary
activities
plug-ins

dates

proposed adjust
human-in-the-loop itinerary budget

plan
Single-agent enrichment loop
Prompt → retrieve → enrich → repeat is a repetitive cycle & pattern
agentic enrichment loop
in
prompt
agent LLM memory tool response
out
vector DB, context DB MCP, APIs

1. augment save prompt gather env or


prompt human feedback
to send back to the
call LLM agent
2. initial LLM
pass
call tool, please

save response
API,
functions, or
call tool programs
3. call tool to
take action
results
4. add results update prompt w/ results
to prompt

call LLM again


5. augmented
LLM call
more results

save response

repeat, if multiple
tools called
LLMs are stateless. Context is assembled.
Agentic services augment prompts with data from many sources.

retrieval type for available in

vector db Facts, knowledge, and related


agentic service contains knowledge, facts information about various
semantic search places that one can travel to
prompt
Given my travel history and the
weather, what is the best location
for me to travel. Book the same this interaction’s history Here is the hotel that I
type of hotel we discussed in our working memory recommend, can I have your
memory db permission to book it?
previous interaction, but do not
book it before getting my
permission.
Cancel the reservation if the In the last conversation we had,
weather changes.
long-term memory a 5 star hotel was
Use the Weather Underground previous interactions recommended by the LLM.
API to gather weather info.
external
API Instruct the agent to call the
calling APIs or code Weather Underground API
retrieve & augment tools with specific parameters and
location.

event stream
updating context There is a storm alert for the
events Bahamas next week.
Agentic systems are distributed systems
Architectural techniques and practices required for scale and resilience

chunked
response ➔ Async, non-blocking invocation
LLM client LLM ➔ Event-based, streaming responses
prompt
➔ Backpressure

LLM
➔ Event-driven architecture
➔ Human-in-the-loop interaction
Tools, APIs
➔ Streaming real-time ingest
Agent agent Vector DB ➔ Retries, circuit breakers, timeouts
Memory ➔ Memory & tool integration
➔ CQRS
Humans
➔ Replication and failover

➔ Durable workflows
Agentic agent agent
➔ Distributed tracing
System multi-agent
protocol ➔ Discovery & mesh networking
➔ Multi-agent protocols: A2A, BeeAI
Poll question

16
Today’s agenda

LLM, agents, agentic systems


01

System engineering challenges


02

System engineering practices


03

Resources and next steps


04

Q&A
05

17
Bumpy path from POC to production
Top three enterprise challenges: uncertainty, privacy, and scale

52% 8+ months
fail to reach POC to
production production

“Leaders reported that only 48% of AI


POCs (Proof Of Concept) make it into
production, and they take an average of
8.2 months to go from POC to production.”
Uncertainty: From deterministic to stochastic
Randomness cannot be eliminated and must be embraced

LLMs are not deterministic components Scaling makes it even harder


● Same prompt ≠ same output. ● LLMs are slow, expensive, and limited by token windows.
● They predict tokens, not answers. ● You need streaming, chunking, caching, windowing, reranking, fallback.
● You don’t pass parameters – you design prompts. ● You’re not calling a model – your orchestrating a distributed system.
● Hard to predict outputs, validate correctness, or reproduce behavior. ● Cough, cough - why Akka :)

Prompting isn’t programming Debugging is a black box


● No function signatures. No modular reuse. ● No stack traces.
● Tiny prompt changes can break results. ● No explanations.
● Long prompts increase latency. ● Logs give you input/output, not reasons.
● And prompts don’t always work the same across workflows or chains. ● Prompt tweaks can cause side effects far from where you made change.

Retrieval adds more uncertainty Expectations ≠ reality


● In RAG, you’re combining semantic search, reranking, and formatting. ● People expect memory, perfect instructions, stable outputs, and truth.
● Each step adds noise. ● LLMs forget, hallucinate, and drift based on sampling.
● Generating over possibly irrelevant context. ● Without scaffolding, they will be (and feel) brittle and inconsistent.
● Now the system is doubly stochastic: retrieval + generation.

Testing LLMs isn’t straightforward If you’re looking for vibes, it will be short lived
● There’s no .assertEqual(). ● LLMs are probabilistic pattern matchers - not deterministic APIs.
● Heuristic metrics are flawed. ● Building with them means thinking in systems, not functions.
● Human evals are expensive and inconsistent. ● It means controlling chaos, not eliminating it.
● Even stable outputs might still be wrong.
Privacy and compliance horror show
LLMs are leaky sieves creating numerous holes for security to plug

LLMs memorize or reproduce PII


Establish clear security &
from training data compliance guidelines
Private or proprietary data Enable enterprise-grade data
mishandled by agents
controls
Training data lacking explicit consent
from individuals Implement agentic service
interaction and logging policies
Insecure data handling in APIs
Choose agentic platform with
Anonymized data can be
reconstructed by LLMs
tracing and reasoning auditing
Regulations with data deletion Implement risk mitigation with
requirements are not LLM-enforced content filtering
LLM decisions using personal data
can introduce bias and ethics issues Implement agent identity with
roles and permissions
LLMs may expose PII by integrating
with unapproved systems
Memory hardening with trust
controls and min-access-policy
Enterprise agentic scale requires efficiency
More txs: each slower, less predictable and more costly

SaaS Agentic

Users billions 20x

TPS 10,000 100x

p(99) Latency 10-80ms 15-400x

Cost / LLM tx cheap 10-10,000x


Mar 25: the best performing LLM @ 86% MMLU accuracy costs $98 / 1M tokens, or ~850,000x more expensive
than the average database transaction. The worst performing LLM @ 36% MMLU accuracy costs $.01 / 1M
tokens, or 7x more expensive.
Poll question

22
Today’s agenda

LLM, agents, agentic systems


01

System engineering challenges


02

System engineering practices


03

Resources and next steps


04

Q&A
05

23
Agentic systems engineering for reliability
1. Execute a DDD and AI-DD process ➔ Produce context map, ubiquitous language, and bounded contexts
➔ Define overall orchestration and flow across bounded contexts
➔ Develop localized workflows for each bounded context
2. Define data sovereignty and scope ➔ Company-specific requirements (e.g., retention policies, audit logging)
➔ Country or regional regulations (e.g., GDPR, HIPAA, financial data rules)
3. Establish evaluation strategy ➔ Make reasoning visible and measurable from the start
➔ Build synthetic evaluation sets to test reasoning steps
4. Select the right AI models ➔ Reasoning models: OpenAI o3, Claude Sonnet, DeepSeek
➔ General models: OpenAI GPT-4o, Gemini Pro, LLaMa
➔ Small language models: Phi-4, Mistral 7B, Claude Haiku, Gemini Flash
➔ Fine-tuned industry models: DeepSeek-Coder, CodeLlama
5. Select agentic platform architecture ➔ Choose platform that enables services that transact and reason
➔ Rqmts: Durable execution, event-driven, memory, streaming, and tools support
➔ Rqmts: Elastic, <20ms p99 latencies, resilient, multi-region failover
6. Build developer workflow and agents ➔ Refine developer workflow
➔ Build initial versions of your agent(s)
7. Deploy and observe ➔ Release, monitor, and refine agent based on real-world behavior
Techniques for reducing uncertainty
Design to anticipate randomness while embracing failure as expected

Leverage Create reasoning layers that break


strategies that complex plans into stages, steps, or
create layers of sub-tasks that can be validated or checked
certainty by downstream agents or humans.

Incorporate Continuous testing and experimentation


eval-driven of different inputs (real-world, synthetic,
development adversarial) to track and validate accuracy.

Choose an Agentic Leverage a framework and platform based


AI Platform proven upon proven runtime that supports
to operate services distributed orchestration, event-driven
scalably, safely behaviors, backpressure, streaming, and
embedded memory.
Create layers of certainty
Incorporate multi-agent and human verification strategies

Human in the loop Delegate decisions to humans with workflow

Agentic awareness Take more LLM thinking time when observing uncertainty

Check and balance Get second opinions from other agents

Specialization Limit LLMs to making decisions in one area of expertise

Restricted decisioning Limit LLMs to a finite set of outcomes


Incorporate evaluation-driven
development
A system is only as reliable and accurate as its evaluation framework

… is the output accurate


Guardrails
and reliable?

Post-Inference

Model(s)

vary inputs (adversarial,


synthetic, real-world) and
measure output validity
Prompt(s)
Given a
set of Context(s)
inputs …
Choose a proven Agentic AI Platform
LLMs unlock reasoning – but there is no free lunch

LLMs are stateless Need a memory system


No recall of prior interactions

LLMs need context Tools integration


Must be told everything upfront Knowledge integration (e.g., vector databases)
LLMs are stochastic Rely on deterministic workflows as much as possible
Same input, different outputs Design for uncertainty
LLMs are unreliable Adopt a distributed systems mindset
May fail to respond or timeout Use a durable execution framework
under load

LLMs are slow Use streaming to improve responsiveness


High latency, limited concurrency

28
Humans
IOT Devices
Audio / Video
Metrics

Streaming Endpoints
Any Protocol | In/Out | Custom APIs Efficient
Agent Lifecycle Agent Memory 70% less compute
Mgmt Orchestration Database API + agentic combo

Elastic
Data, API and
1 2 3
P R 5M TPS
akka clustering

Agentic AI
Services Agent Connectivity & Adapters
Non-Blocking | Backpressure | Load Balanced
Agile
Prod in days
Secure | Observable | Scalable Semantic Search Multi-LLM, A2A Integration & MCP SDK + ops envs

Prompt 2
3
Events Resilient
1 0-0 RTO/RPO
multi-region,
multi-master data
Other replication
Vector
DB
LLMs Systems
The Akka agentic advantage
✓ Agentic, AI, apps & data Streaming endpoints Memory database
➔ Shared compute: agentic ➔ Agentic sessions with infinite context
✓ Hardened runtime ➔ Context snapshot pruning to avoid
co-execution with API services
✓ Simple, expressive SDK ➔ HTTP and gRPC custom API LLM token caps
endpoints ➔ In-memory context sharding, load
✓ Multi-region balancing, and traffic routing
➔ Custom protocols, media types,
✓ Automated ops and edge deployments ➔ Multi-region context replication
➔ Real-time streaming ingest, ➔ Memory filters for region-pinning and
benchmarked to over 1TB cross-session context creation
➔ Embedded context persistence with
Postgres event store

Agent connectivity & adapters Agent orchestration Agent lifecycle management


➔ Non-blocking, streaming LLM ➔ Event-driven workflow ➔ Agent versioning
inference adapters with back pressure benchmarked to 10M TPS ➔ Agent replay
➔ Multi-LLM selection ➔ SDK with AI workflow component ➔ Event, workflow, and agent
➔ LLM adapters & 100s of ML algos ➔ Serial, parallel, state machine, & debugger
➔ Agent-to-agent brokerless messaging human-in-the-loop flows ➔ No downtime agent upgrades
➔ 100s of 3rd party integrations ➔ Sub-tasking agents and
30
multi-agent coordination
2B people experience Akka daily
SMILE Swiggy Tubi [Fox]
A fast ML engine with 100s of ML & LLM API-driven predictions with multi-model Tubi applies ML models to real-time streams of
inference, powering Google Earth fan-out and ultra-low latency data with in-memory, durable journals

400K downloads / mo 3+ million TPS 2 month time to delivery


6K GitHub stars 71ms p(99) latency
user big data from many sources
“Akka is used for streaming and back pressure
- critical for hosted AI API inference. Akka prediction API
enables event-driven inference exposed as
personalization +
HTTP efficiently, with low latency.” model orchestration + batching multi-model inference

Haifeng Li – maintainer

Horn Coho AI
“Zero problems” augmenting high-performance “With Akka, we got to market 75% faster compared
audio and video streams on demand to other agentic solutions we had considered.”
Tomasz Wujec - Lead Developer Michael Ehrlich – CTO
Agentic is real
Let’s make it real for you

Additional resources Have a project?


Webpage: What is agentic AI?
Case Studies: Agentic AI customer stories
concept
Webinar: A blueprint for agentic AI services
Samples Production-ready agents
proof
Blogs: Agentic AI blogs
News: Akka launches new deployment
options for agentic AI at scale
48 hours
Get Started: Develop your own agentic app

32
Q&A

Contact Us:
Tyler Jewell: [email protected]

Richard Li: [email protected]

InfoQ: [email protected]

You might also like