0% found this document useful (0 votes)
27 views32 pages

Architecting Scalable AI RAG Systems

RAG Architecture

Uploaded by

민냥
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views32 pages

Architecting Scalable AI RAG Systems

RAG Architecture

Uploaded by

민냥
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

1

Experiences of tomorrow.
Engineered together.
We transform how people experience the
business. All through next generation technology.

2002 4000+
What we do: founded professionals

Product
Engineering
Intelligent
Automation
Data &
Analytics 20+ 300+
offices clients

Leading companies choose us:

2
Our Global Delivery Centres
Global Reach, Local Insight - Ciklum bridges the best in tech from the three key IT regions

Central & Eastern Europe

Bulgaria
Czech Republic
Poland
Romania
Slovakia
Spain
Ukraine
United Kingdom

Asia

India
Pakistan

LATAM

Argentina
Uruguay

3
Our speakers

Lucian Gruia Ivan Shelonik Daniel-Mihai


Principal AI Expert Data Scientist Gorgan
Technology Lead Senior JS Developer

AI Tech Lead with over 11 years Certified Professional Machine Tech enthusiast specializing in
of hands-on experience in Learning Expert with 7 years Node.js, SQL/NoSQL and
Telecom, Fintech, and of commercial experience in Cloud technologies with 5+
Aerospace. He specializes in developing Machine Learning years of experience
AI, data integrity, fraud projects from the ground to
detection, system delivery into the Cloud (AWS Hands-on experience in
performance, architecting 5+YoE). projects across outsourcing
frameworks and solutions for and product companies,
real-time systems. Has worked and delivered contributing to the
primarily for customers from development of in-house
Develops an AI upskill S&P 500 products, smart chatbots, and
program for 300 engineers at voicebots by leveraging
Ciklum. different AI technologies

4
Our speakers

Saikumar Uckoo Mariana Batiuk Maksym Lypivskyi


Conversational AI Principal TCoE Lead Global Head of Cloud
expert Platforms

Cloud Architect specialized in Mariana leads the technical Specializing in cloud


building, deploying, and council on the QA maturity computing architectures and
maintaining AI solutions on assessment, test strategy, generative AI applications, he
Microsoft cloud platforms. pre-sales, new services focuses on creating, deploying,
Leads deliveries on platforms development, initiatives, and and optimizing cutting-edge
like Microsoft PVA, KoreAI, and quality engineering activities. solutions across global
custom GenAI solutions built platforms.
on open-source tech. Has proficient experience in
QA Management, Agile A mentor and community
Methodologies, Testing, Team builder, he actively shares his
Management, and Coaching. insights on generative AI,
cloud technologies, and
leadership.

5
Playing in all parts of the AI stack
User Experiences & Engagement Emerging Stack Trends Partners

Apps The rise of cloud-based generative AI and LLMs,


accessible via APIs and embedded in other
Applications applications, will allow companies to use them
Operating Systems & API Layers as-is or customize with their data

The need for model fine-tuning will drive


Model Hubs demand for a diverse skill set, such as software
Fine-Tuning
End-to- engineering, psychology, linguistics, etc.
End Hyper-local AI Models
Closed
Apps The market will evolve and diversify with the
Source Foundation emergence of more pre-trained models, offering
Specific AI Models Models options for size, transparency, versatility &
performance balance
Open-Source
Mastery of new and diverse data types and
volumes will be crucial for success, with GenAI
Foundational AI Models Data
features in modern data platforms facilitating
adoption at scale
Cloud Platforms
Essential for GenAI deployment, cloud
infrastructure will help manage costs and
Computer Hardware Infrastructure carbon emissions, necessitating data center
retrofitting and advancements in chipset
architectures, hardware & algorithms

Applications Models Infrastructure Where we work

6
Agenda

01 What is RAG
05 Build with Javascript

02 LLM Wrappers and Docker


06 Deploy RAG app in AWS

03 Build with Java


07 Deploy RAG app in Azure

04 Build with Python


08 Challenges in QA and more

7
Session’s Tech map

Programming languages Databases

FAISS

Infrastructure

8
What is RAG

A RAG system essentially correlates a user's


prompt with a relevant data chunk. It does this
by identifying the most semantically similar
chunk from the database.
This chunk then becomes the context for the
prompt.
When passed to the Large Language Model
(LLM), it enables the system to provide a
relevant answer within the given context.

9 9
LLM Wrapper
● Build with Java
● Deploy locally
● Integrate a 3rd party client

10
Why do we need RAG?

● Expands Knowledge Base


RAG accesses a vast external database, enriching its knowledge
beyond initial training data
● Improves Accuracy
Enhances response precision by integrating relevant, real-time
information
● Adaptable
Effectively handles novel and niche queries
● Increases Efficiency
Streamlines information retrieval and generation process
● Versatile Applications
Source: What Is Retrieval-Augmented Generation, aka RAG?
Useful across various fields, from customer support to research

11 11
AWS
● Build with Python
● Build Docker images
● Semantic search with FAISS
● Deploy on AWS

12
Data Chunking and LLMs
LLMs also have a limited capacity for context.
Just as humans cannot digest unlimited context, these models have a specific size limit for the content they
can process.

So, what about situations involving very large amounts of data?


Consider a specific use case, such as a book. It's too large to pass the entire book as the context for the
current prompt, so it needs to be divided before being stored in the database.

This process is known as data chunking.

Types of Data chunking (by size):


● Fixed-size
● Variable Chunking
● Semantic Chunking

Generated with DALL·E 3

13
JavaScript
● Build with TypeScript
● Semantic Search with Pg vector

14
Embeddings. Similarity
● Embeddings
Numerical representations of concepts, in a high-dimensional space,
capturing semantic meaning.

● Similarity:
○ Lexical: entities are alike in appearance
○ Semantic: entities are alike in meaning

● In RAG we represent entities by describing them.


This is a form of knowledge representation.

Example: Mountain, River, Canal

One hot encoding 2-Dimensional Space


[Natural vs Artificial, Mobility]
Mountain: 1 Mountain: [-0.7, -0.8]
River: 2 River: [-0.3, 0.7]
Canal: 3 Canal: [ 0.4, 0.5]
Read more: Wikipedia - Cosine Similarity

15
Azure
● Deploy on Azure
● Semantic Search with Qdrant
● Conversation history

16
RAG Architecture

17
Benefits of RAG

1. Providing up-to-date and accurate responses


RAG ensures that the response of an LLM is not based solely on static, stale training data. Rather, the model uses up-to-date external
data sources to provide responses.

2. Reducing inaccurate responses, or hallucinations


By grounding the LLM model's output on relevant, external knowledge, RAG attempts to mitigate the risk of responding with incorrect
or fabricated information (also known as hallucinations). Outputs can include citations of original sources, allowing human verification.

3. Providing domain-specific, relevant responses


Using RAG, the LLM will be able to provide contextually relevant responses tailored to an organization's proprietary or domain-specific
data.

4. Being efficient and cost-effective


Compared to other approaches to customizing LLMs with domain-specific data, RAG is simple and cost-effective. Organizations can
deploy RAG without needing to customize the model. This is especially beneficial when models need to be updated frequently with
new data.

18
QA & Testing
● SW characteristics
● Top 5 risks
● Methods and tools
● Balanced success factors

19
Software Characteristics
ISO 25010 Product Quality Model

Functional Performance
Compatibility Usability Reliability Security Maintainability Portability
Suitability Efficiency

Non-Functional How do the system do this?


Functional Testing What does the system do? Testing

AI-specific Characteristics

Side-effects & Transparency,


Flexibility &
Autonomy Evolution Bias Reward Ethics Interpretability Safety
Adaptability
Hacking & Explainability

20
Top 5 current shortcomings and risks

Ethical Loss of
Lack of Dynamic
& Bias Control Hallucinations
Transparency Learning
Concerns
Testing / QA

Difficult to design Unpredictable Misleading or


Impact on

Unfair outputs Impaired predictability


test cases system behavior inaccurate outputs

Low Challenging Complex testing Difficulties Unreliable


interpretability bias testing updates & maintenance in oversight testing & results

21
Experience-
Based
testing
Stress Pairwise
testing testing

Methods &
tools Transfer Experiences
Black-Box
that help us mitigate risks
Learning of tomorrow. testing
testing
and ensure proper testing Engineered
of AI Together.
Exploratory Robustness
testing testing

Combinatorial Metamorphic
testing testing
Some essential elements
that should be considered when verifying AI systems

KNOW TEST
THE ALGORITHM THE ALGORISM

BALANCED
SUCCESS
FACTORS

MAKE SURE TO HAVE BRING THE


ENOUGH DATA RIGHT PEOPLE
Interact
● Prompt Engineering
● Fine-tuning

24
Context optimization

What the model needs to know


RAG All together
The
optimization
flow Prompt
engineering
Fine tuning

How the model needs to act


LLM Optimization

25
What is a good prompt
Act as an experienced Learning specialist. I need to improve my
upselling skills. Prepare an educational program for me to improve
that skills. Program should be for 2 month with 4 hours effort per
week.
Please provide answer with the next output:
Topic: Name Instruction
● blocks Context
● …
Role
Books:
Formatting
Example:
Topic: Negotiation basics
Tone
● Win-win strategy Examples
● Active listening strategy
Books: "Getting to Yes" by Roger Fisher and William Ury
26
Prompt tactics

Model-guided Self-evaluating
* Shot Prompting
prompting prompting
Zero Before answering, I want Can this program be
Add 2+2: you to first ask for any improved?
extra information that helps
One you produce a better
Add 3+3: 6 answer.
Add 2+2:
If you got no questions,
Few please provide an answer
Add 3+3: 6 instead.
Add 5+5: 10
Add 2+2:

27
Chain of
thoughts
Virma has three bags, each of which
fits five shirts. How many shirts can
Virma fit in her bags?
Let's think step-by-step.

28
Thread-of-Thought

Virma has three bags, each of which


fits five shirts. How many shirts can
Virma fit in her bags?
Walk me through this context in
manageable parts step by step,
summarizing and analyzing as we go.

29
Share your
feedback!

31
Product
Join our team
Engineering
From custom platform and
product development to
scaled agile delivery, we join
forces to build advanced
technology solutions

32

You might also like