0% found this document useful (0 votes)

27 views32 pages

Architecting Scalable AI RAG Systems

RAG Architecture

Uploaded by

민냥

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views32 pages

Architecting Scalable AI RAG Systems

RAG Architecture

Uploaded by

민냥

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

1

Experiences of tomorrow.
Engineered together.
We transform how people experience the
business. All through next generation technology.

2002 4000+
What we do: founded professionals

Product
Engineering
Intelligent
Automation
Data &
Analytics 20+ 300+
ofﬁces clients

Leading companies choose us:

2
Our Global Delivery Centres
Global Reach, Local Insight - Ciklum bridges the best in tech from the three key IT regions

Central & Eastern Europe

Bulgaria
Czech Republic
Poland
Romania
Slovakia
Spain
Ukraine
United Kingdom

Asia

India
Pakistan

LATAM

Argentina
Uruguay

3
Our speakers

Lucian Gruia Ivan Shelonik Daniel-Mihai

Principal AI Expert Data Scientist Gorgan
Technology Lead Senior JS Developer

AI Tech Lead with over 11 years Certiﬁed Professional Machine Tech enthusiast specializing in
of hands-on experience in Learning Expert with 7 years Node.js, SQL/NoSQL and
Telecom, Fintech, and of commercial experience in Cloud technologies with 5+
Aerospace. He specializes in developing Machine Learning years of experience
AI, data integrity, fraud projects from the ground to
detection, system delivery into the Cloud (AWS Hands-on experience in
performance, architecting 5+YoE). projects across outsourcing
frameworks and solutions for and product companies,
real-time systems. Has worked and delivered contributing to the
primarily for customers from development of in-house
Develops an AI upskill S&P 500 products, smart chatbots, and
program for 300 engineers at voicebots by leveraging
Ciklum. different AI technologies

4
Our speakers

Saikumar Uckoo Mariana Batiuk Maksym Lypivskyi

Conversational AI Principal TCoE Lead Global Head of Cloud
expert Platforms

Cloud Architect specialized in Mariana leads the technical Specializing in cloud

building, deploying, and council on the QA maturity computing architectures and
maintaining AI solutions on assessment, test strategy, generative AI applications, he
Microsoft cloud platforms. pre-sales, new services focuses on creating, deploying,
Leads deliveries on platforms development, initiatives, and and optimizing cutting-edge
like Microsoft PVA, KoreAI, and quality engineering activities. solutions across global
custom GenAI solutions built platforms.
on open-source tech. Has proﬁcient experience in
QA Management, Agile A mentor and community
Methodologies, Testing, Team builder, he actively shares his
Management, and Coaching. insights on generative AI,
cloud technologies, and
leadership.

5
Playing in all parts of the AI stack
User Experiences & Engagement Emerging Stack Trends Partners

Apps The rise of cloud-based generative AI and LLMs,

accessible via APIs and embedded in other
Applications applications, will allow companies to use them
Operating Systems & API Layers as-is or customize with their data

The need for model ﬁne-tuning will drive

Model Hubs demand for a diverse skill set, such as software
Fine-Tuning
End-to- engineering, psychology, linguistics, etc.
End Hyper-local AI Models
Closed
Apps The market will evolve and diversify with the
Source Foundation emergence of more pre-trained models, offering
Speciﬁc AI Models Models options for size, transparency, versatility &
performance balance
Open-Source
Mastery of new and diverse data types and
volumes will be crucial for success, with GenAI
Foundational AI Models Data
features in modern data platforms facilitating
adoption at scale
Cloud Platforms
Essential for GenAI deployment, cloud
infrastructure will help manage costs and
Computer Hardware Infrastructure carbon emissions, necessitating data center
retroﬁtting and advancements in chipset
architectures, hardware & algorithms

Applications Models Infrastructure Where we work

6
Agenda

01 What is RAG
05 Build with Javascript

02 LLM Wrappers and Docker

06 Deploy RAG app in AWS

03 Build with Java

07 Deploy RAG app in Azure

04 Build with Python

08 Challenges in QA and more

7
Session’s Tech map

Programming languages Databases

FAISS

Infrastructure

8
What is RAG

A RAG system essentially correlates a user's

prompt with a relevant data chunk. It does this
by identifying the most semantically similar
chunk from the database.
This chunk then becomes the context for the
prompt.
When passed to the Large Language Model
(LLM), it enables the system to provide a
relevant answer within the given context.

9 9
LLM Wrapper
● Build with Java
● Deploy locally
● Integrate a 3rd party client

10
Why do we need RAG?

● Expands Knowledge Base

RAG accesses a vast external database, enriching its knowledge
beyond initial training data
● Improves Accuracy
Enhances response precision by integrating relevant, real-time
information
● Adaptable
Effectively handles novel and niche queries
● Increases Efﬁciency
Streamlines information retrieval and generation process
● Versatile Applications
Source: What Is Retrieval-Augmented Generation, aka RAG?
Useful across various ﬁelds, from customer support to research

11 11
AWS
● Build with Python
● Build Docker images
● Semantic search with FAISS
● Deploy on AWS

12
Data Chunking and LLMs
LLMs also have a limited capacity for context.
Just as humans cannot digest unlimited context, these models have a speciﬁc size limit for the content they
can process.

So, what about situations involving very large amounts of data?

Consider a speciﬁc use case, such as a book. It's too large to pass the entire book as the context for the
current prompt, so it needs to be divided before being stored in the database.

This process is known as data chunking.

Types of Data chunking (by size):

● Fixed-size
● Variable Chunking
● Semantic Chunking

Generated with DALL·E 3

13
JavaScript
● Build with TypeScript
● Semantic Search with Pg vector

14
Embeddings. Similarity
● Embeddings
Numerical representations of concepts, in a high-dimensional space,
capturing semantic meaning.

● Similarity:
○ Lexical: entities are alike in appearance
○ Semantic: entities are alike in meaning

● In RAG we represent entities by describing them.

This is a form of knowledge representation.

Example: Mountain, River, Canal

One hot encoding 2-Dimensional Space

[Natural vs Artiﬁcial, Mobility]
Mountain: 1 Mountain: [-0.7, -0.8]
River: 2 River: [-0.3, 0.7]
Canal: 3 Canal: [ 0.4, 0.5]
Read more: Wikipedia - Cosine Similarity

15
Azure
● Deploy on Azure
● Semantic Search with Qdrant
● Conversation history

16
RAG Architecture

17
Beneﬁts of RAG

1. Providing up-to-date and accurate responses

RAG ensures that the response of an LLM is not based solely on static, stale training data. Rather, the model uses up-to-date external
data sources to provide responses.

2. Reducing inaccurate responses, or hallucinations

By grounding the LLM model's output on relevant, external knowledge, RAG attempts to mitigate the risk of responding with incorrect
or fabricated information (also known as hallucinations). Outputs can include citations of original sources, allowing human veriﬁcation.

3. Providing domain-speciﬁc, relevant responses

Using RAG, the LLM will be able to provide contextually relevant responses tailored to an organization's proprietary or domain-speciﬁc
data.

4. Being efﬁcient and cost-effective

Compared to other approaches to customizing LLMs with domain-speciﬁc data, RAG is simple and cost-effective. Organizations can
deploy RAG without needing to customize the model. This is especially beneﬁcial when models need to be updated frequently with
new data.

18
QA & Testing
● SW characteristics
● Top 5 risks
● Methods and tools
● Balanced success factors

19
Software Characteristics
ISO 25010 Product Quality Model

Functional Performance
Compatibility Usability Reliability Security Maintainability Portability
Suitability Efﬁciency

Non-Functional How do the system do this?

Functional Testing What does the system do? Testing

AI-speciﬁc Characteristics

Side-effects & Transparency,

Flexibility &
Autonomy Evolution Bias Reward Ethics Interpretability Safety
Adaptability
Hacking & Explainability

20
Top 5 current shortcomings and risks

Ethical Loss of
Lack of Dynamic
& Bias Control Hallucinations
Transparency Learning
Concerns
Testing / QA

Difﬁcult to design Unpredictable Misleading or

Impact on

Unfair outputs Impaired predictability

test cases system behavior inaccurate outputs

Low Challenging Complex testing Difﬁculties Unreliable

interpretability bias testing updates & maintenance in oversight testing & results

21
Experience-
Based
testing
Stress Pairwise
testing testing

Methods &
tools Transfer Experiences
Black-Box
that help us mitigate risks
Learning of tomorrow. testing
testing
and ensure proper testing Engineered
of AI Together.
Exploratory Robustness
testing testing

Combinatorial Metamorphic
testing testing
Some essential elements
that should be considered when verifying AI systems

KNOW TEST
THE ALGORITHM THE ALGORISM

BALANCED
SUCCESS
FACTORS

MAKE SURE TO HAVE BRING THE

ENOUGH DATA RIGHT PEOPLE
Interact
● Prompt Engineering
● Fine-tuning

24
Context optimization

What the model needs to know

RAG All together
The
optimization
ﬂow Prompt
engineering
Fine tuning

How the model needs to act

LLM Optimization

25
What is a good prompt
Act as an experienced Learning specialist. I need to improve my
upselling skills. Prepare an educational program for me to improve
that skills. Program should be for 2 month with 4 hours effort per
week.
Please provide answer with the next output:
Topic: Name Instruction
● blocks Context
● …
Role
Books:
Formatting
Example:
Topic: Negotiation basics
Tone
● Win-win strategy Examples
● Active listening strategy
Books: "Getting to Yes" by Roger Fisher and William Ury
26
Prompt tactics

Model-guided Self-evaluating
* Shot Prompting
prompting prompting
Zero Before answering, I want Can this program be
Add 2+2: you to ﬁrst ask for any improved?
extra information that helps
One you produce a better
Add 3+3: 6 answer.
Add 2+2:
If you got no questions,
Few please provide an answer
Add 3+3: 6 instead.
Add 5+5: 10
Add 2+2:

27
Chain of
thoughts
Virma has three bags, each of which
fits five shirts. How many shirts can
Virma fit in her bags?
Let's think step-by-step.

28
Thread-of-Thought

Virma has three bags, each of which

fits five shirts. How many shirts can
Virma fit in her bags?
Walk me through this context in
manageable parts step by step,
summarizing and analyzing as we go.

29
Share your
feedback!

31
Product
Join our team
Engineering
From custom platform and
product development to
scaled agile delivery, we join
forces to build advanced
technology solutions

101 Productivity Boosting ChatGPT Prompts
100% (2)
101 Productivity Boosting ChatGPT Prompts
28 pages
Cheat Sheet AWS AI Practitioner
100% (1)
Cheat Sheet AWS AI Practitioner
50 pages
Gartner Reprint
No ratings yet
Gartner Reprint
84 pages
Introduction To Transformers
No ratings yet
Introduction To Transformers
187 pages
GenAI ElasticSearch
No ratings yet
GenAI ElasticSearch
42 pages
AI-900 Slides
100% (1)
AI-900 Slides
193 pages
Building Blocks of Rag Ebook Final
100% (2)
Building Blocks of Rag Ebook Final
9 pages
f5 Ai Reference Architecture
No ratings yet
f5 Ai Reference Architecture
33 pages
Salesforce Certified Agentforce - 11
No ratings yet
Salesforce Certified Agentforce - 11
5 pages
How To Build AI Driven Knowledge Assistants
100% (1)
How To Build AI Driven Knowledge Assistants
24 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
Generative Ai Primer
No ratings yet
Generative Ai Primer
4 pages
Final
No ratings yet
Final
24 pages
Generative AI From Use Cases To Organizational Paradigm v1.1
No ratings yet
Generative AI From Use Cases To Organizational Paradigm v1.1
44 pages
AI Product Essentials Crash Course
No ratings yet
AI Product Essentials Crash Course
270 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
GenAI Interview Questions Answers
No ratings yet
GenAI Interview Questions Answers
2 pages
Rise of LLM
No ratings yet
Rise of LLM
64 pages
LLM For Recommandation
No ratings yet
LLM For Recommandation
101 pages
Generative AI Use Cases For The Enterprise - IBM Blog
No ratings yet
Generative AI Use Cases For The Enterprise - IBM Blog
17 pages
ChatGPT in Finance - Applications, Challenges, and Solutions
No ratings yet
ChatGPT in Finance - Applications, Challenges, and Solutions
8 pages
AI For Everyone
No ratings yet
AI For Everyone
23 pages
Downloadfile
No ratings yet
Downloadfile
112 pages
Generative AI For Business Leaders
No ratings yet
Generative AI For Business Leaders
12 pages
Syllabus: Computer Engineer
No ratings yet
Syllabus: Computer Engineer
68 pages
Pazago RAG Assignment Checklist
No ratings yet
Pazago RAG Assignment Checklist
11 pages
(EXTERNAL) AI Trailblazers Workshop - Industry (5th Sep)
No ratings yet
(EXTERNAL) AI Trailblazers Workshop - Industry (5th Sep)
134 pages
Language Modelling Approaches To Adaptive Machine Translation
No ratings yet
Language Modelling Approaches To Adaptive Machine Translation
132 pages
Coin Metrics Crypto Asset Valuation Primer I
No ratings yet
Coin Metrics Crypto Asset Valuation Primer I
11 pages
ML AI Google Masterclass
No ratings yet
ML AI Google Masterclass
50 pages
A Short White Paper by Nikhil Malhotra
No ratings yet
A Short White Paper by Nikhil Malhotra
7 pages
AI ML RL GenAI
No ratings yet
AI ML RL GenAI
37 pages
RBI Bulletin October 2024 How Indian Banks Are Adopting Artificial
No ratings yet
RBI Bulletin October 2024 How Indian Banks Are Adopting Artificial
16 pages
Predicting Individual Equity Options
No ratings yet
Predicting Individual Equity Options
38 pages
UCalgary AI-900 Prep Guide
No ratings yet
UCalgary AI-900 Prep Guide
117 pages
Multimodal Chain-of-Thought Reasoning
No ratings yet
Multimodal Chain-of-Thought Reasoning
25 pages
17 (Advanced) RAG Techniques To Turn Your LLM App Prototype Into A Production-Ready Solution - by Dominik Polzer - Jun, 2024 - Towards Data Science
No ratings yet
17 (Advanced) RAG Techniques To Turn Your LLM App Prototype Into A Production-Ready Solution - by Dominik Polzer - Jun, 2024 - Towards Data Science
54 pages
Sem 2 Prompt Engineering
No ratings yet
Sem 2 Prompt Engineering
5 pages
Gen AI Foundation
No ratings yet
Gen AI Foundation
40 pages
A Guide To GenerativeAI (GAI) and Large Language Models (LLMS)
No ratings yet
A Guide To GenerativeAI (GAI) and Large Language Models (LLMS)
14 pages
Generativeaiconamazonbedrock 231229150142 844d444e
No ratings yet
Generativeaiconamazonbedrock 231229150142 844d444e
48 pages
Unit 1-2
No ratings yet
Unit 1-2
58 pages
Generative AI On Amazon Web Services Ebook
No ratings yet
Generative AI On Amazon Web Services Ebook
33 pages
MS Create Connected Experiences With Mule and AI 1
No ratings yet
MS Create Connected Experiences With Mule and AI 1
22 pages
AI CSW Day 1
No ratings yet
AI CSW Day 1
27 pages
Neoteric INTRO - PPTX
No ratings yet
Neoteric INTRO - PPTX
35 pages
Gen AI - Gartner Notes
No ratings yet
Gen AI - Gartner Notes
14 pages
Generative AI and Its Impact To Everyday Business
No ratings yet
Generative AI and Its Impact To Everyday Business
24 pages
Ai Sustainablity in It
No ratings yet
Ai Sustainablity in It
19 pages
Ask Your PDF (Thesis)
No ratings yet
Ask Your PDF (Thesis)
42 pages
Safari
No ratings yet
Safari
18 pages
Digital Transformation g1
No ratings yet
Digital Transformation g1
25 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
22 pages
AI
No ratings yet
AI
26 pages
Data Infrastructure Ai Success Ebook
No ratings yet
Data Infrastructure Ai Success Ebook
17 pages
A Practical Blueprint For Implementing Generative AI Retrieval-Augmented Generation
No ratings yet
A Practical Blueprint For Implementing Generative AI Retrieval-Augmented Generation
19 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
22 pages
Melting Point: Mobile Evaluation of Language Transformers
No ratings yet
Melting Point: Mobile Evaluation of Language Transformers
16 pages
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
No ratings yet
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
23 pages
Harness Proprietary Data With Foundational Models and RAG: by Marian Veteanu
No ratings yet
Harness Proprietary Data With Foundational Models and RAG: by Marian Veteanu
20 pages
GenAI4PubSec v2
No ratings yet
GenAI4PubSec v2
17 pages
AI and Software Testing: Vikram Raghuwanshi Senior Consultant
No ratings yet
AI and Software Testing: Vikram Raghuwanshi Senior Consultant
23 pages
About AI
No ratings yet
About AI
11 pages
Training Large Language Models For Reasoning Through Reverse Curriculum Reinforcement Learning
No ratings yet
Training Large Language Models For Reasoning Through Reverse Curriculum Reinforcement Learning
19 pages
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
No ratings yet
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
17 pages
Basic AI & ML Concepts Explained - LinkedIn
No ratings yet
Basic AI & ML Concepts Explained - LinkedIn
10 pages
What Is Generative AI
No ratings yet
What Is Generative AI
16 pages
Liu Li 2024 Toward Artificial Intelligence Human Paired Programming A Review of The Educational Applications and
No ratings yet
Liu Li 2024 Toward Artificial Intelligence Human Paired Programming A Review of The Educational Applications and
31 pages
ASurvey of AIOps For Failure Management in The Era of Large Language Models
No ratings yet
ASurvey of AIOps For Failure Management in The Era of Large Language Models
35 pages
Github - Blog - Ai and ML - Generative Ai - What Is Retrieval Augmented Generation and What Does It Do For Generative Ai
No ratings yet
Github - Blog - Ai and ML - Generative Ai - What Is Retrieval Augmented Generation and What Does It Do For Generative Ai
14 pages
Neurons To GenerativeAI V2 Roadmap
No ratings yet
Neurons To GenerativeAI V2 Roadmap
14 pages
Exploring Computing Innovations Milestones (APCSP)
No ratings yet
Exploring Computing Innovations Milestones (APCSP)
6 pages
Microsoft Program Management
No ratings yet
Microsoft Program Management
11 pages
New Technologies
No ratings yet
New Technologies
10 pages
Generative AI-233444
No ratings yet
Generative AI-233444
11 pages
Neurons To GenerativeAI Roadmap 2024
No ratings yet
Neurons To GenerativeAI Roadmap 2024
14 pages
AI Governance For AI-Powered Applications Palo Alto Firewall
No ratings yet
AI Governance For AI-Powered Applications Palo Alto Firewall
14 pages
White-Paper-4 Gen AI
No ratings yet
White-Paper-4 Gen AI
12 pages
LLM Inference Serving: Survey of Recent Advances and Opportunities
No ratings yet
LLM Inference Serving: Survey of Recent Advances and Opportunities
8 pages
Building Scalable AI-Powered Applications With Clo
No ratings yet
Building Scalable AI-Powered Applications With Clo
9 pages
Generative AI Is Here How Tools Like ChatGPT Could Change
No ratings yet
Generative AI Is Here How Tools Like ChatGPT Could Change
6 pages
Towards Controllable Speech Synthesis in The Era of Large Language Models A Survey
No ratings yet
Towards Controllable Speech Synthesis in The Era of Large Language Models A Survey
23 pages
ReDeEP Detecting Hallucination in Retrieval-Augmen
No ratings yet
ReDeEP Detecting Hallucination in Retrieval-Augmen
23 pages
7amba Proposal
No ratings yet
7amba Proposal
23 pages
Mit Gao Eng Brochure
No ratings yet
Mit Gao Eng Brochure
21 pages
MoRSE: Bridging The Cybersecurity Gap With AI
No ratings yet
MoRSE: Bridging The Cybersecurity Gap With AI
7 pages
Advancing Social Intelligence in AI Agents - Technical Challenges and Open Questions
No ratings yet
Advancing Social Intelligence in AI Agents - Technical Challenges and Open Questions
20 pages
Ways To Use LLM in Finance Organisation
No ratings yet
Ways To Use LLM in Finance Organisation
5 pages
AI in 2025 A Combinatorial Explosion of Possibilities But NOT AGI CCR
No ratings yet
AI in 2025 A Combinatorial Explosion of Possibilities But NOT AGI CCR
4 pages
Ai Primer For Executives
No ratings yet
Ai Primer For Executives
5 pages
Chat History
No ratings yet
Chat History
6 pages
WorkSmarterNotHarderGenAI Script
No ratings yet
WorkSmarterNotHarderGenAI Script
5 pages
Understand The Technology Ecosystem
No ratings yet
Understand The Technology Ecosystem
3 pages
Article 4
No ratings yet
Article 4
5 pages
Suhail M Khan Resume 2
No ratings yet
Suhail M Khan Resume 2
2 pages
MemoryBank - Enhancing Large Language Models With Long-Term Memory
No ratings yet
MemoryBank - Enhancing Large Language Models With Long-Term Memory
11 pages
Log-Based Anomaly Detection Using Large Language Models
No ratings yet
Log-Based Anomaly Detection Using Large Language Models
11 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Medical Reports Summarization Using Text-To-Text Transformer
No ratings yet
Medical Reports Summarization Using Text-To-Text Transformer
5 pages
CompTIA Network+ N10-009 Exam-Prep
From Everand
CompTIA Network+ N10-009 Exam-Prep
Abound Academy
No ratings yet
AI-900: Microsoft Azure AI Fundamentals Practice Questions
From Everand
AI-900: Microsoft Azure AI Fundamentals Practice Questions
IP Specialist
No ratings yet

Architecting Scalable AI RAG Systems

Uploaded by

Architecting Scalable AI RAG Systems

Uploaded by

1

Leading companies choose us:

Central & Eastern Europe

Lucian Gruia Ivan Shelonik Daniel-Mihai

Saikumar Uckoo Mariana Batiuk Maksym Lypivskyi

Cloud Architect specialized in Mariana leads the technical Specializing in cloud

Apps The rise of cloud-based generative AI and LLMs,

The need for model ﬁne-tuning will drive

Applications Models Infrastructure Where we work

02 LLM Wrappers and Docker

03 Build with Java

04 Build with Python

Programming languages Databases

A RAG system essentially correlates a user's

● Expands Knowledge Base

So, what about situations involving very large amounts of data?

This process is known as data chunking.

Types of Data chunking (by size):

Generated with DALL·E 3

● In RAG we represent entities by describing them.

Example: Mountain, River, Canal

One hot encoding 2-Dimensional Space

1. Providing up-to-date and accurate responses

2. Reducing inaccurate responses, or hallucinations

3. Providing domain-speciﬁc, relevant responses

4. Being efﬁcient and cost-effective

Non-Functional How do the system do this?

Side-effects & Transparency,

Difﬁcult to design Unpredictable Misleading or

Unfair outputs Impaired predictability

Low Challenging Complex testing Difﬁculties Unreliable

MAKE SURE TO HAVE BRING THE

What the model needs to know

How the model needs to act

Virma has three bags, each of which

You might also like