Architecture Patterns For Building Generative AI Applications
Architecture Patterns For Building Generative AI Applications
Praveen Jayakumar
Head of AI/ML Solutions Architecture
AWS India
Jurassic-2 Ultra Titan Text Embeddings Claude 2 Command + Embed Llama 2 Stable Diffusion XL1.0
Jurassic-2 Mid Titan Multimodal Embeddings Claude 2.1 Cohere Command Light Llama 2 13B
Titan Text Lite Claude Instant Cohere Embed English Llama 2 70B
Titan Text Express Cohere Embed Multilingual
Titan Image Generator
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Why customize?
E.g. Healthcare – Understand E.g. Finance – Teach financial & E.g. Customer Service – Improve E.g. Legal Services – Better
medical terminology and provide accounting terms to provide good ability to understand and respond understand case facts and law to
accurate responses related to analysis for earnings reports to customer’s inquires and provide useful insights for
patient’s health complaints attorneys
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Common approaches for customizing foundation models (FMs)
Complexity,
Quality,
Cost,
Time
Adjust behavior of
a pre-trained FM
Augment knowledge
without changing
pre-trained model Train FM
weights from scratch
Retrieval Customize
Augmented
Prompt
Generation
Engineering (RAG)
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 4
Customize vs. augment
External data
Consolidated or
sources
historical info
or up-to-date info Task
information
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 6
RAG use cases
E.g., helps in reducing E.g., enhance chatbot capabilities E.g., searching based on user E.g., retrieving and summarizing
hallucinations and connecting by integrating with real-time data previous search history and transactional data from
with recent knowledge including persona databases, or API calls
enterprise data
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 7
Types of retrieval
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 8
What are embeddings?
• Numerical representation of
text (vectors) that captures
semantics and relationships New York 0.027 -0.011 … -0.023
between words.
Paris 0.025 -0.009 … -0.025
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 9
Why are embeddings important for RAG?
Translates text inputs (words, phrases) into numerical • Titan Text Embeddings offers fast, cost
representations (embeddings). Comparing effective, high-performance, accurate
embeddings produces more relevant and contextual embeddings in 25 languages.
responses than word matching.
• Optimized for text retrieval tasks, semantic
similarity and clustering.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 11
RAG in Action
User Input
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 12
However, when it comes to implementing RAG,
there are challenges…
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 14
Data Ingestion Workflow
KNOWLEDGE BASES FOR AMAZON BEDROCK
Fully
managed
data
ingestion
Data source Embeddings
workflow New data Document chunks
model
Vector store
• Choose your data source • Choose your • Choose your • Choose your vector
(Amazon S3) chunking strategy embedding model store
• Support for incremental • Fixed chunks • Amazon Titan • Open search
updates • No chunking serverless
• Multiple data file formats • Default (200 • Pinecone
supported tokens) • Redis
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 15
Fully managed data ingestion
KNOWLEDGE BASES FOR AMAZON BEDROCK
Fully
managed
data
ingestion
Data source Embeddings Vector store
workflow New data Document chunks
• Choose your data source • Choose your • Choose your • Choose your vector
Knowledge
(Amazon S3) Bases •for
chunking Amazon
strategy Bedrock
embedding model store
• Fixed chunks
Support for incremental • Amazon Titan • Open search
updates • No chunking serverless
• Multiple data file format • Default (200 • Pinecone
tokens) • Redis
supported
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 16
RetrieveAndGenerate API
KNOWLEDGE BASES FOR AMAZON BEDROCK
User Input
RetrieveAndGenerate
API
User Response
Fully User query Generated
managed response
RAG
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 17
Customize RAG workflows using Retrieve API
KNOWLEDGE BASES FOR AMAZON BEDROCK
User Input
Customized
RAG Retrieve API
workflow
User Retrieved
Context
query documents
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 18
Knowledge Bases integration with Agents
Search
Knowledge Bases
Query
Retrieval
Agent
Query + Retrieval
Large Language
Model
Response Generation
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 19
Customize
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 20
Instruction fine-tuning and continued pre-training
Domain
adaptation
(e.g. extend
knowledge)
Instruction fine-tuning Continued
pre-training
• Instruction training dataset is available?
+
• Specific style, behavior required? Continued Instruction
Pre-training fine-tuning
Continued pre-training
• Raw dataset (e.g. PDFs)
• Additional knowledge through domain adaptation Instruction fine-
tuning
Task specialization
(e.g. behavior, style)
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 21
Datasets for instruction fine-tuning and continued pre-training
Instruction Continued
fine-tuning pre-training
{”prompt": "<prompt text>", ”completion": "<expected generated text>"} {"input": "<raw text>"}
{”prompt": "<prompt text>", ”completion": "<expected generated text>"} {"input": "<raw text>"}
{”prompt": "<prompt text>", ”completion": "<expected generated text>"} {"input": "<raw text>"}
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 22
Example of Instruction fine-tuning dataset
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 23
Amazon Bedrock custom models New!
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 24
Components of a customization job
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 25
Customization architecture overview
Amazon Bedrock service account Model deployment account
(AWS owned and operated)
via the console, SDKs, and API
All incoming network traffic
Training orchestration
Fine-tuned model
S3 bucket
Customer account
Training data
AWS Amazon AWS S3 bucket
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
CloudTrail CloudWatch IAM 26
Security and privacy
You are always in control of your data
ü Data not used to improve models, and not shared with model providers
ü Custom models encrypted and stored with Service or Customer Managed Keys
(CMK) - Only you have access to your models
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 27
Recap
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 28
Thank you!
Praveen Jayakumar
[email protected]