Handout Accelerate To Production With Serverless Compute For Generative AI Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

26 SEP,2024

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Accelerate to production with serverless
compute for generative AI applications

Mai Nishitani Jad Goss Robert Louw


Senior Solutions Architect AI Engineer AI Engineer
AWS RACQ RACQ

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do I prioritize my projects?

How can I lower my costs? How do I make this real?

What customization method should I use?

The Year of How I can I scale this? Which models should I use?

Production
Should I train my own model? How do I manage risks?

How can we move faster?

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How foundation models differ from other ML models

Tasks Tasks
Labeled
ML model Text generation Text generation
data

Labeled
ML model Summarization Summarization
data

Labeled Information Unlabeled Foundation Information


ML model
data extraction data model extraction

Labeled
ML model Q&A Q&A
data

Labeled
ML model Chatbot Chatbot
data

Train Deploy Pre-train Adapt

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is serverless?

Serverless services simplify the Business logic Customer


management and scaling of
cloud applications by shifting
undifferentiated operational tasks to the API
cloud provider so development teams
Messaging &
can focus on writing code that solve
orchestration
business problems
Storage &
AWS
databases

Compute

Physical
infrastructure

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI personas

Consumers

Tuners

Builders

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RACQ - Claim Research Assistant
Business objectives

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RACQ - continued

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI on AWS

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI stack
Applica tions that leverage LLMs and other foundation models (FMs)

Amazon Q Amazon Q Amazon Q in Amazon Q in


Business Developer QuickSight Connect

Tools to build with LLMs and other FMs

Amazon Bedrock

Guardrails Agents Studio Customization capabilities Custom Model Import

Infrastructure for FM training and inference

GPUs AWS Trainium AWS Inferentia Amazon SageMaker

Amazon EC2 Elastic Fabric Amazon EC2 AWS Nitro AWS Neuron
UltraClusters Adapter (EFA) Capacity Blocks

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI stack
Applica tions that leverage LLMs and other foundation models (FMs)

Tools to build with LLMs and other FMs

Amazon Bedrock

Guardrails Agents Studio Customization capabilities Custom Model Import

Infrastructure for FM training and inference

GPUs AWS Trainium AWS Inferentia Amazon SageMaker

Amazon EC2 Elastic Fabric Amazon EC2 AWS Nitro AWS Neuron
UltraClusters Adapter (EFA) Capacity Blocks

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choice of leading FMs through a single API

Model customization
Amazon Bedrock
Retrieval Augmented Generation (RAG)
The easiest way to build and scale
generative AI applications with
foundation models (FMs) Agents that execute multistep tasks

Security, privacy, and safety

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Bedrock
simplifies

Choice Customization Integration Security and


governance

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Bedrock
Broad choice of models

Contextual answers, Text summarization, Summarization, Text generation, Q&A and reading Text summarization, High-quality images
summarization, generation, complex reasoning, search, classification comprehension text classification, and art
paraphrasing Q&A, search, writing, coding text completion,
image generation code generation, Q&A

Jamba-Instruct Amazon Titan Claude 3.5 Sonnet Command Llama 3 8B Mistral Small Stable Diffusion XL1.0
Text Premier
Jurassic-2 Ultra Claude 3 Opus Command Light Llama 3 70B Mistral Large Stable Diffusion
Amazon Titan XL 0.8
Jurassic-2 Mid Claude 3 Sonnet Embed English Llama 2 13B Mistral 7B
Text Lite
Claude 3 Haiku Embed Multilingual Llama 2 70B Mixtral 8x7B
Amazon Titan
Text Express Claude 2.1 Command R+
Amazon Titan Text Claude 2 Command R
Embeddings Claude Instant
Amazon Titan Text
Embeddings V2
Amazon Titan
Multimodal
Embeddings
Amazon Titan
Image Generator

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anthropic Claude Models on Amazon Bedrock
Choose the exact combination of intelligence, speed, and cost to suit your needs

Claude 3.5 Sonnet Claude 3 Haiku Claude 3 Sonnet Claude 3 Opus

Balance between Second-most intelligent


Most intelligent, built for Fastest performance
Use case high-volume use cases at the lowest cost
intelligence, speed, and overall; most intelligent in
cost Claude 3 family

Context 200K 200K 200K 200K

Vision ✓ ✓ ✓ ✓
Input: $0.003 $0.00025 $0.003 $0.015
Cost*
Output: $0.015 $0.00125 $0.015 $0.075

*Per 1K tokens

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RACQ – Claim Research Assistant –
Technical objectives

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture
AWS Cloud

Enterprise Virtual private cloud (VPC)


application 1

AWS Amazon Amazon Athena


Glue Job Comprehend
2

Amazon Amazon AWS Lambda AWS Step


ECS API GW (Router) Functions
Amazon Amazon S3
DynamoDB

Amazon Bedrock AWS Lambda


(Orchestrator)

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
AWS Step Functions – Gather data

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
AWS Step Functions – Generate “Prompt Chunks”

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
AWS Step Functions – Process “Prompt Chunks”

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
AWS Step Functions – Process Eval “Prompt
Chunks”

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
AWS Step Functions – Generate summary

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
AWS Step Functions – Notify the user

Disclaimer: The following content is for information purposes only. RACQ provides no warranties as to completeness or accuracy and accepts no liability for any reliance on this content.
Why build generative AI
applications?

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Combining speed and power

Rapid delivery
of smarter
Power of generative applications and
Speed of serverless features with
AI
focus on
innovation

AWS Amazon AWS AWS Amazon Amazon Amazon Amazon Q


Lambda ECS Fargate Step EventBridge SageMaker Bedrock
Functions

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Serverless spectrum
AWS offers a wide portfolio of serverless services

Compute Storage

AWS AWS Amazon AWS App Amazon Simple Storage Amazon


Lambda Fargate ECS Runner Service (Amazon S3) EFS

Workflows and integrations

Amazon Simple Amazon Simple


Amazon AWS Step Amazon AWS Amazon
Queue Service Notification Service
EventBridge Functions API Gateway AppSync (Amazon SQS) (Amazon SNS) Kinesis

Databases and analytics

Amazon Aurora Amazon Amazon Amazon Amazon AWS Amazon


Serverless DynamoDB OpenSearch Service Bedrock QuickSight Glue Redshift

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hosting and serving generative AI
applications with serverless

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hosting and serving generative AI on serverless - Architecture
Source: Build and scale generative AI applications with Amazon Bedrock workshop

Generative AI layer

Amazon API Gateway AWS Lambda Amazon Bedrock


Gen AI core

AWS Lambda Amazon DynamoDB


Prompt management

AWS Lambda
Send mail Amazon SNS

Front-end layer

Amazon ECS Amazon ECR


Amazon Cognito Streamlit app

User
Amazon CloudFront Application Load
Balancer
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on AI code

Why Flexible integrations

serverless
compute for Cost-effective
generative
AI? Agility with rapidly
evolving FMs

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on AI code Built-in auto scaling, high
availability, and fault
tolerance to ensure
Why Flexible integrations developer productivity,
serverless alleviating teams from the
complexity of infrastructure
compute for planning and management
generative Cost-effective for high throughput
inference workloads.
AI?
Agility with rapidly
evolving FMs

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is a serverless
operational model?
Business logic

Customer
Serverless services simplify the API
management and scaling of cloud
applications by shifting undifferentiated Messaging & orchestration
operational tasks to the cloud provider so
development teams can focus on writing
Storage & databases
code that solve business problems

AWS
Compute

Physical infrastructure

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on AI code

Quickly integrate and


Flexible integrations deploy FMs into your
Why serverless applications and workloads
compute for running on AWS using
familiar controls and
generative AI? Cost-effective
integrations with the
depth and breadth of AWS
capabilities and services
like Amazon SageMaker
and Amazon Bedrock
Agility with rapidly
evolving FMs

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on AI code

Flexible integrations

Why serverless
compute for Cost-effective
Cost-effectively scale
infrastructure to train and run
generative AI? FMs containing hundreds of
billions of parameters with pay-
by-request, ensuring that you
Agility with rapidly only pay for the duration and
evolving FMs quantity of inferences.

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Focus on AI code

Flexible integrations

Why serverless Generative AI is in a very early


stage of technology, build a
compute for Cost-effective
company culture for continuous
experimentation and evaluation
generative AI? of different models.

Agility with rapidly


evolving FMs

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Emerging generative AI patterns

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI patterns for consumers and tuners
Building Generative AI applications with Serverless Solutions

Inference enrichment
Transform image text

Store summary table

Amazon Textract AWS Amazon Bedrock


Lambda
Amazon S3 AWS Step AWS Amazon DynamoDB
Functions Lambda

Amazon Transcribe Summarize image text

Model fine-tuning
Fine-tuning to
enhance generative
AWS AWS Step AI performance
Lambda Functions

Amazon S3 AWS Step Amazon S3 Proprietary FM


Functions

Amazon ECS

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless and generative AI = Faster path to
production

1 2 3 4
Enhance reliability Reduced infrastructure Cost-effective Rapid development and
management pricing model composability

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Get started with serverless and generative AI

Build and scale generative AI applications


with Amazon Bedrock workshop

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Visit the Migrate. Modernize. Build. resource hub
Dive deeper into these resources:

• 6 steps to success with generative AI


• Understanding the costs of generative AI
• 5 ways a secure cloud infrastructure drives innovation
• 10 ways to optimize costs and innovate with AWS
• Containers and serverless recommendation guide https://tinyurl.com/migrate-modernize-build

• Running Windows workloads on AWS: Your questions answered


• Top 10 reasons to choose AWS for SAP

… and more!

Visit resource hub

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Training and Certification
Access 600+ free digital courses with AWS Skill Builder

Focus on the cloud skills and services that are most relevant to you across
30+ AWS solutions, including digital self-paced learning plans and ramp-up
guides

• Build your future in the AWS Cloud at your own pace


https://skillbuilder.aws/
• Advance your skills and knowledge with learning plans
• Validate your cloud expertise with AWS Certification

Learn your way skillbuilder.aws »

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you for attending AWS Innovate – Migrate. Modernize. Build.

We hope you found it interesting! A kind reminder to complete the survey.


Let us know what you thought of today’s event and how we can improve the event
experience for you in the future.

[email protected]

twitter.com/AWSCloud

facebook.com/AmazonWebServices

youtube.com/user/AmazonWebServices

linkedin.com/company/amazon-web-services

twitch.tv/aws

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

You might also like