0% found this document useful (0 votes)

21 views21 pages

High-Performance & Cost-Effective Model Deployment With Amazon SageMaker

The document discusses Amazon SageMaker and its capabilities for deploying machine learning models for inference at scale. It provides an overview of SageMaker's different inference options including real-time, asynchronous, batch, and serverless inference. It also discusses features such as automatic deployment recommendations, cost optimization strategies, and integration with MLOps workflows. Additionally, the document provides a simple guide to help choose the best inference option based on factors like payload size, processing time needs, and traffic patterns.

Uploaded by

DuongHang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views21 pages

High-Performance & Cost-Effective Model Deployment With Amazon SageMaker

Uploaded by

DuongHang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

T O R O N T O | J U N E 2 2 – 2 3 , 2 0 2 2

AIM302

High-performance & cost-effective

model deployment with
Amazon SageMaker
Mani Khanuja
Sr. AI/ML Specialist Solutions Architect – Amazon SageMaker
AWS

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Topic 1: Choosing the best inference option
• Introduction to Amazon SageMaker model deployment
• Overview of different inference options
• Simple guide to choose an inference option

Topic 2: Cost optimization options

• SageMaker Savings Plan
• Improving utilization
• Picking the right instance
• Auto scaling
• Optimize models

Topic 3: Demo

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Wide selection of infrastructures
70+ instance types with varying levels of compute
and memory to meet the needs of every use case

Automatic deployment recommendations

Optimal instance type/count and container
parameters and fully managed load testing

Breadth of deployment options

Deploy ML models Real-time, asynchronous, batch, and serverless endpoints

Fully managed deployment Fully managed deployment strategies

for inference at scale Canary and linear traffic shifting modes with built-in
safeguards such as auto-rollbacks

Cost-effective deployment
Multi-model/multi-container endpoints, serverless
inference, and elastic scaling

Built-in integration for MLOps

ML workflows, monitor models, CI/CD, lineage
tracking, and model registry

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker model deployment options
Online Batch

An inference for each request Inference on a set of data

SageMaker offers SageMaker offers batch inference

• Real-time inference
• Serverless inference
• Asynchronous inference

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Real-time inference
Properties Example use cases
Synchronous

Instance-based (supports CPU/GPU)

Ad serving
Low latency

Payload size <6 MB, request timeout – 60 seconds

Personalized
Key features recommendations
Optimize cost and utilization by deploying multiple
models/containers on an instance

Flight changes with A/B testing

Fraud detection
Safely deploy changes with blue/green deployments

Capture model inputs and outputs for later use

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless inference
Properties Example use cases
Synchronous

No need to pick and choose instances

Cost effective for intermittent/unpredictable traffic Analyze data

from documents
Good for workloads that tolerate higher p99 latency

Payload size <4 MB, request timeout – 60 seconds

Form processing
Key features
Pay only for duration of each inference request

No cost at idle

Automatic and fast scaling Chatbots

Similar deploy/invoke model to real-time inference

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Asynchronous inference
Properties Asynchronous Example use cases
Instance-based (supports CPU/GPU)

Good for large payloads (up to 1 GB) of unstructured Image synthesis

data (images, videos, text, etc.)

Suitable when processing time is the order of

minutes (up to 15 minutes)

Known entity
Key features extraction
Built-in queue for requests

Configure auto scaling for queue drain rate

Scale down to zero to optimize for costs Anomaly detection

with time-series data
Safely deploy changes with blue/green deployments

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Batch inference
Properties Example use cases
High throughput inference in batches

Instance-based (supports CPU/GPU)

Propensity modeling
Good for processing gigabytes of data for all
data types

Payload size in GBs and processing time in days

Predictive
Key features maintenance
Built-in features to split, filter, and join
structured data

Automatic distributed processing of structured

tabular data for high performance Churn prediction
Pay only for the duration of the job

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choosing model deployment options
Start
Does your workload Would it be helpful Does your workload Does your workload
need to return an to queue requests have intermittent have sustained
inference for each Yes due to longer No traffic patterns or No traffic and need
request to processing times or periods of lower and
your model? larger payloads? no traffic? consistent latency?

No, I can wait until all

Yes Yes Yes
requests are processed

Batch Async Serverless Real-time

Payload size: GBs Payload size: 1 GB Payload size: 4 MB Payload size: 6 MB

Runtime: days Runtime: 15 minutes Runtime: 60 seconds Runtime: 60 seconds

• AWS Command Line Interface (AWS CLI)

• SageMaker REST APIs
• AWS CloudFormation
• AWS Cloud Development Kit (AWS CDK)
• AWS SDKs
• SageMaker Python SDK

Language support Java, C++, Go, JavaScript, .NET, Node.js, PHP, Python
Ruby, Python
AWS services supported Most AWS services Amazon SageMaker

Persona DevOps, ML engineers Data scientists

Size Lightweight (~67 MB) ~250 MB*
High-level features • More verbose but more transparent • Features like hiding Docker images,
• Pre-installed in AWS Lambda copying scripts from local to Amazon
S3, creating the model and endpoint
configurations without you noticing
• Native support for sync/async API call
• Simpler request/response schema
• Less code
Code complexity Medium Low
* The size may be lower with SageMaker SDK v2

SageMaker Savings Plans

Optimize

Real-time Batch Asynchronous Serverless

Instance-based Instance-based Instance-based Serverless

Auto scaling Pick the right instance Auto scaling (can Choose the right
be zero) memory size
Pick the right instance
Pick the right instance
Use multiple models/containers

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Buy a SageMaker Savings Plan
• Reduce your costs by up to 64% with a Savings Plan
• 1- or 3-year term commitment to a consistent amount of usage ($/hour)
• Apply automatically to eligible SageMaker ML instance usages for
• SageMaker Studio Notebook
• SageMaker on-demand notebook instances
• SageMaker processing
• SageMaker Data Wrangler
• SageMaker training
• SageMaker real-time inference
• SageMaker batch transform

Multi-model endpoints Multi-container endpoints Serial inference pipeline

• Deploy thousands of models • Up to 15 different containers • Chain 2–15 containers
• Works best when models are • Containers can be directly • Reuse the data transformers
of similar size and latency invoked developed for training models
• Models must be able to run in • Works best when containers • Low latency: All containers run
the same container exhibit similar usage and on the same underlying
performance characteristics Amazon EC2 instance
• Dynamic model loading
• Always in memory • Pipeline is immutable

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference recommender
• Run extensive load tests
• Get instance type recommendations
(based on throughput, latency, and cost) Inference recommender job
• Integrate with model registry Job types
• Review performance metrics from
SageMaker Studio
• Customize your load tests
• Fine-tune your model, model server, Advanced
Default
and containers
Custom load testing and
Preliminary
• Get detailed metrics from recommendations
granular control to
performance tuning
Amazon CloudWatch

• Distributes your instances across Inference Inference

request result
Availability Zones
• Dynamically adjusts the number Secure endpoint
of instances
• No traffic interruption while instances are
being added to or removed {ProductionVariants}

• Scale-in and scale-out options suitable for

different traffic patterns
• Support for predefined and custom
Availability Availability Availability
metrics for auto scaling policy Zone 1 Zone 2 Zone 3
• Support for cooldown period for scaling in
Automatic scaling
and scaling out

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimize models
Better-performing models mean you
can run more on an instance over a
shorter duration
Automatically optimize models with
SageMaker Neo
this case

https://aws.amazon.com/blogs/machine-learning/increasing-performance-and-reducing-the-cost-of-
mxnet-inference-using-amazon-sagemaker-neo-and-amazon-elastic-inference/

AWS Skill Builder AWS Certifications

Access 500+ free digital courses Earn an industry-recognized
and Learning Plans credential

Explore resources with a variety Receive Foundational,

of skill levels and 16+ languages Associate, Professional,
to meet your learning needs and Specialty certifications

Deepen your skills with digital Join the AWS Certified community
learning on demand and get exclusive benefits

Access new
Train now exam guides

@manikhanuja

Amazon SageMaker
No ratings yet
Amazon SageMaker
1,055 pages
Sagemaker DG
No ratings yet
Sagemaker DG
3,299 pages
Mla C01
No ratings yet
Mla C01
24 pages
Sagemaker DG
No ratings yet
Sagemaker DG
3,324 pages
Amazon SageMaker First Call Deck
No ratings yet
Amazon SageMaker First Call Deck
191 pages
Aws Sagemaker
No ratings yet
Aws Sagemaker
18 pages
Automatically Build ML Models On Amazon SageMaker Autopilot - Tapan Hoskeri
No ratings yet
Automatically Build ML Models On Amazon SageMaker Autopilot - Tapan Hoskeri
26 pages
File 22
No ratings yet
File 22
37 pages
AI Practitioner Study Guide
No ratings yet
AI Practitioner Study Guide
15 pages
AWS SageMaker Cheatsheet
No ratings yet
AWS SageMaker Cheatsheet
2 pages
Cognizant Response To AZ CISS RFP-112806-Word
100% (4)
Cognizant Response To AZ CISS RFP-112806-Word
241 pages
Introduction To AWS SageMaker
100% (1)
Introduction To AWS SageMaker
52 pages
GRADE 4 TERM 1 TEST MATHEMATICS MEMO (Final)
100% (4)
GRADE 4 TERM 1 TEST MATHEMATICS MEMO (Final)
5 pages
AIM301 Deep Learning With TensorFlow PyTorch and MXNet On AWS
No ratings yet
AIM301 Deep Learning With TensorFlow PyTorch and MXNet On AWS
29 pages
Cloudflare 2025 Investor Day Presentation
No ratings yet
Cloudflare 2025 Investor Day Presentation
117 pages
Automate Machine Learning - Aparna Elangovan
No ratings yet
Automate Machine Learning - Aparna Elangovan
26 pages
Sagemaker-V1 18 0
No ratings yet
Sagemaker-V1 18 0
164 pages
AIM208 Idea To Production On Amazon SageMaker With Thomson Reuters
No ratings yet
AIM208 Idea To Production On Amazon SageMaker With Thomson Reuters
51 pages
NEW LAUNCH REPEAT 1 Introducing Amazon SageMaker Studio, The First Full IDE For ML AIM214-R1
No ratings yet
NEW LAUNCH REPEAT 1 Introducing Amazon SageMaker Studio, The First Full IDE For ML AIM214-R1
44 pages
Sagemaker Pyspark
No ratings yet
Sagemaker Pyspark
49 pages
1 - Optimize Amazon SageMaker Deployment Strategies
No ratings yet
1 - Optimize Amazon SageMaker Deployment Strategies
45 pages
AWS ML Notes - Domain 3 - Deployment
No ratings yet
AWS ML Notes - Domain 3 - Deployment
30 pages
Machine Learning in Practice 1111656172 180813160029
No ratings yet
Machine Learning in Practice 1111656172 180813160029
50 pages
2021 Reinvent Attendee Guide ML OD
No ratings yet
2021 Reinvent Attendee Guide ML OD
47 pages
AWS ML Notes - Domain 2 - Data Transformation
No ratings yet
AWS ML Notes - Domain 2 - Data Transformation
32 pages
Deepdive On Amazon Sagemaker and Aws Reinvent New Features
No ratings yet
Deepdive On Amazon Sagemaker and Aws Reinvent New Features
31 pages
Aiya Session 4
No ratings yet
Aiya Session 4
32 pages
SAS301 - Multi Tenant Meets ML Building ML Based SaaS Environments
No ratings yet
SAS301 - Multi Tenant Meets ML Building ML Based SaaS Environments
23 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
Jumpstart Your Machine Learning Journey With Amazon Sagemaker and Facilitate Your Portfolio Management
No ratings yet
Jumpstart Your Machine Learning Journey With Amazon Sagemaker and Facilitate Your Portfolio Management
27 pages
AWS SageMaker Custom Algorithms and Frameworks
No ratings yet
AWS SageMaker Custom Algorithms and Frameworks
19 pages
PDF Handout Implement MLOps Practices With Amazon SageMaker
No ratings yet
PDF Handout Implement MLOps Practices With Amazon SageMaker
24 pages
MLOPs PPT
No ratings yet
MLOPs PPT
26 pages
Aws Sagemaker Pricing
No ratings yet
Aws Sagemaker Pricing
18 pages
AWS ML Notes - Domain Misc
No ratings yet
AWS ML Notes - Domain Misc
15 pages
Lagrange's Interpolation Method PDF
No ratings yet
Lagrange's Interpolation Method PDF
22 pages
Lab2 - Train, Tune and Deploy XGBoost
No ratings yet
Lab2 - Train, Tune and Deploy XGBoost
13 pages
Build, Train, and Deploy Machine Learning Models On Aws With Amazon Sagemaker
No ratings yet
Build, Train, and Deploy Machine Learning Models On Aws With Amazon Sagemaker
21 pages
Lecture Notes - DL Deployment
No ratings yet
Lecture Notes - DL Deployment
13 pages
Accelerating Machine Learning Innovation Through Security
No ratings yet
Accelerating Machine Learning Innovation Through Security
12 pages
AWS ML Exam Notes - Important
No ratings yet
AWS ML Exam Notes - Important
20 pages
Grade 10 Notes Printed - 05 - 2010 - Logic Gates
No ratings yet
Grade 10 Notes Printed - 05 - 2010 - Logic Gates
4 pages
File 20
No ratings yet
File 20
26 pages
Amazon SageMaker Guide - FAQs
No ratings yet
Amazon SageMaker Guide - FAQs
9 pages
Ambo University Woliso Campus
100% (1)
Ambo University Woliso Campus
6 pages
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
No ratings yet
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
11 pages
Module Preprocesing - MLPipeline
No ratings yet
Module Preprocesing - MLPipeline
7 pages
ML Ops Notes
No ratings yet
ML Ops Notes
5 pages
File 38
No ratings yet
File 38
9 pages
Aws Exp 11
No ratings yet
Aws Exp 11
6 pages
AWS Assignment
No ratings yet
AWS Assignment
7 pages
Amazon SageMaker
No ratings yet
Amazon SageMaker
5 pages
Cloud 3
No ratings yet
Cloud 3
4 pages
Aws Analytics Aiml
No ratings yet
Aws Analytics Aiml
13 pages
SageMaker Overview
No ratings yet
SageMaker Overview
4 pages
Cloud 3
No ratings yet
Cloud 3
4 pages
AWS SageMaker With Python
No ratings yet
AWS SageMaker With Python
6 pages
Leveraging AI For Sustainable Smart Cities and Digital Government
No ratings yet
Leveraging AI For Sustainable Smart Cities and Digital Government
31 pages
Sagemaker Studio - EMR - Glue - Macarious
No ratings yet
Sagemaker Studio - EMR - Glue - Macarious
2 pages
AWS Machine Learning Engineer Nanodegree Program Syllabus
No ratings yet
AWS Machine Learning Engineer Nanodegree Program Syllabus
16 pages
Chang Si Ju
No ratings yet
Chang Si Ju
2 pages
Laptop Bill
No ratings yet
Laptop Bill
7 pages
Ideas For Topics of Formal Writing Oxford
No ratings yet
Ideas For Topics of Formal Writing Oxford
90 pages
OUDREY THOMAS ASSIGNMENT AWS - Oudrey
No ratings yet
OUDREY THOMAS ASSIGNMENT AWS - Oudrey
2 pages
Amazon SageMaker Pricing - Amazon Web Services (AWS)
No ratings yet
Amazon SageMaker Pricing - Amazon Web Services (AWS)
1 page
Deploy Algo
No ratings yet
Deploy Algo
1 page
Swat Modflow Tutorial
No ratings yet
Swat Modflow Tutorial
11 pages
The TCO of Amazon SageMaker PDF
No ratings yet
The TCO of Amazon SageMaker PDF
20 pages
Telit 3g Modules at Commands Reference Guide r9
No ratings yet
Telit 3g Modules at Commands Reference Guide r9
537 pages
X-Mabini's TLE-CSS Reviewer
No ratings yet
X-Mabini's TLE-CSS Reviewer
5 pages
Unit 2 VR Modeling
No ratings yet
Unit 2 VR Modeling
23 pages
How To Make A Survey in Thesis
100% (1)
How To Make A Survey in Thesis
6 pages
Code:: Program To Implement RMI
No ratings yet
Code:: Program To Implement RMI
4 pages
HB Ac2 Acv2 Ethernet Ip Geraeteintegration en
No ratings yet
HB Ac2 Acv2 Ethernet Ip Geraeteintegration en
52 pages
Scout Engineering 3
No ratings yet
Scout Engineering 3
14 pages
DS - Fujitsu PRIMERGY TX1310
No ratings yet
DS - Fujitsu PRIMERGY TX1310
7 pages
Manifest NonUFSFiles Win64
No ratings yet
Manifest NonUFSFiles Win64
75 pages
(Test 1) Elektrostatyka (B) PDF
No ratings yet
(Test 1) Elektrostatyka (B) PDF
1 page
Test Phase and Pre-Final Viva Important Questions
No ratings yet
Test Phase and Pre-Final Viva Important Questions
10 pages
Infographic CV Siebe Bridoux
No ratings yet
Infographic CV Siebe Bridoux
1 page
Dynamic Rule-Based Tags
No ratings yet
Dynamic Rule-Based Tags
16 pages
100 Baby Challenge Rules: The Sims 2
No ratings yet
100 Baby Challenge Rules: The Sims 2
7 pages
2010A IP Questions
No ratings yet
2010A IP Questions
47 pages
Window On Humanity: A Concise Introduction To Anthropology, Ninth 9 Edition Conrad Phillip Kottak
No ratings yet
Window On Humanity: A Concise Introduction To Anthropology, Ninth 9 Edition Conrad Phillip Kottak
9 pages
Retire The Threetier Applica 308298
No ratings yet
Retire The Threetier Applica 308298
17 pages
Ofdm Basics Lte
No ratings yet
Ofdm Basics Lte
18 pages
Email-InFOSYS TRAINING - Students Schedule - 21 June-03 Jul 2025
No ratings yet
Email-InFOSYS TRAINING - Students Schedule - 21 June-03 Jul 2025
5 pages
Mock Insem Q.P2023-24
No ratings yet
Mock Insem Q.P2023-24
1 page
Gowtham P 4
No ratings yet
Gowtham P 4
2 pages

High-Performance & Cost-Effective Model Deployment With Amazon SageMaker

Uploaded by

High-Performance & Cost-Effective Model Deployment With Amazon SageMaker

Uploaded by

T O R O N T O | J U N E 2 2 – 2 3 , 2 0 2 2

High-performance & cost-effective

Topic 2: Cost optimization options

Automatic deployment recommendations

Breadth of deployment options

Deploy ML models Real-time, asynchronous, batch, and serverless endpoints

Fully managed deployment Fully managed deployment strategies

Built-in integration for MLOps

An inference for each request Inference on a set of data

SageMaker offers SageMaker offers batch inference

Instance-based (supports CPU/GPU)

Payload size <6 MB, request timeout – 60 seconds

Flight changes with A/B testing

Capture model inputs and outputs for later use

No need to pick and choose instances

Cost effective for intermittent/unpredictable traffic Analyze data

Payload size <4 MB, request timeout – 60 seconds

Automatic and fast scaling Chatbots

Similar deploy/invoke model to real-time inference

Good for large payloads (up to 1 GB) of unstructured Image synthesis

Suitable when processing time is the order of

Configure auto scaling for queue drain rate

Scale down to zero to optimize for costs Anomaly detection

Instance-based (supports CPU/GPU)

Payload size in GBs and processing time in days

Automatic distributed processing of structured

No, I can wait until all

Batch Async Serverless Real-time

Payload size: GBs Payload size: 1 GB Payload size: 4 MB Payload size: 6 MB

• AWS Command Line Interface (AWS CLI)

Persona DevOps, ML engineers Data scientists

SageMaker Savings Plans

Real-time Batch Asynchronous Serverless

Multi-model endpoints Multi-container endpoints Serial inference pipeline

• Distributes your instances across Inference Inference

• Scale-in and scale-out options suitable for

AWS Skill Builder AWS Certifications

Explore resources with a variety Receive Foundational,

You might also like