High-Performance & Cost-Effective Model Deployment With Amazon SageMaker
High-Performance & Cost-Effective Model Deployment With Amazon SageMaker
AIM302
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Topic 1: Choosing the best inference option
• Introduction to Amazon SageMaker model deployment
• Overview of different inference options
• Simple guide to choose an inference option
Topic 3: Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Wide selection of infrastructures
70+ instance types with varying levels of compute
and memory to meet the needs of every use case
Cost-effective deployment
Multi-model/multi-container endpoints, serverless
inference, and elastic scaling
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker model deployment options
Online Batch
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Real-time inference
Properties Example use cases
Synchronous
Personalized
Key features recommendations
Optimize cost and utilization by deploying multiple
models/containers on an instance
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless inference
Properties Example use cases
Synchronous
Form processing
Key features
Pay only for duration of each inference request
No cost at idle
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Asynchronous inference
Properties Asynchronous Example use cases
Instance-based (supports CPU/GPU)
Known entity
Key features extraction
Built-in queue for requests
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Batch inference
Properties Example use cases
High throughput inference in batches
Predictive
Key features maintenance
Built-in features to split, filter, and join
structured data
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choosing model deployment options
Start
Does your workload Would it be helpful Does your workload Does your workload
need to return an to queue requests have intermittent have sustained
inference for each Yes due to longer No traffic patterns or No traffic and need
request to processing times or periods of lower and
your model? larger payloads? no traffic? consistent latency?
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Programmatically calling SageMaker
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS SDKs and SageMaker Python SDK
AWS SDKs SageMaker Python SDK
Abstraction Low-level API High-level API
Language support Java, C++, Go, JavaScript, .NET, Node.js, PHP, Python
Ruby, Python
AWS services supported Most AWS services Amazon SageMaker
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker model deployment
cost optimizations
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost optimizations
Optimize
Auto scaling Pick the right instance Auto scaling (can Choose the right
be zero) memory size
Pick the right instance
Pick the right instance
Use multiple models/containers
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Buy a SageMaker Savings Plan
• Reduce your costs by up to 64% with a Savings Plan
• 1- or 3-year term commitment to a consistent amount of usage ($/hour)
• Apply automatically to eligible SageMaker ML instance usages for
• SageMaker Studio Notebook
• SageMaker on-demand notebook instances
• SageMaker processing
• SageMaker Data Wrangler
• SageMaker training
• SageMaker real-time inference
• SageMaker batch transform
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improve utilization of real-time inference
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference recommender
• Run extensive load tests
• Get instance type recommendations
(based on throughput, latency, and cost) Inference recommender job
• Integrate with model registry Job types
• Review performance metrics from
SageMaker Studio
• Customize your load tests
• Fine-tune your model, model server, Advanced
Default
and containers
Custom load testing and
Preliminary
• Get detailed metrics from recommendations
granular control to
performance tuning
Amazon CloudWatch
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto scaling
Client application
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimize models
Better-performing models mean you
can run more on an instance over a
shorter duration
Automatically optimize models with
SageMaker Neo
this case
https://aws.amazon.com/blogs/machine-learning/increasing-performance-and-reducing-the-cost-of-
mxnet-inference-using-amazon-sagemaker-neo-and-amazon-elastic-inference/
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn in-demand AWS Cloud skills
Deepen your skills with digital Join the AWS Certified community
learning on demand and get exclusive benefits
Access new
Train now exam guides
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Mani Khanuja
@mani_Khanuja
@manikhanuja
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.