1 - Optimize Amazon SageMaker Deployment Strategies
1 - Optimize Amazon SageMaker Deployment Strategies
USE301
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
By the end of 2024,
75% of enterprises will
shift from piloting to
operationalizing AI
–Gartner
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker
Purpose-built tools so you can be 10x more productive
Access ML data Prepare data Build ML models Train and tune Deploy and
Amazon SageMaker
Connect to many Transform data to Optimized with ML model monitor results
Studio notebooks data sources such as browse data sources, 150+ popular Correct performance Create, automate,
Amazon S3, Apache explore metadata, open-source models problems in real time and manage
Spark, Amazon schemas, and write and frameworks such end-to-end ML
Redshift, CSV files, queries in popular as TensorFlow and workflows
and more languages PyTorch to improve
model quality
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why optimize model deployment
10%
Training
Predictions drive
complexity and cost Spend
in production
90%
Prediction
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to choose your inference architecture
CONSIDERATIONS
Business Engineering
Efficient Updates,
Variable workload Control Data drift
utilization AB testing
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deploying Models in SageMaker
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Batch Inference
SageMaker
Batch Transform
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Real-time Inference
real_time_endpoint = model.deploy(
SageMaker
Real-time initial_instance_count = 1,
instance_type = ”ml.c5.xlarge”, ...)
Inference
real_time_endpoint.predict(payload)
Autoscaling
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Async Inference
from sagemaker.async_inference import AsyncInferenceConfig
SageMaker
Asynchronous
Inference async_config = AsyncInferenceConfig(
output_path="s3://{s3_bucket}/{bucket_prefix}/output",
max_concurrent_invocations_per_instance=10,
notification_config = {
"SuccessTopic": sns_success_topic_arn,
"ErrorTopic": sns_error_topic_arn } )
Ideal for large payload up to 1GB
async_predictor = model.deploy(async_inference_config=async_config)
Longer processing timeout up to 15 min
async_predictor.predict_async(input_path=input_s3_path)
Autoscaling (down to 0 instance)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Serverless Inference
Preview from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(
SageMaker
memory_size_in_mb=4096,
Serverless max_concurrency=10
Inference )
serverless_predictor =
model.deploy(serverless_inference_config=serverless_config)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost Considerations
HOSTING INDIVIDUAL ENDPOINTS
EndpointName='endpoint-05'
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-model Endpoint
COST-SAVING OPPORTUNITY
SageMaker
Multi-Model
Endpoint
TargetModel=
'model-007.tar.gz'
'model-013.tar.gz'
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-model Endpoint
COST-SAVING OPPORTUNITY
SageMaker
Multi-Model
Endpoint container = {
'Image’: mme-supported-image,
'ModelDataUrl': 's3://my-bucket/folder-of-tar-gz’,
'Mode': 'MultiModel’}
sm.create_model(
Containers = [container], ...)
smrt.invoke_endpoint(
Direct invocation to target model
EndpointName = endpoint_name,
TargetModel = 'model-007.tar.gz’,
Improves resource utilization Body = body, ...)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-container
Endpoint
COST-SAVING OPPORTUNITY
SageMaker
multi-container
endpoint
TargetContainerHostname=
‘Container-05'
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-container
Endpoint
COST-SAVING OPPORTUNITY
SageMaker
multi-container container1 = {
endpoint 'Image': container,
'ContainerHostname': 'firstContainer’}; ...
sm.create_model(
InferenceExecutionConfig = {'Mode': 'Direct’},
Containers = [container1, container2, ...], ...)
smrt.invoke_endpoint(
Direct or serial invocation EndpointName = endpoint_name,
TargetContainerHostname = 'firstContainer’,
No cold start vs. Multi-Model Endpoint Body = body, ...)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker ML instance options
BALANCING BETWEEN COST AND PERFORMANCE
C5 P3 G4 Inf1
Low throughput, low cost, High throughput, and Inf1: High throughput,
most flexible low-latency access to CUDA high performance,
and lowest cost in the cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Endpoint Load testing
KNOW YOUR ENDPOINTS
Availability Zone 1
ML instance ML instance
ML instance
Availability Zone 2
ML instance ML instance
Artificial Endpoint Elastic Load ML instance
requests Balancing
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to choose your Deployment Strategy
A DECISION TREE
No (daily, hourly, weekly)
Batch Transform
Live No
Predictions? SageMaker
Yes Serverless Inference
Can > 6 MB
Yes
Tolerate Payload or Yes
Cold Start? > 60 sec Fluctuating
traffic?
No SageMaker
async inference
Multiple
models/ No
Load testing Yes
containers?
SageMaker to right-size
Yes endpoint
Yes
Single ML
framework Auto-scaling
SageMaker
multi-model
endpoint
No. Multiple containers
SageMaker
multi-container
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. endpoint
SageMaker Model Monitor
OPTIMIZING MODEL ACCURACY
SageMaker Clarify
79%
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Endpoint A/B testing
USING PRODUCTION VARIANTS
sm.update_endpoint_weights_and_capacities(
EndpointName=endpoint_name,
DesiredWeightsAndCapacities=[
{
"DesiredWeight": 0.1,
"VariantName": ”new-model”
},
{
"DesiredWeight": 0.9, Elastic Load
Balancing
"VariantName": ”existing-model”
}
]
)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo – Inference Recommender
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Docebo® - SageMaker Journey
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Docebo® - Introduction
Docebo® started with an
idea to create a learning
technology that would
make a real impact.
In October of 2019, Docebo®
Since then Docebo® has successfully launched their first
grown into a global IPO on the Toronto Stock
company with 6 offices all Exchange (TSX: DCBO), followed
over the world, 400+ by the launch of NASDAQ
employees and 2000+ (NASDAQ: DCBO) in December
customers. of 2020.
https://www.docebo.com/
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Docebo® Shape
Watch the magic happen Share your content
Shape takes your source and You’re ready to publish!
turns into a short content Choose your content format
pill with AI. and channel.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How AI behind Shape works
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We had some problems…
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Computer Vision
Document Layout Analysis
• We found a pre-trained Computer Vision
model for object instance segmentation:
this approach efficiently detects objects in
an image
• We apply a set of rule to identify the block
type (paragraphs, titles, images, etc.),
resolve colliding boxes, remove noise and
to stabilize a reading order
• Once objects are detected, we pass them to
Amazon Textract to retrieve the text from
each block
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deploy the model on SageMaker
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model Serving using
AWS CDK
• This architecture is designed to provide a
realtime endpoint using the Computer Vision
model.
• Application Modernization approach was
applied to develop and operate agile and
stable services.
• We deployed these cloud resources using CDK
CLI
• Each stack has a single responsibility and the
code is splitted into different folder so that
each developer, computer scientist or architect
know where is his own code
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Project structure
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference with SageMaker Endpoint
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choosing the instance type
Cost in Dollars
12
The models were deployed on instances with 10
different AI accelerators for inference, namely 8
CPU (ml.c5.xlarge), GPU (ml.g4dn.xlarge), and 6
AWS Inferentia chips (ml.inf1.xlarge). 4
2
0
Tests conducted by concurrent invocations of c5 g4dn inf1
the model with one input image of an A4 page Cost in Dollars
sizes showed that AWS Inferentia instances
had the highest throughput per second, the # of Transactions per hour (in millions)
lowest latency, and the highest cost-efficiency 1,4
1,2
1
0,8
0,6
0,4
0,2
0
c5 g4dn inf1
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
# of Transactions per hour (in millions)
Multiple models deployment for A/B Testing
"ModelServing": {
"Name": "ModelServingStack",
"ModelList": [
{
"ModelName": "Dla-A-20210512",
"ModelS3Key": "models/a/dla",
"ModelDockerImage": »...ecr.eu-west-1.amazonaws.com/pytorch-inference:...",
...
},
{
"ModelName": "Dla-B",
"ModelS3Key": "models/b/dla",
"ModelDockerImage": »...ecr.eu-west-1.amazonaws.com/pytorch-inference:...",
...
},
],
}
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic Scaling
"ModelServing": {
"Name": "ModelServingStack",
"ModelList": [
{
"ModelName": "Dla-A-20210512",
...
...
},
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Run the deployment (via CI/CD)
npm run model:archive
...
tar -zcvf $MODEL_FILE --directory=$MODEL_PATH/src .
mv $MODEL_FILE $MODEL_PATH/$MODEL_FILE
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we learned
• AWS CDK is the best tool for developers to create the infrastructure
and deploy it with the full control of the resources involved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s next
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we covered today
Model monitoring
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn in-demand AWS Cloud skills
Deepen your skills with digital Join the AWS Certified community
learning on demand and get exclusive benefits
Access new
Train now exam guides
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete
the session survey
in the mobile app
Android iOS
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.