0% found this document useful (0 votes)
109 views45 pages

1 - Optimize Amazon SageMaker Deployment Strategies

The document discusses various strategies for deploying machine learning models with Amazon SageMaker. It describes SageMaker's built-in capabilities for batch inference, real-time inference, asynchronous inference, serverless inference, and multi-model endpoints. The key considerations for choosing a deployment architecture include business needs around SLAs, cost, compute resources, data collection frequency, and model updates, as well as engineering factors such as infrastructure management and data payload sizes. Deploying multiple models to a single endpoint can help reduce hosting costs compared to individual endpoints.

Uploaded by

DuongHang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views45 pages

1 - Optimize Amazon SageMaker Deployment Strategies

The document discusses various strategies for deploying machine learning models with Amazon SageMaker. It describes SageMaker's built-in capabilities for batch inference, real-time inference, asynchronous inference, serverless inference, and multi-model endpoints. The key considerations for choosing a deployment architecture include business needs around SLAs, cost, compute resources, data collection frequency, and model updates, as well as engineering factors such as infrastructure management and data payload sizes. Deploying multiple models to a single endpoint can help reduce hosting costs compared to individual endpoints.

Uploaded by

DuongHang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

B R U S S E L S | 3 1 M A R C H 2 0 2 2

USE301

Optimize Amazon SageMaker


deployment strategies

Will Badr Larissa Becka Alberto Galimberti


Principal AI/ML Specialist Sr. Specialist BDM AIML Product Manager
AWS AWS Docebo

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
By the end of 2024,
75% of enterprises will
shift from piloting to
operationalizing AI
–Gartner

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker
Purpose-built tools so you can be 10x more productive

Access ML data Prepare data Build ML models Train and tune Deploy and
Amazon SageMaker
Connect to many Transform data to Optimized with ML model monitor results
Studio notebooks data sources such as browse data sources, 150+ popular Correct performance Create, automate,
Amazon S3, Apache explore metadata, open-source models problems in real time and manage
Spark, Amazon schemas, and write and frameworks such end-to-end ML
Redshift, CSV files, queries in popular as TensorFlow and workflows
and more languages PyTorch to improve
model quality

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why optimize model deployment
10%
Training

Predictions drive
complexity and cost Spend
in production

90%
Prediction

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to choose your inference architecture
CONSIDERATIONS

Business Engineering

SLA Cost Compute Data Models

CPU, GPU, AI Collection Multiple


Latency Budget
accelerator frequency models

Infrastructure Size, runtime,


Throughput Monitoring Payload size
management frameworks

Efficient Updates,
Variable workload Control Data drift
utilization AB testing

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deploying Models in SageMaker

Easy deployment Online and Fully managed


of ML models offline scoring infrastructure

predictor = model.deploy( transformer = model.transformer(


initial_instance_count = 1, instance_count = 1,
instance_type = 'ml.c5.4xlarge') instance_type = 'ml.m5.xlarge’)
transformer.transform(
prediction = predictor.predict(x_test) test_data_s3,
content_type = 'text/csv')

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Batch Inference

SageMaker
Batch Transform

Fully managed mini-batching for large data

Pay only for what you use

Suitable for periodic arrival of large data

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Real-time Inference

real_time_endpoint = model.deploy(
SageMaker
Real-time initial_instance_count = 1,
instance_type = ”ml.c5.xlarge”, ...)
Inference

real_time_endpoint.predict(payload)

Create a long-running microservice

Instant response for payload up to 6MB


Elastic Load
Balancing
Accessible from an external application

Autoscaling

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Async Inference
from sagemaker.async_inference import AsyncInferenceConfig

SageMaker
Asynchronous
Inference async_config = AsyncInferenceConfig(
output_path="s3://{s3_bucket}/{bucket_prefix}/output",
max_concurrent_invocations_per_instance=10,
notification_config = {
"SuccessTopic": sns_success_topic_arn,
"ErrorTopic": sns_error_topic_arn } )
Ideal for large payload up to 1GB

async_predictor = model.deploy(async_inference_config=async_config)
Longer processing timeout up to 15 min
async_predictor.predict_async(input_path=input_s3_path)
Autoscaling (down to 0 instance)

Suitable for CV/NLP use cases

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Serverless Inference
Preview from sagemaker.serverless import ServerlessInferenceConfig

serverless_config = ServerlessInferenceConfig(
SageMaker
memory_size_in_mb=4096,
Serverless max_concurrency=10
Inference )

serverless_predictor =
model.deploy(serverless_inference_config=serverless_config)

Ideal for unpredictable prediction traffic


serverless_predictor.predict(data)

Workload tolerable to cold start

Autoscaling (down to 0 instance)

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost Considerations
HOSTING INDIVIDUAL ENDPOINTS

EndpointName='endpoint-05'

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-model Endpoint
COST-SAVING OPPORTUNITY

SageMaker
Multi-Model
Endpoint

TargetModel=
'model-007.tar.gz'
'model-013.tar.gz'

Host multiple models in one container

Direct invocation to target model

Improves resource utilization

Dynamic loading model from Amazon S3

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-model Endpoint
COST-SAVING OPPORTUNITY

SageMaker
Multi-Model
Endpoint container = {
'Image’: mme-supported-image,
'ModelDataUrl': 's3://my-bucket/folder-of-tar-gz’,
'Mode': 'MultiModel’}

sm.create_model(
Containers = [container], ...)

Host multiple models in one container sm.create_endpoint_config();sm.create_endpoint()

smrt.invoke_endpoint(
Direct invocation to target model
EndpointName = endpoint_name,
TargetModel = 'model-007.tar.gz’,
Improves resource utilization Body = body, ...)

Dynamic loading model from Amazon S3

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-container
Endpoint
COST-SAVING OPPORTUNITY

SageMaker
multi-container
endpoint

TargetContainerHostname=
‘Container-05'

Host up to 15 distinct containers

Direct or serial invocation

No cold start vs. Multi-Model Endpoint

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker Deployment – Multi-container
Endpoint
COST-SAVING OPPORTUNITY

SageMaker
multi-container container1 = {
endpoint 'Image': container,
'ContainerHostname': 'firstContainer’}; ...

sm.create_model(
InferenceExecutionConfig = {'Mode': 'Direct’},
Containers = [container1, container2, ...], ...)

Host up to 15 distinct containers sm.create_endpoint_config();sm.create_endpoint()

smrt.invoke_endpoint(
Direct or serial invocation EndpointName = endpoint_name,
TargetContainerHostname = 'firstContainer’,
No cold start vs. Multi-Model Endpoint Body = body, ...)

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SageMaker ML instance options
BALANCING BETWEEN COST AND PERFORMANCE

CPU INSTANCES GPU INSTANCES CUSTOM CHIP

C5 P3 G4 Inf1

Low throughput, low cost, High throughput, and Inf1: High throughput,
most flexible low-latency access to CUDA high performance,
and lowest cost in the cloud

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Endpoint Load testing
KNOW YOUR ENDPOINTS

Amazon SageMaker endpoint

Amazon CloudWatch Auto-scaling group

Availability Zone 1
ML instance ML instance
ML instance

Availability Zone 2

ML instance ML instance
Artificial Endpoint Elastic Load ML instance

requests Balancing

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to choose your Deployment Strategy
A DECISION TREE
No (daily, hourly, weekly)

Batch Transform
Live No
Predictions? SageMaker
Yes Serverless Inference

Can > 6 MB
Yes
Tolerate Payload or Yes
Cold Start? > 60 sec Fluctuating
traffic?
No SageMaker
async inference

Multiple
models/ No
Load testing Yes
containers?
SageMaker to right-size
Yes endpoint
Yes
Single ML
framework Auto-scaling
SageMaker
multi-model
endpoint
No. Multiple containers
SageMaker
multi-container
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. endpoint
SageMaker Model Monitor
OPTIMIZING MODEL ACCURACY

SageMaker Clarify

79%

Model quality drift Data drift Feature importance


drift & data bias

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Endpoint A/B testing
USING PRODUCTION VARIANTS

sm.update_endpoint_weights_and_capacities(
EndpointName=endpoint_name,
DesiredWeightsAndCapacities=[
{
"DesiredWeight": 0.1,
"VariantName": ”new-model”
},
{
"DesiredWeight": 0.9, Elastic Load
Balancing
"VariantName": ”existing-model”
}
]
)

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo – Inference Recommender

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Docebo® - SageMaker Journey

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Docebo® - Introduction
Docebo® started with an
idea to create a learning
technology that would
make a real impact.
In October of 2019, Docebo®
Since then Docebo® has successfully launched their first
grown into a global IPO on the Toronto Stock
company with 6 offices all Exchange (TSX: DCBO), followed
over the world, 400+ by the launch of NASDAQ
employees and 2000+ (NASDAQ: DCBO) in December
customers. of 2020.

The Docebo® learning Suite is a


full suite of enterprise learning
solutions to create and manage
engaging content, deliver
training to customers, partners,
or employees, and close the
loop on your learning lifecycle
by understanding how learning
impacts your people and your
business.

https://www.docebo.com/
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Docebo® Shape
Watch the magic happen Share your content
Shape takes your source and You’re ready to publish!
turns into a short content Choose your content format
pill with AI. and channel.

Plug in sources Add the final touches Update your content


Add in your source material, Optionally, put a bow on Time to refresh your pill?
like an online article, a lecture your content by making Awesome, create a new
or a video, from your internal tweaks to the text, audio,
knowledge library or an version and follow the
video or other elements of same process, sharing a
external source. your content. new version within
minutes.

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How AI behind Shape works

• A Step Function orchestrates the


models that contribute to create
a content automatically.
• Each step calls an endpoint that
is – in a Kubernetes pod.
• The call is asynchronous, when
the process is complete, the AI
endpoint call a callback URL that
will resume the execution of the
State Machine

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We had some problems…

• ”Text extraction” service was not reliable


§ The quality of the output was not good: no structure, flat texts, errors on some
type of sources
§ Monitoring was very lacking
§ Performance was quite good, but the service was not scalable
§ The in-house solution we used for deployment was not reliable

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Computer Vision
Document Layout Analysis
• We found a pre-trained Computer Vision
model for object instance segmentation:
this approach efficiently detects objects in
an image
• We apply a set of rule to identify the block
type (paragraphs, titles, images, etc.),
resolve colliding boxes, remove noise and
to stabilize a reading order
• Once objects are detected, we pass them to
Amazon Textract to retrieve the text from
each block

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deploy the model on SageMaker

Now we had a new working model, but we needed a way to deploy it


to start running the inferences
§ We had few experiences in deploying a new ML model
§ We had a very few time to create and validate a deployment strategy
§ We needed a solution that could be monitored and managed easily

à That’s why we opted for Amazon SageMaker

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model Serving using
AWS CDK
• This architecture is designed to provide a
realtime endpoint using the Computer Vision
model.
• Application Modernization approach was
applied to develop and operate agile and
stable services.
• We deployed these cloud resources using CDK
CLI
• Each stack has a single responsibility and the
code is splitted into different folder so that
each developer, computer scientist or architect
know where is his own code

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Project structure

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference with SageMaker Endpoint

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choosing the instance type
Cost in Dollars
12
The models were deployed on instances with 10
different AI accelerators for inference, namely 8
CPU (ml.c5.xlarge), GPU (ml.g4dn.xlarge), and 6
AWS Inferentia chips (ml.inf1.xlarge). 4
2
0
Tests conducted by concurrent invocations of c5 g4dn inf1
the model with one input image of an A4 page Cost in Dollars
sizes showed that AWS Inferentia instances
had the highest throughput per second, the # of Transactions per hour (in millions)
lowest latency, and the highest cost-efficiency 1,4
1,2
1
0,8
0,6
0,4
0,2
0
c5 g4dn inf1

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
# of Transactions per hour (in millions)
Multiple models deployment for A/B Testing
"ModelServing": {
"Name": "ModelServingStack",
"ModelList": [
{
"ModelName": "Dla-A-20210512",
"ModelS3Key": "models/a/dla",
"ModelDockerImage": »...ecr.eu-west-1.amazonaws.com/pytorch-inference:...",
...
},
{
"ModelName": "Dla-B",
"ModelS3Key": "models/b/dla",
"ModelDockerImage": »...ecr.eu-west-1.amazonaws.com/pytorch-inference:...",
...
},

],
}

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic Scaling
"ModelServing": {
"Name": "ModelServingStack",
"ModelList": [
{
"ModelName": "Dla-A-20210512",

...

"AutoScalingEnable": true, <--- this is a option to activate/deactivate auto-scaling


"AutoScalingMinCapacity": 2,
"AutoScalingMaxCapacity": 5,
"AutoScalingTargetInvocation": 70
],

...
},

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Run the deployment (via CI/CD)
npm run model:archive

...
tar -zcvf $MODEL_FILE --directory=$MODEL_PATH/src .
mv $MODEL_FILE $MODEL_PATH/$MODEL_FILE

npm run stage:deploy:{env} We created had to create only


two commands that are
... #Read the configuration from config.json
responsible for the deployment
cdk deploy *-ModelArchivingStack --require-approval never –c \ of the model and of the
--profile $PROFILE_NAME environment="${STAGE}" --region $REGION
endpoint.
for row in $(echo $MODEL_LIST | jq -r '.[] | @base64'); do
... # for each model, upload the model file to the s3 bucket These commands are launched
done
in our CI/CD pipeline, managed
with Github Actions.
cdk deploy *-ModelServingStack --require-approval never–c \
--profile $PROFILE_NAME environment="${STAGE}" --region $REGION
cdk deploy *-ModelScalingStack --require-approval never–c \
--profile $PROFILE_NAME environment="${STAGE}" --region $REGION

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we learned

• With Amazon SageMaker we were able to bring up a running


endpoint for inferences in a matter of a few hours.

• CloudWatch gives us the ability to monitor the load on the endpoint


and the status of the model in nearly real-time.

• AWS CDK is the best tool for developers to create the infrastructure
and deploy it with the full control of the resources involved.

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s next

• Migrate the other Shape AI models to SageMaker Endpoint


• Migrate from Github Actions to CodePipeline
• Serverless SageMaker Endpoint

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we covered today

SageMaker deployment strategy

Inference load testing

Inference A/B testing

Model monitoring

Docebo’s journey with Amazon SageMaker

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn in-demand AWS Cloud skills

AWS Skill Builder AWS Certifications


Access 500+ free digital courses Earn an industry-recognized
and Learning Plans credential

Explore resources with a variety Receive Foundational,


of skill levels and 16+ languages Associate, Professional,
to meet your learning needs and Specialty certifications

Deepen your skills with digital Join the AWS Certified community
learning on demand and get exclusive benefits

Access new
Train now exam guides

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete
the session survey
in the mobile app
Android iOS

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.

You might also like