AWS ML Notes - Domain 3 - Deployment
AWS ML Notes - Domain 3 - Deployment
Amazon MWAA • When you're familiar with Apache Airflow and prefer DAG-based workflows
(Managed Workflows for • For complex scheduling requirements
Apache Airflow) • When you need to integrate with both AWS and non-AWS services
b) Comparisons: AWS Controllers for Kubernetes (ACK) and SageMaker Components for Kubeflow
Pipelines.
Inference
Description When to Choose
Option
• create ML solutions that anonymize sensitive data, such as personally identifiable information
• guides the configuration of least-privileged access to your data and resources
• suggests configurations for your AWS account structures and Amazon Virtual Private Clouds to
provide isolation boundaries around your workloads.
• helps construct ML solutions that are resistant to disruption while recovering quickly
• guides you to design data processing workflows to be resilient to failures by implementing
error handling, retries, and fallback mechanisms
• recommends data backups, and versioning.
Language Multi-Cloud
Tool Description Typical Use Cases
Support Support
• AWS-only
AWS-native IaC JSON, • Teams familiar with AWS
CloudFormation AWS only ecosystem
service YAML
• Simple to moderate complexity
deployments
TypeScript, • Teams with strong programming
IaC framework skills
CDK (Cloud Python, AWS only (can
that compiles to • Complex AWS infrastructures
Development Kit) Java, C#, be extended)
CloudFormation • Reusable ML infrastructure
Go
components
• Multi-cloud ML deployments
Open-source IaC HCL,
Terraform Excellent • Hybrid cloud scenarios
tool JSON • Teams preferring declarative
syntax
• Requires complex logic
TypeScript,
Modern IaC • Teams preferring familiar
Pulumi Python, Excellent programming languages
platform
Go, .NET • Multi-cloud, complex
architectures
Working with CloudFormation
a) Template
• AWS CDK Construct Library: This library contains a collection of pre-written modular and reusable
pieces of code called constructs. These constructs represent infrastructure resources and collections
of infrastructure resources.
• AWS CDK Toolkit: This is a command line tool for interacting with CDK apps. Use the AWS CDK Toolkit
to create, manage, and deploy your AWS CDK projects.
b) CDK LifeCycle
cdk init
When you begin your CDK project, you create a directory for it, run cdk init, and
specify the programming language used:
• mkdir my-cdk-app
• cd my-cdk-app
• cdk init app --language typescript
cdk bootstrap
You then run cdk bootstrap to prepare the environments into which the stacks
will be deployed. This creates the special dedicated AWS CDK resources for the
environments.
cdk synth
Finally, you can run the cdk deploy command to have CloudFormation provision
resources defined in the synthesized templates.
Comparing CF and CDK
The AWS CDK consists
pipeline = Pipeline(
name=pipeline_name,
parameters=[input_data, processing_instance_type,
processing_instance_count, training_instance_type,
mse_threshold, model_approval_status],
steps = [step_process, step_train, step_evaluate, step_conditional]
)
b) Automating common tasks with the SageMaker Python SDK
You use this configuration in a deploy() method. If the model is already created, you use the Model class to
create a SageMaker model from a model artifact. The model artifact location in Amazon S3 and the code to
use to perform inference as the entry_point is specified:
After deployment is complete, you can use the predictor’s predict() method to invoke the serverless
endpoint:
c) Building and Maintaining Containers
Training Container
Inference container
Point File
serve.py
when the container is started for hosting. It starts the inference server, including
the nginx web server and Gunicorn as a Python web server gateway interface.
predictor.py
This Python script contains the logic to load and perform inference with your
model. It uses Flask to provide the /ping and /invocations endpoints.
wsgi.py
This is a wrapper for the Gunicorn server.
nginx.conf
This is a script to configure a web server, including listening on port 8080. It
forwards requests containing either /ping or /invocations paths to the Gunicorn
server.
When creating or adapting a container for performing real-time inference, your container must
meet the following requirements:
• Your container must include the path /opt/ml/model. When the inference container starts, it
will import the model artifact and store it in this directory.
Note: This is the same directory that a training container uses to store the newly trained model
artifact.
• Your container must be configured to run as an executable. Your Dockerfile should include an
ENTRYPOINT instruction that defines an executable to run when the container starts, as
ENTRYPOINT ["<language>", "<executable>"]
e.g. ENTRYPOINT ["python", "serve.py"]
• Your container must accept POST requests to the /invocations and /ping real-time endpoints.
• The requests that you send to these endpoints must be returned with 60 seconds and have a
size <6 MB.
Auto scaling strategy
a) SageMaker model auto scaling methods
Project
Projects, Kanban boards Issue boards, Epics, Roadmaps
Management
Third-party
Extensive marketplace Fewer, but strong built-in tools
Integrations
• Unit tests
validate smaller components like individual functions or methods.
• Integration tests
can check that pipeline stages, including data ingestion, training, and deployment, work together
correctly. Other types of integration tests depend on your system or architecture.
• Regression tests
In practice, regression testing is re-running the same tests to make sure something that used to work
was not broken by a change.
Feature
Feature branches from develop Feature branches from main
Development
Release Process Dedicated release branches Direct to main via pull requests
Suited For Scheduled releases, larger projects Continuous delivery, smaller projects
c) GitFlow
3.3.3 AWS Software Release Processes
Continuous Delivery Services
a) AWS CI/CD Pipeline
Unique
1. Set up IAM role • Review CodeDeploy logs
Automates application
2. Create CodeDeploy app • Verify CodeDeploy agent
CodeDeploy deployments to various 3. Create deployment group
compute platforms • Validate AppSpec file
4. Define deployment
configuration • Check instance health
• Analyze the rollback reason
Primary Purpose CI/CD and release automation Workflow orchestration and coordination
Workflow Type Linear, predefined stages Complex, branching workflows with conditional logic
Comparison deployment
• When you need instant • To test new features with a subset • When you have stateful
rollback capability of users application
• For critical • When you want to gather user • For large-scale deployments
applications requiring feedback before full release where cost is an issue
zero downtime • For applications with high traffic • When you can tolerate having
• When your application where you want to minimize risk mixed versions temporarily
can handle sudden
traffic shifts
The baking period is a set time for monitoring the green fleet's performance before completing the full transition,
making it possible to rollback if alarms trip. This period builds confidence in the new deployment before permanent
cutover.
3.3.4 Retraining models
Retraining models
a) Retraining mechanisms
Catastrophic forgetting is a type of over-fitting. The model learns the training data too well such that it no
longer performs well on other data.
Modify network architecture to • Can be very effective • May increase model complexity
Architectural
accommodate new tasks • Can be challenging to design
• Doesn't require old data
Training Inferencing
Standalone item not integrated into application stack Integrated into application stack workflows
Runs in the cloud Runs on different devices at the edge and cloud
Typically runs less frequently and on an as-needed basis Runs an indefinite amount of time
Compute capacity requirements are typically predictable, Compute capacity requirements might be dynamic
so wouldn't require auto scaling and unpredictable, so would require auto scaling