0% found this document useful (0 votes)
42 views30 pages

AWS ML Notes - Domain 3 - Deployment

Uploaded by

jjoxeyejoxeye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views30 pages

AWS ML Notes - Domain 3 - Deployment

Uploaded by

jjoxeyejoxeye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Domain 3: Select a deployment infrastructure

3.1 Select a Deployment Infrastructure


3.1.1 Model building & Deployment Infra
Building a Repeatable Framework
a) Example pipeline sequences -Options
Workflow Orchestration Options
a) Comparisons

Deployment Option When to Use

• When working entirely within the AWS SageMaker ecosystem


SageMaker Pipelines
• For end-to-end ML workflows that need to be automated and managed at scale

• For serverless orchestration of ML pipelines


• When you need to integrate ML workflows with other AWS services
AWS Step Functions
• For complex workflows with branching and parallel execution
• When you want visual representation of workflow

Amazon MWAA • When you're familiar with Apache Airflow and prefer DAG-based workflows
(Managed Workflows for • For complex scheduling requirements
Apache Airflow) • When you need to integrate with both AWS and non-AWS services

• When you need an open-source platform for the complete ML lifecycle


• For tracking experiments, packaging code into reproducible runs, and sharing
and deploying models
MLflow
• When working in a multi-cloud or hybrid cloud environment
• When you want to use a tool that integrates well with many ML frameworks and
libraries

• For container orchestration of ML workflows and deploying ML models at scale


• When you need fine-grained control over resource allocation and scheduling
Kubernetes • For multi-cloud or hybrid cloud deployments
• When you want to leverage Kubernetes' extensive ecosystem (e.g., Kubeflow for
ML-specific workflows

b) Comparisons: AWS Controllers for Kubernetes (ACK) and SageMaker Components for Kubeflow
Pipelines.

AWS Controllers for Kubernetes (ACK) SageMaker Components for Kubeflow


Pipelines
• SageMaker Operators for Kubernetes facilitate • You can move your data processing and training
the processes for developers and data scientists jobs from the Kubernetes cluster to the
who use Kubernetes to train, tune, and deploy SageMaker ML-optimized managed service.
ML models in SageMaker. • You have an alternative to launching your
• You can install SageMaker Operators on your compute-intensive jobs from SageMaker.
Kubernetes cluster in Amazon Elastic • You can create and monitor your SageMaker
Kubernetes Service (Amazon EKS). resources as part of a Kubeflow Pipelines
• You can create SageMaker jobs by using the workflow.
Kubernetes API and command-line Kubernetes • Each of the jobs in your pipelines run on
tools, such as kubectl. SageMaker instead of the local Kubernetes
cluster so you can take advantage of key
SageMaker features.
3.1.2 Inference Infrastructure
Deployment Considerations & Deployment Infrastructure
a) Deployment Targets
Best practice: When

Benefits Keep in mind Choose when.. Use case


• Fully managed • Not as customizable • You want a fully • A bank decides to
service as other options managed solution use SageMaker
• Convenient to • Potentially higher with minimal endpoints to deploy
deploy and cost than other operational overhead ML models that
scale options and don't require detect fraud.
SageMaker
• Built-in advanced
endpoints
monitoring and customization.
logging
• Supports
various ML
frameworks
• Highly scalable • Possible higher • You need advanced • A biomedical
and flexible operational deployment company uses EKS
• Supports overhead scenarios and clusters to process
advanced • Steeper learning customized DNA sequencing
EKS deployment curve to manage configurations, and data.
scenarios tool effectively you have the •
• Supports resources to manage
custom the Kubernetes
configurations cluster
• Managed • Limited advanced • You want a managed • A renewable energy
container features compared container firm uses Amazon
orchestration to Kubernetes orchestration service ECS to scale solar
service • Vendor lock-in with good AWS energy forecasting
• Convenient to integration and you workloads.
ECS scale don't require
• Integrates well advanced
with other AWS Kubernetes features
services
• Can run in Batch
mode
• Serverless • Limited run time • You have lightweight, • A telehealth
• Automatically • Cold starts can low-latency models company uses
scales impact latency and want a Lambda functions
Lambda
• Low operational • Not suitable for serverless, pay-per- for appointment
overhead long-running or use solution reminders.
complex models
Choosing a model inference strategy
a) Amazon SageMaker inference options
SageMaker provides multiple inference options, including real-time, serverless, batch, and
asynchronous to suit different workloads.

Inference
Description When to Choose
Option

• When you need immediate responses (e.g., real-time


fraud detection)
• For applications requiring consistent, low-latency
For low latency, high
Real-time predictions
throughput requests
• When your model can handle requests within
milliseconds
• For high-traffic applications with steady request rates

• For unpredictable or sporadic workloads


• When you want to avoid managing and scaling
Handles intermittent traffic
infrastructure
Serverless without managing
• For cost optimization in scenarios with variable traffic
infrastructure
• For dev/test environments or proof-of-concept
deployments

• For time-insensitive inference requests


• When dealing with large input payloads (e.g., high-
Queues requests and handles resolution images)
Asynchronous
large payloads • For long-running inference jobs (up to 15 minutes)
• When you need to decouple request submission from
processing

• For offline predictions on large datasets


• When you need to process data in bulk (e.g., nightly
batch jobs)
Processes large offline
Batch Transform • For scenarios where real-time predictions are not
datasets
required
• When you want to precompute predictions for faster
serving
Container and Instance Types for Inference
a) Choosing the right container for Inference

Approach Description When to Choose

• When using standard ML frameworks (e.g.,


TensorFlow, PyTorch, Scikit-learn)
SageMaker managed Pre-built containers with • For quick deployment without custom code
container images inference logic included • When built-in inference logic meets your needs
• To leverage SageMaker's optimizations and best
practices

• When you need custom preprocessing or


postprocessing
Custom containers with • For proprietary algorithms or frameworks not
Your own inference
your specific inference supported by SageMaker
code
logic • When you require specific dependencies or
libraries
• For full control over the inference environment

b) Choosing the right compute resources (AWS instance)

Instance family Workload type


t family Short jobs or notebooks
m family Standard CPU to memory ratio
r family Memory-optimized
c family Compute-optimized
p family Accelerated computing, training, and inference
g family Accelerated inference, smaller training jobs
Amazon Elastic Inference Cost-effective inference accelerators

When to choose CPU, GPU of Inf2

Amazon EC2 P5 instances GPU-based instances: C2 Inf2 instances:


• High serial • High throughput at • Accelerator designed for ML
performance desired latency inference
• Cost efficient for • Cost efficient for high • High throughput at lower
smaller models utilization cost than GPUs
• Broad support for • Good for deep learning, • Ideal for models that AWS
models and large models Neuron SDK supports
frameworks
Optimizing Deployment with Edge Computing
a) Using edge devices - AWS Options

AWS IoT Greengrass

Amazon SageMaker Neo

b) When to use which

AWS IoT Greengrass SageMaker Neo


• Run at the edge: Bring intelligence to • Optimize models for faster
edge devices, such as for anomaly inference: SageMaker Neo can optimize
detection in precision agriculture or models trained in frameworks like TensorFlow,
powering autonomous devices. PyTorch, and MXNet to run faster with no loss in
• Manage applications: Deploy new or accuracy.
legacy apps across fleets using any • Deploy models to SageMaker and edge
language, packaging technology, or run devices: SageMaker Neo can optimize and
time. compile models to run on SageMaker hosted
• Control fleets: Manage and operate inference platforms, like SageMaker endpoints.
device fleets in the field locally or As you've learned, it can also help you to run
remotely using MQTT or other models on edge devices, such as phones,
protocols. cameras, and IoT devices.
• Process locally: Collect, aggregate, • Model portability: SageMaker Neo can convert
filter, and send data locally. compiled models between frameworks, such
as TensorFlow and PyTorch. Compiled models
can also be run across different platforms and
hardware, helping you to deploy models to
diverse target environments.
• Compress model size: SageMaker Neo
quantizes and prunes models to significantly
reduce their size, lowering storage costs and
improving load times. This works well for
compressing large, complex models for
production.
3.2 Create and Script Infrastructure
These pillars provide a consistent and scalable designs.

The security pillar

• create ML solutions that anonymize sensitive data, such as personally identifiable information
• guides the configuration of least-privileged access to your data and resources
• suggests configurations for your AWS account structures and Amazon Virtual Private Clouds to
provide isolation boundaries around your workloads.

The reliability pillar

• helps construct ML solutions that are resistant to disruption while recovering quickly
• guides you to design data processing workflows to be resilient to failures by implementing
error handling, retries, and fallback mechanisms
• recommends data backups, and versioning.

The performance efficiency pillar

• focuses on the efficient use of resources to meet requirements.


• help you optimize ML training and tuning jobs by selecting the most suitable EC2 instance
types for a particular task, running model inference using edge computing to minimize latency
and maximize performance.

The cost optimization pillar

• focuses on building and operating systems that minimize costs


• In the data processing stage -> guides storage resource selection and tools for automation
such as Amazon SageMaker Data Wrangler.
• During model development -> rightsizing compute resources
• Finally, during model deployment -> auto scaling

The sustainability pillar

• focuses on environmental impacts (energy consumption, efficient resource usage)

The operational excellence pillar

• focuses on the efficient operation, performance visibility, and continuous improvement


3.2.1 Methods for Provisioning Resources
IAC
a) Tools

Language Multi-Cloud
Tool Description Typical Use Cases
Support Support

• AWS-only
AWS-native IaC JSON, • Teams familiar with AWS
CloudFormation AWS only ecosystem
service YAML
• Simple to moderate complexity
deployments
TypeScript, • Teams with strong programming
IaC framework skills
CDK (Cloud Python, AWS only (can
that compiles to • Complex AWS infrastructures
Development Kit) Java, C#, be extended)
CloudFormation • Reusable ML infrastructure
Go
components
• Multi-cloud ML deployments
Open-source IaC HCL,
Terraform Excellent • Hybrid cloud scenarios
tool JSON • Teams preferring declarative
syntax
• Requires complex logic
TypeScript,
Modern IaC • Teams preferring familiar
Pulumi Python, Excellent programming languages
platform
Go, .NET • Multi-cloud, complex
architectures
Working with CloudFormation
a) Template

"AWSTemplateFormatVersion" : "2010-09-09" Format version


This first section identifies the AWS CloudFormation
template version to which the template conforms.
"Description" : "Write details on the template." Description
This text string describes the template.
"Metadata" : { Metadata
"Instances" : {"Description" : "Info on instances"}, These objects provide additional information about
"Databases" : {"Description" : " Info about dbs"} the template.
}
"Parameters" : { Parameters
"InstanceTypeParameter" : { Values passed to your template when you create or
"Type" : "String", update a stack. You can refer to parameters from the
"Default" : "t2.micro", Resources and Outputs sections of the template.
"AllowedValues" : ["t2.micro", "m1.small"],
"Description" : "Enter t2.micro or m1.small”
}
}
"Rules" : { Rules
"Rule01": { Rules validate parameter values passed to a template
"RuleCondition": { during a stack creation or stack update.
...
},
"Assertions": [
...
]}
}
"Mappings" : { Mappings
"Mapping01" : { These are map keys and associated values that you
"Key01" : { can use to specify conditional parameter values. This
"Name" : "Value01" is similar to a lookup table
}, ...
}}
"Conditions" : { Conditions
"MyLogicalID" : {Intrinsic function} Control whether certain resources are created, or
} whether certain resource properties are assigned a
value during stack creation or an update.
"Transform" : { Transform
set of transforms For serverless applications, transform specifies the
} version of the AWS SAM to use.
"Resources" : { Resources
"Logical ID of resource" : { This section specifies the stack resources, and their
"Type" : "Resource type", properties that you would like to provision. You can
"Properties" : { refer to resources in the Resources and Outputs
Set of properties sections of the template.
}} Note: This is the only required section of the template.
}
"Outputs" : { Outputs
"Logical ID of resource" : { Describe the values that are returned whenever you
"Description" : "Information on the value", view your stack's properties. For example, you can
"Value" : "Value to return", declare an output for an Amazon S3 bucket name and
"Export" : { then call the aws cloudformation describe-
"Name" : "Name of resource to export" stacks AWS CLI command to view the name.
}}}
b) CF Stacks

c) Provisioning stacks using CloudFormation templates

$ aws cloudformation create-stack \


--stack-name myteststack \
--template-body file:///home/testuser/mytemplate.json \
--parameters ParameterKey=Parm1,ParameterValue=test1
ParameterKey=Parm2,ParameterValue=test2
Working with CDK
The AWS CDK consists of two primary parts:

• AWS CDK Construct Library: This library contains a collection of pre-written modular and reusable
pieces of code called constructs. These constructs represent infrastructure resources and collections
of infrastructure resources.
• AWS CDK Toolkit: This is a command line tool for interacting with CDK apps. Use the AWS CDK Toolkit
to create, manage, and deploy your AWS CDK projects.

a) CDK Construct level comparisons


Abstract Ease of Typical Use Case
Level Description
ion Use
Direct CF resources full control over CF resources
L1 Low Low
representation
L2 Logical grouping of L1 resources Medium Medium most common

High-level abstractions that Quickly deploy common


L3 High High architectural patterns
represent complete solutions

b) CDK LifeCycle

cdk init

When you begin your CDK project, you create a directory for it, run cdk init, and
specify the programming language used:

• mkdir my-cdk-app
• cd my-cdk-app
• cdk init app --language typescript
cdk bootstrap

You then run cdk bootstrap to prepare the environments into which the stacks
will be deployed. This creates the special dedicated AWS CDK resources for the
environments.
cdk synth

Creates the CloudFormation templates using the cdk synth command.


cdk deploy

Finally, you can run the cdk deploy command to have CloudFormation provision
resources defined in the synthesized templates.
Comparing CF and CDK
The AWS CDK consists

AWS CloudFormation AWS CDK


Authoring CF only uses JSON or YAML templates to M modern programming
experience define your infrastructure resources. languages, like Python,
TypeScript, Java, C#, and Go.
IaC approach CloudFormation templates are declarative. AWS CDK provides an
You define the desired state of your imperative approach to
infrastructure and CloudFormation handles generating CloudFormation
the provisioning and updates. templates, which are
declarative, means you can
introduce logic and conditions
that determine the resources to
provision in your infrastructure.
Debugging and Troubleshooting CloudFormation templates With the CDK, you can use the
troubleshooting requires learning specific CloudFormation debugging capabilities of your
error handling and messages. chosen programming language,
making it more convenient to
identify and fix issues in your
infrastructure code.
Reusability and create nested stacks and cross-stack Supports programming
modularity references, resulting in modular and reusable languages that you can use to
infrastructure designs. However, this apply object-oriented
approach can become complex and difficult programming principles. This
to manage as your infrastructure grows. makes it more convenient to
create modular and reusable
IaC code blocks for your
infrastructure.
Community CloudFormation has been around for a longer Newer offering than AWS
support time and has a larger community for support. CloudFormation, but it is
It also has a variety of third-party tools and rapidly gaining adoption.
resources.
Learning curve steeper learning curve for developers who are If you're already familiar with
used to a more programmatic approach over programming languages like
a template-driven approach. Python or TypeScript, AWS CDK
will have a gentler learning
curve.
3.2.2 Deploying and Hosting Models

SageMaker Python SDK


a) Creating pipelines with the SageMaker Python SDK to orchestrate workflows

pipeline = Pipeline(
name=pipeline_name,
parameters=[input_data, processing_instance_type,
processing_instance_count, training_instance_type,
mse_threshold, model_approval_status],
steps = [step_process, step_train, step_evaluate, step_conditional]
)
b) Automating common tasks with the SageMaker Python SDK

Preparing data (the.run())


With Amazon SageMaker Processing, you can run processing jobs for data processing steps in your
machine learning pipeline. Processing jobs accept data from Amazon S3 as input and store data into
Amazon S3 as output.

I.Creating the Processor


To define a processing job, you first create a Processor. The following example instantiates
the SKLearnProcessor() class, which streamlines using scikit-learn in your data processing
step:

II.Running the Processor


You then use the .run() method on the processor to run a processing job.

III.Adding the data preprocessing step to a SageMaker Pipeline


Finally, you define a data preprocessing step in your pipeline using ProcessingStep():
Training Models (the.fit())
You can run model training jobs using the SageMaker Python SDK. The following model training job
example manages the training script, framework, training instance, and training data input.

I.Creating the estimator for the training job


To define a model training job, you instantiate the estimator class. This class encapsulates
training on SageMaker. The following code creates a model training job using the MXNet() class to
train a model using the MXNet framework:

II.Running the training job


After you create the estimator, you can then use the .fit() method to run the training job. This
method takes an argument that identifies the path to the training data. In this example, the training
dataset is stored in Amazon S3:

III.Creating a model training step in SageMaker Pipelines


Finally, you define a model training step in your pipeline using the TrainingStep() method:

Deploying Models (deploy())


You can use SageMaker Python SDK to deploy a SageMaker model endpoint using
the deploy() and predict() methods. You start by defining your endpoint configuration. The following code
shows the configuration for a serverless endpoint:

You use this configuration in a deploy() method. If the model is already created, you use the Model class to
create a SageMaker model from a model artifact. The model artifact location in Amazon S3 and the code to
use to perform inference as the entry_point is specified:

After deployment is complete, you can use the predictor’s predict() method to invoke the serverless
endpoint:
c) Building and Maintaining Containers

Training Container
Inference container

Below example is for Python script

Point File
serve.py
when the container is started for hosting. It starts the inference server, including
the nginx web server and Gunicorn as a Python web server gateway interface.
predictor.py
This Python script contains the logic to load and perform inference with your
model. It uses Flask to provide the /ping and /invocations endpoints.
wsgi.py
This is a wrapper for the Gunicorn server.
nginx.conf
This is a script to configure a web server, including listening on port 8080. It
forwards requests containing either /ping or /invocations paths to the Gunicorn
server.

When creating or adapting a container for performing real-time inference, your container must
meet the following requirements:

• Your container must include the path /opt/ml/model. When the inference container starts, it
will import the model artifact and store it in this directory.

Note: This is the same directory that a training container uses to store the newly trained model
artifact.

• Your container must be configured to run as an executable. Your Dockerfile should include an
ENTRYPOINT instruction that defines an executable to run when the container starts, as
ENTRYPOINT ["<language>", "<executable>"]
e.g. ENTRYPOINT ["python", "serve.py"]

• Your container must have a web server listening on port 8080.

• Your container must accept POST requests to the /invocations and /ping real-time endpoints.

• The requests that you send to these endpoints must be returned with 60 seconds and have a
size <6 MB.
Auto scaling strategy
a) SageMaker model auto scaling methods

Scaling Method Description Use Case Key Features

• Specify a metric and target


Adjusts capacity to When you want to maintain a value
Target tracking
maintain a specified specific metric (e.g., CPU • Automatically adds or removes
scaling policy capacity
metric near a target value utilization) at a target level
• Good for maintaining
consistent performance
• Define multiple thresholds and
Defines multiple policies When you need more granular corresponding scaling actions
Step scaling
for scaling based on control over scaling actions at • More aggressive response to
policy demand changes
specific metric thresholds different metric levels
• Allows fine-tuning of scaling
behavior
• Set one-time or recurring
Scales resources based When demand follows a schedules
Scheduled
on a predetermined predictable pattern (daily, • Use cron expressions with
scaling policy
schedule weekly, monthly, yearly) start and end times
• Ideal for known traffic patterns
• Full manual control over
scaling
Manually increase or For unpredictable or one-off
On-demand • Useful for new product
decrease the number of events that require manual launches, unexpected traffic
scaling
instances intervention spikes, or special promotions
• Flexibility to respond to
unforeseen circumstances
3.3 Automate Deployment
3.3.1 Introduction to DevOps
Code repositories
a) GitHub vs GitLab

Feature GitHub GitLab

Cloud-hosted, GitHub Enterprise Cloud-hosted, Self-hosted (Community and


Hosting Options
(self-hosted) Enterprise Editions)

CI/CD GitHub Actions (built-in) GitLab CI/CD (built-in)

Project
Projects, Kanban boards Issue boards, Epics, Roadmaps
Management

Third-party
Extensive marketplace Fewer, but strong built-in tools
Integrations

Open Source Many open-source projects Fully open-source core


3.3.2 CI/CD: Applying DevOps to MLOps
Introduction to MLOps
a) CICD in ML Lifecycle

b) Teams in the ML process


• Data engineers: Data engineers are responsible for data sourcing, data cleaning, and data processing.
They transform data into a consumable format for ML and data scientist analysis.
• Data scientists: Responsible for model development including model training, evaluation, and
monitoring to drive insights from data.
• ML engineers: Responsible for MLOps - model deployment, production integration, and monitoring.
They standardize ML systems development (Dev) and ML systems deployment (Ops) for continuous
delivery of high-performing production ML models.
c) Nonfunctional requirements in ML
• Consistency
• Flexibility: Accommodates a wide range of ML frameworks and technologies to adapt to changing
requirements.
• Reproducibility
• Reusability:
• Scalability:
• Auditability: Provides comprehensive logs, versioning, and dependency tracking of all ML artifacts for
transparency and compliance.
• Explainability: Incorporates techniques that promote decision transparency and model interpretability.

d) Comparing the ML workflow with DevOps


Automating testing in CI/CD Pipelines
SageMaker projects with CI/CD practices

• Unit tests
validate smaller components like individual functions or methods.
• Integration tests
can check that pipeline stages, including data ingestion, training, and deployment, work together
correctly. Other types of integration tests depend on your system or architecture.
• Regression tests
In practice, regression testing is re-running the same tests to make sure something that used to work
was not broken by a change.

Version Control Systems: Getting started with Git


SageMaker projects with CI/CD practices
Continuous Flow Structures : Automate deployment
a) Key components

1) Model training and versioning:


2) Model packaging and containerization:
3) Continuous integration (CI):
4) Monitoring and observability:
5) Rollback and rollforward strategies:

b) Gitflow and GitHub flows

Feature Gitflow GitHub Flow

Complexity More complex Simpler

Main Branches main and develop Single main branch

Feature
Feature branches from develop Feature branches from main
Development

Release Process Dedicated release branches Direct to main via pull requests

Hotfixes Separate hotfix branches Treated like features

Suited For Scheduled releases, larger projects Continuous delivery, smaller projects

Integration Branch develop branch N/A (uses main)

Learning Curve Steeper Flatter

Flexibility More rigid structure More flexible

c) GitFlow
3.3.3 AWS Software Release Processes
Continuous Delivery Services
a) AWS CI/CD Pipeline

CodePipeline • Each AWS account: 1000 pipelines


• The number of actions in a single pipeline: 500.
Provides configurable manual approval gates
• The size of input artifact for a single action: 1 GB
to control releases, detailed monitoring
• #of custom actions for/region/account: 50.
capabilities, and granular permissions to • # of webhooks/region/account : 300.
manage pipeline access, as we will explore
further.

CodeBuild • detailed logging, auto-scaling capacity, and high


availability for builds
CodeBuild sets service quotas and limits on • integrates with other AWS services like CodePipeline
builds and compute fleets. These quotas are and ECR for end-to-end CI/CD workflows
for each supported AWS Region for each AWS • artifacts can be stored in S3 or other destinations
account. • build can be monitored through the CodeBuild
console, Amazon CloudWatch, and other methods
• fine-grained access controls for build projects using
IA) policies.
CodeDeploy • facilitates automated deployments to multiple
environments
CodeDeploy is a deployment service that • supporting deployment strategies like blue/green, in-
provides automated deployments, flexible place, and canary deployments
deployment strategies, rollback capabilities,
• provides rollback capabilities,
and integration with other AWS services to
• detailed monitoring and logging, and integration with
help manage the application lifecycle across
services like EC2, Lambda, and ECS.
environments.
Best Practices for Configuring & Troubleshooting
Service Purpose Key Configuration Steps Troubleshooting Tips

• Validate buildspec file


1. Create a CodeBuild • Verify IAM permissions
project • Review service limits
Compiles source code,
2. Define build specification
runs unit tests, and • CloudTrail
CodeBuild 3. Configure build
produces deployment- environment Unique
ready artifacts 4. Set up build artifacts
5. Configure CloudWatch • Check for network issues
Logs (optional) • Check CodeBuild logs

Unique
1. Set up IAM role • Review CodeDeploy logs
Automates application
2. Create CodeDeploy app • Verify CodeDeploy agent
CodeDeploy deployments to various 3. Create deployment group
compute platforms • Validate AppSpec file
4. Define deployment
configuration • Check instance health
• Analyze the rollback reason

• Validate build specifications


((buildspec.yml)
1. Create CodePipeline • Verify input configuration
pipeline • Use CloudTrail
Models, visualizes, and
2. Add source stage • Check the IAM permissions
CodePipeline automates software 3. Add build stage
release steps 4. Add deploy stage Unique
5. Review and create • Use CloudWatch
pipeline
• Examine runtime details
• Check pipeline history

Automating Data Integration in ML Pipeline


Code Pipeline vs Step Functions

Aspect AWS CodePipeline AWS Step Functions

Primary Purpose CI/CD and release automation Workflow orchestration and coordination

Workflow Type Linear, predefined stages Complex, branching workflows with conditional logic

Standard software deployment Complex, multi-step processes and microservices


Best For
pipelines orchestration
MLOps with Code Pipeline and Step Functions
• MLOps Overview:
o Set of practices and tools for streamlining ML model deployment, monitoring, and
management
o Focuses on automating ML workflows in production environments
• AWS Step Functions:
o Fully managed visual workflow service for building distributed applications
o Represents pipeline stages (preprocessing, training, evaluation, deployment) as task
states
o Manages control flow between stages
o Can integrate with Lambda, AWS Batch, and other AWS services
• AWS CodePipeline:
o Fully managed continuous delivery service
o Automates release pipelines for MLOps workflows
o Represents each stage of the pipeline as an action
• Integration of Step Functions and CodePipeline:
a) CodePipeline invokes MLOps pipeline based on events (e.g., new model version
commit)
b) Pipeline stages include source code management, model building, testing, and
deployment
c) CodePipeline starts Step Functions state machine to initiate MLOps workflow
d) Can pass input data or parameters to configure the workflow
• Benefits of Integration:
o Enables efficient movement of models through development lifecycle
o Automates the entire process from training to production deployment
o Provides flexibility in configuring and managing complex ML workflows
Deployment strategies

Comparison deployment

Blue/green deploy Canary deployment Rolling deployment


maintaining two identical Gradually rolling out a new model Gradually replace previous
prod environments, one version to a small portion of users deployment of model version
blue (existing) and one with new version by updating
green (new), and endpoint in configurable batch
gradually shifting traffic sizes
between them.

• When you need instant • To test new features with a subset • When you have stateful
rollback capability of users application
• For critical • When you want to gather user • For large-scale deployments
applications requiring feedback before full release where cost is an issue
zero downtime • For applications with high traffic • When you can tolerate having
• When your application where you want to minimize risk mixed versions temporarily
can handle sudden
traffic shifts

The baking period is a set time for monitoring the green fleet's performance before completing the full transition,
making it possible to rollback if alarms trip. This period builds confidence in the new deployment before permanent
cutover.
3.3.4 Retraining models
Retraining models
a) Retraining mechanisms

Automated retraining Invoke model retraining when new


pipelines data becomes available
Scheduled retraining Periodic jobs to retrain the model at
regular intervals
Drift detection and invoke retraining when the model's SageMaker Model Monitor +
invoked retraining performance starts to degrade. Lambda
(SageMaker Model Monitor can detect
model drift, and Lambda can be used
to initiate the retraining process.)
Incremental learning Incremental learning allows the • XGBoost
model to be updated with new data • Linear Learner
without completely retraining the (AWS SageMaker supports several
model from scratch. algorithms for this, as above )
Experimentation and Retraining can be paired with AWS SageMaker + Personalize
A/B testing experimentation and A/B testing to (AWS SageMaker and Amazon
compare various model versions Personalize can be used to deploy
and manage these experiments.)

b) Catastrophic forgetting during retraining and transfer learning


Catastrophic forgetting is a phenomenon that occurs in machine learning models, particularly in the context of
continual or lifelong learning.

Catastrophic forgetting is a type of over-fitting. The model learns the training data too well such that it no
longer performs well on other data.

Why does catastrophic forgetting happen?


a. Retraining and optimized training: The primary reason for catastrophic forgetting is that the parameters of
the model are typically updated to optimize for the current task. These updates effectively overwrite the
knowledge acquired from previous tasks.
b. Transfer Learning: Transfer learning is an ML approach where a pre-trained model, which was trained on
one task, is fine-tuned for a related task. Organizations can make use of transfer learning to retrain existing
models on new, related tasks using a smaller dataset.
Solving catastrophic forgetting

Method Main Idea Advantages Limitations

• Requires storing old data


Retrain on a subset of old data • Directly addresses forgetting
Rehearsal-based • Can be computationally
along with new data
• Conceptually simple expensive

Modify network architecture to • Can be very effective • May increase model complexity
Architectural
accommodate new tasks • Can be challenging to design
• Doesn't require old data

• Quality depends on model


Generate synthetic data to • Doesn't need storing old data
Replay-based • May not capture all aspects of
represent old tasks • Work well with Gen AI models old data

• Doesn't require old data or • May limit learning of new


Add constraints to limit
Regularization- architecture changes tasks
changes to important
based • Often computationally • Determining importance can
parameters
efficient be challenging

Configuring Inferencing Jobs


a) Inference types

Differences between training and inferencing

Training Inferencing

Requires high parallelism with large batch processing for


Usually run on a single input in real time
higher throughput

More compute- and memory- intensive Less compute- and memory-intensive

Standalone item not integrated into application stack Integrated into application stack workflows
Runs in the cloud Runs on different devices at the edge and cloud
Typically runs less frequently and on an as-needed basis Runs an indefinite amount of time
Compute capacity requirements are typically predictable, Compute capacity requirements might be dynamic
so wouldn't require auto scaling and unpredictable, so would require auto scaling

You might also like