0% found this document useful (0 votes)

17 views16 pages

MLOps

Uploaded by

plaharia63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views16 pages

MLOps

Uploaded by

plaharia63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

We will learn what is MLOps or Machine Learning Operations.

This
article is for people who want to understand how an ML model is
deployed to production, the stages, the process, and the tears it
involves.

Let us begin!

What is MLOps?

Machine Learning Operations involves a set of processes or rather a

sequence of steps implemented to deploy an ML model to the
production environment. There are several steps to be undertaken
before an ML Model is production-ready. These processes ensure
that your model can be scaled for a large user base and perform
accurately.

Why do we need MLOps?

Creating an ML model that can predict what you want it to predict

from the data you have fed is easy. However, creating an ML model
that is reliable, fast, accurate, and can be used by a large number
of users is difficult.

The necessity of MLOps can be summarized as follows:

 ML models rely on a huge amount of data, difficult

for a single person to keep track of.
 Difficult to keep track of parameters we tweak in
ML models. Small changes can lead to enormous
differences in the results.

 We have to keep track of the features the model works

with, feature engineering is a separate task that
contributes largely to model accuracy.

 Monitoring an ML model isn’t like monitoring a

deployed software or web app.

 Debugging an ML model is an extremely

complicated art

 Models rely on real-world data for predicting, as real-

world data changes, so should the model. This
means we have to keep track of new data changes and
make sure the model learns accordingly.

Remember the old excuse Software Engineers used. We want to

avoid it.

Software Engineers vs ML Engineers

DevOps vs MLOps

You must have heard of good old DevOps. The process to build and
deploy software applications. You might wonder how MLOps is
different.

DevOps Cycle (Image by Author)

The DevOps stages are targeted for developing a software

application. You plan the features of the application you want to
release, write code, build the code, test it, create a release plan and
deploy it. You monitor the infrastructure where the app is deployed.
And this cycle continues until the app is fully built.
In MLOps, things are different. We implement the following stages

ML Project Lifecycle

Scoping — We define the project, check if the problem

requires Machine Learning to solve it. Perform requirement
engineering, check if the relevant data is available. Verify if the data
is non-biased and reflects the real-world use case.

Data Engineering — This stage involves collecting data,

establishing baselines, cleaning the data, formatting the data,
labelling, and organizing the data.

Modelling — Now we come to the coding part, here we create the

ML model. We train the model with the processed data. Perform
error analysis, define error measurement, and track the model
performance.
Deployment — Here we package the model, deploy it in the cloud
or on edge devices as necessary. Packaging could be — model
wrapped with an API server exposing REST or gRPC endpoints, a
docker container deployed on cloud infrastructure, deployed on
server-less cloud platform, or a mobile app for edge-based models.

Monitoring — Once the deployment is done, we rely on a

monitoring infrastructure to help us maintain and update the model.
This stage has the following components:

1. Monitor the infrastructure where we deploy — for load,

usage, storage, and health. This tells us about the
environment where the ML model is deployed.

2. Monitor the model for its performance, accuracy, loss,

bias, and data drift. This informs us if the model is
performing as expected, valid for real-world scenarios or
not.

There will sometimes be a feedback loop as some models might

require learning from the user inputs and predictions it makes. This
lifecycle is valid for most of the ML use cases.

Equipped with the knowledge of the basic lifecycle of an ML project,

let’s take a look at how the infrastructure scene is on the ML side.
ML Production Infrastructure

ML Infrastructure

Now we learn what infrastructure setup we would need for a model

to be deployed in production. You can see in the above picture, ML
code is only a small part of it. Let us understand the components one
by one.

Data Collection — This step involves collecting data from various

sources. ML models require a lot of data to learn. Data collection
involves consolidating all kinds of raw data related to the problem.
i.e Image classification might require you to collect all available
images or scrape the web for images. Voice recognition may require
you to collect tons of audio samples.
Data Verification — In this step we check the validity of the data,
if the collected data is up to date, reliable, and reflects the real world,
is it in a proper consumable format, is the data structured properly.

Feature Extraction — Here, we select the best features for the

model to predict. In other words, your model may not require all the
data in its entirety for discovering patterns, some columns or parts
of data might be not used at all. Some models perform well when a
few columns are dropped. We usually rank the features with
importance, features with high importance are included, lower ones
or near zero ones are dropped.

Configuration — This step involves setting up the protocols for

communications, system integrations, and how various components
in the pipeline are supposed to talk to each other. You want your
data pipeline to be connected to the database, you want your ML
model to connect to database with proper access, your model to
expose prediction endpoints in a certain way, your model inputs to
be formatted in a certain way. All the necessary configurations
required for the system need to be properly finalized and
documented.

ML Code — Now we, come to the actual coding part. In this stage,
we develop a base model, which can learn from the data and predict.
There are tons of ML libraries out there with multiple language
support. Ex: tensorflow, pytorch, scikit-learn, keras, fast-ai and
many more. Once we have a model, we start improving its
performance by tweaking the hyper-parameters, testing different
learning approaches until we are satisfied that the model is
performing relatively better than its previous version.

Machine Resource Management — This step involves the

planning of the resources for the ML model. Usually, ML models
require heavy resources in terms of CPU, memory, and storage.
Deep learning models are dependent on GPU and TPU for
computation. Training ML models involves cost in terms of time and
money. Slower CPUs involve more time, Powerful CPUs are pricier.
The larger the model, the bigger the storage you will have to invest
in.

Analysis Tool — Once your model is ready, how do you know if the
model is performing up to mark. We decide on model analysis in
this stage. How do we compute loss, what error measurement
should we use, how do we check if the model is drifting, is the
prediction result proper, has the model been overfitted or underfit?
Usually, the libraries with which we implement the model ship with
analysis kits and error measurements.

Project Management Tool — Tracking an ML project is very

important. It’s easy to get lost and mess up while dealing with huge
data, features, ML code, resource management. Luckily there are a
lot of project management tools out on the Internet to help us out.

Serving Infrastructure — Once the model is developed, tested,

and ready to go, we need to deploy it somewhere the users can
access it. The majority of the models are deployed on cloud. Public
cloud providers like AWS, GCP, and Azure even have specific ML-
related features for easy deployment of models. Depending on the
budget you can select the provider suited for your needs.

If we are dealing with an edge-based model, we need to decide on

how the ML model can be used, it could be a mobile application for
use cases like image recognition, voice recognition. We could also
have a custom chip and processor for certain use case like
autonomous driving as in the case of Tesla. Here we have to take
into account how much computing capability is available and how
large is our model size.

Monitoring — We need to implement a monitoring system to

observe our deployed model and the system on which it runs.
Collecting model logs, user access logs, and prediction logs will help
in maintaining the model. There are several monitoring solutions
like greylog, elasticstack, and fluentd available. Cloud providers
usually ship their own monitoring systems.

A Walk-through

Now that we have understood how the ML project lifecycle works,

how the infrastructure scene is in an ML production. We will learn
how the ML model is deployed in production.

To understand this we will look at Jen and her quest for ML

Engine.

Jen has a huge pumpkin patch, every year she sells the pumpkins to
townsfolk and local pumpkin spice latte factory. Since she has a
huge demand every year, it became tedious for her to look at every
pumpkin and check if it is good or bad.

So she approaches you to help her develop an ML Engine that will

help her predict if a given pumpkin is good or bad.

Jen needs help classifying her pumpkins. (Image by Author)

Off the bat, this is a simple classification problem.

Let’s discuss our approach to solve this problem.

1. First, we collect all the info about the pumpkins in Jen’s

patch. All the photos of good pumpkins, ok pumpkins,
and bad pumpkins. We then ask the good townsfolk to
send in pictures of the pumpkins they had bought from
Jen. (Some of them send pictures of watermelon! Shame
on them) This step is EDA + Compiling dataset.

2. Now that we have a good collection of pictures. It’s time

to label these, with Jen’s help we label several hundred
pictures. We check the resolution of the pictures, set a
standard resolution, discard the low-quality images,
format the image contrast and brightness for better
readability. This step is Data Preparation

3. Now we train a tensorflow model to classify the images.

Say we use a sequential neural net with ReLU activation.
We define 1 input layer, 2 hidden layers, and 1 output
layer, just a basic Convolution Neural Net. We split our
image dataset into training and testing sets, provide the
training data as input to the ConvoNet. The model is
trained. This step is Model Training.

4. Once the model is trained, we evaluate the model by

using the testing dataset. Based on the prediction, we
compare the result and check for the accuracy of the
prediction. This step is Model Evaluation.

5. We tweak the hyper parameters of the model, to increase

the accuracy, retrain the model and evaluate it again.
This iteration is done until we are satisfied that the
model is good enough for Jen. This step is Model
Analysis.

6. Now we have our working model, we deploy it so that

Jen can use this ML Engine for her daily work. We create
a server in the cloud with prediction APIs, create an app
or a website where she can upload the images and get
results in real-time. This is Model Deployment.
We have done all the work manually, from data preparation to
deployment. Congratulations! This process of MLOps is
called Level-0. We have achieved our deployment, but all things
are done manually. You can refer to the diagram below.

MLOps Level-0

Jen is happy.

Classification
Now the demand has begun to rise. You cannot train the model
every day manually. So you create an automation pipeline, to
validate data, prep it, and train the model. You also try to fetch the
best available model, by comparing multiple error metrics. The
pipeline takes care of it all. This process is Level-1 of
MLOps. Here the training and analysis of the model are taken care
of automatically. You just have to check if proper data is available
and make sure there isn’t a skewed dataset so that the model is
trained properly.

MLOps Level-1

Most companies achieve this level of MLOps. This is also achievable

by an individual data scientist or ML engineer. This is good enough
when you test the model in your development environment.

Let’s now ask ourselves a few questions.

 Is your model able to replicate the result with different

varieties of pumpkins?
 When new data is added to the dataset, is your model
able to retrain?

 Can your model be used by hundreds of thousands of

people at once? Scaled well?

 How do you keep track of models when you deploy them

across a large region or even across the globe?

This leads us to Level-2.

It’s time to look at the bigger picture…

MLOps Level-2
Let’s break down the process.

 Everything we do above the red line, in the flowchart

— is Level-1.

 This entire Orchestrated Experiment is now part of

the Automated ML Pipeline.

 We introduce a Feature Store, which pulls data from

various sources and transforms the data to features
required by the model. The ML pipeline uses the data
from the Store in batches.

 The ML pipeline is connected to a Metadata Store.

Think of it as bookkeeping, since you don’t train the
model manually — this store has the records of each
stage in the pipeline. Once a stage is completed, the next
stage looks up the record list, finds the previous stage
records, picks up from there.

 The models are then stored in a Model Registry. We

have a bunch of models with various accuracy stored
here. Based on the requirement, the appropriate model
is then sent to a CI/CD pipeline which deploys it as
a Prediction Service. Authorized users are able to
access the prediction service when desired.

 This system is monitored for performance. Say you have

a new bunch of genetically modified pumpkins. Your
model isn’t aware of this. This is a new dataset
with a high probability of being wrongly
classified. This drop in performance would set a
trigger, that would lead to the retraining of the model on
the new data.

 This cycle continues.

We retrain the model, when there is a drop in performance or there

is new data available. It’s good to keep the model up to date so that
the real-world changes are reflected in the model. Ex: Your model
shouldn’t be recommending cassette tapes, when the
world has moved onto digital streaming.

Where to go from here?

We covered what is MLOps? Why would you use it? What would the
production infrastructure setup look like? And, once you have the
infrastructure, how would you implement it — the process.

You can start by creating simple models and automating the steps.
Remember, it is an iterative process, will take time to get it right.

Do check out some of the MLOps Tools like MLFlow, Seldon

Core, Metaflow, and Kubeflow Pipelines.

MS Powerpoint
No ratings yet
MS Powerpoint
78 pages
BCA Final Year Project - Ecommerce Site
100% (1)
BCA Final Year Project - Ecommerce Site
63 pages
AWS MLOps Slides
No ratings yet
AWS MLOps Slides
185 pages
MLOps Notes
100% (1)
MLOps Notes
48 pages
ArcGIS Pro - Tips and Tricks
100% (5)
ArcGIS Pro - Tips and Tricks
38 pages
ML in Production en
No ratings yet
ML in Production en
106 pages
Machine Learning Operations
No ratings yet
Machine Learning Operations
11 pages
MLOps Specialization Course January 2024!5!15
No ratings yet
MLOps Specialization Course January 2024!5!15
11 pages
MLOPS Unit 1
No ratings yet
MLOPS Unit 1
10 pages
KeepGoing ML Engineering Ebook
No ratings yet
KeepGoing ML Engineering Ebook
39 pages
AD3002 Healthcare Unit2 Updated
No ratings yet
AD3002 Healthcare Unit2 Updated
83 pages
Program As Full
No ratings yet
Program As Full
40 pages
Mlops 101
No ratings yet
Mlops 101
33 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
The Big Book of Mlops
No ratings yet
The Big Book of Mlops
49 pages
W11 Ecs7020p
No ratings yet
W11 Ecs7020p
35 pages
Designing Machine Learning Systems by Chip Huygen by Rick
No ratings yet
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
MLOps Continuous Delivery For ML On AWS
No ratings yet
MLOps Continuous Delivery For ML On AWS
69 pages
MDCM Sagar Assignment
No ratings yet
MDCM Sagar Assignment
15 pages
MLops Concept
No ratings yet
MLops Concept
20 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
Internship Summary Presentation: Yatish Mishra 4 Year Cse Nitra Technical Campus
No ratings yet
Internship Summary Presentation: Yatish Mishra 4 Year Cse Nitra Technical Campus
10 pages
Mlops Productionalization Brochure
No ratings yet
Mlops Productionalization Brochure
7 pages
MLOps Google Cloud
No ratings yet
MLOps Google Cloud
37 pages
Mlops: Continuous Delivery and Automation Pipelines in Machine Learning
100% (1)
Mlops: Continuous Delivery and Automation Pipelines in Machine Learning
14 pages
Unit2 Hca Notes
No ratings yet
Unit2 Hca Notes
17 pages
MLOps Interview Study CSCW24
No ratings yet
MLOps Interview Study CSCW24
34 pages
Webinar Slides Mlops
100% (1)
Webinar Slides Mlops
35 pages
MLOps
No ratings yet
MLOps
19 pages
ML Ops
100% (1)
ML Ops
19 pages
Base Paper 3 - Master Theises
No ratings yet
Base Paper 3 - Master Theises
75 pages
Survey Questionnaire For Teachers
100% (2)
Survey Questionnaire For Teachers
6 pages
Implementation of MLOps 1710672760
No ratings yet
Implementation of MLOps 1710672760
23 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Machine Learning Operations A Mapping Study
No ratings yet
Machine Learning Operations A Mapping Study
9 pages
The Ultimate Guide To MLOps Ebook
No ratings yet
The Ultimate Guide To MLOps Ebook
10 pages
VIANOPS - Whitepaper 3 16
No ratings yet
VIANOPS - Whitepaper 3 16
18 pages
Course Categories List
No ratings yet
Course Categories List
6 pages
7 - From ML To Production
No ratings yet
7 - From ML To Production
23 pages
Getting Started With MLOPs 21 Page Tutorial
No ratings yet
Getting Started With MLOPs 21 Page Tutorial
21 pages
ML Iat 1
No ratings yet
ML Iat 1
23 pages
Tantithamthavorn Et Al - 2025
No ratings yet
Tantithamthavorn Et Al - 2025
7 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
36 pages
Unit 2
No ratings yet
Unit 2
12 pages
Future of SAP HCM and Fiori
100% (2)
Future of SAP HCM and Fiori
2 pages
Week 1 - Overview of ML Lifecycle and Deployment
No ratings yet
Week 1 - Overview of ML Lifecycle and Deployment
21 pages
Unit 2
No ratings yet
Unit 2
9 pages
Standardizing ML Ebook
No ratings yet
Standardizing ML Ebook
24 pages
Difference Between Machine Learning and Traditional Programming
No ratings yet
Difference Between Machine Learning and Traditional Programming
11 pages
The Machine Learning Lifecycle in 2021
No ratings yet
The Machine Learning Lifecycle in 2021
20 pages
Unit 1
No ratings yet
Unit 1
21 pages
Desktop Publishing
No ratings yet
Desktop Publishing
6 pages
Lecture 8 - Lifecycle of A Data Science Project - Part 2
No ratings yet
Lecture 8 - Lifecycle of A Data Science Project - Part 2
43 pages
MLOps Asilla 20221124
No ratings yet
MLOps Asilla 20221124
16 pages
Step-By-Step Guide To Gain MLOps Skills
No ratings yet
Step-By-Step Guide To Gain MLOps Skills
6 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
ML Engineering Ebook Final
No ratings yet
ML Engineering Ebook Final
39 pages
1 KJH
No ratings yet
1 KJH
4 pages
Gopi Appian 2 Yrs CV
100% (1)
Gopi Appian 2 Yrs CV
3 pages
DevOps For AI-IEEE
No ratings yet
DevOps For AI-IEEE
6 pages
Machine Learning (ML) - Comprehensive Summary
No ratings yet
Machine Learning (ML) - Comprehensive Summary
7 pages
Subjects You Need To Know:: Programming Languages of AI
0% (1)
Subjects You Need To Know:: Programming Languages of AI
7 pages
Flash Catalyst CS5.5 Read Me
No ratings yet
Flash Catalyst CS5.5 Read Me
6 pages
Presentation 1
No ratings yet
Presentation 1
5 pages
Database Basics
No ratings yet
Database Basics
7 pages
Automated Food Ordering System With Real-Time Customer Feedback
No ratings yet
Automated Food Ordering System With Real-Time Customer Feedback
6 pages
MLops
No ratings yet
MLops
43 pages
Computing Essentials 2010 Edition Timothy J. O'Leary Ebook All Chapters PDF
No ratings yet
Computing Essentials 2010 Edition Timothy J. O'Leary Ebook All Chapters PDF
55 pages
Internet Programming With Delphi
100% (1)
Internet Programming With Delphi
14 pages
LEKHA - SFDC - Dev Resume
No ratings yet
LEKHA - SFDC - Dev Resume
7 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
Jake S Resume Anonymous
No ratings yet
Jake S Resume Anonymous
2 pages
Lecture 13 (ICT) - S4
No ratings yet
Lecture 13 (ICT) - S4
15 pages
Software Engineering Mod4
No ratings yet
Software Engineering Mod4
18 pages
Freeze Panes in Excel - Easy Excel Tutorial
No ratings yet
Freeze Panes in Excel - Easy Excel Tutorial
5 pages
It6 SDC Report
No ratings yet
It6 SDC Report
26 pages
Microsoft 365 Enterprise - Frontline
No ratings yet
Microsoft 365 Enterprise - Frontline
1 page
My Resume
No ratings yet
My Resume
3 pages
Using Resource Manager To Detect and Kill Idle Sessions
No ratings yet
Using Resource Manager To Detect and Kill Idle Sessions
3 pages
Perfecto Answers
0% (1)
Perfecto Answers
5 pages
15 Kategori Daftar Software-Software Untuk GNU Linux
No ratings yet
15 Kategori Daftar Software-Software Untuk GNU Linux
4 pages
Teaching and Learning Activity-Suggested Solution
No ratings yet
Teaching and Learning Activity-Suggested Solution
3 pages
Design Application Architecture - CS
No ratings yet
Design Application Architecture - CS
3 pages
SQL Server Management Studio Tips
No ratings yet
SQL Server Management Studio Tips
6 pages
Litchfield Magazine Ad Specs
No ratings yet
Litchfield Magazine Ad Specs
1 page
Machine Learning Engineering
From Everand
Machine Learning Engineering
Henry Codwell
No ratings yet
A Comprehensive Guide to Machine Learning Operations (MLOps)
From Everand
A Comprehensive Guide to Machine Learning Operations (MLOps)
Rick Spair
No ratings yet
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
From Everand
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
Premkumar Rangarajan
No ratings yet
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
From Everand
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
Suhas Pote
No ratings yet
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet

MLOps

Uploaded by

MLOps

Uploaded by

We will learn what is MLOps or Machine Learning Operations.

Machine Learning Operations involves a set of processes or rather a

Why do we need MLOps?

Creating an ML model that can predict what you want it to predict

The necessity of MLOps can be summarized as follows:

 ML models rely on a huge amount of data, difficult

 We have to keep track of the features the model works

 Monitoring an ML model isn’t like monitoring a

 Debugging an ML model is an extremely

 Models rely on real-world data for predicting, as real-

Remember the old excuse Software Engineers used. We want to

Software Engineers vs ML Engineers

DevOps Cycle (Image by Author)

The DevOps stages are targeted for developing a software

Scoping — We define the project, check if the problem

Data Engineering — This stage involves collecting data,

Modelling — Now we come to the coding part, here we create the

Monitoring — Once the deployment is done, we rely on a

1. Monitor the infrastructure where we deploy — for load,

2. Monitor the model for its performance, accuracy, loss,

There will sometimes be a feedback loop as some models might

Equipped with the knowledge of the basic lifecycle of an ML project,

Now we learn what infrastructure setup we would need for a model

Data Collection — This step involves collecting data from various

Feature Extraction — Here, we select the best features for the

Configuration — This step involves setting up the protocols for

Machine Resource Management — This step involves the

Project Management Tool — Tracking an ML project is very

Serving Infrastructure — Once the model is developed, tested,

If we are dealing with an edge-based model, we need to decide on

Monitoring — We need to implement a monitoring system to

Now that we have understood how the ML project lifecycle works,

To understand this we will look at Jen and her quest for ML

So she approaches you to help her develop an ML Engine that will

Jen needs help classifying her pumpkins. (Image by Author)

Off the bat, this is a simple classification problem.

Let’s discuss our approach to solve this problem.

1. First, we collect all the info about the pumpkins in Jen’s

2. Now that we have a good collection of pictures. It’s time

3. Now we train a tensorflow model to classify the images.

4. Once the model is trained, we evaluate the model by

5. We tweak the hyper parameters of the model, to increase

6. Now we have our working model, we deploy it so that

Most companies achieve this level of MLOps. This is also achievable

Let’s now ask ourselves a few questions.

 Is your model able to replicate the result with different

 Can your model be used by hundreds of thousands of

 How do you keep track of models when you deploy them

This leads us to Level-2.

It’s time to look at the bigger picture…

 Everything we do above the red line, in the flowchart

 This entire Orchestrated Experiment is now part of

 We introduce a Feature Store, which pulls data from

 The ML pipeline is connected to a Metadata Store.

 The models are then stored in a Model Registry. We

 This system is monitored for performance. Say you have

 This cycle continues.

We retrain the model, when there is a drop in performance or there

Where to go from here?

Do check out some of the MLOps Tools like MLFlow, Seldon

You might also like