0% found this document useful (0 votes)
29 views68 pages

CT1-MLOPs S1 2

Uploaded by

Divam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views68 pages

CT1-MLOPs S1 2

Uploaded by

Divam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

CT1 - MLOps

Manaranjan Pradhan
Key Objectives
• Understand key challenges at every step in building ML Systems
• Design, Development, Evaluation, Deployment and Monitoring stages
• Equip you with tools, techniques and best practices to deal with these
challenges
• Design, develop and deploy end-to-end ML systems
• Learn some of the Industry Standard tools and platforms
• Focus is on practical lessons.
Exams and Grading

Component Weightage Coding Scheme

Group Assignment 50% 3N-a

Final Exam 50% 4N


Projects
• Project to be completed by a team of 5 (max) people.
• The project will be hosted on Github. You can showcase this
as an accomplishment.
• Plan to write a blog also.
• This will be good learning experience!
Prepare for the class!
• Install and setup Conda environment
• Install Google Colab
• Create a folder on your Google Drive, where you can store all
the code and materials, I send you before each class.
• Sign up for Azure
• You should get 100 USD Student credit if you sign up using your
student (ISB) email id
Why MLOps and ML Systems Design
Netflix 1 Million Dollar Challenge!
Netflix 1M USD Prize

• https://www.wired.com/2012/04/netflix-prize-costs/
• https://netflixtechblog.com/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429
Netflix Blog

MLOps Training Material - [email protected] 9


https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
ML Lifecycle

Use Case Data Exploratory Building


Identification Model
Collection Analysis & Machine Model Model
and Problem Validation and
and Feature Learning Deployment Monitoring
Formulation Evaluation
Preparation Engineering Models

First Model Development Iterative Steps

Continuous Model Update Iterative Steps


Lifecycle – Engineering Skills are Leveraged

Use Case Data Exploratory Building


Identification Model
Collection Analysis & Machine Model Model
and Problem and Feature Validation and
Learning Deployment Monitoring
Formulation Preparation Engineering Evaluation
Models

Is being automated.
AutoML is an initiative in
that direction. Can
Can take develop frameworks for
around 60% of selecting models or
total effort Needs some amount of reuse or repurpose an MLOps
statistics and business existing model (transfer
knowledge learning)
Machine Learning Algorithms
Machine Learning
Ability to learn without being
explicitly programmed

Supervised Unsupervised
Where features or factors and the Only features are known. No
outcome is known in the historical data ground truth.

Recommender Dimensionality
Classification Regression Forecasting Clustering
Systems Reduction
Predicting labels which are Predicting labels which are Grouping related items for Projecting higher dimensional data
categorical for example continuous in nature for example Time series example finding customer onto lower dimensions. For example
sales volume or stock price forecasting. Recommending groups images
customer churn or fraud products or services
detection prediction.
What exactly is a ML System?

MLOps Training Material - [email protected] 14


Part of Development
Continuous Integration (C) Continuous Delivery (CD)

When you say ML systems Only need the model here and the required parameters
like y = B0 + B1*X1 + ....
Incoming requests

Do Transformation + Agorithm

Train How?
Data M1 Log Reg
Test
M2 Decision Tree
Code
- Exploration
Prediction
- Preparation M3 KNN Transfer the model M3
- Modelling
- Evaluation
M4 Rendom Forest

Learning System Prediction /


What Inference System
Serialization/Deserialization Technique: else?
We take a model and write it to a file in either pickel format of ONNX format(Open Neural Network Exchange) and this file need to be
version control. Then we load the file into memory in inference system (deserialization) and when we get input we run the model and
get prediction values.
Different Types of ML systems

Batch prediction Online prediction


Frequency Periodical (e.g. every 4 hours) As soon as requests come

Useful for Processing accumulated data Need prediction result immediately


when you don't need
immediate results (e.g.
recommendation systems)

Optimized High throughput Low latency


quickly execute tasks delay is minimized

Examples ● TripAdvisor hotel ranking ● Google Assistant


● Netflix recommendations speech recognition
● Customer Churn Analysis ● Fraud Detection

https://stanford-cs329s.github.io/
MLOps Training Material - [email protected] 16
MLOps Process Frameworks
Business Understanding
Problem Formulation

Accuracy or
Cost or risk of System
business metrics
model failure Constraints
to measure

Interpretability or Bias and Regulatory


explainability Fairness Requirements

19
All machine learning projects
should be single metric driven.
Customer going to loan default or not

Cost of Model Failure


False TP
Negatives Recall =
TP FN (TP+FN)

Minimize Total Cost =


FP TN
(C1 x FP + C2 x FN)
Where,
C1 = Cost of each False Positives
C2 = Cost of each false negatives

False
Positives
21
Cost of Model Failure
Spam mails
that
appear in
the inbox

TP
Precision =
(TP+FP)

Minimizing FP is more important here

Good mails that appear in the If finding employee attrition: Leave or not leave: FN is more important: Use Recall
spam box
22
The risk or cost of model failure
should be captured at the time of
problem formulation.
Business Metrics
Performance of the recommender system is measured by

Number of quality plays


take-rate =
Number of recommendations user sees

https://dl.acm.org/doi/pdf/10.1145/2843948

24
Deployment Constraints
• Latency
• Throughput
Latency requirement can
impact the choice of models
Input Data or Features

Model
Makes
Decisions
Prediction

Inference / Prediction System

25
Interpretability or explainability
• Model Objective Attrition
Model
• Prediction (Black Box) What is the likelihood
of this employee
• Inference leaving?

• Inference is key to create strategy


• Black box vs glass box models

What are important


factors influencing
employee’s decision
to leave? 26
Bias and Fairness

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G

https://fortune.com/2018/10/10/amazon-ai-recruitment-bias-women-sexist/
Problem Formulation
• Inference or Prediction
• Local or Global Inference
• Model Risk Assessment
• Evaluation Metrics
• Model Interpretability or Explainability
• Bias and Fairness Requirements
• Compliance Requirements
• System Constraints
Data Understanding
What Practitioners Say?
1. Data + Schema
2. Storage Efficiency
3. Read Latency

Data format

31
Limitations of Traditional DWs
• Plain-text CSV - a good old friend of a data scientist
• Pickle - a Python’s way to serialize things
• HDF5 - a file format designed to store and organize large amounts of data
• https://en.wikipedia.org/wiki/Hierarchical_Data_Format
• Feather - a fast, lightweight, and easy-to-use binary file format for storing
data frames
• https://github.com/wesm/feather
• Parquet - an Apache Hadoop’s columnar storage format
• https://parquet.apache.org/
• Good for reading a subset of the columns. Mostly used as data lake or data
warehouse storage format.

32
CSV and JSON are semi-structured data

Pros:
• Widely used
• Plain text file – Can open it in any computer, readable by humans
• Can be read from and written to by most data software

Cons:
• Not the most efficient way to store or access
• No formal standard, so there is room for user interpretation on how to handle edge cases
• No Schema attached

Note:
• A great default option for most use cases
• < 100 MB data
Parquet or ORC or Pickle: Data + Schema columnar storage format

Pros:
• Very fast
• Naturally understands all dtypes used by pandas, including multi-index DataFrames
• Very common in “big data” systems like Hadoop or Spark
• Supports various compression algorithms

Cons:
• Binary storage format that is not human-readable

Note:
• > 100 MB data
• When many systems are accessing it (BI, ML Platforms, any other analysis tool)
• When tool many columns are present, but you need to access only a subset of it
Row Oriented Vs. Columnar Format
Pickle
Pros:
• Python native serializable format. Highly optimized for python read and write

Cons:
• No other language or system can understand this.

Note:
• Only when you know only python systems will read this file
• Can be used as a staging file during pipelines
Missing: 1. How much is missing?: >20% missing data, don't use that column
2. Imputation: by default value, mean/median, models

Data Profiling Outliers: 1. extreme outliers


2.

• Missing Values
• Outliers
• Bad Data Quality
• Data Sampling Errors

https://careersatdoordash.com/blog/five-common-data-quality-gotchas-in-machine-learning-and-how-
to-detect-them-quickly/
Train-Test Split
Train Test

• Is not appropriate for model search


• Experimenting with multiple models (different algorithms)
• Searching for best model (same algorithm) with optimal
hyperparameters
• Test data is used multiple times for model validation
• Information gets leaked into modelling process
• Test accuracy of final model is optimistically biased
Train-Validation-Test Split
Train Val Test

• Is appropriate for model search


• Experimenting with multiple models (different algorithms)
• Searching for best model (same algorithm) with optimal
hyperparameters
• Validation set remains static
• Information gets leaked into modelling process
• Validation accuracy may be optimistically biased
To find the best model we do cv

K-Fold Cross Validation Split


5-Fold here

Train Holdout

train with K1 or K4 and test it against k5 and see


accuracy. Do this for all iterations and take average
of the accuracies.

Similar can be done with different mode (DT) and


accuracies of the two models will be compared.

Depending on this model will be selected.

Divide train data into multifolds and test with a unique fold and iterate
Class Imbalance Type text here

● Not enough signal to learn about rare ● Fraud detection


classes ● Spam detection
● Statistically speaking, predicting ● Disease screening
majority label has higher chance of ● Churn prediction
being right
● Imbalance often comes with
differences in cost of wrong predictions
Class imbalance solution: Resampling
Under sampling Oversampling
Remove samples from the majority Add more examples to the minority
class class
Can cause overfitting Can cause loss of information
Over sampling: SMOTE
Synthetic Minority Oversampling Technique
Types of data leakage
● Data Leakage
○ Premature featurization: creating feature on the entire data instead of just
training data
■ E.g. create n-gram counts/vocabulary from train + test sets
○ Oversampling before splits
■ Train splits might contain test samples
○ Time leakage
■ Randomly splitting data instead of temporal split can cause training data to be able to see
the future
○ Group leakage
■ A patient has 3 CT scans: 2 in train, 1 in test.

44
How to avoid leakage?
● Check for duplication between train and valid/test splits
● Temporal split data (if possible)
● Use only train splits for feature engineering
● Train model on subset of features
○ If performance very high on a subset, either very good set of features or
leakage!
● Monitor model performance as more features are added
○ Sudden increase: either a very good feature or leakage!
● Involve subject matter experts in the process

45
Keeping track of all things!
Software Development, only
code is version controlled

Pipeline Model
Data + /Code +

Any change in Any change in Any change in


samples will effect imputation, scaling hyperparameters
the performance and encoding will effect the
of the model. techniques will performance of
change the the mode.
performance of
the model.

MLOps Training Material - [email protected]


Model Developement

MLOps Training Material - [email protected] 47


Do we always need to build a ML Model?

MLOps Training Material - [email protected] 48


Model Baselining

https://eugeneyan.com/writing/first-rule-of-ml/

MLOps Training Material - [email protected]


Google’s Rule for Machine Learning

https://developers.google.com/machine-learning/guides/rules-of-ml

MLOps Training Material - [email protected]


Before building a ML model start with a baseline model

Baselining
• Create a system with if/else rules from heuristics
• Build a simple ML (linear regression) first
• Build a system with regex (hand crafted regular expressions) for
classifying text data
• Benefits of creating a heuristics system

MLOps Training Material - [email protected]


When you are forced to build ML systems?

MLOps Training Material - [email protected]


Which model need to be built?

MLOps Training Material - [email protected] 53


Which model to select?

Called Black box models


Accuracy Vs. Explainability

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G

https://www.bbc.com/news/technology-45809919

• White box Vs. Black box models


• Explainable AI (XAI)
https://fortune.com/2018/10/10/amazon-ai-recruitment-bias-women-sexist/
• Bias and Fairness

MLOps Training Material - [email protected]


Model Development is messy!
• Run number of experiments to refine your model
• Experiments may involve
• Different Transformations
• Different Models
• Different Hyperparameters

• Easy to lose track of code, hyperparameters, and artifacts


• Fail to reproduce experiments (reproducibility)

MLOps Training Material - [email protected]


Building and deploying a Pipeline
Preprocessors

OHE Encode the


Scale the
Train categorical Model
numerical features
variables

MLFlow: For local use


Neptune.io
Test

pipeline

Deploy
Weights and Biases : Cloud based: Mostly used

the
Prediction
System
Defining preprocessor
Defining the model
Generating the pipeline
Weights and biases tool: https://wandb.ai/site/

Experiment Tracking
• Manually track the results of all model runs in a spreadsheet
• Use experiment tracking tools
• Weights and Biases
• MLFlow
• Neptune.ai

MLOps Training Material - [email protected]


AutoML
• Finding right model can be time consuming
• Time to market can be critical
• Unavailability of expertise in enterprises
• Benefits of using AutoML:
• Improve efficiency by automatically running repetitive tasks. This allows data
scientists to focus more on problems (like data) instead of models.
• Automated ML pipelines also help avoid potential errors caused by manual
work.
• AutoML is a big step towards the democratization of machine learning and
allows everyone to use ML features.
AutoML Frameworks
AutoML Frameworks
• Two types of frameworks
• Searches possible models from traditional ML algorithms
• Linear, SVM, KNN, Bagging, Boosting, Naïve Bayes etc.
• Does hyperparameter tuning
• Works with mostly structured data
• Neural Network Search
• Searches for neural network architectures
• Number of neurons and layers
• Works with structured and unstructured data

Extra:

Ensembling Techniques:
1. Bagging
2. Boosting
3. Stacking: Build all the different models, take outcome from each model and then again pass it through a meta model and get prediction. This is popular
now a days
AutoML outputs leaderboard and lists all the models it has tried and ranks them

Leaderboards
AutoML Frameworks

Popular

• https://isg.beel.org/blog/2020/04/09/list-of-automl-tools-and-software-libraries/
• https://medium.com/swlh/8-automl-libraries-to-automate-machine-learning-pipeline-3da0af08f636
Searching parameters
• Max run time – Limits the time to experiment
• Max models – Limits the number of experiments
• Stopping Metrics and tolerance: MSE, AUC R_square<=0.85
• Sorting Metrics – For leaderboard creation
• Exclude algos e.g. ["GLM", "DeepLearning"]
• Include algos e.g. ["GBM", " XGBoost", "DRF"]
• Preprocessing
• for example scaling, various encoding (OHE, Target etc.) – Not many frameworks support
this.
How to use AutoML?
• Should be used as a guidance tool
• May not take the suggested model directly to production
• Gives guidance on
• What models can be used
• What feature engineering can be used (though this is not a replacement of
actual feature engineering based on domain knowledge)
• Can be an indicator of what accuracy can be expected

You might also like