0% found this document useful (0 votes)
20 views

Segmentation Dataset

Uploaded by

Brunet Nathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Segmentation Dataset

Uploaded by

Brunet Nathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Machine Learning Operations - MLOps

Getting from Good to Great

Michal Maciejewski, PhD

Acknowledgements: Dejan Golubovic, Ricardo Rocha, Christoph Obermair, Marek Grzenkowicz


Alice: X ML Model Y = f(X)

Bob: Y = f(X)

Let’s share our model with users aka let’s put it into production! 2
What Has to Go Right?

What is needed for an ML model to perform well in production? 3


What Can Go Wrong?

Concept and data drifts are one of the main challenges of production ML systems!

4
MLOps is about maintaining the trained model performance* in production.
The performance may degrade due to factors outside of our control
so we ought to monitor the performance and if needed, roll out a new model to users.

*model performance = accuracy, latency, jitter, etc. 5


ML Model = Data + Code

MLOps = ML Model + Software


+ Algorithm + Scripts
+ Weights + Libraries
+ Hyperparameters + Infrastructure
+ DevOps

6
MLOps = ML Model + Software
Machine
Feature Extraction Resource
Management

ML Model Serving
Configuration Analysis Tools Monitoring
Infrastructure
Data
Collection
Data Process
Verification Management
Tools

Your
ML Framework
code

Good news: most of these components come as ready-to-use frameworks

D. Sculley et. al. Hidden Technical Debt in Machine Learning Systems, NIPS 2015 7
MLOps Pipeline

Data Engineering Modelling Deployment Monitoring

MLOps is a multi-stage, iterative process. 8


Data Engineering
Reproducibility
Traceability
Data-driven ML

Data Engineering Modelling Deployment Monitoring 9


f( )=
10
Exploratory Data Analysis
For structured data:
- schema as required
tables, columns and
datatypes

For unstructured data:


- resolution, image
extension
- frequency, duration,
audio codec

Initial exploration allows indetifying requirements for input data in produciton. 11


Data Processing Pipeline

Data Data Data Feature


Ingestion Validation Cleaning Engineering

• Load from file • Schema check • Filling NaNs • Feature selection


• Load from db • Audio/video file • Filtering • Feature crossover
check • Normalization
• Standarization

We need to reproduce some of those steps (e.g. subtracting mean) in production! 12


Reproducibility
Excel
Dataset spreadsheets

Various
Notebooks
scripts

Curated dataset

https://sites.google.com/princeton.edu/rep-workshop/ 13
Keeping Track of Data Processing

• Version Input Data – DVC framework


• Version Processing Script - GitLab
• Version Computing Environment - Docker

Data Provenance – where does data come from?


Data Lineage – how data is manipulated? 14
Notebook Good Practices
• Linear flow of execution
• Little amount of code
• Extract reusable code into a package
• Pre-commit for cleaning notebook before
committing to a repository
• Set parameters on top so that notebook
can be treated as a function (papermill and
scrapbook packages)

It is OK, to do exploratory quick&dirty model development.


Once we start communicating the model outside, we need to clean it! 15
From Model-driven to Data-driven ML

Model-driven ML Data-driven ML
Fixed component Dataset Model Architecture
Variable component Model Architecture Dataset
Objective High accuracy Fairness, low bias
Explainability Limited Possible

https://datacentricai.org
16
https://spectrum.ieee.org/andrew-ng-data-centric-ai
Modelling
Training challenges
Rare events
Analyzing results

Data Engineering Modelling Deployment Monitoring 17


Selecting Data for Training
Training Validation
Dataset
80% 20%

Hyperparameter
train validate
tuning

With this approach, the model eventually sees the entire dataset. 18
Selecting Data for Training
Training Validation Test
Dataset
75% 15% 10%

Final
Hyperparameter check
train tuning
validate test

Splitting dataset in three allows to perform a final check with unseen data. 19
Balancing Datasets
Consider a binary classification problem with a dataset composed of 200 entries.
There are 160 negative examples (no failure) and 40 positive ones (failure).

Training Validation Test


Expected: 75% 15% 10%
(120 + 30) (24+6) (16+4)

Training Validation Test


Random: 75% 15% 10%
(131 + 19) (19+11) (10+10)

For continuous values it is important to preserve statistical distribution.


Although for big datasets it is not an issue, it is still a low-hanging-fruit. 20
Rare Events

There were 3130 healthy signals (Y=False) and 112 faulty ones (Y=True)
C. Obermair, Extension of Signal Monitoring Applications with Machine Learning, Master Thesis, TU Graz
M. Brice, LHC tunnel Pictures during LS2, https://cds.cern.ch/images/CERN-PHOTO-201904-108-15 21
Rare Events

This naive model is guaranteed to achieve 97% average dataset accuracy?! 22


Rare Events
TN
Ground truth Avg accuracy = = 97%
TN + FN
Y = True Y = False 𝑇𝑃 0
Precision = =
𝑇𝑃 + 𝐹𝑃 0
Y = True 0 0
Model

true positive false positive 𝑇𝑃 0


Recall = = =0
Y = False 112 3130 𝑇𝑃 + 𝐹𝑁 0 + 112
false negative true negative
2
F1score =
1/Precision + 1/Recall

It is a valuable conversation to decide if precision or recall (or both) is more important. 23


Data Augmentation

New examples obtained by New examples obtained by


shifting the region left and right rotating/shifting/hiding

JH. Kim et al. Hybrid Integration of Solid-State Quantum Emitters on a Silicon Photonic Chip, Nano Letters 2017 24
What else can we do?
When one of the values of Y is rare in the population, considerable
resources in data collection can be saved by randomly selecting within
categories of Y. […]
The strategy is to select on Y by collecting observations (randomly or all
those available) for which Y = 1 (the "cases") and a random selection of
observations for which Y = 0 (the "controls").

We can also collect more data of particular class (if even possible).
G. King and L. Zeng, “Logistic Regression in Rare Events Data,” Political Analysis, p. 28, 2001.
https://en.wikipedia.org/wiki/Cross-validation_(statistics) 25
Training Tracking
1. Pen & Paper
2. Spreadsheet
3. Dedicated framework
- Weights and Biases
- Neptune.ai
- Tensorflow
- …

26
Error Analysis

# Signal Noise Gap in signal Bias Wrong sampling


1 Magnet 1 x x
2 Magnet 2 x x
3 Magnet 3 x x

Such analysis may reveal issues with labelling or rare classes in data.
For unstructured data, a cockpit could help in analysis.
Useful in monitoring of certain classes of inputs. 27
28
Deployment
Degrees of automation
Modes of deployment
Reproducible environments

Data Engineering Modelling Deployment Monitoring 29


Degrees of Automation
Human in
Human inspection Shadow mode Full Automation
the loop

Starting from Shadow mode we can collect more training data in production!
C. Obermair, Extension of Signal Monitoring Applications with Machine Learning, Master Thesis, TU Graz 30
Modes of Deployment

100-X% Old version

Router

X%
New version

- In Canary deployment there is a gradual switch between versions


- In Blue/green deployment there is an on/off switch between versions

https://hbr.org/2017/09/the-surprising-power-of-online-experiments
https://en.wikipedia.org/wiki/Blue-winged_parrot 31
Reproducible Environments

Request Response Request Response

HTTP Server
KServe
REST API
Data Pipeline Config Pool of
file models
ML Model Computing
infrastructure
Computing environment
(OS, Python, packages)

Docker Containers Serverless compute


We will play with those during the exercise sessions! 32
Monitoring
Useful metrics
Relevant frameworks

Data Engineering Modelling Deployment Monitoring 33


34
Relevant Metrics
• Model metrics
• Distribution of input features – data/concept drift
• Missing/malformed values in the input
• Average output accuracy/classification distribution – concept drift

• Infrastructure metrics
• Logging errors
• Memory, CPU resources utilization
• Latency and jitter

For each of the relevant metrics one should define warning/error thresholds. 35
Monitoring Matters

C. Obermair, Extension of Signal Monitoring Applications with Machine Learning, Master Thesis, TU Graz 36
Data Engineering Modelling Deployment Monitoring

37
MLOps Pipeline with Tensorflow
Pipeline represented as DAG
directed acyclic graph
Data Engineering

Modelling

Deployment

https://www.tensorflow.org/tfx/guide 38
MLOps Pipeline with Kubeflow

Data Engineering

Modelling

https://ml.cern.ch
Deployment https://www.kubeflow.org/docs/started/ 39
Conclusion
Development ML Production ML
Objective High-accuracy model Efficiency of the overall system
Dataset Fixed Evolving
Code quality Secondary importance Critical
Model training Optimal tuning Fast turn-arounds
Reproducibility Secondary importance Critical
Traceability Secondary importance Critical

I do hope the presented MLOps concepts will allow your models to transition
40
From Good to Great
Resources

Machine Learning Engineering for Production (MLOps) Specialization

41

You might also like