Machine Learning + DevOps using
Azure ML Services
ML + DevOps
Together at
Last!
Azure Nights Meetup, Thu 18 Jul 2019
Rolf Tesmer
Microsoft Cloud Architect | Azure | Data | Analytics | AI
Mr. Fox SQL Blog - https://mrfoxsql.wordpress.com/
Linked In - https://www.linkedin.com/in/rolftesmer/
https://mrfoxsql.wordpress.com/2019/06/11/machine-learning-devops-ml-devops-together-at-last/
What exactly is DevOps? And Why Should I Care?
DevOps is a software engineering practice that aims at unifying
software development and software operation. The main
characteristic of the DevOps movement is to strongly advocate
automation and monitoring at all steps of software construction,
from integration, testing, releasing to deployment and management.
GOAL: DevOps enables faster time to market, lower failure rate,
shortened lead times, automated compliance, release consistency.
method of development → Agile != DevOps method of deployment
https://en.wikipedia.org/wiki/DevOps https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/team-data-science-process-for-devops
https://azure.microsoft.com/en-us/services/machine-learning-service/
Leverage your favorite frameworks
TensorFlow MS Cognitive Toolkit PyTorch Scikit-Learn ONNX Caffe2 MXNet Chainer
Batch Score Path Realtime Score Path (optional)
Authorise
Gate
Trigger (score.py) / Manual
Repo Pipeline Pipeline Pipeline
Batch
Databricks Model Run Commit → model train file (train.py) Azure ML SDK/CLI Azure ML SDK/CLI Azure ML SDK/CLI
(Batch Scoring) :: Registered Model Create Docker from artefacts… Deploy Docker Image → ACI Deploy Docker Image → AKS
:: Score File (batch.py) Realtime Scoring + Registered Model Enable Azure ML model monitoring (SDK)
Commit → conda file (env.yml) + Score File (score.py) [Future]
Raw Data
Commit → realtime score file (score.py) + Conda File (env.yml) Run API Integration Tests [Future]
Run API Integration Tests
A/B Release, Traffic Redirection & Testing
Model Drain & Promotion
Scored Data
On-demand
Training
Model
Compute Training
Iterate
Compute
App Insights
Create Docker Image (RT Stats History)
Data ACI Deploy AKS Deploy
From Registered Statistics Reports
Tables + Test + Test
Model Model Monitor
Azure Data Databricks / IDE
Lake Store (Machine Learning) [MODELS]
(Hot) + Azure ML SDK ML Model [IMAGES] [TEST] [PROD]
Batch AI kubernetes
Container [DEPLOYMENTS] [DEPLOYMENTS]
Registry & docker
on-demand Kubernetes Azure Storage
Registry Container
Instance Services (RT Data History)
Experiment (Test API) (Prod API) Usage Reports
Track Logs/Metrics
Experiment
Training Run + Logs
Data
Snapshot
[FUTURE]
Azure Data Automated Retraining
Lake Store
(Archive) “Trigger” Model Retraining (Azure Function)
Data Drift / Prediction Drift
Workflow Steps
1. Develop ML training scripts in Python (train.py)
2. Create and configure a compute target.
3. Submit the scripts to the configured compute target
to run in that environment. During training, the
compute target stores run records to a datastore.
There the records are saved to an experiment.
4. Query the experiment for logged metrics from the
current and past runs. If the metrics do not indicate a
desired outcome, loop back to step 1 and iterate on
Azure
your scripts.
Machine Learning
5. Once a satisfactory run is found, register the
persisted model in the model registry.
6. Develop a scoring script (score.py)
7. Create a Docker Image and register in image registry.
8. Deploy the image as a web service in Azure.
9. Monitor the deployed Web Service API for drift
10. Trigger an ML model retraining event if required
https://docs.microsoft.com/en-us/azure/machine-learning/service/machine-learning-interpretability-explainability
Appendix – References
https://www.youtube.com/watch?v=nst3UAGpiBA
https://github.com/microsoft/MLOpsPython
https://docs.microsoft.com/en-au/azure/machine-learning/service/concept-model-management-and-deployment
https://mrfoxsql.wordpress.com/2019/06/11/machine-learning-devops-ml-devops-together-at-last/
https://docs.microsoft.com/en-us/azure/machine-learning/service/machine-learning-interpretability-explainability
https://azure.microsoft.com/en-us/services/machine-learning-service/
https://en.wikipedia.org/wiki/DevOps
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/team-data-science-process-for-
devops
Azure Machine Learning
Enable collaboration between data Track experiments for reproducibility
scientists and data engineers with an and auditing needs.
interactive productive workspace
Identify and promote best performing models
Prepare and clean data at massive scale into production
with the language of your choice
Deploy and manage your models using
Build and train models with pre- containers to run them anywhere
configured machine learning and deep
learning optimized clusters
https://azure.microsoft.com/en-us/services/databricks/
https://azure.microsoft.com/en-us/solutions/devops/?v=18.44