Machine Learning Operations
Machine Learning Operations
Operations
Databricks Academy
January, 2025
©2024 Databricks Inc. — All rights reserved
Course Learning Objectives
● Explain modern machine learning operations within the frameworks of
DataOps, DevOps, and ModelOps.
● Relate MLOps activities to the features and tools available in Databricks,
and explore their practical applications in the machine lifecycle.
● Design and implement basic machine learning operations, including
setting up and executing a machine learning project on Databricks,
following best practices and recommended tools.
● Detail Implementation and Monitoring capabilities of MLOps solutions
on Databricks.
20 min
MLOps on Databricks 15 min ✔ ✔ ✔
15 min
15 min
Monitoring in Machine Learning 25 min ✔ ✔ ✔
30 min
Defining MLOps
Data
Business Define Success Preprocessing/
Data Collection Model Training
Problem Criteria Feature
Engineering
DataOps
Data and
Data Model Model Model
EDA Model
Preparation Development Validation Serving
Monitoring CI/CD
Workflows
Tooling
Scalable, efficient, and performant data processing
DAG-based
Data Processing Solution Code management, orchestration, job
version control, and scheduling
Unified security, governance, and cataloging automatic testing
Data Governance Solution
Unified data storage for reliability, quality, and sharing
Data Storage and Management Solution
Data and
Data Model Model Model
EDA
Preparation Development Validation Serving
Model
Monitoring
Workflow
Repos Orchestration
Scalable, efficient, and performant data processing
Code management, DAG-based
Apache Spark and Photon version control, and orchestration, job
automatic testing scheduling
Unified security, governance, and cataloging
Unity Catalog
Unified data storage for reliability, quality, and sharing
Delta Lake
MLOps on
Databricks
freshest data with data Mosaic AI Delta Live Tables Workflows Databricks SQL
warehouse performance and data Use generative AI to understand the semantics of your data
Databricks
Data ScienceAI Delta Live
ETL &Tables Workflows
Orchestration Data
& AI Real-time Analytics Databricks SQL
Warehousing
Workflows is a fully-managed cloud- Create, tune, and
Databricks
serve AI
custom LLMs Delta
Automated
Live
data Tables
quality
Job cost optimized
Workflows
based on past runs
Text-to-SQL
Databricks SQL
based general-purpose task Use generative AI to understand the semantics of your data
on any cloud.
Jobs consist of
one or more Tasks
Databricks Python Python SQL DBSQL Delta Live Tables dbt Java Spark
Notebooks Scripts Wheels Files/Queries Dashboards Pipeline JAR file Submit
Private
Preview
Jobs supports
different Triggers
Manual Scheduled API File Delta Table Continuous
Trigger (Cron) Trigger Arrival Update (Streaming)
Key Features:
● Seamless Git integration.
● Collaborative coding
environment.
● Simplified version control.
● Infrastructure as Code for ● Easy-to-use interface SDKs for: ● Flexible tool for to
Databricks resources and for automation from • Python manage your
terminal, command • Java Databricks
workflows prompt, or bash scripts. • Go workspaces and the
• R associated cloud
infrastructure
• Databricks Asset Bundles or DABs are a collection of Databricks artifacts 1 (e.g. jobs, ML
models, DLT pipelines, and clusters) and assets 2 (e.g. Python files, notebooks, SQL queries,
and dashboards).
• These DABs (aka bundles) are configured through YAML files and can be co-versioned in
the same repository as the assets and artifacts referenced in the bundle.
• Using the Databricks CLI these bundles can be materialized across multiple workspaces
like dev / staging and production enabling customers to integrate these into their
automation and CI/CD processes
Setting Up and
Managing Workflow
Jobs using UI
Creating and
Managing
Workflow Jobs
using UI
©2024 Databricks Inc. — All rights reserved
Lab
Outline
Opinionated
MLOps Principles
Recommended
MLOps
Architectures
Simplicity Efficiency
When ML projects are well architected, the When ML projects are well architected, processes
downstream management, maintenance, and around the project become more efficient.
monitoring of the project is simplified.
Scalability Collaboration
When ML projects are well architected, they can When ML projects are well architected, it’s easy for
easily be scaled to adapt to changing requirements different users and different types of users to
for infrastructure and compute. collaborate effectively.
Infrastructure Workflow
Code Management
A single project code repository to be used throughout all environments
Data/Artifact Management
A single data/artifact management solution with access to environment-specific catalogs
Production
Setting up the
4 deployment and
monitoring of the
1 Development production-grade
ML solution.
Infrastructure Setup Developing the EDA
Optional* and ML Pipelines of
the ML solution. Staging
Organization and set up of infrastructure for a
machine learning project
2 3 Establishing the
automated testing
setup for the ML
solution.
• Who: ML Engineer
• How often:
• Set up and run every time a change is
made in Development 4
• Run every time a model is refreshed
• What: Staging
• Project merge 2 3 Establishing the
• Project code testing automated
testing setup for
• How: the ML solution.
• Who: ML Engineer
• How often: Production
• When changes are made and tests are Setting up the
passed 4 deployment and
monitoring of the
• When the model needs refreshed production-grade
ML solution.
• What:
• Automated run/deployment of solution 2 3
• Monitoring of solution
• How:
• Centralized Git repository
• MLOps Stacks project
Machine Learning
Pipeline Workflow
with Databricks
SDK
©2024 Databricks Inc. — All rights reserved
Demo
Outline
● Pipeline Configuration
○ Define and initialize a JSON payload for pipeline tasks.
● Notifications on Completion
○ Set up email notifications for workflow completion.
Deploying Models
with Jobs and the
Databricks CLI
Implementation
of MLOps Stacks
Type of Model
Monitoring
80 5 50 35
©2024 Databricks Inc. — All rights reserved Note: The scenario does overlap with concept drift.
ML Model Retraining Triggers
● Scheduled Retraining:
○ Databricks recommends starting with scheduled, periodic retraining and moving to triggered
retraining when needed.
● Data Changes:
○ Changes in the data can either explicitly trigger a retraining job or it can be automated if data drift
is detected.
● Model Code Changes:
○ Retraining can be triggered by changes in the model code, often due to concept drift or other
factors that necessitate an update in the model.
● Model Configuration Changes:
○ Alterations in the model configuration can also initiate a retraining job.
● Monitoring and Alerts:
○ Jobs can monitor data and model drift, and Databricks SQL dashboards can display status and
send alerts
©2024 Databricks Inc. — All rights reserved
Implementation and Monitoring MLOps Solution
LECTURE
Monitoring in
Machine Learning
Why?
Used to help diagnose issues before they become severe or costly
DataOps
MLOp
s
DevOps ModelOps
assets including machine-learning model’s fairness & bias # Refresh monitoring metrics
lm.run_refresh(”my_table”)
Python API
Simple Operations
Databricks managed compute eliminates
infrastructure management and scaling complexity
Lakehouse
Monitoring
Dashboard
Model Monitoring