0% found this document useful (0 votes)

13 views92 pages

Machine Learning Operations

The document outlines a course on Machine Learning Operations (MLOps) offered by Databricks Academy, focusing on modern practices within DataOps, DevOps, and ModelOps. It details learning objectives, prerequisites, and a comprehensive agenda covering MLOps principles, architectures, implementation, and monitoring on the Databricks platform. The course emphasizes the importance of quality data and operational practices in enhancing machine learning project efficiency and effectiveness.

Uploaded by

jmpaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views92 pages

Machine Learning Operations

Uploaded by

jmpaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Machine Learning

Operations

Databricks Academy
January, 2025
©2024 Databricks Inc. — All rights reserved
Course Learning Objectives
● Explain modern machine learning operations within the frameworks of
DataOps, DevOps, and ModelOps.
● Relate MLOps activities to the features and tools available in Databricks,
and explore their practical applications in the machine lifecycle.
● Design and implement basic machine learning operations, including
setting up and executing a machine learning project on Databricks,
following best practices and recommended tools.
● Detail Implementation and Monitoring capabilities of MLOps solutions
on Databricks.

©2024 Databricks Inc. — All rights reserved

Prerequisites/Technical Considerations
Things to keep in mind before you work through this course

Prerequisites Technical Considerations

Basic knowledge of traditional machine
1 learning concepts 1 A cluster running on DBR ML 15.4+
Unity Catalog, Model Serving, and
Beginner experience with traditional 2 Lakehouse Monitoring enabled
2 machine learning development on workspace
Databricks
3 CLI Authentication
Intermediate knowledge of Python for
3 machine learning projects

Recommended: Beginner experience with

4
basic DevOps concepts like CI/CD

©2024 Databricks Inc. — All rights reserved

AGENDA
1. Modern MLOps Time Lecture DEMO LAB
Defining MLOps 20 min ✔

20 min
MLOps on Databricks 15 min ✔ ✔ ✔
15 min

02. Architecting MLOps Solutions

Opinionated MLOps Principles
Course Agenda
15 min
25 min
✔

Recommended MLOps Architectures 15 min ✔ ✔ ✔

15 min

03. Implementation and Monitoring MLOps Solution

MLOps Stacks Overview 10 min ✔

Type of Model Monitoring 15 min ✔

15 min
Monitoring in Machine Learning 25 min ✔ ✔ ✔
30 min

©2024 Databricks Inc. — All rights reserved

Modern MLOps

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved

Learning Objectives
● Explain the significance of MLOps by integrating DataOps, DevOps, and
ModelOps in modern machine learning.
● Identify and understand the components of DataOps, DevOps, and
ModelOps within the context of machine learning.
● Describe Databricks' capabilities for handling tasks related to DataOps,
DevOps, and ModelOps.
● Relate Databricks features and services to practical applications in
DataOps, DevOps, and ModelOps tasks.

©2024 Databricks Inc. — All rights reserved

Modern MLOps
LECTURE

Defining MLOps

©2024 Databricks Inc. — All rights reserved

The Machine Learning Full Lifecycle
End to End Process from Business Problem to Deployment and Monitoring

Data
Business Define Success Preprocessing/
Data Collection Model Training
Problem Criteria Feature
Engineering

Model Development Model

(use static historical data) Evaluation

Deployment & Production

(deal with continuously Model Model
changing new data) Monitoring Deployment

©2024 Databricks Inc. — All rights reserved

Defining MLOps
An all-inclusive, holistic approach to managing ML systems

The set of practices, processes, and technologies for

managing data, code, and models to improve
performance stability and long-term efficiency in
ML systems.

©2024 Databricks Inc. — All rights reserved

Understanding the Components of MLOps
MLOps components each address a specific part of ML projects

DataOps DevOps ModelOps

A set of practices, A set of practices, A set of practices,

processes, and technologies processes, and technologies processes, and technologies
to organize and improve to integrate and automate to organize and govern the
processes around data to software development lifecycle of machine learning
increase speed, governance, workflows. and artificial intelligence
quality, and collaboration. models.

©2024 Databricks Inc. — All rights reserved

Responsibilities of MLOps Components
Key Functions of DataOps, DevOps, and ModelOps in MLOps

DataOps DevOps ModelOps

• Optimized data • Machine learning is code • Move beyond models as

processing • Continuous integration objects
• Centralized data and continuous • Treating model code as
discovery, management, deployment (CI/CD) software
and governance • Version control via Git • Treating models as data
• Ensured data quality • Production-grade • Manage the model
• Traceable data lineage workflows lifecycle
and monitoring • Orchestration
• Automation

©2024 Databricks Inc. — All rights reserved

Comprehensive MLOps
Operationalizing the entire machine learning solution

DataOps

The set of processes and

automation for managing
data, code, and models to
improve performance stability MLOps
and long-term efficiency in ML
systems. DevOps ModelOps

©2024 Databricks Inc. — All rights reserved

A Simple Example ML Project
Retail Recommendation System

1. A business owner defines a problem to be solved with a

recommendation service.
2. A data scientist begins exploring governed data associated with the
service.
3. A data scientist develops a scalable ML solution using using relevant
data while tracking the experiment.
4. A machine learning engineer implements CI/CD, automates the ML
solution, and establishes model performance monitoring; the data is
written to a production catalog.

©2024 Databricks Inc. — All rights reserved

Why does MLOps matter?
Success depends on quality data and operations practices

• Defining an effective strategy Real-world Example

• ML systems built on quality data
• Streamlining the process of taking Databricks customer CareSource accelerated
solutions to production their model’s development and deployment,
• Operationalizing performance and resulting in a self-service MLOps solution for
effectiveness monitoring data scientists that reduced ML project time
• So what? from 8 weeks to 3-4 weeks.
• Time to realizing business value is
accelerated The CareSource team can extend this
• Reduction in manual oversight by high- approach to other machine learning projects,
value data science teams realizing this benefit broadly.
Learn more about the work here.

©2024 Databricks Inc. — All rights reserved

W
DataOps, DevOps, and ModelOps in MLOps
Effective machine learning involves managing data, code, and the model
lifecycle to maintain and improve performance.
Model Lifecycle Management

Data and
Data Model Model Model
EDA Model
Preparation Development Validation Serving
Monitoring CI/CD
Workflows
Tooling
Scalable, efficient, and performant data processing
DAG-based
Data Processing Solution Code management, orchestration, job
version control, and scheduling
Unified security, governance, and cataloging automatic testing
Data Governance Solution
Unified data storage for reliability, quality, and sharing
Data Storage and Management Solution

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved

A Single Platform for Modern MLOps
Combining DataOps, DevOps, and ModelOps solutions

Model Registry, Model Serving, and Lakehouse Monitoring

Data and
Data Model Model Model
EDA
Preparation Development Validation Serving
Model
Monitoring
Workflow
Repos Orchestration
Scalable, efficient, and performant data processing
Code management, DAG-based
Apache Spark and Photon version control, and orchestration, job
automatic testing scheduling
Unified security, governance, and cataloging
Unity Catalog
Unified data storage for reliability, quality, and sharing
Delta Lake

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved

Modern MLOps
LECTURE

MLOps on
Databricks

©2024 Databricks Inc. — All rights reserved

DataOps tasks and tools in Databricks
The table lists common DataOps tasks and tools in Databricks:
Ingest & transform data Autoloader and Apache Spark*

Track data changes to including

versioning & lineage
Delta tables*

Build, manage, & monitor

data processing pipelines
Delta Live Tables*

Ensure data security &

governance
Unity Catalog*

Exploratory data analysis, Databricks SQL, Dashboards,

dashboards, & general coding and Databricks notebooks
Schedule data pipelines &
Automate general workflows
Databricks Workflows

Create, store, manage, &

discover features
Databricks Feature Store*

* Discussed in other associated Data monitoring Lakehouse Monitoring**

machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Databricks SQL

Delivering analytics on the Data Science

& AI
ETL &
Real-time Analytics
Orchestration Data
Warehousing

freshest data with data Mosaic AI Delta Live Tables Workflows Databricks SQL

warehouse performance and data Use generative AI to understand the semantics of your data

Data Intelligence Engine

lake economics Unity Catalog
Securely get insights in natural language

■ Better price / performance than other

Delta Lake
cloud data warehouses Data layout is automatically optimized based on usage patterns
■ Simplify discovery and sharing of new
insights Open Data Lake
■ Connect to familiar BI tools, like Tableau All Raw Data
(Logs, Texts, Audio, Video,
or Power BI Images)

■ Simplified administration and governance

©2024 Databricks Inc. — All rights reserved

A new home for Data Analysts

Enable data analysts to quickly

perform ad-hoc and exploratory
data analysis, with a new SQL
query editor, visualizations and
dashboards. Automatic alerts can
be triggered for critical changes,
allowing to respond to business
needs faster.

©2024 Databricks Inc. — All rights reserved

DataOps Tasks and Tools in Databricks
The table lists common DataOps tasks and tools in Databricks:
Ingest & transform data Autoloader and Apache Spark*

Track data changes to including

versioning & lineage
Delta tables*

Build, manage, & monitor

data processing pipelines
Delta Live Tables*

Ensure data security &

governance
Unity Catalog*

Exploratory data analysis, Databricks SQL, Dashboards,

dashboards, & general coding and Databricks notebooks
Schedule data pipelines &
Automate general workflows
Databricks Workflows

Create, store, manage, &

discover features
Databricks Feature Store*

* Discussed in other associated Data monitoring Lakehouse Monitoring**

machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Databricks Workflows Databricks Workflows

Databricks
Data ScienceAI Delta Live
ETL &Tables Workflows
Orchestration Data
& AI Real-time Analytics Databricks SQL
Warehousing
Workflows is a fully-managed cloud- Create, tune, and
Databricks
serve AI
custom LLMs Delta
Automated
Live
data Tables
quality
Job cost optimized
Workflows
based on past runs
Text-to-SQL
Databricks SQL

based general-purpose task Use generative AI to understand the semantics of your data

Data Intelligence Engine

orchestration service for the entire
Unified security, governance, and cataloging
Lakehouse. Unity Catalog
Unity
Securely get Catalog
insights in natural language

Unified data storage for reliability and sharing

Delta Lake
Delta
Data layout is automatically Lake based on usage patterns
optimized
Workflows is a service for data engineers,
data scientists and analysts to build Open Data Lake
All Raw Data
reliable data, analytics and AI workflows (Logs, Texts, Audio, Video, Images)

on any cloud.

©2024 Databricks Inc. — All rights reserved

Databricks Workflows
Databricks has two main task orchestration services:
Job Workflows Delta Live Tables

• Execute jobs on a • Perform machine • Implement a variety of • ETL processes

predefined schedule learning operations by tasks within a job • Compatible with batch
• Series of interrelated running tasks within job • Using notebooks, JARS, and streaming inputs
tasks. frameworks like MLflow Delta Live Tables • Enforced data quality
pipelines, or Python, and consistency.
Scala, Spark submit, • Tracking & logging of
SQL and Java data transformation
applications
Arbitrary Code,
Orchestration of Machine External API Data Ingestion
and
Dependent Jobs Learning Tasks Call, Custom Transformation
Tasks

Note: DLT pipeline can be a task in a workflow

©2024 Databricks Inc. — All rights reserved

Workflows Jobs
Key Features

Workflow Job • Easy creation, scheduling, and orchestration of

your code with a DAG (Directed Acyclic Graphs)
• Key features
DAG of tasks • Simplicity: Easy creation and monitoring in the
UI
• Many Tasks types suited to your workload
• Fully integrated in Databricks platform, making
inspecting results, debugging faster
• Reliability of proven Databricks scheduler
• Observability to easily monitor status

©2024 Databricks Inc. — All rights reserved

Building Blocks of Databricks Workflows Job
A unit of orchestration in Databricks Workflows is called a Job.

Jobs consist of
one or more Tasks
Databricks Python Python SQL DBSQL Delta Live Tables dbt Java Spark
Notebooks Scripts Wheels Files/Queries Dashboards Pipeline JAR file Submit

Control flows can

be established
between Tasks. Conditionals Jobs as a Task
Sequential Parallel
(Run If) (Modular)

Private
Preview

Jobs supports
different Triggers
Manual Scheduled API File Delta Table Continuous
Trigger (Cron) Trigger Arrival Update (Streaming)

©2024 Databricks Inc. — All rights reserved 26

ModelOps Tasks and Tools in Databricks
The table lists common ModelOps tasks and tools provided by
Databricks:
Manage model lifecycle Models in Unity Catalog*

Track model development MLflow model tracking*

Model code version control

and sharing
Databricks Repos

No-code model development Databricks AutoML*

Model monitoring Lakehouse Monitoring**

* Discussed in other associated

machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Introduction to Databricks Repos
Databricks Repos provide a visual Git client & API within Databricks, allowing
users to manage code repositories, collaborate, and integrate with Git services.

Key Features:
● Seamless Git integration.
● Collaborative coding
environment.
● Simplified version control.

©2024 Databricks Inc. — All rights reserved

Databricks Repo Setup and Commands
Once setup easily manage and perform Git operations within Databricks

● From the Databricks UI

common Git operations:
■ Clone
■ Checkout
■ Commit
■ Push
■ Pull
■ Branch management
● Uses Personal Access Token or
equivalent to authenticate.

©2024 Databricks Inc. — All rights reserved

DevOps: Production and automation
The table lists common DevOps tasks and tools provided by Databricks:
Data and model lineage
Access control and governance
Unity Catalog*

Maintain a highly available

low latency REST endpoint
Mosaic AI Model Serving*

Automate and schedule Databricks workflows

workloads, from: ETL to ML Databricks also supports integrations with popular third party
orchestrators like Airflow.

Deployment infrastructure for Asset Bundles, Databricks SDKs,

inference and serving Terraform provider, Databricks CLI

Databricks Asset Bundles, Azure

Establish CI/CD Pipelines
DevOps, Jenkins, or GitHub Actions

* Discussed in other associated Monitoring, and Maintaining

machine learning courses. your Applications
Lakehouse Monitoring**
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Overview Developer Tools

Databricks Asset Bundles Databricks CLI Databricks SDKs Terraform provider

(Recommended)

● Infrastructure as Code for ● Easy-to-use interface SDKs for: ● Flexible tool for to
Databricks resources and for automation from • Python manage your
terminal, command • Java Databricks
workflows prompt, or bash scripts. • Go workspaces and the
• R associated cloud
infrastructure

©2024 Databricks Inc. — All rights reserved

What is a DAB?

• Databricks Asset Bundles or DABs are a collection of Databricks artifacts 1 (e.g. jobs, ML
models, DLT pipelines, and clusters) and assets 2 (e.g. Python files, notebooks, SQL queries,
and dashboards).
• These DABs (aka bundles) are configured through YAML files and can be co-versioned in
the same repository as the assets and artifacts referenced in the bundle.
• Using the Databricks CLI these bundles can be materialized across multiple workspaces
like dev / staging and production enabling customers to integrate these into their
automation and CI/CD processes

1 Assets are instantiations of sources that persist state.

2 Artifacts are file-like resources that exist on a workspace path that carry little or no state

©2024 Databricks Inc. — All rights reserved 34

Modern MLOps
DEMONSTRATION

Setting Up and
Managing Workflow
Jobs using UI

©2024 Databricks Inc. — All rights reserved

Demo
Outline

What we’ll cover:

● Create and configure a Databricks Workflow job with multiple
task Notebooks.
● Set dependencies and conditional paths between tasks.
● Enable email notifications for successful job runs.
● Manually trigger the workflow.
● Monitor the job's execution and completion.

©2024 Databricks Inc. — All rights reserved

Modern MLOps
LAB EXERCISE

Creating and
Managing
Workflow Jobs
using UI
©2024 Databricks Inc. — All rights reserved
Lab
Outline

What you’ll do:

• Create and Configure a Workflow Job
• Setup multiple tasks using the UI.
• Enable Email Notifications
• Configure notifications for job status updates.
• Manually Trigger Deployment Workflow
• Initiate the job run manually.
• Monitor Job Run
• Observe job execution and monitor the workflow.

©2024 Databricks Inc. — All rights reserved

Architecting
MLOps Solutions

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved

Learning objectives
Things you’ll be able to do after completing this module

• Explain the importance of using the right MLOps architecture.

• Explain the reasoning behind the opinionated MLOps principles
informing the recommended architecture of Databricks.
• Describe the Databricks-recommended MLOps architecture
approach.
• Architect basic machine learning operations solutions for
traditional machine learning applications based on Databricks-
recommended best practices.
©2024 Databricks Inc. — All rights reserved
Architecting MLOps Solutions
LECTURE

Opinionated
MLOps Principles

©2024 Databricks Inc. — All rights reserved

Guiding Principle
A data-centric approach to machine learning

Machine Learning Pipelines

• ML projects are made up of data
pipelines
• Operationalizing ML solutions
Data Model
EDA Dev Validate Monitor
Prep Serving

requires the connection of a

Scalable, efficient, and performant data processing

Apache Spark and Photon

Repos Workflows
variety of data pipelines
Unified security, governance, and cataloging
• Data pipelines require storage,
Unity Catalog governance, orchestration, etc.
Unified data storage for reliability, quality, and sharing • Aligned to the Data Intelligence
Delta Lake
Platform vision
Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved

Multi*-environment Semantics
Defining Development, Staging, and Production environments

Development Staging Production

An environment where machine

An environment where data An environment where machine learning engineers can deploy
scientists can explore, learning practitioners can test and monitor their solutions
experiment, and develop their solutions

©2024 Databricks Inc. — All rights reserved

Environment Separation
How many Databricks workspaces do we have?

Direct Separation Indirect Separation

dev staging prod dev staging prod

● Completed separate Databricks workspaces ● One Databricks workspace with enforced

for each environment separation
● Simpler environments ● Simpler overall infrastructure requiring less
● Scales well to multiple projects permission required
● Complex individual environment
● Doesn’t scale well to multiple projects

©2024 Databricks Inc. — All rights reserved

Deployment Patterns
Moving from Deploy Model to Deploy Code

Deploy Model Deploy Code (recommended)

dev staging prod dev staging prod

● Model is trained in development ● Code is developed in the development

environment environment
● Model artifact is moved from staging ● Code is tested in the staging environment
through production ● Code is deployed in the production
● Separate process needed for other code environment
(inference, monitoring, operational pipelines) ● Training pipeline is run in each environment,
model is deployed in production

©2024 Databricks Inc. — All rights reserved

Architecting MLOps Solutions
LECTURE

Recommended
MLOps
Architectures

©2024 Databricks Inc. — All rights reserved

Importance of MLOps Architecture
Setting an ML project up for success starts with architecture

Simplicity Efficiency
When ML projects are well architected, the When ML projects are well architected, processes
downstream management, maintenance, and around the project become more efficient.
monitoring of the project is simplified.

Scalability Collaboration
When ML projects are well architected, they can When ML projects are well architected, it’s easy for
easily be scaled to adapt to changing requirements different users and different types of users to
for infrastructure and compute. collaborate effectively.

©2024 Databricks Inc. — All rights reserved

Dimensions of Architecture
Differentiating initial organization/setup and ongoing workflows

Infrastructure Workflow

The organization, governance, and The processes that ML practitioners

setup of environments, data, compute follow within a defined architecture to
and other resources. achieve success on an ML project.
● Set up one time (per project or per ● Repeatable, fluid processes
team/organization) specific to a project
● Crucial to downstream success of ● Aligned to organizational best
project(s) practices

©2024 Databricks Inc. — All rights reserved

Recommended MLOps Architecture
A high-level view of code, data, and ML environments

Code Management
A single project code repository to be used throughout all environments

Development Staging Production

A Databricks workspace (or environment) A Databricks workspace (or environment) A Databricks workspace (or environment)
for exploratory data analysis, model for testing the efficacy of the project, for production ML workflows, scaling, and
training/tracking and validation, including unit tests, integration tests, and monitoring.
deployment for model selection, and performance regression tests.
monitoring.

Data/Artifact Management
A single data/artifact management solution with access to environment-specific catalogs

©2024 Databricks Inc. — All rights reserved

MLOps Solution
Infrastructure through production

Production
Setting up the
4 deployment and
monitoring of the
1 Development production-grade
ML solution.
Infrastructure Setup Developing the EDA
Optional* and ML Pipelines of
the ML solution. Staging
Organization and set up of infrastructure for a
machine learning project
2 3 Establishing the
automated testing
setup for the ML
solution.

©2024 Databricks Inc. — All rights reserved

Infrastructure Setup
Getting set up for a machine learning project

• Who: Architect or Engineer

• How often: Set up once
• What:
• A Unity Catalog metastore
• One or three Databricks workspaces 1
• Three data catalogs: dev, staging, prod Infrastructure Setup
Optional*
• Command line environment
• MLOps Stacks project (per project) Organization and set up of infrastructure for a
machine learning project
• Git repository (per project)
• How:
• Manual setup/Terraform
• MLOps Stacks

©2024 Databricks Inc. — All rights reserved

Infrastructure Setup Tasks
Using our DataOps + DevOps + ModelOps mental model

• Make Infrastructure Decisions: • Optimize for Efficiency:

• Number of workspaces • Project templates with guardrails and
• Use existing vs. new infrastructure best practices configured.
• Select CI/CD tooling • Git repository creation and connection
• Configure Databricks CLI and IDE
• e.g., VS Code extension
• Create Databricks Environment:
• Unity Catalog Metastore
• 1 or 3 workspaces • Additional Considerations:
• Unity Catalogs (dev, test, staging, prod) • Monitoring and logging
• Service principal permissions • Backup and recovery
• Security best practices
• Network configuration

©2024 Databricks Inc. — All rights reserved

Deep Dive: Infrastructure
A closer look at the Infrastructure stage

©2024 Databricks Inc. — All rights reserved

W
Development
Developing a machine learning project

• Who: Data Scientist

• How often:
• Initial solution development
• Solution updates Development
4
• What: Developing the
• EDA EDA and ML
Pipelines of the
• ML Development ML solution.
2 3
• ML Validation and Deployment
• Monitoring solution
• How:
• Develop ML pipelines within project
architecture by editing an MLOps
Stacks project

©2024 Databricks Inc. — All rights reserved

Development Tasks
Using our DataOps + DevOps + ModelOps mental model

• Make Changes to Code: • Deploy Solution:

• Add and update code • Deploy all jobs to ensure successful
• Use project templates (DAB, MLOps execution
Stacks, etc.) • Use CLI, scripts, and automation for
• Write notebooks and scripts consistent deployments across
• Create queries and alerts environments
• Commit and pull code changes • Establish rollback strategies

• Validate Data and Code: • Additional Considerations:

• Ensure correct setup before deployment • Implement version control
• Validate using DLT, Asset Bundles, etc. • Optimize deployment workflow
• Confirm data format compliance • Ensure scalability and performance
• Implement data quality checks tuning
• Review for security and compliance • Set up automated notifications
©2024 Databricks Inc. — All rights reserved
Deep Dive: Development
A closer look at the Development stage

©2024 Databricks Inc. — All rights reserved

W
Staging
Testing a machine learning project

• Who: ML Engineer
• How often:
• Set up and run every time a change is
made in Development 4
• Run every time a model is refreshed
• What: Staging
• Project merge 2 3 Establishing the
• Project code testing automated
testing setup for
• How: the ML solution.

• Centralized Git repository

• Automated CI/CD infrastructure tools

©2024 Databricks Inc. — All rights reserved

Staging Tasks
Using our DataOps + DevOps + ModelOps mental model

• Review CI/CD Workflows and Tests: • Analyze Test Results

• Examine existing CI/CD workflows • If pass all tests and get approved,
• Add/change tests as needed • merge into the main branch
• Ensure workflow alignment with project • If tests fail,
• return to the development stage
• Create Pull Request to Run Tests: • Review test reports and logs for insights
• Establish a trigger
• e.g., pull request to merge dev branch • Additional Considerations:
into main branch • Implement automated notifications for
• Run Tests: test results
• Check for conflicts • Monitor staging environment
• Validate CI/CD setup performance
• Validate the project • Ensure data integrity and consistency
• Unit, integration, and stress tests during staging
©2024 Databricks Inc. — All rights reserved
Deep Dive: Staging
A closer look at the Staging stage

©2024 Databricks Inc. — All rights reserved

Production
Deploying and monitoring a machine learning project

• Who: ML Engineer
• How often: Production
• When changes are made and tests are Setting up the
passed 4 deployment and
monitoring of the
• When the model needs refreshed production-grade
ML solution.
• What:
• Automated run/deployment of solution 2 3
• Monitoring of solution
• How:
• Centralized Git repository
• MLOps Stacks project

©2024 Databricks Inc. — All rights reserved

Production Tasks
Using our DataOps + DevOps + ModelOps mental model

• Create/Merge to Release Branch: • Monitor

• Set up release branch • Track system performance
• Include deployment triggers • Monitor deployment health
• Merge updates into release branch • Real-time alerting and notifications
• Log analysis for error detection
• Deploy Solution: • Data and model drift detection
• Automatic deployment triggered by
release branch updates • Additional Considerations:
• Deploy components: • Optimize resource utilization
• Project code • Ensure compliance with security policies
• Data, Model, and Monitoring workflows • Conduct post-deployment reviews
• Compute resources • Implement incident response protocols
• Set up automated retraining pipelines

©2024 Databricks Inc. — All rights reserved

Deep Dive: Production
A closer look at the Production stage

©2024 Databricks Inc. — All rights reserved

Complete MLOps Architecture

©2024 Databricks Inc. — All rights reserved

Architecting MLOps Solutions
DEMONSTRATION

Machine Learning
Pipeline Workflow
with Databricks
SDK
©2024 Databricks Inc. — All rights reserved
Demo
Outline

What we’ll cover:

● Authentication Setup
○ Configure and authenticate access to the Databricks REST API.

● Pipeline Configuration
○ Define and initialize a JSON payload for pipeline tasks.

● Executing the Pipeline

○ Trigger the workflow using the Databricks REST API.

● Monitoring Task Progress

○ Track job status and task completion using REST API calls.

● Notifications on Completion
○ Set up email notifications for workflow completion.

● Retrieving and Displaying Outputs

Model Testing Job

with the
Databricks CLI

Demo
Outline

What we’ll cover:

• CLI Basics
• Execute the help command to explore functionalities
• Workflow Job Configuration
• Create a JSON configuration file for the workflow
• Creating and Running a Workflow Job
• Create the job using Databricks CLI
• Extract job ID and run the job
• Monitoring and Exploring Jobs
• Access the job console
• View and explore tasks and run output.
©2024 Databricks Inc. — All rights reserved
Architecting MLOps Solutions
LAB EXERCISE

Deploying Models
with Jobs and the
Databricks CLI

Lab
Outline

What you’ll do:

• Task 1: Identify and update a model's alias to "Champion".

• Task 2: Configure and use the Databricks CLI to manage jobs.
• Task 3: Create and run a workflow job for model deployment and
Batch Inferencing.
• Task 4: Monitor and explore the executing workflow job.

Implementation
and Monitoring
MLOps Solution

Machine Learning Operations

Learning objectives
Things you’ll be able to do after completing this module

• Understand the integration and application of Databricks MLOps Stacks to

improve CI/CD practices and infrastructure management for machine
learning environments.
• Develop skills to implement effective model monitoring strategies that
encompass business requirements, resource utilization, model
performance, and traceability.
• Develop expertise in diagnosing model drift types and setting up
appropriate retraining triggers to maintain model accuracy and reliability.
• Develop proficiency in employing monitoring techniques that ensure data
integrity, trace model performance, and automate alerts leveraging
Databricks' Lakehouse Monitoring.
©2024 Databricks Inc. — All rights reserved
Implementation and Monitoring MLOps Solution
LECTURE

Implementation
of MLOps Stacks

How do we set all of this up?
We recommend using Databricks MLOps Stacks

Databricks MLOps Stacks

• Easing the implementation and
Out-of-the box MLOps tooling
management of MLOps
infrastructure and architecture
CI/CD
Infrastructure • Return your focus to solving
as-code Orchestration
via GitHub Actions or Azure
DevOps with asset bundles and with Workflows business problems
templates
• Aligned to recommended
deploy-code architecture best
practices
Built on existing Databricks infrastructure components like Workflows, MLflow experiments,
• Current Status: Public Preview
MLflow models, Feature Store

What does MLOps Stacks actually do?
Creates a repo for in a sample project structure for productioning ML
├── README.md
├── requirements.txt
├── databricks.yml
Project structure for structuring ML code
├── training
├── validation
├── deployment
├── tests ML-tailored CI/CD for deploying ML systems across multiple
├ ── .github/.azure environments
├ ── resources
├ ── inference.yml Infra as code for configuring and managing ML resources
├ ── training.yml across multiple environments including model registry,
├ ── ml-artifacts.yml training job, batch jobs, feature engineering, monitoring,
serving endpoints, etc.
└── feature-engineering.yml

W
How do we use MLOps Stacks?
Set up and run the project from the command line in Public Preview

Set up the project Run the project

> databricks bundle init mlops-stacks > databricks bundle validate
> # … answer the prompts > databricks bundle deploy -t <env-name>
> databricks bundle run -t <env-name> <job-name>
> databricks bundle destroy -t <env-name>

● Initialize the project ● Validate the project

● Answer prompts with specific ● Deploy the project to environment
details ● Run the project jobs
● Edit the project code ● Delete the project when complete
● Commit/merge the project code

W
Implementation and Monitoring MLOps Solution
LECTURE

Type of Model
Monitoring

Type of Model Monitoring

Business Requirements Model Performance

● Ensuring the ML solution aligns with and fulfills ● Tracking the accuracy and efficiency of the
specific business objectives. model over time.
● Regular assessments to ensure continued ● Detecting and addressing types of drift or
relevance to evolving business needs. degradation.

Resource Utilization Traceability

● Ensuring efficient resource utilization within the ● Facilitating audit trails for troubleshooting,
ML infrastructure. regulatory compliance, and model improvement.
● Compliance with Service Level Agreements ● Tracking data lineage to understand the origin,
(SLAs) for system performance and availability. movement, and transformation of data.

Four Types of Drift

Data Drift: Concept Drift:

● Occurs when the statistical properties of the Happens when the relationship between input
input data change over time. features and the target variable changes.
● Can impact the model's quality by introducing
Forces models to adapt to new patterns to stay
inconsistencies in the data patterns.
relevant.

Model Quality Drift: Bias Drift:

● Involves shifts in model outcomes that could lead
• Reflects a decrease in the model's predictive to unfair treatment of certain groups.
performance over time. ● Monitoring for bias drift is crucial to maintain
• Can be detected through worsening metrics like fairness and ethical standards in model
accuracy, precision, recall, or F1 score. predictions.

Illustrating Data and Model Drift
Visualizing Changes in Data and Model Performance Over Time

Data Drift Model Drift

Training data distribution: Right After Deployment: After a Period of Time:

80 5 50 35

Production data distribution:

10 85 10 85

Illustrating Concept and Bias Drift
Visualizing Changes in Concept and Bias Over Time

Concept Drift Bias Drift

Change in Group Prediction accuracy varies

Distribution & New Group across different groups

Data Drift Scenario:
Event: Introduction of a New Product Line

Scenario: The company decides to introduce a new line of eco-friendly

products, heavily promoting them through social media and email campaigns.
This new product line attracts a different demographic compared to the
existing customer base.

Addressing Data Drift:

Changes in Data: Impact of Data Drift:

• Demographic Shift ● Prediction Accuracy: Training data no ● Collect New Data

• Browsing Behavior longer represents the current ● Retrain the Model
customer behavior and product ● Monitor Continuously
• Historical Sales Data
offerings.
• Promotional
● Sales Forecasting: The sales of eco-
Activities
friendly products is underestimated
and the sales of other products is
overestimated.

● Scheduled Retraining:
○ Databricks recommends starting with scheduled, periodic retraining and moving to triggered
retraining when needed.
● Data Changes:
○ Changes in the data can either explicitly trigger a retraining job or it can be automated if data drift
is detected.
● Model Code Changes:
○ Retraining can be triggered by changes in the model code, often due to concept drift or other
factors that necessitate an update in the model.
● Model Configuration Changes:
○ Alterations in the model configuration can also initiate a retraining job.
● Monitoring and Alerts:
○ Jobs can monitor data and model drift, and Databricks SQL dashboards can display status and
send alerts
©2024 Databricks Inc. — All rights reserved
Implementation and Monitoring MLOps Solution
LECTURE

Monitoring in
Machine Learning

Monitoring ML Systems
Continuous logging and review of key component/system metrics

Why?
Used to help diagnose issues before they become severe or costly

Data to Monitor ML Assets to Monitor

Input data (tricky with existing models) Mid-training checkpoints for analysis
Data in feature stores and vector databases Component evaluation metrics
Human feedback data ML system evaluation metrics
Model Outputs Performance/cost details

Lakehouse Monitoring
Manage, govern, evaluate, and switch models easily

• Monitor data and AI assets

• Centralized and standardized
mechanism for monitoring models in
production
• Simplified, built-in tool for monitoring
mechanisms to diagnose errors, detect
drift, etc.
• Allow for the creation of additional
custom metrics.
• Alerting to get notified on drift or
quality issues.
©2024 Databricks Inc. — All rights reserved
What does this look like in practice?

DataOps

MLOp
s
DevOps ModelOps

Lakehouse Monitoring Capabilities
Monitor Key Metrics
For fields such as Nulls, null %, zeros, zero %, avg, distincts,
distinct %, max, min, stdev, median, max/min/avg length, value
frequencies, quantiles, row counts

Define (multiple) time granularities

Monitor metrics over time windows, e.g. every day, every 5
minutes, over n weeks UI (UC Data Explorer)
Monitor Data Slices import databricks.lakehouse_monitoring as lm
Slice metrics based on columns or predicates, e.g. state, # Set up monitoring parameters
product_class, “cart_total > 1000”
lm.create_monitor(

Monitor Tables, VIEWS and ML Models table_name=”my_UC_table”,

Consistent quality & drift monitoring of all your production …)

assets including machine-learning model’s fairness & bias # Refresh monitoring metrics

lm.run_refresh(”my_table”)

Python API

Lakehouse Monitoring Capabilities
Dashboards and Alerts Auto-generated dashboard

Auto-generated DB SQL dashboards to visualize

metrics & trends, SQL Alerts for notifications

Open Monitoring Results

Monitoring results stored in open format Delta tables
to build custom analytics using your favorite BI tool

Simple Operations
Databricks managed compute eliminates
infrastructure management and scaling complexity

What are Databricks SQL alerts?

• Databricks SQL alerts periodically run

queries, evaluate defined conditions, and
send notifications if a condition is met.
• Scheduled Execution: Automatically runs
queries at defined intervals to check specific
conditions.
• Multi-Channel Notifications: Receive alerts via
Email, Slack, Webhook, MS Teams, PagerDuty,
and more.
• Explore the documentation for in-depth
setup and customization options.

Implementation and Monitoring MLOps Solution
DEMONSTRATION

Lakehouse
Monitoring
Dashboard

Demo
Outline

What we’ll cover:

• Train and analyze a machine learning model's inference logs.
• Monitor the model's performance and detect anomalies or drift.
• Handle drift detection and trigger retraining when needed.
• Utilize Databricks Lakehouse Monitoring to continuously track and alert on model
performance metrics.

Implementation and Monitoring MLOps Solution
LAB EXERCISE

Model Monitoring

Lab
Outline

What you’ll do:

● Task 1: Save the Training Data as Reference for Drift
● Task 2: Processing and Monitoring Inference Data
○ 2.1: Monitoring the Inference Table
○ 2.2: Processing Inference Table Data
○ 2.3: Analyzing Processed Requests
● Task 3: Persisting Processed Model Logs
● Task 4: Setting Up and Monitoring Inference Data
○ 4.1: Creating an Inference Monitor with Databricks Lakehouse Monitoring
○ 4.2: Inspect and Monitor Metrics Tables

Summary and
Next Steps

MLOps Notes
100% (1)
MLOps Notes
48 pages
AWS MLOps Slides
No ratings yet
AWS MLOps Slides
185 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
MLOPS Unit 1
No ratings yet
MLOPS Unit 1
10 pages
Practical MLOPS
100% (1)
Practical MLOPS
52 pages
ML in Production en
No ratings yet
ML in Production en
106 pages
The Big Book of Mlops
No ratings yet
The Big Book of Mlops
49 pages
MLOps Specialization Course April 2024
100% (1)
MLOps Specialization Course April 2024
25 pages
SAP Data Quality
No ratings yet
SAP Data Quality
58 pages
Data Engineering With Databricks Da
100% (2)
Data Engineering With Databricks Da
232 pages
MLOps and LLMOps Platform Development Team Structure
No ratings yet
MLOps and LLMOps Platform Development Team Structure
33 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
MLops Concept
No ratings yet
MLops Concept
20 pages
Mlops: 5 Steps To Operationalize Machine Learning Models
No ratings yet
Mlops: 5 Steps To Operationalize Machine Learning Models
17 pages
Machine Learning Operations (Mlops) : Overview, Definition, and Architecture
No ratings yet
Machine Learning Operations (Mlops) : Overview, Definition, and Architecture
13 pages
Session 29 - MLOps Tools Overview-New
100% (1)
Session 29 - MLOps Tools Overview-New
40 pages
Generalization
No ratings yet
Generalization
4 pages
ML Ops
100% (1)
ML Ops
19 pages
Databricks Certified Machine Learning Professional Exam Guide
No ratings yet
Databricks Certified Machine Learning Professional Exam Guide
9 pages
Upload A ZIP File To Application Server - SAP Q&A
0% (1)
Upload A ZIP File To Application Server - SAP Q&A
2 pages
MLOps Interview QnA
No ratings yet
MLOps Interview QnA
19 pages
Study Notes For DB Design and Management Exam 1
No ratings yet
Study Notes For DB Design and Management Exam 1
13 pages
DBMS (CST-227) Unit-1
No ratings yet
DBMS (CST-227) Unit-1
22 pages
Integration CheckPoint EndPoint Security VPN and Microsoft Azure MFA
No ratings yet
Integration CheckPoint EndPoint Security VPN and Microsoft Azure MFA
27 pages
Chapter 7 Common Standard in Cloud Computing: Working Group
No ratings yet
Chapter 7 Common Standard in Cloud Computing: Working Group
6 pages
SOFTWARE PROJECT MANAGEMENT Jntu
No ratings yet
SOFTWARE PROJECT MANAGEMENT Jntu
4 pages
Code Vulnerability Analyzer
No ratings yet
Code Vulnerability Analyzer
26 pages
01 Enterprise Architecture Trends Report
No ratings yet
01 Enterprise Architecture Trends Report
23 pages
Mule ESB Overview
No ratings yet
Mule ESB Overview
14 pages
Site Analytics in IBM WebSphere Portal
100% (1)
Site Analytics in IBM WebSphere Portal
11 pages
Machine Learning Operations MLOps Overview Definition and Architecture
No ratings yet
Machine Learning Operations MLOps Overview Definition and Architecture
14 pages
Info Assurance2 Prelim Exam
No ratings yet
Info Assurance2 Prelim Exam
3 pages
MLOps - Definitions, Tools and Challenges
100% (1)
MLOps - Definitions, Tools and Challenges
8 pages
ISM - Guidelines For Cyber Security Incidents (September 2024)
No ratings yet
ISM - Guidelines For Cyber Security Incidents (September 2024)
8 pages
Machine Learning Operations
No ratings yet
Machine Learning Operations
11 pages
Result Management System UML Diagram FreeProjectz
No ratings yet
Result Management System UML Diagram FreeProjectz
14 pages
Ethical Hacker Roadmap by ChatGPT
No ratings yet
Ethical Hacker Roadmap by ChatGPT
4 pages
Webinar Slides Mlops
100% (1)
Webinar Slides Mlops
35 pages
PMIT 300 Charter-1
No ratings yet
PMIT 300 Charter-1
4 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
FAQ - Schedule MRP Runs in SAP S - 4HANA - SAP Blogs
No ratings yet
FAQ - Schedule MRP Runs in SAP S - 4HANA - SAP Blogs
6 pages
Module 9
No ratings yet
Module 9
6 pages
MLOps Specialization Course
No ratings yet
MLOps Specialization Course
29 pages
Applies To:: Important Oracle Solaris 11.3 SRU Issues (Doc ID 2076753.1)
No ratings yet
Applies To:: Important Oracle Solaris 11.3 SRU Issues (Doc ID 2076753.1)
37 pages
Normalization
No ratings yet
Normalization
48 pages
Dbms Aptitute Q and A
No ratings yet
Dbms Aptitute Q and A
63 pages
2018 NEW Questions and Answers RELEASED In: Microsoft 70-535: Architecting Microsoft Azure Solutions Exam
No ratings yet
2018 NEW Questions and Answers RELEASED In: Microsoft 70-535: Architecting Microsoft Azure Solutions Exam
10 pages
Scalable Machine Learning With Apache Spark en
No ratings yet
Scalable Machine Learning With Apache Spark en
145 pages
Laudon Ess7 ch07
No ratings yet
Laudon Ess7 ch07
26 pages
Power Point For Ubuntu Comend
No ratings yet
Power Point For Ubuntu Comend
25 pages
Storage Networking Design and Management
No ratings yet
Storage Networking Design and Management
15 pages
Client: Informatica: Primary Responsibilities Would Include
No ratings yet
Client: Informatica: Primary Responsibilities Would Include
3 pages
1 Mlops Intro Lecture
No ratings yet
1 Mlops Intro Lecture
10 pages
Batch Vs Online ML: Wednesday, March 17, 2021 5:30 PM
No ratings yet
Batch Vs Online ML: Wednesday, March 17, 2021 5:30 PM
436 pages
100 Days of ML
No ratings yet
100 Days of ML
383 pages
Peter Maddison
No ratings yet
Peter Maddison
19 pages
MLOps Research Work by Arka Roy
No ratings yet
MLOps Research Work by Arka Roy
21 pages
Course Catalog
No ratings yet
Course Catalog
58 pages
DevOps To MLOps Roadmap
No ratings yet
DevOps To MLOps Roadmap
2 pages
Segmentation Dataset
No ratings yet
Segmentation Dataset
41 pages
CS213 - Fundamentals of Databases: Assignment 4
No ratings yet
CS213 - Fundamentals of Databases: Assignment 4
2 pages
VTH Sem Syllabus
No ratings yet
VTH Sem Syllabus
37 pages
Pivot & Unpivot
No ratings yet
Pivot & Unpivot
5 pages
Diffusion Models Part3
No ratings yet
Diffusion Models Part3
24 pages
Tantithamthavorn Et Al - 2025
No ratings yet
Tantithamthavorn Et Al - 2025
7 pages
Nasscom Mlops Playbook 2022
No ratings yet
Nasscom Mlops Playbook 2022
55 pages
AI PathShala Summer Training Program
No ratings yet
AI PathShala Summer Training Program
4 pages
Unit 1
No ratings yet
Unit 1
21 pages
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
No ratings yet
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
9 pages
VIANOPS - Whitepaper 3 16
No ratings yet
VIANOPS - Whitepaper 3 16
18 pages
MLOps Brochure Datamites India V6.1
No ratings yet
MLOps Brochure Datamites India V6.1
16 pages
Driving Innovation With Ai: Getting Ahead With Dataops and Mlops
No ratings yet
Driving Innovation With Ai: Getting Ahead With Dataops and Mlops
27 pages
POV MLOPs With Databricks 19-04-2023
No ratings yet
POV MLOPs With Databricks 19-04-2023
17 pages
MLops 2
No ratings yet
MLops 2
50 pages
Get Started With Databricks For Machine Learning
No ratings yet
Get Started With Databricks For Machine Learning
85 pages
Machine Learning Operations A Mapping Study
No ratings yet
Machine Learning Operations A Mapping Study
9 pages
The Ultimate Guide To MLOps Ebook
No ratings yet
The Ultimate Guide To MLOps Ebook
10 pages
Resilient Machine Learning With MLOps - v2.0
No ratings yet
Resilient Machine Learning With MLOps - v2.0
15 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
27 pages
Kreuzberger, Kühl, & Hirschl (202 - ) Machine Learning Operations (MLOps) Overview, Definition, and Architecture
No ratings yet
Kreuzberger, Kühl, & Hirschl (202 - ) Machine Learning Operations (MLOps) Overview, Definition, and Architecture
13 pages
Machine Learning Operations (MLOps) - Overview, Definition, and Architecture
No ratings yet
Machine Learning Operations (MLOps) - Overview, Definition, and Architecture
13 pages
A Guide To MLOps Canonical PDF 1680071776
No ratings yet
A Guide To MLOps Canonical PDF 1680071776
19 pages
MLOps Specialization Course January 2024
No ratings yet
MLOps Specialization Course January 2024
24 pages
Road Map To Become Machine Learning Engineer
No ratings yet
Road Map To Become Machine Learning Engineer
1 page
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
No ratings yet
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
38 pages
A Guide To MLOps
No ratings yet
A Guide To MLOps
3 pages
MLOps PDF
No ratings yet
MLOps PDF
1 page