0% found this document useful (0 votes)
13 views92 pages

Machine Learning Operations

The document outlines a course on Machine Learning Operations (MLOps) offered by Databricks Academy, focusing on modern practices within DataOps, DevOps, and ModelOps. It details learning objectives, prerequisites, and a comprehensive agenda covering MLOps principles, architectures, implementation, and monitoring on the Databricks platform. The course emphasizes the importance of quality data and operational practices in enhancing machine learning project efficiency and effectiveness.

Uploaded by

jmpaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views92 pages

Machine Learning Operations

The document outlines a course on Machine Learning Operations (MLOps) offered by Databricks Academy, focusing on modern practices within DataOps, DevOps, and ModelOps. It details learning objectives, prerequisites, and a comprehensive agenda covering MLOps principles, architectures, implementation, and monitoring on the Databricks platform. The course emphasizes the importance of quality data and operational practices in enhancing machine learning project efficiency and effectiveness.

Uploaded by

jmpaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Machine Learning

Operations

Databricks Academy
January, 2025
©2024 Databricks Inc. — All rights reserved
Course Learning Objectives
● Explain modern machine learning operations within the frameworks of
DataOps, DevOps, and ModelOps.
● Relate MLOps activities to the features and tools available in Databricks,
and explore their practical applications in the machine lifecycle.
● Design and implement basic machine learning operations, including
setting up and executing a machine learning project on Databricks,
following best practices and recommended tools.
● Detail Implementation and Monitoring capabilities of MLOps solutions
on Databricks.

©2024 Databricks Inc. — All rights reserved


Prerequisites/Technical Considerations
Things to keep in mind before you work through this course

Prerequisites Technical Considerations


Basic knowledge of traditional machine
1 learning concepts 1 A cluster running on DBR ML 15.4+
Unity Catalog, Model Serving, and
Beginner experience with traditional 2 Lakehouse Monitoring enabled
2 machine learning development on workspace
Databricks
3 CLI Authentication
Intermediate knowledge of Python for
3 machine learning projects

Recommended: Beginner experience with


4
basic DevOps concepts like CI/CD

©2024 Databricks Inc. — All rights reserved


AGENDA
1. Modern MLOps Time Lecture DEMO LAB
Defining MLOps 20 min ✔

20 min
MLOps on Databricks 15 min ✔ ✔ ✔
15 min

02. Architecting MLOps Solutions


Opinionated MLOps Principles
Course Agenda
15 min
25 min

Recommended MLOps Architectures 15 min ✔ ✔ ✔


15 min

03. Implementation and Monitoring MLOps Solution


MLOps Stacks Overview 10 min ✔

Type of Model Monitoring 15 min ✔

15 min
Monitoring in Machine Learning 25 min ✔ ✔ ✔
30 min

©2024 Databricks Inc. — All rights reserved


Modern MLOps

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved


Learning Objectives
● Explain the significance of MLOps by integrating DataOps, DevOps, and
ModelOps in modern machine learning.
● Identify and understand the components of DataOps, DevOps, and
ModelOps within the context of machine learning.
● Describe Databricks' capabilities for handling tasks related to DataOps,
DevOps, and ModelOps.
● Relate Databricks features and services to practical applications in
DataOps, DevOps, and ModelOps tasks.

©2024 Databricks Inc. — All rights reserved


Modern MLOps
LECTURE

Defining MLOps

©2024 Databricks Inc. — All rights reserved


The Machine Learning Full Lifecycle
End to End Process from Business Problem to Deployment and Monitoring

Data
Business Define Success Preprocessing/
Data Collection Model Training
Problem Criteria Feature
Engineering

Model Development Model


(use static historical data) Evaluation

Deployment & Production


(deal with continuously Model Model
changing new data) Monitoring Deployment

©2024 Databricks Inc. — All rights reserved


Defining MLOps
An all-inclusive, holistic approach to managing ML systems

The set of practices, processes, and technologies for


managing data, code, and models to improve
performance stability and long-term efficiency in
ML systems.

©2024 Databricks Inc. — All rights reserved


Understanding the Components of MLOps
MLOps components each address a specific part of ML projects

DataOps DevOps ModelOps

A set of practices, A set of practices, A set of practices,


processes, and technologies processes, and technologies processes, and technologies
to organize and improve to integrate and automate to organize and govern the
processes around data to software development lifecycle of machine learning
increase speed, governance, workflows. and artificial intelligence
quality, and collaboration. models.

©2024 Databricks Inc. — All rights reserved


Responsibilities of MLOps Components
Key Functions of DataOps, DevOps, and ModelOps in MLOps

DataOps DevOps ModelOps

• Optimized data • Machine learning is code • Move beyond models as


processing • Continuous integration objects
• Centralized data and continuous • Treating model code as
discovery, management, deployment (CI/CD) software
and governance • Version control via Git • Treating models as data
• Ensured data quality • Production-grade • Manage the model
• Traceable data lineage workflows lifecycle
and monitoring • Orchestration
• Automation

©2024 Databricks Inc. — All rights reserved


Comprehensive MLOps
Operationalizing the entire machine learning solution

DataOps

The set of processes and


automation for managing
data, code, and models to
improve performance stability MLOps
and long-term efficiency in ML
systems. DevOps ModelOps

©2024 Databricks Inc. — All rights reserved


A Simple Example ML Project
Retail Recommendation System

1. A business owner defines a problem to be solved with a


recommendation service.
2. A data scientist begins exploring governed data associated with the
service.
3. A data scientist develops a scalable ML solution using using relevant
data while tracking the experiment.
4. A machine learning engineer implements CI/CD, automates the ML
solution, and establishes model performance monitoring; the data is
written to a production catalog.

©2024 Databricks Inc. — All rights reserved


Why does MLOps matter?
Success depends on quality data and operations practices

• Defining an effective strategy Real-world Example


• ML systems built on quality data
• Streamlining the process of taking Databricks customer CareSource accelerated
solutions to production their model’s development and deployment,
• Operationalizing performance and resulting in a self-service MLOps solution for
effectiveness monitoring data scientists that reduced ML project time
• So what? from 8 weeks to 3-4 weeks.
• Time to realizing business value is
accelerated The CareSource team can extend this
• Reduction in manual oversight by high- approach to other machine learning projects,
value data science teams realizing this benefit broadly.
Learn more about the work here.

©2024 Databricks Inc. — All rights reserved


W
DataOps, DevOps, and ModelOps in MLOps
Effective machine learning involves managing data, code, and the model
lifecycle to maintain and improve performance.
Model Lifecycle Management

Data and
Data Model Model Model
EDA Model
Preparation Development Validation Serving
Monitoring CI/CD
Workflows
Tooling
Scalable, efficient, and performant data processing
DAG-based
Data Processing Solution Code management, orchestration, job
version control, and scheduling
Unified security, governance, and cataloging automatic testing
Data Governance Solution
Unified data storage for reliability, quality, and sharing
Data Storage and Management Solution

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved


A Single Platform for Modern MLOps
Combining DataOps, DevOps, and ModelOps solutions

Model Registry, Model Serving, and Lakehouse Monitoring

Data and
Data Model Model Model
EDA
Preparation Development Validation Serving
Model
Monitoring
Workflow
Repos Orchestration
Scalable, efficient, and performant data processing
Code management, DAG-based
Apache Spark and Photon version control, and orchestration, job
automatic testing scheduling
Unified security, governance, and cataloging
Unity Catalog
Unified data storage for reliability, quality, and sharing
Delta Lake

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved


Modern MLOps
LECTURE

MLOps on
Databricks

©2024 Databricks Inc. — All rights reserved


DataOps tasks and tools in Databricks
The table lists common DataOps tasks and tools in Databricks:
Ingest & transform data Autoloader and Apache Spark*

Track data changes to including


versioning & lineage
Delta tables*

Build, manage, & monitor


data processing pipelines
Delta Live Tables*

Ensure data security &


governance
Unity Catalog*

Exploratory data analysis, Databricks SQL, Dashboards,


dashboards, & general coding and Databricks notebooks
Schedule data pipelines &
Automate general workflows
Databricks Workflows

Create, store, manage, &


discover features
Databricks Feature Store*

* Discussed in other associated Data monitoring Lakehouse Monitoring**


machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Databricks SQL

Delivering analytics on the Data Science


& AI
ETL &
Real-time Analytics
Orchestration Data
Warehousing

freshest data with data Mosaic AI Delta Live Tables Workflows Databricks SQL

warehouse performance and data Use generative AI to understand the semantics of your data

Data Intelligence Engine


lake economics Unity Catalog
Securely get insights in natural language

■ Better price / performance than other


Delta Lake
cloud data warehouses Data layout is automatically optimized based on usage patterns
■ Simplify discovery and sharing of new
insights Open Data Lake
■ Connect to familiar BI tools, like Tableau All Raw Data
(Logs, Texts, Audio, Video,
or Power BI Images)

■ Simplified administration and governance

©2024 Databricks Inc. — All rights reserved


A new home for Data Analysts

Enable data analysts to quickly


perform ad-hoc and exploratory
data analysis, with a new SQL
query editor, visualizations and
dashboards. Automatic alerts can
be triggered for critical changes,
allowing to respond to business
needs faster.

©2024 Databricks Inc. — All rights reserved


DataOps Tasks and Tools in Databricks
The table lists common DataOps tasks and tools in Databricks:
Ingest & transform data Autoloader and Apache Spark*

Track data changes to including


versioning & lineage
Delta tables*

Build, manage, & monitor


data processing pipelines
Delta Live Tables*

Ensure data security &


governance
Unity Catalog*

Exploratory data analysis, Databricks SQL, Dashboards,


dashboards, & general coding and Databricks notebooks
Schedule data pipelines &
Automate general workflows
Databricks Workflows

Create, store, manage, &


discover features
Databricks Feature Store*

* Discussed in other associated Data monitoring Lakehouse Monitoring**


machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Databricks Workflows Databricks Workflows

Databricks
Data ScienceAI Delta Live
ETL &Tables Workflows
Orchestration Data
& AI Real-time Analytics Databricks SQL
Warehousing
Workflows is a fully-managed cloud- Create, tune, and
Databricks
serve AI
custom LLMs Delta
Automated
Live
data Tables
quality
Job cost optimized
Workflows
based on past runs
Text-to-SQL
Databricks SQL

based general-purpose task Use generative AI to understand the semantics of your data

Data Intelligence Engine


orchestration service for the entire
Unified security, governance, and cataloging
Lakehouse. Unity Catalog
Unity
Securely get Catalog
insights in natural language

Unified data storage for reliability and sharing


Delta Lake
Delta
Data layout is automatically Lake based on usage patterns
optimized
Workflows is a service for data engineers,
data scientists and analysts to build Open Data Lake
All Raw Data
reliable data, analytics and AI workflows (Logs, Texts, Audio, Video, Images)

on any cloud.

©2024 Databricks Inc. — All rights reserved


Databricks Workflows
Databricks has two main task orchestration services:
Job Workflows Delta Live Tables

• Execute jobs on a • Perform machine • Implement a variety of • ETL processes


predefined schedule learning operations by tasks within a job • Compatible with batch
• Series of interrelated running tasks within job • Using notebooks, JARS, and streaming inputs
tasks. frameworks like MLflow Delta Live Tables • Enforced data quality
pipelines, or Python, and consistency.
Scala, Spark submit, • Tracking & logging of
SQL and Java data transformation
applications
Arbitrary Code,
Orchestration of Machine External API Data Ingestion
and
Dependent Jobs Learning Tasks Call, Custom Transformation
Tasks

Note: DLT pipeline can be a task in a workflow

©2024 Databricks Inc. — All rights reserved


Workflows Jobs
Key Features

Workflow Job • Easy creation, scheduling, and orchestration of


your code with a DAG (Directed Acyclic Graphs)
• Key features
DAG of tasks • Simplicity: Easy creation and monitoring in the
UI
• Many Tasks types suited to your workload
• Fully integrated in Databricks platform, making
inspecting results, debugging faster
• Reliability of proven Databricks scheduler
• Observability to easily monitor status

©2024 Databricks Inc. — All rights reserved


Building Blocks of Databricks Workflows Job
A unit of orchestration in Databricks Workflows is called a Job.

Jobs consist of
one or more Tasks
Databricks Python Python SQL DBSQL Delta Live Tables dbt Java Spark
Notebooks Scripts Wheels Files/Queries Dashboards Pipeline JAR file Submit

Control flows can


be established
between Tasks. Conditionals Jobs as a Task
Sequential Parallel
(Run If) (Modular)

Private
Preview

Jobs supports
different Triggers
Manual Scheduled API File Delta Table Continuous
Trigger (Cron) Trigger Arrival Update (Streaming)

©2024 Databricks Inc. — All rights reserved 26


ModelOps Tasks and Tools in Databricks
The table lists common ModelOps tasks and tools provided by
Databricks:
Manage model lifecycle Models in Unity Catalog*

Track model development MLflow model tracking*

Model code version control


and sharing
Databricks Repos

No-code model development Databricks AutoML*

Model monitoring Lakehouse Monitoring**

* Discussed in other associated


machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Introduction to Databricks Repos
Databricks Repos provide a visual Git client & API within Databricks, allowing
users to manage code repositories, collaborate, and integrate with Git services.

Key Features:
● Seamless Git integration.
● Collaborative coding
environment.
● Simplified version control.

©2024 Databricks Inc. — All rights reserved


Databricks Repo Setup and Commands
Once setup easily manage and perform Git operations within Databricks

● From the Databricks UI


common Git operations:
■ Clone
■ Checkout
■ Commit
■ Push
■ Pull
■ Branch management
● Uses Personal Access Token or
equivalent to authenticate.

©2024 Databricks Inc. — All rights reserved


DevOps: Production and automation
The table lists common DevOps tasks and tools provided by Databricks:
Data and model lineage
Access control and governance
Unity Catalog*

Maintain a highly available


low latency REST endpoint
Mosaic AI Model Serving*

Automate and schedule Databricks workflows


workloads, from: ETL to ML Databricks also supports integrations with popular third party
orchestrators like Airflow.

Deployment infrastructure for Asset Bundles, Databricks SDKs,


inference and serving Terraform provider, Databricks CLI

Databricks Asset Bundles, Azure


Establish CI/CD Pipelines
DevOps, Jenkins, or GitHub Actions

* Discussed in other associated Monitoring, and Maintaining


machine learning courses. your Applications
Lakehouse Monitoring**
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Overview Developer Tools

Databricks Asset Bundles Databricks CLI Databricks SDKs Terraform provider


(Recommended)

● Infrastructure as Code for ● Easy-to-use interface SDKs for: ● Flexible tool for to
Databricks resources and for automation from • Python manage your
terminal, command • Java Databricks
workflows prompt, or bash scripts. • Go workspaces and the
• R associated cloud
infrastructure

©2024 Databricks Inc. — All rights reserved


What is a DAB?

• Databricks Asset Bundles or DABs are a collection of Databricks artifacts 1 (e.g. jobs, ML
models, DLT pipelines, and clusters) and assets 2 (e.g. Python files, notebooks, SQL queries,
and dashboards).
• These DABs (aka bundles) are configured through YAML files and can be co-versioned in
the same repository as the assets and artifacts referenced in the bundle.
• Using the Databricks CLI these bundles can be materialized across multiple workspaces
like dev / staging and production enabling customers to integrate these into their
automation and CI/CD processes

1 Assets are instantiations of sources that persist state.


2 Artifacts are file-like resources that exist on a workspace path that carry little or no state

©2024 Databricks Inc. — All rights reserved 34


Modern MLOps
DEMONSTRATION

Setting Up and
Managing Workflow
Jobs using UI

©2024 Databricks Inc. — All rights reserved


Demo
Outline

What we’ll cover:


● Create and configure a Databricks Workflow job with multiple
task Notebooks.
● Set dependencies and conditional paths between tasks.
● Enable email notifications for successful job runs.
● Manually trigger the workflow.
● Monitor the job's execution and completion.

©2024 Databricks Inc. — All rights reserved


Modern MLOps
LAB EXERCISE

Creating and
Managing
Workflow Jobs
using UI
©2024 Databricks Inc. — All rights reserved
Lab
Outline

What you’ll do:


• Create and Configure a Workflow Job
• Setup multiple tasks using the UI.
• Enable Email Notifications
• Configure notifications for job status updates.
• Manually Trigger Deployment Workflow
• Initiate the job run manually.
• Monitor Job Run
• Observe job execution and monitor the workflow.

©2024 Databricks Inc. — All rights reserved


Architecting
MLOps Solutions

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved


Learning objectives
Things you’ll be able to do after completing this module

• Explain the importance of using the right MLOps architecture.


• Explain the reasoning behind the opinionated MLOps principles
informing the recommended architecture of Databricks.
• Describe the Databricks-recommended MLOps architecture
approach.
• Architect basic machine learning operations solutions for
traditional machine learning applications based on Databricks-
recommended best practices.
©2024 Databricks Inc. — All rights reserved
Architecting MLOps Solutions
LECTURE

Opinionated
MLOps Principles

©2024 Databricks Inc. — All rights reserved


Guiding Principle
A data-centric approach to machine learning

Machine Learning Pipelines


• ML projects are made up of data
pipelines
• Operationalizing ML solutions
Data Model
EDA Dev Validate Monitor
Prep Serving

requires the connection of a


Scalable, efficient, and performant data processing

Apache Spark and Photon


Repos Workflows
variety of data pipelines
Unified security, governance, and cataloging
• Data pipelines require storage,
Unity Catalog governance, orchestration, etc.
Unified data storage for reliability, quality, and sharing • Aligned to the Data Intelligence
Delta Lake
Platform vision
Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved


Multi*-environment Semantics
Defining Development, Staging, and Production environments

Development Staging Production

An environment where machine


An environment where data An environment where machine learning engineers can deploy
scientists can explore, learning practitioners can test and monitor their solutions
experiment, and develop their solutions

©2024 Databricks Inc. — All rights reserved


Environment Separation
How many Databricks workspaces do we have?

Direct Separation Indirect Separation

dev staging prod dev staging prod

● Completed separate Databricks workspaces ● One Databricks workspace with enforced


for each environment separation
● Simpler environments ● Simpler overall infrastructure requiring less
● Scales well to multiple projects permission required
● Complex individual environment
● Doesn’t scale well to multiple projects

©2024 Databricks Inc. — All rights reserved


Deployment Patterns
Moving from Deploy Model to Deploy Code

Deploy Model Deploy Code (recommended)

dev staging prod dev staging prod

● Model is trained in development ● Code is developed in the development


environment environment
● Model artifact is moved from staging ● Code is tested in the staging environment
through production ● Code is deployed in the production
● Separate process needed for other code environment
(inference, monitoring, operational pipelines) ● Training pipeline is run in each environment,
model is deployed in production

©2024 Databricks Inc. — All rights reserved


Architecting MLOps Solutions
LECTURE

Recommended
MLOps
Architectures

©2024 Databricks Inc. — All rights reserved


Importance of MLOps Architecture
Setting an ML project up for success starts with architecture

Simplicity Efficiency
When ML projects are well architected, the When ML projects are well architected, processes
downstream management, maintenance, and around the project become more efficient.
monitoring of the project is simplified.

Scalability Collaboration
When ML projects are well architected, they can When ML projects are well architected, it’s easy for
easily be scaled to adapt to changing requirements different users and different types of users to
for infrastructure and compute. collaborate effectively.

©2024 Databricks Inc. — All rights reserved


Dimensions of Architecture
Differentiating initial organization/setup and ongoing workflows

Infrastructure Workflow

The organization, governance, and The processes that ML practitioners


setup of environments, data, compute follow within a defined architecture to
and other resources. achieve success on an ML project.
● Set up one time (per project or per ● Repeatable, fluid processes
team/organization) specific to a project
● Crucial to downstream success of ● Aligned to organizational best
project(s) practices

©2024 Databricks Inc. — All rights reserved


Recommended MLOps Architecture
A high-level view of code, data, and ML environments

Code Management
A single project code repository to be used throughout all environments

Development Staging Production


A Databricks workspace (or environment) A Databricks workspace (or environment) A Databricks workspace (or environment)
for exploratory data analysis, model for testing the efficacy of the project, for production ML workflows, scaling, and
training/tracking and validation, including unit tests, integration tests, and monitoring.
deployment for model selection, and performance regression tests.
monitoring.

Data/Artifact Management
A single data/artifact management solution with access to environment-specific catalogs

©2024 Databricks Inc. — All rights reserved


MLOps Solution
Infrastructure through production

Production
Setting up the
4 deployment and
monitoring of the
1 Development production-grade
ML solution.
Infrastructure Setup Developing the EDA
Optional* and ML Pipelines of
the ML solution. Staging
Organization and set up of infrastructure for a
machine learning project
2 3 Establishing the
automated testing
setup for the ML
solution.

©2024 Databricks Inc. — All rights reserved


Infrastructure Setup
Getting set up for a machine learning project

• Who: Architect or Engineer


• How often: Set up once
• What:
• A Unity Catalog metastore
• One or three Databricks workspaces 1
• Three data catalogs: dev, staging, prod Infrastructure Setup
Optional*
• Command line environment
• MLOps Stacks project (per project) Organization and set up of infrastructure for a
machine learning project
• Git repository (per project)
• How:
• Manual setup/Terraform
• MLOps Stacks

©2024 Databricks Inc. — All rights reserved


Infrastructure Setup Tasks
Using our DataOps + DevOps + ModelOps mental model

• Make Infrastructure Decisions: • Optimize for Efficiency:


• Number of workspaces • Project templates with guardrails and
• Use existing vs. new infrastructure best practices configured.
• Select CI/CD tooling • Git repository creation and connection
• Configure Databricks CLI and IDE
• e.g., VS Code extension
• Create Databricks Environment:
• Unity Catalog Metastore
• 1 or 3 workspaces • Additional Considerations:
• Unity Catalogs (dev, test, staging, prod) • Monitoring and logging
• Service principal permissions • Backup and recovery
• Security best practices
• Network configuration

©2024 Databricks Inc. — All rights reserved


Deep Dive: Infrastructure
A closer look at the Infrastructure stage

©2024 Databricks Inc. — All rights reserved


W
Development
Developing a machine learning project

• Who: Data Scientist


• How often:
• Initial solution development
• Solution updates Development
4
• What: Developing the
• EDA EDA and ML
Pipelines of the
• ML Development ML solution.
2 3
• ML Validation and Deployment
• Monitoring solution
• How:
• Develop ML pipelines within project
architecture by editing an MLOps
Stacks project

©2024 Databricks Inc. — All rights reserved


Development Tasks
Using our DataOps + DevOps + ModelOps mental model

• Make Changes to Code: • Deploy Solution:


• Add and update code • Deploy all jobs to ensure successful
• Use project templates (DAB, MLOps execution
Stacks, etc.) • Use CLI, scripts, and automation for
• Write notebooks and scripts consistent deployments across
• Create queries and alerts environments
• Commit and pull code changes • Establish rollback strategies

• Validate Data and Code: • Additional Considerations:


• Ensure correct setup before deployment • Implement version control
• Validate using DLT, Asset Bundles, etc. • Optimize deployment workflow
• Confirm data format compliance • Ensure scalability and performance
• Implement data quality checks tuning
• Review for security and compliance • Set up automated notifications
©2024 Databricks Inc. — All rights reserved
Deep Dive: Development
A closer look at the Development stage

©2024 Databricks Inc. — All rights reserved


W
Staging
Testing a machine learning project

• Who: ML Engineer
• How often:
• Set up and run every time a change is
made in Development 4
• Run every time a model is refreshed
• What: Staging
• Project merge 2 3 Establishing the
• Project code testing automated
testing setup for
• How: the ML solution.

• Centralized Git repository


• Automated CI/CD infrastructure tools

©2024 Databricks Inc. — All rights reserved


Staging Tasks
Using our DataOps + DevOps + ModelOps mental model

• Review CI/CD Workflows and Tests: • Analyze Test Results


• Examine existing CI/CD workflows • If pass all tests and get approved,
• Add/change tests as needed • merge into the main branch
• Ensure workflow alignment with project • If tests fail,
• return to the development stage
• Create Pull Request to Run Tests: • Review test reports and logs for insights
• Establish a trigger
• e.g., pull request to merge dev branch • Additional Considerations:
into main branch • Implement automated notifications for
• Run Tests: test results
• Check for conflicts • Monitor staging environment
• Validate CI/CD setup performance
• Validate the project • Ensure data integrity and consistency
• Unit, integration, and stress tests during staging
©2024 Databricks Inc. — All rights reserved
Deep Dive: Staging
A closer look at the Staging stage

©2024 Databricks Inc. — All rights reserved


Production
Deploying and monitoring a machine learning project

• Who: ML Engineer
• How often: Production
• When changes are made and tests are Setting up the
passed 4 deployment and
monitoring of the
• When the model needs refreshed production-grade
ML solution.
• What:
• Automated run/deployment of solution 2 3
• Monitoring of solution
• How:
• Centralized Git repository
• MLOps Stacks project

©2024 Databricks Inc. — All rights reserved


Production Tasks
Using our DataOps + DevOps + ModelOps mental model

• Create/Merge to Release Branch: • Monitor


• Set up release branch • Track system performance
• Include deployment triggers • Monitor deployment health
• Merge updates into release branch • Real-time alerting and notifications
• Log analysis for error detection
• Deploy Solution: • Data and model drift detection
• Automatic deployment triggered by
release branch updates • Additional Considerations:
• Deploy components: • Optimize resource utilization
• Project code • Ensure compliance with security policies
• Data, Model, and Monitoring workflows • Conduct post-deployment reviews
• Compute resources • Implement incident response protocols
• Set up automated retraining pipelines

©2024 Databricks Inc. — All rights reserved


Deep Dive: Production
A closer look at the Production stage

©2024 Databricks Inc. — All rights reserved


Complete MLOps Architecture

©2024 Databricks Inc. — All rights reserved


Architecting MLOps Solutions
DEMONSTRATION

Machine Learning
Pipeline Workflow
with Databricks
SDK
©2024 Databricks Inc. — All rights reserved
Demo
Outline

What we’ll cover:


● Authentication Setup
○ Configure and authenticate access to the Databricks REST API.

● Pipeline Configuration
○ Define and initialize a JSON payload for pipeline tasks.

● Executing the Pipeline


○ Trigger the workflow using the Databricks REST API.

● Monitoring Task Progress


○ Track job status and task completion using REST API calls.

● Notifications on Completion
○ Set up email notifications for workflow completion.

● Retrieving and Displaying Outputs


○ Access and visualize JSON data and output files from completed tasks.
©2024 Databricks Inc. — All rights reserved
Architecting MLOps Solutions
DEMONSTRATION

Model Testing Job


with the
Databricks CLI

©2024 Databricks Inc. — All rights reserved


Demo
Outline

What we’ll cover:


• CLI Basics
• Execute the help command to explore functionalities
• Workflow Job Configuration
• Create a JSON configuration file for the workflow
• Creating and Running a Workflow Job
• Create the job using Databricks CLI
• Extract job ID and run the job
• Monitoring and Exploring Jobs
• Access the job console
• View and explore tasks and run output.
©2024 Databricks Inc. — All rights reserved
Architecting MLOps Solutions
LAB EXERCISE

Deploying Models
with Jobs and the
Databricks CLI

©2024 Databricks Inc. — All rights reserved


Lab
Outline

What you’ll do:

• Task 1: Identify and update a model's alias to "Champion".


• Task 2: Configure and use the Databricks CLI to manage jobs.
• Task 3: Create and run a workflow job for model deployment and
Batch Inferencing.
• Task 4: Monitor and explore the executing workflow job.

©2024 Databricks Inc. — All rights reserved


Implementation
and Monitoring
MLOps Solution

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved


Learning objectives
Things you’ll be able to do after completing this module

• Understand the integration and application of Databricks MLOps Stacks to


improve CI/CD practices and infrastructure management for machine
learning environments.
• Develop skills to implement effective model monitoring strategies that
encompass business requirements, resource utilization, model
performance, and traceability.
• Develop expertise in diagnosing model drift types and setting up
appropriate retraining triggers to maintain model accuracy and reliability.
• Develop proficiency in employing monitoring techniques that ensure data
integrity, trace model performance, and automate alerts leveraging
Databricks' Lakehouse Monitoring.
©2024 Databricks Inc. — All rights reserved
Implementation and Monitoring MLOps Solution
LECTURE

Implementation
of MLOps Stacks

©2024 Databricks Inc. — All rights reserved


How do we set all of this up?
We recommend using Databricks MLOps Stacks

Databricks MLOps Stacks


• Easing the implementation and
Out-of-the box MLOps tooling
management of MLOps
infrastructure and architecture
CI/CD
Infrastructure • Return your focus to solving
as-code Orchestration
via GitHub Actions or Azure
DevOps with asset bundles and with Workflows business problems
templates
• Aligned to recommended
deploy-code architecture best
practices
Built on existing Databricks infrastructure components like Workflows, MLflow experiments,
• Current Status: Public Preview
MLflow models, Feature Store

©2024 Databricks Inc. — All rights reserved


What does MLOps Stacks actually do?
Creates a repo for in a sample project structure for productioning ML
├── README.md
├── requirements.txt
├── databricks.yml
Project structure for structuring ML code
├── training
├── validation
├── deployment
├── tests ML-tailored CI/CD for deploying ML systems across multiple
├ ── .github/.azure environments
├ ── resources
├ ── inference.yml Infra as code for configuring and managing ML resources
├ ── training.yml across multiple environments including model registry,
├ ── ml-artifacts.yml training job, batch jobs, feature engineering, monitoring,
serving endpoints, etc.
└── feature-engineering.yml

©2024 Databricks Inc. — All rights reserved


W
How do we use MLOps Stacks?
Set up and run the project from the command line in Public Preview

Set up the project Run the project


> databricks bundle init mlops-stacks > databricks bundle validate
> # … answer the prompts > databricks bundle deploy -t <env-name>
> databricks bundle run -t <env-name> <job-name>
> databricks bundle destroy -t <env-name>

● Initialize the project ● Validate the project


● Answer prompts with specific ● Deploy the project to environment
details ● Run the project jobs
● Edit the project code ● Delete the project when complete
● Commit/merge the project code

©2024 Databricks Inc. — All rights reserved


W
Implementation and Monitoring MLOps Solution
LECTURE

Type of Model
Monitoring

©2024 Databricks Inc. — All rights reserved


Type of Model Monitoring

Business Requirements Model Performance


● Ensuring the ML solution aligns with and fulfills ● Tracking the accuracy and efficiency of the
specific business objectives. model over time.
● Regular assessments to ensure continued ● Detecting and addressing types of drift or
relevance to evolving business needs. degradation.

Resource Utilization Traceability


● Ensuring efficient resource utilization within the ● Facilitating audit trails for troubleshooting,
ML infrastructure. regulatory compliance, and model improvement.
● Compliance with Service Level Agreements ● Tracking data lineage to understand the origin,
(SLAs) for system performance and availability. movement, and transformation of data.

©2024 Databricks Inc. — All rights reserved


Four Types of Drift

Data Drift: Concept Drift:


● Occurs when the statistical properties of the Happens when the relationship between input
input data change over time. features and the target variable changes.
● Can impact the model's quality by introducing
Forces models to adapt to new patterns to stay
inconsistencies in the data patterns.
relevant.

Model Quality Drift: Bias Drift:


● Involves shifts in model outcomes that could lead
• Reflects a decrease in the model's predictive to unfair treatment of certain groups.
performance over time. ● Monitoring for bias drift is crucial to maintain
• Can be detected through worsening metrics like fairness and ethical standards in model
accuracy, precision, recall, or F1 score. predictions.

©2024 Databricks Inc. — All rights reserved


Illustrating Data and Model Drift
Visualizing Changes in Data and Model Performance Over Time

Data Drift Model Drift

Training data distribution: Right After Deployment: After a Period of Time:

80 5 50 35

Production data distribution:


10 85 10 85

©2024 Databricks Inc. — All rights reserved


Illustrating Concept and Bias Drift
Visualizing Changes in Concept and Bias Over Time

Concept Drift Bias Drift

Change in Group Prediction accuracy varies


Distribution & New Group across different groups

©2024 Databricks Inc. — All rights reserved


Data Drift Scenario:
Event: Introduction of a New Product Line

Scenario: The company decides to introduce a new line of eco-friendly


products, heavily promoting them through social media and email campaigns.
This new product line attracts a different demographic compared to the
existing customer base.

Addressing Data Drift:


Changes in Data: Impact of Data Drift:

• Demographic Shift ● Prediction Accuracy: Training data no ● Collect New Data


• Browsing Behavior longer represents the current ● Retrain the Model
customer behavior and product ● Monitor Continuously
• Historical Sales Data
offerings.
• Promotional
● Sales Forecasting: The sales of eco-
Activities
friendly products is underestimated
and the sales of other products is
overestimated.

©2024 Databricks Inc. — All rights reserved Note: The scenario does overlap with concept drift.
ML Model Retraining Triggers

● Scheduled Retraining:
○ Databricks recommends starting with scheduled, periodic retraining and moving to triggered
retraining when needed.
● Data Changes:
○ Changes in the data can either explicitly trigger a retraining job or it can be automated if data drift
is detected.
● Model Code Changes:
○ Retraining can be triggered by changes in the model code, often due to concept drift or other
factors that necessitate an update in the model.
● Model Configuration Changes:
○ Alterations in the model configuration can also initiate a retraining job.
● Monitoring and Alerts:
○ Jobs can monitor data and model drift, and Databricks SQL dashboards can display status and
send alerts
©2024 Databricks Inc. — All rights reserved
Implementation and Monitoring MLOps Solution
LECTURE

Monitoring in
Machine Learning

©2024 Databricks Inc. — All rights reserved


Monitoring ML Systems
Continuous logging and review of key component/system metrics

Why?
Used to help diagnose issues before they become severe or costly

Data to Monitor ML Assets to Monitor


Input data (tricky with existing models) Mid-training checkpoints for analysis
Data in feature stores and vector databases Component evaluation metrics
Human feedback data ML system evaluation metrics
Model Outputs Performance/cost details

©2024 Databricks Inc. — All rights reserved


Lakehouse Monitoring
Manage, govern, evaluate, and switch models easily

• Monitor data and AI assets


• Centralized and standardized
mechanism for monitoring models in
production
• Simplified, built-in tool for monitoring
mechanisms to diagnose errors, detect
drift, etc.
• Allow for the creation of additional
custom metrics.
• Alerting to get notified on drift or
quality issues.
©2024 Databricks Inc. — All rights reserved
What does this look like in practice?

DataOps

MLOp
s
DevOps ModelOps

©2024 Databricks Inc. — All rights reserved


Lakehouse Monitoring Capabilities
Monitor Key Metrics
For fields such as Nulls, null %, zeros, zero %, avg, distincts,
distinct %, max, min, stdev, median, max/min/avg length, value
frequencies, quantiles, row counts

Define (multiple) time granularities


Monitor metrics over time windows, e.g. every day, every 5
minutes, over n weeks UI (UC Data Explorer)
Monitor Data Slices import databricks.lakehouse_monitoring as lm
Slice metrics based on columns or predicates, e.g. state, # Set up monitoring parameters
product_class, “cart_total > 1000”
lm.create_monitor(

Monitor Tables, VIEWS and ML Models table_name=”my_UC_table”,

Consistent quality & drift monitoring of all your production …)

assets including machine-learning model’s fairness & bias # Refresh monitoring metrics

lm.run_refresh(”my_table”)

Python API

©2024 Databricks Inc. — All rights reserved


Lakehouse Monitoring Capabilities
Dashboards and Alerts Auto-generated dashboard

Auto-generated DB SQL dashboards to visualize


metrics & trends, SQL Alerts for notifications

Open Monitoring Results


Monitoring results stored in open format Delta tables
to build custom analytics using your favorite BI tool

Simple Operations
Databricks managed compute eliminates
infrastructure management and scaling complexity

©2024 Databricks Inc. — All rights reserved


What are Databricks SQL alerts?

• Databricks SQL alerts periodically run


queries, evaluate defined conditions, and
send notifications if a condition is met.
• Scheduled Execution: Automatically runs
queries at defined intervals to check specific
conditions.
• Multi-Channel Notifications: Receive alerts via
Email, Slack, Webhook, MS Teams, PagerDuty,
and more.
• Explore the documentation for in-depth
setup and customization options.

©2024 Databricks Inc. — All rights reserved


Implementation and Monitoring MLOps Solution
DEMONSTRATION

Lakehouse
Monitoring
Dashboard

©2024 Databricks Inc. — All rights reserved


Demo
Outline

What we’ll cover:


• Train and analyze a machine learning model's inference logs.
• Monitor the model's performance and detect anomalies or drift.
• Handle drift detection and trigger retraining when needed.
• Utilize Databricks Lakehouse Monitoring to continuously track and alert on model
performance metrics.

©2024 Databricks Inc. — All rights reserved


Implementation and Monitoring MLOps Solution
LAB EXERCISE

Model Monitoring

©2024 Databricks Inc. — All rights reserved


Lab
Outline

What you’ll do:


● Task 1: Save the Training Data as Reference for Drift
● Task 2: Processing and Monitoring Inference Data
○ 2.1: Monitoring the Inference Table
○ 2.2: Processing Inference Table Data
○ 2.3: Analyzing Processed Requests
● Task 3: Persisting Processed Model Logs
● Task 4: Setting Up and Monitoring Inference Data
○ 4.1: Creating an Inference Monitor with Databricks Lakehouse Monitoring
○ 4.2: Inspect and Monitor Metrics Tables

©2024 Databricks Inc. — All rights reserved


Summary and
Next Steps

©2024 Databricks Inc. — All rights reserved


©2024 Databricks Inc. — All rights reserved

You might also like