0% found this document useful (0 votes)
212 views

De Mod 5 Deploy Workloads With Databricks Workflows

This document provides an introduction to Databricks Workflows, which is a fully-managed cloud-based general-purpose task orchestration service. It describes key features of Workflows such as orchestrating diverse workloads across data, analytics and AI using notebooks, SQL, and custom code. Workflows enables building reliable data and AI workflows on any cloud with deep platform integration and proven reliability. Common workflow patterns like sequence, funnel, and fan-out are presented along with an example workflow. The document also covers creating workflow jobs with tasks and schedules, as well as monitoring, debugging and navigating workflow runs.

Uploaded by

Jaya Bharathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views

De Mod 5 Deploy Workloads With Databricks Workflows

This document provides an introduction to Databricks Workflows, which is a fully-managed cloud-based general-purpose task orchestration service. It describes key features of Workflows such as orchestrating diverse workloads across data, analytics and AI using notebooks, SQL, and custom code. Workflows enables building reliable data and AI workflows on any cloud with deep platform integration and proven reliability. Common workflow patterns like sequence, funnel, and fan-out are presented along with an example workflow. The document also covers creating workflow jobs with tasks and schedules, as well as monitoring, debugging and navigating workflow runs.

Uploaded by

Jaya Bharathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Deploy

Workloads with
Databricks
Workflows

Module 05

©2023 Databricks Inc. — All rights reserved 1


Module Agenda
Deploy Workloads with Databricks Workflows

Introduction to Workflows
Building and Monitoring Workflow Jobs
DE 5.1 - Scheduling Tasks with the Jobs UI
DE 5.2L - Jobs Lab

©2023 Databricks Inc. — All rights reserved 2


Introduction to
Workflows

©2023 Databricks Inc. — All rights reserved 3


Course Objectives

1 Describe the main features and use cases of Databricks Workflows

2 Create a task orchestration workflow composed of various task types

3 Utilize monitoring and debugging features of Databricks Workflows

4 Describe workflow best practices

©2023 Databricks Inc. — All rights reserved


Databricks Workflows
Databricks Workflows

Workflows is a fully-managed cloud-based


general-purpose task orchestration
service for the entire Lakehouse.
Lakehouse Platform
Data Data Data Data Science
Warehousing Engineering Streaming and ML

Workflows is a service for data engineers,


data scientists and analysts to build reliable Unity Catalog
Fine-grained governance for data and AI
data, analytics and AI workflows on any
Delta Lake
cloud. Data reliability and performance

Cloud Data Lake


All structured and unstructured data

©2023 Databricks Inc. — All rights reserved 5


Databricks Workflows
Databricks has two main task orchestration
services:
• Workflow Jobs (Workflows)
• Workflows for every job
• Delta Live Tables (DLT)
• Automated data pipelines for Delta Lake

Note: DLT pipeline can be a task in a workflow

©2023 Databricks Inc. — All rights reserved 6


DLT versus Workflow Jobs
Considerations

Delta Live Tables Workflow Jobs

JARs, notebooks, DLT, application written in


Source Notebooks only
Scala, Java, Python

Dependencies Automatically determined Manually set

Cluster Self-provisioned Self-provisioned or existing

Timeouts and Retries Not supported Supported

Import Libraries Not supported Supported

©2023 Databricks Inc. — All rights reserved 7


DLT versus Jobs
Use Cases

Orchestration of Machine Learning Tasks Arbitrary Code, External Data Ingestion and
Dependent Jobs API Calls, Custom Tasks Transformation
Run MLflow notebook task
Jobs running on schedule, in a job Run tasks in a job which ETL jobs, Support for batch
containing dependent can contain Jar file, Spark and streaming, Built in data
tasks/steps Submit, Python Script, SQL quality constraints,
task, dbt monitoring & logging

Jobs Workflows Jobs Workflows Jobs Workflows Delta Live Tables

©2023 Databricks Inc. — All rights reserved 8


Workflows Features
Part 1 of 2

Orchestrate Anything Fully Managed Simple Workflow


Anywhere Authoring
Run diverse workloads for the full Remove operational overhead An easy point-and-click authoring
data and AI lifecycle, on any cloud. with a fully managed experience for all your data teams
Orchestrate; orchestration service enabling not just those with specialized
you to focus on your workflows skills
• Notebooks
not on managing your
• Delta Live Tables
infrastructure
• Jobs for SQL
• ML models, and more

©2023 Databricks Inc. — All rights reserved 9


Workflows Features
Part 2 of 2

Deep Platform Integration Proven Reliability

Designed and built into your Have full confidence in your


lakehouse platform giving you workflows leveraging our proven
deep monitoring capabilities and experience running tens of
centralized observability across millions of production workloads
all your workflows daily across AWS, Azure, and GCP

©2023 Databricks Inc. — All rights reserved 10


How to Leverage Workflows

• Allows you to build simple ETL/ML task orchestration


• Reduces infrastructure overhead
• Easily integrate with external tools
• Enables non-engineers to build their own workflows using simple UI
• Cloud-provider independent
• Enables re-using clusters to reduce cost and startup time

©2023 Databricks Inc. — All rights reserved


Common Workflow Patterns

Sequence Funnel Fan-out

Sequence Funnel
● Data transformation/ Fan-out, star pattern
● Multiple data sources
processing/cleaning ● Single data source
● Data collection
● Bronze/silver/gold tables ● Data ingestion and
distribution

©2023 Databricks Inc. — All rights reserved


Example Workflow

Data ingestion funnel


E.g. Auto Loader, DLT

Data filtering, quality assurance, transformation


E.g. DLT, SQL, Python

ML feature extraction
E.g. MLflow

Persisting features and training prediction model

©2023 Databricks Inc. — All rights reserved


Building and
Monitoring Workflow
Jobs

©2023 Databricks Inc. — All rights reserved 14


Workflows Job Components

TASKS SCHEDULE CLUSTER

What? When? How?

©2023 Databricks Inc. — All rights reserved 15


Creating a Workflow
Task Definition

While creating a task;


• Define the task type
• Choose the cluster type
• Job clusters and All-purpose clusters can
be used.
• A cluster can be used by multiple tasks.
This reduces cost and startup time.
• If you want to create a new cluster,
you must have required permissions.
• Define task dependency if task
depends on another task

©2023 Databricks Inc. — All rights reserved


Monitoring and Debugging
Scheduling and Alerts

You can run your jobs immediately or


periodically through an easy-to-use
scheduling system.

You can specific alerts to be notified


when runs of a job begin, complete or
fail. Notifications can be sent via email,
Slack or AWS SNS.

©2023 Databricks Inc. — All rights reserved


Monitoring and Debugging
Access Control

Workflows integrates with existing


resources access controls, enabling you
to easily manage access across different
teams.

©2023 Databricks Inc. — All rights reserved


Monitoring and Debugging
Job Run History Run duration

Workflows keeps track of job runs and


save information about the success or
failure of each task in the job run.

Navigate to the Runs tab to view


completed or active runs for a job.

Tasks

Job run
©2023 Databricks Inc. — All rights reserved 19
Monitoring and Debugging
Repair a Failed Job Run

Repair feature allows you to re-run only


the failed task and sub-tasks, which
reduces the time and resources required
to recover from unsuccessful job runs.

©2023 Databricks Inc. — All rights reserved


Navigating the Jobs UI
Use breadcrumbs to navigate back to your job from a specific run page

©2023 Databricks Inc. — All rights reserved 21


Navigating the Jobs UI
Runs vs Tasks tabs on the job page

Use Runs tab to view completed or Use Tasks tab to modify or add
active runs for the job tasks to the job

©2023 Databricks Inc. — All rights reserved 22


DE 5.1.1: Task
Orchestration

©2023 Databricks Inc. — All rights reserved 23


Demo: Task Orchestration
DE 5.1.1 - Task Orchestration

• Schedule a notebook task in a Databricks Workflow Job


• Describe job scheduling options and differences between cluster types
• Review Job Runs to track progress and see results
• Schedule a DLT pipeline task in a Databricks Workflow Job
• Configure dependency between tasks via Databricks Workflows UI

©2023 Databricks Inc. — All rights reserved 24


DE 5.2.1.L: Task
Orchestration Lab

©2023 Databricks Inc. — All rights reserved 25


Lab: Task Orchestration
DE 5.2.1.L - Task Orchestration

©2023 Databricks Inc. — All rights reserved 26


©2023 Databricks Inc. — All rights reserved 27

You might also like