0% found this document useful (0 votes)
40 views57 pages

Course Catalog

Uploaded by

Prateek Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views57 pages

Course Catalog

Uploaded by

Prateek Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Databricks Academy Course

Catalog
UPDATED: AUGUST 2023

Welcome to the Databricks Academy 3


About the Databricks Academy 3
Training Offerings 4
Learning paths 5
Databricks Lakehouse Fundamentals 5
Data analysis 6
Data engineering 6
Machine learning 7
Platform administration (cloud agnostic) 8
Platform architecture - Azure 8
Platform architecture - AWS 9
Platform architecture - Google Cloud 9
Apache Spark 9
Databricks Academy Updates for this quarter (August - September - October) 10
Content published/updated 10
New courses currently under development 10
Content retired/to-be-retired 11
Databricks Academy Content Descriptions 11
Certification exam/accreditation descriptions 11
Instructor-led course descriptions 11
Self-paced course descriptions 11
Access Shared Data Externally with Delta Sharing 12
Access Shared Data Externally with Delta Sharing 12
Advanced Data Engineering with Databricks 13
Apache Spark Programming with Databricks 14
AWS Databricks Cloud Integrations 14
AWS Databricks Networking and Security Fundamentals 14
AWS Databricks Platform Administration Fundamentals 15
Azure Databricks Cloud Integrations 16
Azure Databricks Networking and Security Fundamentals 16
Azure Databricks Workspace Administration Fundamentals 17
Build Data Pipelines with Delta Live Tables and Spark SQL 19
Build Data Pipelines with Delta Live Tables and PySpark 20
Certification Overview: Databricks Certified Associate Developer for Apache Spark
Exam 21
Certification Overview: Databricks Certified Data Analyst Associate Exam 21
Certification Overview: Databricks Certified Data Engineer Associate Exam 22
Certification Overview: Databricks Certified Data Engineer Professional Exam 23
Certification Overview: Databricks Certified Machine Learning Associate Exam 23
Certification Overview: Databricks Certified Machine Learning Professional Exam 24
CI/CD Administration in Databricks 24
Compute Resources and Unity Catalog 26
Data Access Control in Unity Catalog 26
Data Administration in Databricks 27
Data Analysis with Databricks SQL 27
Data Engineering with Databricks V3 27
Data Visualization on Databricks SQL 28
Databricks Compute Resource Administration 28
Databricks Identity Administration 29
Databricks Platform Administrator Fundamentals 29
Databricks Workspace Administration and Security 30
Delta Sharing Best Practices 31
Deep Learning with Databricks. 32
Deploy Workloads with Databricks Workflows 32
Getting Started with Databricks Engineering on Databricks 33
Get Started with Databricks for Business Leaders 34
Get Started with Machine Learning on Databricks 34
Getting Started with Data Analysis on Databricks 35
GCP Databricks Platform Administration Fundamentals 36
GCP Databricks Networking and Security Fundamentals 37
GCP Databricks Cloud Integrations 38
Generative AI Fundamentals 38
Generative AI Fundamentals Accreditation 39
How to Ingest Data for Databricks SQL 39
Introduction to Delta Sharing 40
Introduction to Photon 41
Introduction to Python for Data Science and Data Engineering 41
Introduction to Time Series Forecasting with AutoML 42
Introduction to Unity Catalog 42
Machine Learning in Production(V2) 43
Manage Data with Delta Lake 43
Manage Data Access with Unity Catalog 44
New Capability Overview: bamboolib 44
New Capability Overview: Data Lineage with Unity Catalog 46
New Capability Overview: Data Profiles in Databricks Notebooks 47
New Capability Overview: Databricks SQL Serverless 47
New Capability Overview: Global Search 48
New Capability Overview: Model Serving 49
New Capability Overview: ipywidgets 50
New Capability Overview: Personal Compute 50
New Capability Overview: VS Code Extension for Databricks 51
New Capability Overview: Workspace Browser for Databricks SQL 52
Optimizing Apache Spark on Databricks 52
Scalable Machine Learning with Apache Spark (V2) 54
Share Data within Databricks Using Delta Sharing 54
Transform Data with Spark 55
Unity Catalog Patterns and Best Practices 56
What is Big Data? 56

Welcome to the Databricks Academy


About the Databricks Academy
The Databricks Academy is the training arm of Databricks - our goal is to help our
users achieve efficient and effective use of the Databricks Lakehouse Platform to
reach their big data and AI goals.

Via the Databricks Academy, you can access self-paced e-learning and
instructor-led courses that help you prepare for Databricks certification exams and
focus on how to use the Databricks Lakehouse Platform for:

● Data engineering
● Machine learning and AI
● Data analysis
● Platform administration

Training Offerings
Self-paced e-learning is virtual training available 24/7 to individuals signed up for
the Databricks Academy. Databricks customers and partners are granted access to
self-paced e-learning for free. Non-Databricks customers and partners are able to
purchase a subset of content available. Training currently includes mostly lectures
and demos on how to use the Databricks Lakehouse Platform.

Instructor-led is virtual training available to everyone (Databricks customers and


partners and the general public) for a fee. Instructor-led training courses include
roughly 6 to 12 hours of training and include lectures, demos, and hands-on labs.

Accreditations are 30-minute quizzes available via the Databricks Academy after
completing a selection of Databricks courses or learning plans. Upon successful
completion of an accreditation, badges are issued that can be shared on social
media sites and professional networks.

Certifications are 1.5 to two hours exams available via our certification platform.
Upon successful completion of an exam, badges are issued that can be shared on
social media sites and professional networks, and validate your data and AI skills in
the Databricks Lakehouse Platform.
Learning paths
Learning paths are designed to help guide users to the courses most relevant to
them.

Current pathways are available for Databricks fundamentals, data analysts, data
engineers, machine learning practitioners, and Apache Spark. The credential
milestones for each step within these pathways are shown in the images below.

Below, you’ll find a breakdown of the courses required for each of these steps. We
will update these regularly, as new courses are released.

Databricks Lakehouse Fundamentals


Click here for the customer enrollment link for this learning plan.
Data analysis
Click here for the customer enrollment link for this learning plan.

Data engineering
Click here for the customer enrollment link for this learning plan.
Machine learning
Click here for the customer enrollment link for this learning plan.
Platform administration (cloud agnostic)
Click here for the customer enrollment link for this learning plan.

Platform architecture - Azure


Platform architecture - AWS

Platform architecture - Google Cloud

Apache Spark
Click here for the customer enrollment link for this learning plan.
Databricks Academy Updates for this
quarter (August - September -
October)
SP = Self-paced | LP = Self-paced learning plan | ACCRED = Free accreditation exam via
Databricks Academy | CERT = Proctored certification exam via WebAssessor (for a fee)

Content published/updated
● Data Analysis with Databricks (SP - August 11)

New courses currently under development


● Databricks SQL Services and Capabilities
● Data Management in Databricks SQL
● Data Visualization and Dashboarding with Databricks SQL
● Software Engineering Practices with Delta Live Tables
● Automating Production Workflows
● Incremental Processing with Spark Structured Streaming
● Performance Optimization with Spark and Delta Lake
● Data Privacy Patterns in the Lakehouse
● New Capability Overview: UniForm Essentials
● New Capability Overview: Conditional Execution in Workflows Essentials
● New Capability Overview: Model Serving Essentials
● New Capability Overview: Volumes in Unity Catalog Essentials
● New Capability Overview: Marketplace Essentials

Content retired/to-be-retired
● New Capability Overview: Databricks SQL Serverless (SP - August 25)
● New Capability Overview: Workspace Browser for Databricks SQL (SP - August 25)
● Data Visualization on Databricks SQL (SP - August 25)
● Introduction to Time Series Forecasting with AutoML (SP - August 25)
● New Capability Overview: MLflow Model Serving (SP - August 25)
● New Capability Overview: Serverless Real-time Inference (SP - August 25)

Databricks Academy Content


Descriptions
Certification exam/accreditation descriptions
For a full list of available certification exams/accreditations, along with their descriptions,
please click here.

Instructor-led course descriptions


For a full list of available instructor-led courses, along with their descriptions, please click
here.

Self-paced course descriptions


Note: All self-paced courses are free for Databricks customers and partners.
Non-customers can purchase some courses through role-based learning plans available via
the Databricks Academy.
Access Shared Data Externally with Delta Sharing

Access Shared Data Externally with Delta Sharing


Click here for the customer enrollment link.

Duration: 35 Minutes

Delta Sharing is an open protocol for secure data sharing with other organizations
regardless of which computing platforms they use. Databricks provides both open
source and managed options for Delta Sharing. Databricks-managed Delta Sharing
allows data providers to share data and data recipients to access the shared data.
With Delta Sharing, users can share collections of tables in a Unity Catalog
metastore in real time without copying them, so that data recipients can
immediately begin working with the latest version of the shared data.

This course will focus on sharing data externally and accessing shared data from
external tools such as PowerBI and Apache Spark. First, we will demonstrate how to
share data externally. Then, we will show how to access the shared data from
external tools.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform (high-level


knowledge the structure and benefits of the Lakehouse platform)
● Basic understanding of how Unity Catalog is integrated into the Databricks
Lakehouse Platform
● Beginning-level knowledge of python, PySpark API and pandas API
● Admin account privileges or have been granted required privileges for
creating and managing shares

Learning objectives:

● Describe the external data sharing process with Databricks-managed Delta


Sharing
● Share data externally with Databricks-managed Delta Sharing
● Access shared data in PowerBI
● Access shared data using PySpark
● Access shared data using pandas
Advanced Data Engineering with Databricks
Click here for the customer enrollment link.

Duration: 12 hours

Course description: In this course, students will build upon their existing knowledge
of Apache Spark, Structured Streaming, and Delta Lake to unlock the full potential of
the data lakehouse by utilizing the suite of tools provided by Databricks. This course
places a heavy emphasis on designs favoring incremental data processing, enabling
systems optimized to continuously ingest and analyze ever-growing data. By
designing workloads that leverage built-in platform optimizations, data engineers
can reduce the burden of code maintenance and on-call emergencies, and quickly
adapt production code to new demands with minimal refactoring or downtime. The
topics in this course should be mastered prior to attempting the Databricks
Certified s Data Engineering Professional exam.

Prerequisites:

● Experience using PySpark APIs to perform advanced data transformations


● Familiarity implementing classes with Python
● Experience using SQL in production data warehouse or data lake
implementations
● Experience working in Databricks notebooks and configuring clusters
● Familiarity with creating and manipulating data in Delta Lake tables with SQL
● Ability to use Spark Structured Streaming to incrementally read from a Delta
table

Learning objectives:

● Design databases and pipelines optimized for the Databricks Lakehouse


Platform.
● Implement efficient incremental data processing to validate and enrich data
driving business decisions and applications.
● Leverage Databricks-native features for managing access to sensitive data
and fulfilling right-to-be-forgotten requests.
● Manage code promotion, task orchestration, and production job monitoring
using Databricks tools.
Apache Spark Programming with Databricks
Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Apache Spark Programming with


Databricks instructor-led course. It is an on-demand recording available via the
Databricks Academy and covers the same content as the instructor-led course. For
more information about what’s in the course itself, please visit this link.

AWS Databricks Cloud Integrations


Click here for the customer enrollment link.

Duration: 2 hours

Course description: This 2 hour video series will walk you through three common
integrations to help build out the capabilities of your AWS Databricks applications.

Prerequisites:

● Beginner-level knowledge of AWS (EC2, IAM, Kinesis, Redshift, S3)


● Access to your AWS console, with the ability to create buckets, Kinesis data
streams, Redshift clusters, and IAM roles and policies
● Account administrator capabilities in your Databricks account

Learning objectives:

● Create your own S3 bucket and access data from Databricks


● Set up a data stream in Amazon Kinesis and stream records from Databricks
● Set up a data warehouse in Amazon Redshift, connect it to Databricks, and
exchange data.

AWS Databricks Networking and Security


Fundamentals
Click here for the customer enrollment link.

Duration: 1 hour
Course description: This 1 hour video series will provide you with the background
needed to customize the structure of your environment and reinforce security at
the infrastructural level.

Prerequisites:

● Beginner-level knowledge of AWS (IAM, KMS, S3, VPC)


● Knowledge of networking concepts (IPv4 addressing, DNS)
● Access to your AWS console, with the ability to create buckets, VPCs and
their associated resources, and IAM roles and policies
● Account administrator capabilities in your Databricks account

Learning objectives:

● Create your own VPCs


● Deploy workspaces into your own managed VPCs
● Create your own customer-managed keys
● Apply customer-managed keys to achieve different levels of encryption in
your Databricks workspaces

AWS Databricks Platform Administration


Fundamentals
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This 1 hour video series will provide you with a high-level
overview of the AWS environment as it relates to Databricks, and it will guide you
through how to perform some fundamental tasks related to deploying and managing
Databricks workspaces in your organization.

Prerequisites:

● Beginner-level knowledge of AWS (IAM and S3)


● Access to your AWS console, with the ability to create buckets and IAM roles
● Account administrator capabilities in your Databricks account

Learning objectives:

● Describe and identify elements of the AWS Databricks architecture


● Create and manage workspaces and metastores, and supporting resources
● Automate administration operations

Azure Databricks Cloud Integrations


Click here for the customer enrollment link.

Duration: 40 Minutes.

Course description: This course is designed to provide additional information about


individual topics of integration in the Azure Cloud with Azure Databricks. These
videos are stand-alone integration videos that build on the foundational concepts
covered in the associated courses, Azure Databricks Workspace Administration
Fundamentals and Azure Databricks Networking and Security Fundamentals.

Prerequisites:

● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require the following
privileges:-
● Azure Databricks Account administration
● Azure Cloud administration

Learning objectives:

● Describe the basics of establishing cloud integrations with an Azure


Databricks workspace.
● Identify common integration methods used in an Azure Databricks workspace
for specific cloud services integrations.
● Explain the importance of integrating a storage account with Azure
Databricks.
● Connect an Azure storage account to an Azure Databricks workspace.
● Explain why you would use Azure Data Factory with Azure Databricks.
● Connect Azure Data Factory to an Azure Databricks workspace.
● Explain why you would use Power BI with Azure Databricks.
● Connect Power BI to an Azure Databricks workspace.

Azure Databricks Networking and Security


Fundamentals
Click here for the customer enrollment link.
Duration: 1 hour 10 minutes

Course description: This course focuses on providing you with a foundation for the
networking and security needs for an Azure Databricks workspace in your Azure
Cloud ecosystem. You’ll be able to explore how identity and access is managed
through Azure Active Directory and Unity Catalog. Additionally, you’ll be able to
review some foundational networking concepts and how they are applicable to the
Azure Databricks environment, such as Azure Software Defined Networks, CIDR
ranges, subnets, and VNet peering. You’ll also explore how Azure Databricks
workspaces can be secured through IP Access List, User Defined Routes, private and
service endpoints, and private DNS zones to support Data Loss Prevention
strategies.

Prerequisites:

● "The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require the following privileges:
● Account administration
● Cloud administration
● Additional prerequisites include a solid understanding of IPv4 addresses,
subnets, CIDR ranges, and general networking concepts."

Learning objectives:

● Describe components of the Azure Databricks platform architecture and


deployment model.
● Explain network security features including no public IP address, Bring Your
Own VNET, VNET peering, and IP access lists.
● Describe identity provider and Azure Active Directory integrations and
access control configurations for an Azure Databricks workspace.
● Explain encryptions and permissions available for data protection, such as
identity provider authentication, secrets, and table access control.
● Describe security standards and configurations for compliance, including
cluster policies, Bring Your Own Key, and audit logs.

Azure Databricks Workspace Administration


Fundamentals
Click here for the customer enrollment link.
Duration: 1 hour 10 minutes

Course description: This course is designed to introduce you to the fundamentals of


of Azure Databricks Workspace Administration including the reference architecture
and deployment options for your workspace. You’ll also be introduced to some of
the resources necessary to deploy an Azure Databricks Workspace in your
environment.

Prerequisites:

● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require the following privileges:
● Account administration
● Cloud administration
● Additional prerequisites include a solid understanding of IPv4 addresses,
subnets, CIDR ranges, and general networking concepts."

Learning objectives:

● "Explain the first-party service relationship Databricks has with Microsoft.


● Identify the responsibilities of the Platform Administrator/Platform Architect
with an Azure Databricks implementation.
● Describe foundational concepts of the Azure cloud ecosystem.
● Identify how Azure Databricks is part of the Azure ecosystem.
● Describe additional resources that may be included with Azure Databricks in
an Azure based architecture.
● Recognize the impact of Azure Databricks on the platform’s cost
management and planning.
● Review the decisions necessary to implement Azure Databricks for your
architecture.
● Identify the resources needed to implement an Azure Databricks workspace.
● Create necessary backing resources for Azure Databricks.
● Differentiate between the available options for deploying an Azure Databricks
workspace.
● Determine the impact of networking for Azure on your workspace.
● Deploy an Azure Databricks workspace using the default method.
● Deploy an Azure Databricks workspace with VNet Injection.
● Describe how Terraform can automate the deployment process of Azure
Databricks.
Build Data Pipelines with Delta Live Tables and Spark
SQL
Click here for the customer enrollment link.

Duration: 3 hours

Course description: Students will use Delta Live Tables with Spark SQL and Python
to define and schedule pipelines that incrementally process new data from a variety
of data sources into the Lakehouse.

Prerequisites:

● Beginner familiarity with cloud computing concepts (virtual machines, object


storage, etc.)
● Ability to perform basic code development tasks using the Databricks Data
Engineering & Data Science workspace (create clusters, run code in
notebooks, use basic notebook operations, import repos from git, etc)
● Beginning programming experience with Delta Lake
● Use Delta Lake DDL to create tables, compact files, restore previous table
versions, and perform garbage collection of tables in the Lakehouse.
● Use CTAS to store data derived from a query in a Delta Lake table
● Use SQL to perform complete and incremental updates to existing tables.
● Beginning programming experience with Spark SQL or PySpark
● Extract data from a variety of file formats and data sources
● Apply a number of common transformations to clean data
● Reshape and manipulate complex data using advanced built-in functions
● Production experience working with data warehouses and data lakes

Learning objectives:

● Describe how Delta Live Tables tracks data dependencies in data pipelines
● Configure and run data pipelines using the Delta Live Tables UI
● Use Python or Spark SQL to define data pipelines that ingest and process
data through multiple tables in the lakehouse using Auto Loader and Delta
Live Tables
● Use APPLY CHANGES INTO syntax to process Change Data Capture feeds
● Review event logs and data artifacts created by pipelines and troubleshoot
DLT syntax
Build Data Pipelines with Delta Live Tables and
PySpark
Click here for the customer enrollment link.

Duration: 3 hours

Course description: Students will use Delta Live Tables with Spark SQL and Python
to define and schedule pipelines that incrementally process new data from a variety
of data sources into the Lakehouse.

Prerequisites:

● Beginner familiarity with cloud computing concepts (virtual machines, object


storage, etc.)
● Ability to perform basic code development tasks using the Databricks Data
Engineering & Data Science workspace (create clusters, run code in
notebooks, use basic notebook operations, import repos from git, etc)
● Beginning programming experience with Delta Lake
● Use Delta Lake DDL to create tables, compact files, restore previous table
versions, and perform garbage collection of tables in the Lakehouse.
● Use CTAS to store data derived from a query in a Delta Lake table
● Use SQL to perform complete and incremental updates to existing tables.
● Beginning programming experience with Spark SQL or PySpark
● Extract data from a variety of file formats and data sources
● Apply a number of common transformations to clean data
● Reshape and manipulate complex data using advanced built-in functions
● Production experience working with data warehouses and data lakes

Learning objectives:

● Describe how Delta Live Tables tracks data dependencies in data pipelines
● Configure and run data pipelines using the Delta Live Tables UI
● Use Python or Spark SQL to define data pipelines that ingest and process
data through multiple tables in the lakehouse using Auto Loader and Delta
Live Tables
● Use APPLY CHANGES INTO syntax to process Change Data Capture feeds
● Review event logs and data artifacts created by pipelines and troubleshoot
DLT syntax
Certification Overview: Databricks Certified
Associate Developer for Apache Spark Exam
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.

Prerequisites:

● Describe the basics of the Apache Spark architecture.


● Apply the Spark DataFrame API to complete individual data manipulation
tasks, including:
● selecting, renaming and manipulating columns
● filtering, dropping, sorting, and aggregating rows
● joining, reading, writing and partitioning DataFrames
● working with UDFs and Spark SQL functions

Learning objectives:

● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.

Certification Overview: Databricks Certified Data


Analyst Associate Exam
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.

Prerequisites:

● Describe Databricks SQL and its capabilities


● Manage data with Databricks tools and best practices
● Use Structured Query Language (SQL) to complete tasks in the Lakehouse
● Create production-grade data visualizations and dashboards
● Develop analytics applications to solve common data analytics problems

Learning objectives:

● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.

Certification Overview: Databricks Certified Data


Engineer Associate Exam
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.

Prerequisites:

● Understand how to use and the benefits of the Databricks Lakehouse


Platform and its tools
● Build ETL pipelines using Apache Spark SQL and Python
● Incrementally process data
● Build production pipelines for data engineering applications and Databricks
SQL queries and dashboards
● Understand and follow best security practices

Learning objectives:

● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Certification Overview: Databricks Certified Data
Engineer Professional Exam
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.

Prerequisites:

● Describe how to use and the benefits of the Databricks platform and
developer tools
● Build optimized and cleaned data processing pipelines using the Spark and
Delta Lake APIs
● Model data into a Lakehouse using knowledge of general data modeling
concepts
● Ensure data pipelines secure, reliable, monitored, and tested before
deployment

Learning objectives:

● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.

Certification Overview: Databricks Certified Machine


Learning Associate Exam
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.

Prerequisites:
● Use Databricks Machine Learning and its capabilities within machine learning
workflows
● Implement correct decisions in machine learning workflows
● Implement machine learning solutions at scale using Spark ML and other tools
● Understand advanced scaling characteristics of classical machine learning
models

Learning objectives:

● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.

Certification Overview: Databricks Certified Machine


Learning Professional Exam
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.

Prerequisites:

● Track, version, and manage machine learning experiments


● Manage the machine learning model lifecycle
● Implement strategies for deploying machine learning models
● Build monitoring solutions for drift detection.

Learning objectives:

● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.

CI/CD Administration in Databricks


Click here for the customer enrollment link.
Duration: 45 Minutes

Course description: This Continuous Integration and Delivery Administration in


Databricks course will talk about the various elements provided in the Databricks
environment that can help fulfill the requirements of your organization’s CI/CD
processes.

Prerequisites:

● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.

Learning objectives:

● Describe roles and responsibilities related to administering the Databricks


platform
● Administer metastores, workspaces and supporting resources
● Automate administration operations
● Describe Databricks identities and how they apply across the platform
● Configure groups, users and service principals
● Automate user and group provisioning
● Configure workspace settings
● Secure access to workspace assets
● Use Databricks secrets to implement security best practices in your
organization
● Describe compute options and features available in the Databricks platform
● Secure access to compute resources
● Secure access to Databricks from a BI tool
● Describe data access patterns in Databricks
● Define data access rules and manage data ownership
● Secure access to external storage
● Upgrade legacy data assets to Unity Catalog
● Describe features offered by the Databricks platform to support continuous
integration and deployment
● Implement revision control in the workspace
● Schedule execution of a Data Science and Engineering workloads using
Databricks Workflows
● Automate actions in response to code updates
Compute Resources and Unity Catalog
Click here for the customer enrollment link.

Duration: 30 Minutes

Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through creating compute
resources capable of accessing Unity Catalog.

Prerequisites:

● This course has no specific course prerequisites.

Learning objectives:

● Describe how to access Unity Catalog through Databricks compute resources


● Create a Unity Catalog enabled cluster
● Access Unity Catalog through Databricks SQL

Data Access Control in Unity Catalog


Click here for the customer enrollment link.

Duration: 1 hour

Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through techniques for creating
and governing data objects in Unity Catalog.

Prerequisites:

● This course has no specific course prerequisites

Learning objectives:

● Describe the security model for governing data objects in Unity Catalog
● Define data access rules and manage data ownership
● Secure access to external storage
Data Administration in Databricks
Click here for the customer enrollment link.

Duration: 1.50 hours

Course description: This Data Administration in Databricks course will provide you
with a basis of how data is managed in the Databricks platform with a feature known
as Unity Catalog.

Prerequisites:

● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.

Learning objectives:

● Describe data access patterns in Databricks


● Define data access rules and manage data ownership
● Secure access to external storage
● Upgrade legacy data assets to Unity Catalog

Data Analysis with Databricks SQL


Click here for the customer enrollment link.

Duration: 6 hours

NOTE: This is an e-learning version of the Data Analysis with Databricks


instructor-led course. It is an on-demand recording available via the Databricks
Academy and covers the same content as the instructor-led course. For more
information about what’s in the course itself, please visit this link.

Data Engineering with Databricks V3


Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Data Engineering with Databricks


instructor-led course. It is an on-demand recording available via the Databricks
Academy and covers the same content as the instructor-led course. For more
information about what’s in the course itself, please visit this link.

Data Visualization on Databricks SQL


Click here for the customer enrollment link.

Duration: 1 hour and 30 Minutes

Course description: Creating queries produces tabular data, which is difficult to


visualize. This course will demonstrate how to create visualizations that highlight
important aspects of a sample retail dataset. Databricks SQL’s visualization editor
can create rich, customizable visualizations that will help tell a story with data using
more than words.

Prerequisites:

● Beginning knowledge of Databricks SQL


● Beginning knowledge of SQL (ability to write basic queries)
● Beginning knowledge of data visualization
● Access to a SQL endpoint setup by an administrator
● Administrator (or administrator access) for initial course setup

Learning objectives:

● Create basic visualizations using Databricks SQL.


● Explore the different types of visualizations that can be created using
Databricks SQL.
● Create customized data visualizations to aid in data storytelling.

Databricks Compute Resource Administration


Click here for the customer enrollment link.

Duration: 1 Hour

Course description: This Databricks Compute Resource Administration course,


along with its accompanying labs, will show you how to create clusters and SQL
warehouses, going through some of the important parameters and their meanings.

Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.

Learning objectives:

● Create and configure clusters and SQL warehouses


● Control access to clusters and SQL warehouses
● Manage costs associated with compute resources
● Optimize clusters for performance and cost

Databricks Identity Administration


Click here for the customer enrollment link.

Duration: 40 Minutes

Course description: This Databricks Identity Administration course will provide you
with a basis of how identity and access management is done in the Databricks
platform.

Prerequisites:

● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.

Learning objectives:

● Describe Databricks identities and how they apply across the platform
● Configure groups, users and service principals
● Automate user and group provisioning

Databricks Platform Administrator Fundamentals


Click here for the customer enrollment link.

Duration: 1.25 hours

Course description:
The Databricks Platform Administration Fundamentals course will provide you with a
high-level overview of the Databricks platform, targeting the specific needs of
platform administrators.

Prerequisites:

The course is primarily demo-based, but learners can follow along assuming they
have sufficient privileges. Some exercises require account administration
capabilities.

Learning objectives:

● Describe and identify the roles and responsibilities related to administering


the Databricks platform
● Automate interactions with the Databricks platform

Databricks Workspace Administration and Security


Click here for the customer enrollment link.

Duration: 45 Minutes

Course description: This Databricks Workspace Administration and Security course,


along with its accompanying labs, will show you how to configure the workspace
using the Workspace Admin console and the SQL admin console.

Prerequisites & Requirements

● Prerequisites
○ The course is primarily demo-based, but learners can follow along
assuming they have sufficient privileges. Some exercises require
account administration, workspace administration, and/or metastore
ownership.

Learning objectives

● Configure workspace settings


● Configure workspace groups, users and service principals
● Employ access control to secure workspace assets
● Implement and advocate security best practices in your notebooks
Delta Sharing Best Practices
Click here for the customer enrollment link.

Duration: 2 hours

Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data.

While data sharing brings a wide range of opportunities for data practitioners, the
performance and security concerns become apparent. Thus, it is crucial to ensure
the security and performance of the shared. This course will focus on the most
important best practices for data sharing with Databricks-managed Delta Sharing.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform (high-level


knowledge the structure and benefits of the Lakehouse platform)
● Basic understanding of how Unity Catalog is integrated into the Databricks
Lakehouse Platform
● Admin account privileges or have been granted required privileges for
creating and managing shares
● Databricks CLI (only for configuring IP access list )

Learning objectives:

● Describe data sharing best practices with Delta Sharing


● Configure and manage data access permissions
● Configure and manage recipient tokens
● Share and access partial data
● Enable audit logging for shared data
Deep Learning with Databricks.
Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Deep Learning with Databricks instructor-led
course. It is an on-demand recording available via the Databricks Academy and covers the
same content as the instructor-led course. For more information about what’s in the course
itself, please visit this link.

Deploy Workloads with Databricks Workflows


Click here for the customer enrollment link.

Duration: 3 hours

Course description: Moving a data pipeline to production means more than just
confirming that code and data are working as expected. By scheduling tasks with
Databricks Jobs, applications can be run automatically to keep tables in the
Lakehouse fresh. Using Databricks SQL to schedule updates to queries and
dashboards allows quick insights using the newest data. In this course, students will
be introduced to task orchestration using the Databricks Workflow Jobs UI.
Optionally, they will configure and schedule dashboards and alerts to reflect
updates to production data pipelines.

Prerequisites:

● Ability to perform basic code development tasks using the Databricks Data
Engineering & Data Science workspace (create clusters, run code in
notebooks, use basic notebook operations, import repos from git, etc)
● Ability to configure and run data pipelines using the Delta Live Tables UI
● Beginner experience defining Delta Live Tables (DLT) pipelines using PySpark
● Ingest and process data using Auto Loader and PySpark syntax
● Process Change Data Capture feeds with APPLY CHANGES INTO syntax
● Review pipeline event logs and results to troubleshoot DLT syntax
● Production experience working with data warehouses and data lakes.

Learning objectives:

● Orchestrate tasks with Databricks Workflow Jobs.


● Use Databricks SQL for on-demand queries.
● Configure and schedule dashboards and alerts to reflect updates to
production data pipelines.

Getting Started with Databricks Engineering on


Databricks
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: The Databricks Data Science and Engineering Workspace


(Workspace) provides a collaborative analytics platform to help data practitioners
get the most out of Databricks when it comes to data science and engineering
tasks. This course guides practitioners through fundamental Workspace concepts
and components necessary to achieve a basic development workflow.

Prerequisites & Requirements

● Prerequisites
○ Beginning-level knowledge of the Databricks Lakehouse platform
(high-level knowledge the structure and benefits of the Lakehouse
platform)
○ Intermediate-level knowledge of Python (good understanding of the
language as well as ability to read and write code)
○ Beginning-level knowledge of SQL (ability to understand and construct
basic queries)

Learning objectives

● Describe the Databricks architecture and the services it provides.


● Navigate the Databricks Data Science and Engineering Workspace.
● Create and manage Databricks clusters for running code.
● Manage data using the Databricks File System and Delta Lake.
● Create and run Databricks Notebooks.
● Schedule non-interactive execution of Databricks Notebooks using
Databricks Jobs.
● Integrate a hosted Git service for revision control using Databricks Repos.
Get Started with Databricks for Business Leaders
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: The Databricks Data Science and Engineering Workspace


(Workspace) provides a collaborative analytics platform to help data practitioners
get the most out of Databricks when it comes to data science and engineering
tasks. This course guides practitioners through fundamental Workspace concepts
and components necessary to achieve a basic development workflow.

Prerequisites & Requirements

● Prerequisites
○ Beginning-level knowledge of the Databricks Lakehouse platform
(high-level knowledge the structure and benefits of the Lakehouse
platform)
○ Intermediate-level knowledge of Python (good understanding of the
language as well as ability to read and write code)
○ Beginning-level knowledge of SQL (ability to understand and construct
basic queries)

Learning objectives

● Describe the Databricks architecture and the services it provides.


● Navigate the Databricks Data Science and Engineering Workspace.
● Create and manage Databricks clusters for running code.
● Manage data using the Databricks File System and Delta Lake.
● Create and run Databricks Notebooks.
● Schedule non-interactive execution of Databricks Notebooks using
Databricks Jobs.
● Integrate a hosted Git service for revision control using Databricks Repos.

Get Started with Machine Learning on Databricks


Click here for the customer enrollment link.

Duration: 2 hours
Course description: This is an introductory course for machine learning
practitioners and data scientists to become familiar with the Databricks Machine
Learning (ML) support available in the Databricks Lakehouse Platform. Learners will
be introduced to how Databricks Machine Learning supports machine learning
initiatives, Feature Store, AutoML, and MLflow as part of Databricks Machine
Learning. Hands-on examples will give learners an opportunity to experience the
machine learning workflow support available in the Databricks Lakehouse Platform
from featurization to inference and model retraining.

Prerequisites & Requirements

● To ensure your success in this course, please be sure you meet the following
prerequisites: -
● Beginner-level knowledge of the Databricks Lakehouse Platform.
● Intermediate-level knowledge of Python.
● Knowledge of basic Machine Learning concepts and workflows.
● Access to Databricks Machine Learning

Learning objectives

● Describe how Databricks supports Machine Learning initiatives.


● Define Databricks Machine Learning.
● Navigate the Databricks Lakehouse Platform UI interface for Databricks
Machine Learning.
● Identify the unique features of Databricks ML
● Explain the purpose of the Databricks Feature Store.
● Use Feature Store to create a feature table.
● Describe how Databricks ML supports manual and automatic model
development.
● Use AutoML to develop a baseline model.
● Define the purpose of MLflow.
● Describe how MLflow supports Machine Learning initiatives.
● Use Model Registry to manage the model lifecycle.
● Use a registered model and feature table to perform batch inference.
● Schedule a model refresh using Databricks Workflows and AutoML.

Getting Started with Data Analysis on Databricks


Click here for the customer enrollment link.
Duration: 1 hour

Course description: This course is designed to introduce Business Leaders to


Databricks and the Databricks Lakehouse Platform. They will learn about the
benefits the lakehouse provides to their businesses through this introductory
content. This content will cover high-level, business impacting topics of value to a
business leader and will not go into technical depth on Databricks products.

Prerequisites & Requirements

● Prerequisites
○ None as this is an introductory course.

Learning objectives

● Describe Databricks and the Databricks Lakehouse Platform, it’s services, the
availability of ISV partnership, and how to begin migrating to Databricks.
● Identify members of the Databricks customer account team that will be
interacting with you throughout your customer journey.
● Locate where to find further information and training on the Databricks
Lakehouse Platform.
● Describe how Databricks promotes a secure data environment that can be
easily governed and scaled.
● Explain how, as an organization using Databricks, your company will be able to
reduce their total cost of ownership of data management solutions by using
Databricks.
● Define common data science terms used by Databricks when discussing the
Databricks Lakehouse Platform.

GCP Databricks Platform Administration


Fundamentals
Click here for the customer enrollment link.

Duration: 1.30 hours

Course description: This video series will provide you with a high-level overview of
the Google Cloud Platform (GCP) environment as it relates to Databricks, and it will
guide you through how to perform some fundamental tasks related to deploying and
managing Databricks workspaces in your organization through GCP.
Prerequisites:

● Beginner-level knowledge of GCP


● Beginner-level knowledge of Terraform is helpful
● Access to a GCP project is required, with the ability to add principals and
service accounts

● Account administrator capabilities in your Databricks account

Learning objectives:

● Describe and identify elements of the GCP Databricks architecture


● Create and manage workspaces and metastores, and supporting resources
● Automate administration operations

GCP Databricks Networking and Security


Fundamentals
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This video series will provide you with the background needed
to customize the structure of your environment and reinforce security at the
infrastructural level.

Prerequisites:

● Beginner-level knowledge of GCP


● Knowledge of networking concepts (IPv4 addressing, DNS).
● Account administrator capabilities in your Databricks account
● Access to your a GCP project, with the ability to enable APIs and create
service accounts and VPCs

Learning objectives:

● Create your own VPCs.


● Deploy workspaces into your own managed VPCs.
● Create your own customer-managed keys.
● Apply customer-managed keys to achieve different levels of encryption in
● your Databricks workspaces.
GCP Databricks Cloud Integrations
Click here for the customer enrollment link.

Duration: 1 hour

Course description: This video series will provide you with the background needed
to customize the structure of your environment and reinforce security at the
infrastructural level.

Prerequisites:

● Beginner-level knowledge of GCP


● Account administrator capabilities in your Databricks account
● Access to your a GCP project, with the ability to enable APIs and create
service accounts and buckets

Learning objectives:

● Create your own external GCP bucket and access data from Databricks.
● Set up a topic and subscription in Google Pub/Sub (Lite) and stream
messages from Databricks.
● Set up a data warehouse in Google BigQuery, connect it to Databricks, and
exchange data.

Generative AI Fundamentals
Click here for the customer enrollment link.

Duration: 2 hours

Course description: Welcome to Generative AI Fundamentals. This course provides


an introduction to how organizations can understand and utilize generative artificial
intelligence (AI) models. First, we'll start off with a quick introduction to generative AI
- we'll discuss what it is and pay special attention to large language models, also
known as LLMs. Then, we’ll move into how organizations can find success with
generative AI - we’ll take a deeper dive into what LLM applications are, discuss how
Lakehouse AI can help you succeed, and discuss essential considerations for
adopting AI in general. Finally, we'll tackle important aspects to consider when
evaluating the potential risks and challenges associated with using/adopting
generative AI.

Prerequisites:

● None

Learning objectives:

● By the end of this course you will be able to:-


● Describe how generative artificial intelligence (AI) is being used to
revolutionize practical AI applications.
● Describe how organizations can find success with generative AI applications.
● Recognize the potential legal and ethical considerations of using generative AI
applications.

Generative AI Fundamentals Accreditation


Click here for the customer enrollment link.

Duration: 10 Minutes

Course description: This is a 10-minute assessment that will test your knowledge of
fundamental concepts related to Generative AI.

After successfully completing this assessment, you will be awarded the Databricks
Generative AI Fundamentals badge.

How to Ingest Data for Databricks SQL


Click here for the customer enrollment link.

Duration: 30 Minutes

Course description: Before an analyst can analyze data, that data needs to be
ingested into the Lakehouse. This course shows three different ways to ingest data:
1. Using the Data Science & Engineering UI, 2. Using SQL, and 3. Using Partner
Connect.

Prerequisites:

● Intermediate knowledge of Databricks SQL


● Administrator privileges in Databricks SQL

Learning objectives:

● Upload data using the Data Science & Engineering UI


● Import data using Databricks SQL
● Provide proper data access privileges to users
● Import data using Partner Connect

Introduction to Delta Sharing


Click here for the customer enrollment link.

Duration: 20 minutes

Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data. It can share collections of tables in a Unity
Catalog metastore in real time without copying them, so that data recipients can
immediately begin working with the latest version of the shared data.

This course will give an overview of Delta Sharing with an emphasis on the benefits
of Delta Sharing over other methods of data sharing and how Delta Sharing fits into
the Lakehouse architecture. Then, there will be a demonstration on how to configure
Delta Sharing on a metastore and how to enable external data sharing.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform


● Basic understanding of how Unity Catalog is integrated into the Databricks
Lakehouse Platform
● Metastore admin or account admin privileges (optional, but needed if
following along with the configuration demo)

Learning objectives:

● Describe the benefits of Delta Sharing compared to traditional data sharing


systems
● Describe features and use cases of Delta Sharing
● Define key components of Delta Sharing
● Configure Delta Sharing on a metastore
● Enable external data sharing for an account

Introduction to Photon
Click here for the customer enrollment link.

Duration: 30 minutes

Course description: In this course, you’ll learn how Photon can be used to reduce
Databricks total cost of ownership (TCO) and dramatically improve query
performance. You’ll also learn best practices for when to use and not use Photon
Finally, the course will include a demonstration of a query run with and without
Photon to show improvement in query performance.

Prerequisites:

● Administrator privileges
● Introductory knowledge about the Databricks Lakehouse Platform (what the
Databricks Lakehouse Platform is, what it does, main components, etc.)

Learning objectives:

● Explain fundamental concepts about Photon on Databricks.


● Describe the benefits of enabling Photon on Databricks.
● Identify queries that would benefit from using Photon
● Describe the performance differences between a query run with and without
Photon enabled

Introduction to Python for Data Science and Data


Engineering
Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Introduction to Python for Data Science
and Data Engineering instructor-led course. It is an on-demand recording available
via the Databricks Academy and covers the same content as the instructor-led
course. For more information about what’s in the course itself, please visit this link.
Introduction to Time Series Forecasting with AutoML
Click here for the customer course enrollment link.

Duration: 31 minutes

Course description: Time series forecasting is an incredibly important component of any


organization’s portfolio of models. But the logic, set up, and tuning of time series models
can be overwhelming. Databricks offers Time Series models in its AutoML product, which
takes much of the time and headache out of creating these models! This course will
introduce learners to the time series functionality in AutoML.

Prerequisites:

● Beginning-level knowledge of and experience with AutoML.


● Beginning level knowledge of time series modeling
Learning objectives:

● Locate key features of AutoML Time Series UI


● Employ knowledge of the Prophet time series model to create a forecast in AutoML
via API

Introduction to Unity Catalog


Click here for the customer course enrollment link.

Duration: 1 hour

Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through fundamental Unity Catalog
concepts and tasks.

Prerequisites:

● This course has no specific course prerequisites.

Learning objectives:

● Describe key concepts related to Unity Catalog


● Describe how Unity Catalog is integrated with the Databricks platform
● Perform a variety of tasks with Unity Catalog (managing users and groups,
creating a Unity Catalog-enabled cluster, accessing Unity Catalog through
Databricks SQL, create and govern table assets, limit table access with views,
govern file access)

Machine Learning in Production(V2)


Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Machine Learning in Production


instructor-led course. It is an on-demand recording available via the Databricks
Academy and covers the same content as the instructor-led course. For more
information about what’s in the course itself, please visit this link.

Manage Data with Delta Lake


Click here for the customer enrollment link.

Duration: 3 hours

Course description: The Databricks Lakehouse Platform enables individuals


throughout an organization to collaboratively develop, productionalize, and derive
insights from data assets using a set of common tools and a unified collection of
databases. This module presents an overview of the data lakehouse and provides an
in-depth hands-on introduction to Delta Lake.

Prerequisites:

● Beginner familiarity with cloud computing concepts (virtual machines, object


storage, etc.)
● Ability to perform basic code development tasks using the Databricks Data
Engineering & Data Science workspace (create clusters, run code in
notebooks, use basic notebook operations, import repos from git, etc)
● Beginning programming experience with Spark SQL
● Extract data from a variety of file formats and data sources
● Apply a number of common transformations to clean data
● Reshape and manipulate complex data using advanced built-in functions

Learning objectives:
● Describe how Delta Lake transactional guarantees provide the foundation for
the data lakehouse architecture.
● Use Delta Lake DDL to create tables, compact files, restore previous table
versions, and perform garbage collection of tables in the Lakehouse.
● Use CTAS to store data derived from a query in a Delta Lake table
● Use SQL to perform complete and incremental updates to existing tables.

Manage Data Access with Unity Catalog


Click here for the customer enrollment link.

Duration: 3 hours

Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through fundamental Unity Catalog
concepts and tasks necessary to satisfy data governance requirements.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform (high-level


knowledge the structure and benefits of the Lakehouse platform)
● Beginning-level knowledge of SQL (ability to understand and construct basic
queries)

Learning objectives:

● Describe Unity Catalog key concepts and how it integrates with the
Databricks platform
● Access Unity Catalog through clusters and SQL warehouses
● Create and govern data assets in Unity Catalog
● Adopt Databricks recommendations into your organization’s Unity Catalog
based solutions

● Data Science Workspace

New Capability Overview: bamboolib


Click here for the customer enrollment link.
Duration: 70 minutes

Course description: bamboolib delivers an extendable GUI that exports Python code
for fast, simple data exploration and transformation without any coding required for
users. The UI-based workflows help make Databricks accessible for both citizen
data scientists and experts alike and reduces employee on-boarding and training
costs. These no-code use cases include data exploration, data transformation and
data visualization.

There are many benefits of using bamboolib on Databricks for team members who
have coding skills and team members who cannot code. Team members who have
coding skills can speed up their data exploration and transformation process by
avoiding repetitive tasks and use out-of-the box best-practice analyses provided
by bamboolib. Others who cannot code can use bamboolib for all stages of the data
analysis process without writing code or they may use bamboolib as a great
entrypoint for getting started to learn coding.

This course will introduce you to bamboolib, discuss its use case, and demonstrate
how to use it on Databricks platform. The demonstration will cover loading data
from various sources, exploring data, transforming data and visualizing data.

Prerequisites & Requirements

● In this course, learners are expected to have;


● Intermediate level of data analysis concept and methods including data
transformation, data visualization, and summary statistics
● Basic understanding of statistical methods

Learning objectives

● Describe features and use cases of the bamboolib on Databricks


● Import data from static files and database tables into the bamboolib UI
● Utilize bamboolib for data exploration
● Utilize bamboolib for data transformation
● Utilize bamboolib for data visualization
New Capability Overview: Data Lineage with Unity
Catalog
Click here for the customer enrollment link.

Duration: 1 hour 30 minutes

Course description: Databricks Unity Catalog, the unified governance solution for all
data and AI assets on lakehouse, brings support for data lineage. Data lineage
includes capturing all the relevant metadata and events associated with the data in
its lifecycle, including the source of the data set, what other data sets were used to
create it, who created it and when, what transformations were performed, what
other data sets leverage it, and many other events and attributes. With a data
lineage solution, data teams get an end-to-end view of how data is transformed
and how it flows across their data estate. Lineage is supported for all languages and
is captured down to the column level. Lineage data includes notebooks, workflows,
and dashboards related to the query. Lineage can be visualized in Data Explorer in
near real-time and retrieved with the Databricks REST API.

In this course, we are going to cover the fundamentals of data lineage, how to create
lineage enabled clusters, and demonstrate how to capture and view lineage data.

Prerequisites:

● Basic knowledge of Databricks SQL

Learning objectives:

● Describe fundamental concepts of data lineage


● Explain common use cases and features of data lineage with Unity Catalog
● Describe the technical requirements for data lineage on Databricks
● Configure a cluster and SQL warehouse to enable data lineage
● Utilize Data Explorer UI to explore lineage information at table and column
level
● Manage data lineage permissions at data and entity level
New Capability Overview: Data Profiles in Databricks
Notebooks
Click here for the customer enrollment link.

Duration: 30 minutes

Course description: Exploratory data analysis is a key part of the repeating cycle of
exploration, development, and validation that makes up data asset development,
and establishing a baseline understanding of a data set is a crucial job that is done
as part of EDA. Databricks simplifies this process of understanding data sets,
making it possible to generate a robust profile or summary of the data set with the
push of a button. These are available in notebooks in Databricks Machine Learning
and Databricks Data Science and Engineering Workspaces.

Prerequisites:

● Basic Python skills (e.g. can declare variables, distinguish between methods
and attributes).
● Ability to describe and utilize basic summary statistics (e.g mean, standard
deviation, median).

Learning objectives:

● Explain fundamental concepts about creating Data Profiles to help the


process of EDA.
● Perform basic tasks using Data Profiles in Databricks notebooks.

New Capability Overview: Databricks SQL Serverless


Click here for the customer enrollment link.

Duration: 10 minutes

Course description: In this course, you will learn about how Databricks SQL users can
leverage the nearly instant-on capability of Serverless to make working with Databricks SQL
faster than ever. In addition, you will learn how customer data is protected and isolated.
Finally, you will learn how customers can enable this feature.

Prerequisites & Requirements


● Prerequisites
○ Experience with Databricks SQL
○ Familiarity with the basics of how cloud compute resources are provisioned
○ Administrator access

Learning objectives

● Explain how Databricks SQL Serverless fits into the lakehouse architecture
● Describe how Databricks SQL Serverless works
● Explain how Databricks SQL Serverless keeps customer data secure and isolated
● Implement Databricks SQL Serverless on customer accounts

New Capability Overview: Global Search


Click here for the customer enrollment link.

Duration: 20 minutes

Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data. It can share collections of tables in a Unity
Catalog metastore in real time without copying them, so that data recipients can
immediately begin working with the latest version of the shared data.

This course will give an overview of Delta Sharing with an emphasis on the benefits
of Delta Sharing over other methods of data sharing and how Delta Sharing fits into
the Lakehouse architecture. Then, there will be a demonstration on how to configure
Delta Sharing on a metastore and how to enable external data sharing.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform


● Basic understanding of how Unity Catalog is integrated into the Databricks
Lakehouse Platform
● Metastore admin or account admin privileges (optional, but needed if
following along with the configuration demo)
Learning objectives:

● Describe the benefits of Delta Sharing compared to traditional data sharing


systems
● Describe features and use cases of Delta Sharing
● Define key components of Delta Sharing
● Configure Delta Sharing on a metastore
● Enable external data sharing for an account

New Capability Overview: Model Serving


Click here for the customer enrollment link.

Duration: 1 Hour

Course description: Databricks Model Serving is a turnkey solution that provides a highly
available and low-latency service for deploying machine learning models as scalable REST
API endpoints. This course will teach you how to deploy machine learning models using
Databricks Model Serving. You will learn how to integrate your lakehouse data with
Databricks Model Serving, how to deploy multiple models behind an endpoint, and monitor
Model serving endpoint using internal dashboard and external tools.

Prerequisites & Requirements

● Prerequisites
○ Write basic machine learning code
○ Able to train models within the Databricks platform
○ Ability or previous experience deploying models is helpful but not
required
Learning objectives

● Describe main features of Model Serving


● Explain how Model Serving solve common ML model deployment problems
● Create and configure model endpoints for serving machine learning models
● Query model endpoints in real-time for inference
● Monitor endpoint performance using internal dashboard and external tools
New Capability Overview: ipywidgets
Click here for the customer enrollment link.

Duration: 1:15 hour

Until recently, Databricks widgets have been the only option for users wanting to use
widgets in Databricks notebooks. Now, with the integration of ipywidgets, Databricks users
have an alternative method for building interactive notebooks.

Widgets can be embedded into the Databricks notebooks to provide a user-friendly


interface that can be used to collect and use these data in the code without having to
change the code. This method can be used to create interactive data apps in notebooks.

This course will introduce you to ipywidgets, discuss use cases for when to use them, and
demonstrate how to embed various ipywidgets controls in Databricks notebooks.
Prerequisites & Requirements

● Prerequisites
○ In this course, learners are expected to have;
○ Intermediate level python coding skills
○ Basic knowledge of pyspark for reading and querying data
○ Basic knowledge of CSS for styling the controls

Learning objectives

● Describe ipywidgets features and use cases


● Explain the difference between ipywidgets and Databricks widgets
● List supported widget types
● List supported and unsupported ipywidgets controls on DB
● Utilize ipywidgets controls to create interactive notebooks
● Use layout and styling controls to build custom layouts

New Capability Overview: Personal Compute


Click here for the customer course enrollment link.

Duration: 30 Minutes

Course description: Databricks is no longer a tool that is only suitable for massive,
Spark-powered workloads. Personal compute is a new, Databricks-managed default
compute policy that will appear in all customers’ workspaces.
Prerequisites:

● Log in to Databricks
● Access to cluster creation

Learning objectives:

● Compare and contrast Personal Compute node and multi-node clusters


● Create a Personal Compute node

New Capability Overview: VS Code Extension for


Databricks
Click here for the customer course enrollment link.

Duration: 00:45 hours

Course description: Integrated development environments (IDEs) are important for


Databricks users because they provide a variety of features that can increase
productivity and streamline the development process. The new Databricks
extension for VS Code allows users to develop the Databricks Lakehouse platform
from VSCode. This extension allows Databricks users to connect their local VSCode
environment to remote Databricks workspaces, synchronize local code in VS Code
with remote workspaces, and run Python files and Python notebooks in remote
workspaces. In this course, we are going to introduce you to this new extension and
discuss its main features. In the demo section of the course, we are going to
demonstrate how to install and configure the extension for VS Code and run locally
developed Python applications on Databricks clusters in the cloud.

Prerequisites:

● Basic knowledge of Python programming.


● Basic knowledge of Databricks cluster and notebook concepts.
● Familiarity with VS Code interface.

Learning objectives:

● Explain the main features of VS Code extension for Databricks.


● Describe the local and workspace requirements for using the extension.
● Install and configure the extension to run local Python files.
● Run Python files and Python notebooks as standalone and workflow jobs.
● Run a unit test function using custom runtime configuration.

New Capability Overview: Workspace Browser for


Databricks SQL
Click here for the customer course enrollment link.

Duration: 20 minutes

Course description: Creating, managing and sharing workspace entities on the


Databricks platform is a very common task among data engineers, data scientists
and data analysts. The new Workspace Browser for DBSQL unifies browsing for
content across all three Databricks personas. With this feature, you are able to
create, browse, manage and share existing workspace content (notebooks,
experiments, etc.) and DBSQL specific content (queries, dashboards, alerts, etc.).

This short, introductory course will introduce you to the new Workspace Browser
within Databricks SQL and demonstrate how to use this feature effectively. The first
section of the course will cover features and use cases of the new Workspace
Browser and in the second section of the course we will demonstrate how to create,
organize and share assets using this new feature.

Prerequisites:

● There are no prerequisites for this course.

Learning objectives:

● Describe the Workspace Browser features and use cases


● Migrate existing objects into the Workspace Browser
● Create, organize and share new objects using the Workspace Browser

Optimizing Apache Spark on Databricks


Click here for the customer course enrollment link.

Duration: 12 hours
Course description: In this course, students will explore five key problems that
represent the vast majority of performance problems in an Apache Spark
application: Skew, Spill, Shuffle, Storage, and Serialization. With each of these topics,
we explore coding examples based on 100 GB to 1+ TB datasets that demonstrate
how these problems are introduced, how to diagnose these problems with tools like
the Spark UI, and conclude by discussing mitigation strategies for each of these
problems.

We continue the conversation by looking at a series of key ingestion concepts that


promote strategies for processing Tera Bytes of data including managing
Spark-Partition sizes, Disk-Partitioning, Bucketing, Z-Ordering, and more. With each
of these topics, we explore when and how each of these techniques should be
implemented, new challenges that productionalizing these solutions might provide
along with corresponding mitigation strategies.

Finally, we introduce a couple of other key topics such as issues with Data Locality,
IO-Caching and Spark-Caching, Pitfalls of Broadcast Joins, and new features like
Spark 3’s Adaptive Query Execution and Dynamic Partition Pruning. We then
conclude the course with discussions and exercises on designing and configuring
clusters for optimal performance given specific use cases, personas, the divergent
needs of various teams, and cross-team security concerns.

Prerequisites:

● Intermediate to advanced programming experience in Python or Scala


● Hands-on experience developing Apache Spark applications

Learning objectives:

● Articulate how the five most common performance problems in a Spark


application can be mitigated to achieve better application performance.
● Summarize some of the most common performance problems associated
with data ingestion and how to mitigate them.
● Articulate how new features in Spark 3.0 can be employed to mitigate
performance problems in your Spark applications.
● Configure a Spark cluster for maximum performance given specific job
requirements and while considering a multitude of other factors.
Scalable Machine Learning with Apache Spark (V2)
Click here for the customer course enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Scalable Machine Learning with Apache
Spark instructor-led course. It is an on-demand recording available via the
Databricks Academy and covers the same content as the instructor-led course. For
more information about what’s in the course itself, please visit this link.

Share Data within Databricks Using Delta Sharing


Click here for the customer course enrollment link.

Duration: 45 Minutes

Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data. With Delta Sharing, users can share
collections of tables in a Unity Catalog metastore in real time without copying them,
so that data recipients can immediately begin working with the latest version of the
shared data.

This course will focus on the data sharing process with Delta Sharing. On Databricks
users manage data sharing processes through UI or using Databricks SQL queries. In
this course, we will start by demonstrating how to share data using UI and then we
will use SQL for the same process. It is highly recommended that you are familiar
with both methods.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform (high-level


knowledge the structure and benefits of the Lakehouse platform)
● Basic understanding of how Unity Catalog is integrated into the Databricks
Lakehouse Platform
● Beginning-level knowledge of SQL
● Admin account privileges or have been granted required privileges
Learning objectives:

● Describe the data sharing and data access process with Delta Sharing within
Databricks
● Share data and access shared data within Databricks using the UI
● Share data and access shared data within Databricks using SQL queries
● Share and access partial data within Databricks

Transform Data with Spark


Click here for the customer course enrollment link.

Duration: 3 hours

Course description: While the data lakehouse combines the best aspects of the
data warehouse and the data lake, users familiar with one or both of these
environments may still encounter new concepts as they move to Databricks. By the
end of these lessons, students will feel comfortable defining databases, tables, and
views in the Lakehouse, ingesting arbitrary data from a variety of sources, and
writing simple applications to drive ETL pipelines.

NOTE: This course covers both PySpark and SQL. Also, the lessons in this course are
a subset of the Data Engineering with Databricks course.

Prerequisites:

● Beginner familiarity with basic cloud concepts (virtual machines, object


storage, identity management)
● Ability to perform basic code development tasks using the Databricks Data
Engineering & Data Science workspace (create clusters, run code in
notebooks, use basic notebook operations, import repos from git, etc)
● Intermediate familiarity with basic SQL concepts (select, filter, groupby, join,
etc)
● Beginner programming experience with Python (syntax, conditions, loops,
functions)
● Beginner programming experience with the Spark DataFrame API: Configure
DataFrameReader and DataFrameWriter to read and write data, Express
query transformations using DataFrame methods and Column expressions,
Navigate the Spark documentation to identify built-in functions for various
transformations and data types.

Learning objectives:

● Extract data from a variety of file formats and data sources.


● Apply a number of common transformations to clean data.
● Reshape and manipulate complex data using advanced built-in functions.
● Leverage UDFs for reusable code and apply best practices for performance.

Unity Catalog Patterns and Best Practices


Click here for the customer course enrollment link.

Duration: 30 Minutes

Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through recommended practices
and patterns for implementing data architectures centered on Unity Catalog.

Prerequisites:

● This course has no specific course prerequisites.

Learning objectives:

● Adopt Databricks recommendations into your organization’s Unity Catalog


based solutions

What is Big Data?


Click here for the customer course enrollment link.

Duration: 1 hour

Course description: This course was created for individuals who are new to the big
data landscape and want to become conversant with big data terminology. It will
cover foundational concepts related to the big data landscape including:
characteristics of big data; the relationship between big data, artificial intelligence,
and data science; how individuals on data science teams work with big data; and
how organizations can use big data to enable better business decisions.
Prerequisites:

● Experience using a web browser

Learning objectives:

● Explain foundational concepts used to define big data.


● Explain how the characteristics of big data have changed traditional
organizational workflows for working with data.
● Summarize how individuals on data science teams work with big data on a
daily basis to drive business outcomes.
● Articulate examples of real-world use-cases for big data in businesses across
a variety of industries.

You might also like