Course Catalog
Course Catalog
Catalog
UPDATED: AUGUST 2023
Via the Databricks Academy, you can access self-paced e-learning and
instructor-led courses that help you prepare for Databricks certification exams and
focus on how to use the Databricks Lakehouse Platform for:
● Data engineering
● Machine learning and AI
● Data analysis
● Platform administration
Training Offerings
Self-paced e-learning is virtual training available 24/7 to individuals signed up for
the Databricks Academy. Databricks customers and partners are granted access to
self-paced e-learning for free. Non-Databricks customers and partners are able to
purchase a subset of content available. Training currently includes mostly lectures
and demos on how to use the Databricks Lakehouse Platform.
Accreditations are 30-minute quizzes available via the Databricks Academy after
completing a selection of Databricks courses or learning plans. Upon successful
completion of an accreditation, badges are issued that can be shared on social
media sites and professional networks.
Certifications are 1.5 to two hours exams available via our certification platform.
Upon successful completion of an exam, badges are issued that can be shared on
social media sites and professional networks, and validate your data and AI skills in
the Databricks Lakehouse Platform.
Learning paths
Learning paths are designed to help guide users to the courses most relevant to
them.
Current pathways are available for Databricks fundamentals, data analysts, data
engineers, machine learning practitioners, and Apache Spark. The credential
milestones for each step within these pathways are shown in the images below.
Below, you’ll find a breakdown of the courses required for each of these steps. We
will update these regularly, as new courses are released.
Data engineering
Click here for the customer enrollment link for this learning plan.
Machine learning
Click here for the customer enrollment link for this learning plan.
Platform administration (cloud agnostic)
Click here for the customer enrollment link for this learning plan.
Apache Spark
Click here for the customer enrollment link for this learning plan.
Databricks Academy Updates for this
quarter (August - September -
October)
SP = Self-paced | LP = Self-paced learning plan | ACCRED = Free accreditation exam via
Databricks Academy | CERT = Proctored certification exam via WebAssessor (for a fee)
Content published/updated
● Data Analysis with Databricks (SP - August 11)
Content retired/to-be-retired
● New Capability Overview: Databricks SQL Serverless (SP - August 25)
● New Capability Overview: Workspace Browser for Databricks SQL (SP - August 25)
● Data Visualization on Databricks SQL (SP - August 25)
● Introduction to Time Series Forecasting with AutoML (SP - August 25)
● New Capability Overview: MLflow Model Serving (SP - August 25)
● New Capability Overview: Serverless Real-time Inference (SP - August 25)
Duration: 35 Minutes
Delta Sharing is an open protocol for secure data sharing with other organizations
regardless of which computing platforms they use. Databricks provides both open
source and managed options for Delta Sharing. Databricks-managed Delta Sharing
allows data providers to share data and data recipients to access the shared data.
With Delta Sharing, users can share collections of tables in a Unity Catalog
metastore in real time without copying them, so that data recipients can
immediately begin working with the latest version of the shared data.
This course will focus on sharing data externally and accessing shared data from
external tools such as PowerBI and Apache Spark. First, we will demonstrate how to
share data externally. Then, we will show how to access the shared data from
external tools.
Prerequisites:
Learning objectives:
Duration: 12 hours
Course description: In this course, students will build upon their existing knowledge
of Apache Spark, Structured Streaming, and Delta Lake to unlock the full potential of
the data lakehouse by utilizing the suite of tools provided by Databricks. This course
places a heavy emphasis on designs favoring incremental data processing, enabling
systems optimized to continuously ingest and analyze ever-growing data. By
designing workloads that leverage built-in platform optimizations, data engineers
can reduce the burden of code maintenance and on-call emergencies, and quickly
adapt production code to new demands with minimal refactoring or downtime. The
topics in this course should be mastered prior to attempting the Databricks
Certified s Data Engineering Professional exam.
Prerequisites:
Learning objectives:
Duration: 12 hours
Duration: 2 hours
Course description: This 2 hour video series will walk you through three common
integrations to help build out the capabilities of your AWS Databricks applications.
Prerequisites:
Learning objectives:
Duration: 1 hour
Course description: This 1 hour video series will provide you with the background
needed to customize the structure of your environment and reinforce security at
the infrastructural level.
Prerequisites:
Learning objectives:
Duration: 1 hour
Course description: This 1 hour video series will provide you with a high-level
overview of the AWS environment as it relates to Databricks, and it will guide you
through how to perform some fundamental tasks related to deploying and managing
Databricks workspaces in your organization.
Prerequisites:
Learning objectives:
Duration: 40 Minutes.
Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require the following
privileges:-
● Azure Databricks Account administration
● Azure Cloud administration
Learning objectives:
Course description: This course focuses on providing you with a foundation for the
networking and security needs for an Azure Databricks workspace in your Azure
Cloud ecosystem. You’ll be able to explore how identity and access is managed
through Azure Active Directory and Unity Catalog. Additionally, you’ll be able to
review some foundational networking concepts and how they are applicable to the
Azure Databricks environment, such as Azure Software Defined Networks, CIDR
ranges, subnets, and VNet peering. You’ll also explore how Azure Databricks
workspaces can be secured through IP Access List, User Defined Routes, private and
service endpoints, and private DNS zones to support Data Loss Prevention
strategies.
Prerequisites:
● "The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require the following privileges:
● Account administration
● Cloud administration
● Additional prerequisites include a solid understanding of IPv4 addresses,
subnets, CIDR ranges, and general networking concepts."
Learning objectives:
Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require the following privileges:
● Account administration
● Cloud administration
● Additional prerequisites include a solid understanding of IPv4 addresses,
subnets, CIDR ranges, and general networking concepts."
Learning objectives:
Duration: 3 hours
Course description: Students will use Delta Live Tables with Spark SQL and Python
to define and schedule pipelines that incrementally process new data from a variety
of data sources into the Lakehouse.
Prerequisites:
Learning objectives:
● Describe how Delta Live Tables tracks data dependencies in data pipelines
● Configure and run data pipelines using the Delta Live Tables UI
● Use Python or Spark SQL to define data pipelines that ingest and process
data through multiple tables in the lakehouse using Auto Loader and Delta
Live Tables
● Use APPLY CHANGES INTO syntax to process Change Data Capture feeds
● Review event logs and data artifacts created by pipelines and troubleshoot
DLT syntax
Build Data Pipelines with Delta Live Tables and
PySpark
Click here for the customer enrollment link.
Duration: 3 hours
Course description: Students will use Delta Live Tables with Spark SQL and Python
to define and schedule pipelines that incrementally process new data from a variety
of data sources into the Lakehouse.
Prerequisites:
Learning objectives:
● Describe how Delta Live Tables tracks data dependencies in data pipelines
● Configure and run data pipelines using the Delta Live Tables UI
● Use Python or Spark SQL to define data pipelines that ingest and process
data through multiple tables in the lakehouse using Auto Loader and Delta
Live Tables
● Use APPLY CHANGES INTO syntax to process Change Data Capture feeds
● Review event logs and data artifacts created by pipelines and troubleshoot
DLT syntax
Certification Overview: Databricks Certified
Associate Developer for Apache Spark Exam
Click here for the customer enrollment link.
Duration: 1 hour
Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.
Prerequisites:
Learning objectives:
● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Duration: 1 hour
Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.
Prerequisites:
Learning objectives:
● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Duration: 1 hour
Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.
Prerequisites:
Learning objectives:
● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Certification Overview: Databricks Certified Data
Engineer Professional Exam
Click here for the customer enrollment link.
Duration: 1 hour
Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.
Prerequisites:
● Describe how to use and the benefits of the Databricks platform and
developer tools
● Build optimized and cleaned data processing pipelines using the Spark and
Delta Lake APIs
● Model data into a Lakehouse using knowledge of general data modeling
concepts
● Ensure data pipelines secure, reliable, monitored, and tested before
deployment
Learning objectives:
● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Duration: 1 hour
Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.
Prerequisites:
● Use Databricks Machine Learning and its capabilities within machine learning
workflows
● Implement correct decisions in machine learning workflows
● Implement machine learning solutions at scale using Spark ML and other tools
● Understand advanced scaling characteristics of classical machine learning
models
Learning objectives:
● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Duration: 1 hour
Course description: This course will cover the format and structure of the exam,
topics assessed by the exam, example questions, and tips for exam preparation.
Prerequisites:
Learning objectives:
● Describe the learning context, format, and structure behind the exam.
● Describe the topics covered in the exam.
● Recognize the different types of questions provided on the exam.
● Identify resources that can be used to learn the material covered in the exam.
Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.
Learning objectives:
Duration: 30 Minutes
Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through creating compute
resources capable of accessing Unity Catalog.
Prerequisites:
Learning objectives:
Duration: 1 hour
Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through techniques for creating
and governing data objects in Unity Catalog.
Prerequisites:
Learning objectives:
● Describe the security model for governing data objects in Unity Catalog
● Define data access rules and manage data ownership
● Secure access to external storage
Data Administration in Databricks
Click here for the customer enrollment link.
Course description: This Data Administration in Databricks course will provide you
with a basis of how data is managed in the Databricks platform with a feature known
as Unity Catalog.
Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.
Learning objectives:
Duration: 6 hours
Duration: 12 hours
Prerequisites:
Learning objectives:
Duration: 1 Hour
Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.
Learning objectives:
Duration: 40 Minutes
Course description: This Databricks Identity Administration course will provide you
with a basis of how identity and access management is done in the Databricks
platform.
Prerequisites:
● The course is primarily demo-based, but learners can follow along assuming
they have sufficient privileges. Some exercises require account
administration, workspace administration, and/or metastore ownership.
Learning objectives:
● Describe Databricks identities and how they apply across the platform
● Configure groups, users and service principals
● Automate user and group provisioning
Course description:
The Databricks Platform Administration Fundamentals course will provide you with a
high-level overview of the Databricks platform, targeting the specific needs of
platform administrators.
Prerequisites:
The course is primarily demo-based, but learners can follow along assuming they
have sufficient privileges. Some exercises require account administration
capabilities.
Learning objectives:
Duration: 45 Minutes
● Prerequisites
○ The course is primarily demo-based, but learners can follow along
assuming they have sufficient privileges. Some exercises require
account administration, workspace administration, and/or metastore
ownership.
Learning objectives
Duration: 2 hours
Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data.
While data sharing brings a wide range of opportunities for data practitioners, the
performance and security concerns become apparent. Thus, it is crucial to ensure
the security and performance of the shared. This course will focus on the most
important best practices for data sharing with Databricks-managed Delta Sharing.
Prerequisites:
Learning objectives:
Duration: 12 hours
NOTE: This is an e-learning version of the Deep Learning with Databricks instructor-led
course. It is an on-demand recording available via the Databricks Academy and covers the
same content as the instructor-led course. For more information about what’s in the course
itself, please visit this link.
Duration: 3 hours
Course description: Moving a data pipeline to production means more than just
confirming that code and data are working as expected. By scheduling tasks with
Databricks Jobs, applications can be run automatically to keep tables in the
Lakehouse fresh. Using Databricks SQL to schedule updates to queries and
dashboards allows quick insights using the newest data. In this course, students will
be introduced to task orchestration using the Databricks Workflow Jobs UI.
Optionally, they will configure and schedule dashboards and alerts to reflect
updates to production data pipelines.
Prerequisites:
● Ability to perform basic code development tasks using the Databricks Data
Engineering & Data Science workspace (create clusters, run code in
notebooks, use basic notebook operations, import repos from git, etc)
● Ability to configure and run data pipelines using the Delta Live Tables UI
● Beginner experience defining Delta Live Tables (DLT) pipelines using PySpark
● Ingest and process data using Auto Loader and PySpark syntax
● Process Change Data Capture feeds with APPLY CHANGES INTO syntax
● Review pipeline event logs and results to troubleshoot DLT syntax
● Production experience working with data warehouses and data lakes.
Learning objectives:
● Prerequisites
○ Beginning-level knowledge of the Databricks Lakehouse platform
(high-level knowledge the structure and benefits of the Lakehouse
platform)
○ Intermediate-level knowledge of Python (good understanding of the
language as well as ability to read and write code)
○ Beginning-level knowledge of SQL (ability to understand and construct
basic queries)
Learning objectives
● Prerequisites
○ Beginning-level knowledge of the Databricks Lakehouse platform
(high-level knowledge the structure and benefits of the Lakehouse
platform)
○ Intermediate-level knowledge of Python (good understanding of the
language as well as ability to read and write code)
○ Beginning-level knowledge of SQL (ability to understand and construct
basic queries)
Learning objectives
Duration: 2 hours
Course description: This is an introductory course for machine learning
practitioners and data scientists to become familiar with the Databricks Machine
Learning (ML) support available in the Databricks Lakehouse Platform. Learners will
be introduced to how Databricks Machine Learning supports machine learning
initiatives, Feature Store, AutoML, and MLflow as part of Databricks Machine
Learning. Hands-on examples will give learners an opportunity to experience the
machine learning workflow support available in the Databricks Lakehouse Platform
from featurization to inference and model retraining.
● To ensure your success in this course, please be sure you meet the following
prerequisites: -
● Beginner-level knowledge of the Databricks Lakehouse Platform.
● Intermediate-level knowledge of Python.
● Knowledge of basic Machine Learning concepts and workflows.
● Access to Databricks Machine Learning
Learning objectives
● Prerequisites
○ None as this is an introductory course.
Learning objectives
● Describe Databricks and the Databricks Lakehouse Platform, it’s services, the
availability of ISV partnership, and how to begin migrating to Databricks.
● Identify members of the Databricks customer account team that will be
interacting with you throughout your customer journey.
● Locate where to find further information and training on the Databricks
Lakehouse Platform.
● Describe how Databricks promotes a secure data environment that can be
easily governed and scaled.
● Explain how, as an organization using Databricks, your company will be able to
reduce their total cost of ownership of data management solutions by using
Databricks.
● Define common data science terms used by Databricks when discussing the
Databricks Lakehouse Platform.
Course description: This video series will provide you with a high-level overview of
the Google Cloud Platform (GCP) environment as it relates to Databricks, and it will
guide you through how to perform some fundamental tasks related to deploying and
managing Databricks workspaces in your organization through GCP.
Prerequisites:
Learning objectives:
Duration: 1 hour
Course description: This video series will provide you with the background needed
to customize the structure of your environment and reinforce security at the
infrastructural level.
Prerequisites:
Learning objectives:
Duration: 1 hour
Course description: This video series will provide you with the background needed
to customize the structure of your environment and reinforce security at the
infrastructural level.
Prerequisites:
Learning objectives:
● Create your own external GCP bucket and access data from Databricks.
● Set up a topic and subscription in Google Pub/Sub (Lite) and stream
messages from Databricks.
● Set up a data warehouse in Google BigQuery, connect it to Databricks, and
exchange data.
Generative AI Fundamentals
Click here for the customer enrollment link.
Duration: 2 hours
Prerequisites:
● None
Learning objectives:
Duration: 10 Minutes
Course description: This is a 10-minute assessment that will test your knowledge of
fundamental concepts related to Generative AI.
After successfully completing this assessment, you will be awarded the Databricks
Generative AI Fundamentals badge.
Duration: 30 Minutes
Course description: Before an analyst can analyze data, that data needs to be
ingested into the Lakehouse. This course shows three different ways to ingest data:
1. Using the Data Science & Engineering UI, 2. Using SQL, and 3. Using Partner
Connect.
Prerequisites:
Learning objectives:
Duration: 20 minutes
Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data. It can share collections of tables in a Unity
Catalog metastore in real time without copying them, so that data recipients can
immediately begin working with the latest version of the shared data.
This course will give an overview of Delta Sharing with an emphasis on the benefits
of Delta Sharing over other methods of data sharing and how Delta Sharing fits into
the Lakehouse architecture. Then, there will be a demonstration on how to configure
Delta Sharing on a metastore and how to enable external data sharing.
Prerequisites:
Learning objectives:
Introduction to Photon
Click here for the customer enrollment link.
Duration: 30 minutes
Course description: In this course, you’ll learn how Photon can be used to reduce
Databricks total cost of ownership (TCO) and dramatically improve query
performance. You’ll also learn best practices for when to use and not use Photon
Finally, the course will include a demonstration of a query run with and without
Photon to show improvement in query performance.
Prerequisites:
● Administrator privileges
● Introductory knowledge about the Databricks Lakehouse Platform (what the
Databricks Lakehouse Platform is, what it does, main components, etc.)
Learning objectives:
Duration: 12 hours
NOTE: This is an e-learning version of the Introduction to Python for Data Science
and Data Engineering instructor-led course. It is an on-demand recording available
via the Databricks Academy and covers the same content as the instructor-led
course. For more information about what’s in the course itself, please visit this link.
Introduction to Time Series Forecasting with AutoML
Click here for the customer course enrollment link.
Duration: 31 minutes
Prerequisites:
Duration: 1 hour
Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through fundamental Unity Catalog
concepts and tasks.
Prerequisites:
Learning objectives:
Duration: 12 hours
Duration: 3 hours
Prerequisites:
Learning objectives:
● Describe how Delta Lake transactional guarantees provide the foundation for
the data lakehouse architecture.
● Use Delta Lake DDL to create tables, compact files, restore previous table
versions, and perform garbage collection of tables in the Lakehouse.
● Use CTAS to store data derived from a query in a Delta Lake table
● Use SQL to perform complete and incremental updates to existing tables.
Duration: 3 hours
Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through fundamental Unity Catalog
concepts and tasks necessary to satisfy data governance requirements.
Prerequisites:
Learning objectives:
● Describe Unity Catalog key concepts and how it integrates with the
Databricks platform
● Access Unity Catalog through clusters and SQL warehouses
● Create and govern data assets in Unity Catalog
● Adopt Databricks recommendations into your organization’s Unity Catalog
based solutions
Course description: bamboolib delivers an extendable GUI that exports Python code
for fast, simple data exploration and transformation without any coding required for
users. The UI-based workflows help make Databricks accessible for both citizen
data scientists and experts alike and reduces employee on-boarding and training
costs. These no-code use cases include data exploration, data transformation and
data visualization.
There are many benefits of using bamboolib on Databricks for team members who
have coding skills and team members who cannot code. Team members who have
coding skills can speed up their data exploration and transformation process by
avoiding repetitive tasks and use out-of-the box best-practice analyses provided
by bamboolib. Others who cannot code can use bamboolib for all stages of the data
analysis process without writing code or they may use bamboolib as a great
entrypoint for getting started to learn coding.
This course will introduce you to bamboolib, discuss its use case, and demonstrate
how to use it on Databricks platform. The demonstration will cover loading data
from various sources, exploring data, transforming data and visualizing data.
Learning objectives
Course description: Databricks Unity Catalog, the unified governance solution for all
data and AI assets on lakehouse, brings support for data lineage. Data lineage
includes capturing all the relevant metadata and events associated with the data in
its lifecycle, including the source of the data set, what other data sets were used to
create it, who created it and when, what transformations were performed, what
other data sets leverage it, and many other events and attributes. With a data
lineage solution, data teams get an end-to-end view of how data is transformed
and how it flows across their data estate. Lineage is supported for all languages and
is captured down to the column level. Lineage data includes notebooks, workflows,
and dashboards related to the query. Lineage can be visualized in Data Explorer in
near real-time and retrieved with the Databricks REST API.
In this course, we are going to cover the fundamentals of data lineage, how to create
lineage enabled clusters, and demonstrate how to capture and view lineage data.
Prerequisites:
Learning objectives:
Duration: 30 minutes
Course description: Exploratory data analysis is a key part of the repeating cycle of
exploration, development, and validation that makes up data asset development,
and establishing a baseline understanding of a data set is a crucial job that is done
as part of EDA. Databricks simplifies this process of understanding data sets,
making it possible to generate a robust profile or summary of the data set with the
push of a button. These are available in notebooks in Databricks Machine Learning
and Databricks Data Science and Engineering Workspaces.
Prerequisites:
● Basic Python skills (e.g. can declare variables, distinguish between methods
and attributes).
● Ability to describe and utilize basic summary statistics (e.g mean, standard
deviation, median).
Learning objectives:
Duration: 10 minutes
Course description: In this course, you will learn about how Databricks SQL users can
leverage the nearly instant-on capability of Serverless to make working with Databricks SQL
faster than ever. In addition, you will learn how customer data is protected and isolated.
Finally, you will learn how customers can enable this feature.
Learning objectives
● Explain how Databricks SQL Serverless fits into the lakehouse architecture
● Describe how Databricks SQL Serverless works
● Explain how Databricks SQL Serverless keeps customer data secure and isolated
● Implement Databricks SQL Serverless on customer accounts
Duration: 20 minutes
Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data. It can share collections of tables in a Unity
Catalog metastore in real time without copying them, so that data recipients can
immediately begin working with the latest version of the shared data.
This course will give an overview of Delta Sharing with an emphasis on the benefits
of Delta Sharing over other methods of data sharing and how Delta Sharing fits into
the Lakehouse architecture. Then, there will be a demonstration on how to configure
Delta Sharing on a metastore and how to enable external data sharing.
Prerequisites:
Duration: 1 Hour
Course description: Databricks Model Serving is a turnkey solution that provides a highly
available and low-latency service for deploying machine learning models as scalable REST
API endpoints. This course will teach you how to deploy machine learning models using
Databricks Model Serving. You will learn how to integrate your lakehouse data with
Databricks Model Serving, how to deploy multiple models behind an endpoint, and monitor
Model serving endpoint using internal dashboard and external tools.
● Prerequisites
○ Write basic machine learning code
○ Able to train models within the Databricks platform
○ Ability or previous experience deploying models is helpful but not
required
Learning objectives
Until recently, Databricks widgets have been the only option for users wanting to use
widgets in Databricks notebooks. Now, with the integration of ipywidgets, Databricks users
have an alternative method for building interactive notebooks.
This course will introduce you to ipywidgets, discuss use cases for when to use them, and
demonstrate how to embed various ipywidgets controls in Databricks notebooks.
Prerequisites & Requirements
● Prerequisites
○ In this course, learners are expected to have;
○ Intermediate level python coding skills
○ Basic knowledge of pyspark for reading and querying data
○ Basic knowledge of CSS for styling the controls
Learning objectives
Duration: 30 Minutes
Course description: Databricks is no longer a tool that is only suitable for massive,
Spark-powered workloads. Personal compute is a new, Databricks-managed default
compute policy that will appear in all customers’ workspaces.
Prerequisites:
● Log in to Databricks
● Access to cluster creation
Learning objectives:
Prerequisites:
Learning objectives:
Duration: 20 minutes
This short, introductory course will introduce you to the new Workspace Browser
within Databricks SQL and demonstrate how to use this feature effectively. The first
section of the course will cover features and use cases of the new Workspace
Browser and in the second section of the course we will demonstrate how to create,
organize and share assets using this new feature.
Prerequisites:
Learning objectives:
Duration: 12 hours
Course description: In this course, students will explore five key problems that
represent the vast majority of performance problems in an Apache Spark
application: Skew, Spill, Shuffle, Storage, and Serialization. With each of these topics,
we explore coding examples based on 100 GB to 1+ TB datasets that demonstrate
how these problems are introduced, how to diagnose these problems with tools like
the Spark UI, and conclude by discussing mitigation strategies for each of these
problems.
Finally, we introduce a couple of other key topics such as issues with Data Locality,
IO-Caching and Spark-Caching, Pitfalls of Broadcast Joins, and new features like
Spark 3’s Adaptive Query Execution and Dynamic Partition Pruning. We then
conclude the course with discussions and exercises on designing and configuring
clusters for optimal performance given specific use cases, personas, the divergent
needs of various teams, and cross-team security concerns.
Prerequisites:
Learning objectives:
Duration: 12 hours
NOTE: This is an e-learning version of the Scalable Machine Learning with Apache
Spark instructor-led course. It is an on-demand recording available via the
Databricks Academy and covers the same content as the instructor-led course. For
more information about what’s in the course itself, please visit this link.
Duration: 45 Minutes
Course description: Delta Sharing is an open protocol for secure data sharing with
other organizations regardless of which computing platforms they use. Databricks
provides both open source and managed options for Delta Sharing.
Databricks-managed Delta Sharing allows data providers to share data and data
recipients to access the shared data. With Delta Sharing, users can share
collections of tables in a Unity Catalog metastore in real time without copying them,
so that data recipients can immediately begin working with the latest version of the
shared data.
This course will focus on the data sharing process with Delta Sharing. On Databricks
users manage data sharing processes through UI or using Databricks SQL queries. In
this course, we will start by demonstrating how to share data using UI and then we
will use SQL for the same process. It is highly recommended that you are familiar
with both methods.
Prerequisites:
● Describe the data sharing and data access process with Delta Sharing within
Databricks
● Share data and access shared data within Databricks using the UI
● Share data and access shared data within Databricks using SQL queries
● Share and access partial data within Databricks
Duration: 3 hours
Course description: While the data lakehouse combines the best aspects of the
data warehouse and the data lake, users familiar with one or both of these
environments may still encounter new concepts as they move to Databricks. By the
end of these lessons, students will feel comfortable defining databases, tables, and
views in the Lakehouse, ingesting arbitrary data from a variety of sources, and
writing simple applications to drive ETL pipelines.
NOTE: This course covers both PySpark and SQL. Also, the lessons in this course are
a subset of the Data Engineering with Databricks course.
Prerequisites:
Learning objectives:
Duration: 30 Minutes
Course description: Unity Catalog is a central hub for administering and securing
your data which enables granular access control and built-in auditing across the
Databricks platform. This course guides learners through recommended practices
and patterns for implementing data architectures centered on Unity Catalog.
Prerequisites:
Learning objectives:
Duration: 1 hour
Course description: This course was created for individuals who are new to the big
data landscape and want to become conversant with big data terminology. It will
cover foundational concepts related to the big data landscape including:
characteristics of big data; the relationship between big data, artificial intelligence,
and data science; how individuals on data science teams work with big data; and
how organizations can use big data to enable better business decisions.
Prerequisites:
Learning objectives: