0% found this document useful (0 votes)
162 views96 pages

APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views96 pages

APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Databricks

Certification Exam
Overview Training
Databricks Certified
Data Analyst Associate

Dave Harris

©2022 Databricks Inc. — All rights reserved 1


HOUSEKEEPING
● This session will be recorded and the slides and notebooks will be shared by
the end of this week
● Please use the Q&A chat function to ask our team any questions during the
training - we will do our best to answer them in the chat or out loud. If we are
unable to answer your questions during the session your Databricks account
team will follow-up with you
● Find all important links under the Related Content widget including the links to
the course in Databricks Academy, the notebooks in GitHub, our survey and
more!
● Feel free to adjust the widgets on your audience dashboard
● Refresh your page if you have any issues
● Enjoy this training :) and don’t forget to complete our survey!

©2022 Databricks Inc. — All rights reserved


THE CERTIFICATION VOUCHER
Complete the Fundamentals of the Databricks Lakehouse Platform Accreditation by August
12, provide your proof of completion on our feedback survey and you will receive 1 voucher to
take on any Databricks certification exam before October 31, 2022. You can expect to receive
your voucher by August 19.
• This is a 20-minute assessment that will test your knowledge about fundamental concepts
about the Databricks Lakehouse Platform. This accreditation is the beginning step in all of the
Databricks Academy learning plans for data analysts, machine learning practitioners, data
engineers, and platform administrators. Business leaders are also welcome to take this
assessment.

©2022 Databricks Inc. — All rights reserved


Meet your instructor!

©2022 Databricks Inc. — All rights reserved 4


Introduction and Overview
Agenda 1 Goals
Topics addressed in
this session
Why Databricks certification?
2 Benefits of certification

Overview of concepts
3 Key concepts tested on exam

Certification Exam Logistics


4 Information about the exam

Q&A
5 …your last chance to ask anything :)
©2022 Databricks Inc. — All rights reserved 5
Introduction and Overview
Agenda 1 Goals
Topics addressed in
this session
Why Databricks certification?
2 Benefits of certification

Overview of concepts
3 Key concepts tested on exam

Certification Exam Logistics


4 Information about the exam

Q&A
5 …your last chance to ask anything :)
©2022 Databricks Inc. — All rights reserved 6
Introduction and Overview
Agenda 1 Goals
Topics addressed in
this session
Why Databricks certification?
2 Benefits of certification

Overview of concepts
3 Key concepts tested on exam

Certification Exam Logistics


4 Information about the exam

Q&A
5 …your last chance to ask anything :)
©2022 Databricks Inc. — All rights reserved 7
Introduction and Overview
Agenda 1 Goals
Topics addressed in
this session
Why Databricks certification?
2 Benefits of certification

Overview of concepts
3 Key concepts tested on exam

Certification Exam Logistics


4 Information about the exam

Q&A
5 …your last chance to ask anything :)
©2022 Databricks Inc. — All rights reserved 8
Introduction and Overview
Agenda 1 Goals
Topics addressed in
this session
Why Databricks certification?
2 Benefits of certification

Overview of concepts
3 Key concepts tested on exam

Certification Exam Logistics


4 Information about the exam

Q&A
5 …your last chance to ask anything :)
©2022 Databricks Inc. — All rights reserved 9
Introduction and
Overview
Training series goals

©2022 Databricks Inc. — All rights reserved 10


Before we get started… who is this for?
• Data analyst
• Beginner-level certification
Associate
• Assess candidates at a level
equivalent to six months of
experience with data analysis on
Databricks SQL

©2021 Databricks Inc. — All rights reserved


Associate Data Analyst Expectations
Therefore, the following is expected of an associate-level data
analyst:

• Describe Databricks SQL and its capabilities


• Manage data with Databricks tools and best practices
• Use Structured Query Language (SQL) to complete tasks in the
Lakehouse
• Create production-grade data visualizations and dashboards
• Develop analytics applications to solve common data analytics
problems

©2021 Databricks Inc. — All rights reserved


What this training will not do
• Prepare you 100% to take the exam
• Provide answers to exam questions.

©2022 Databricks Inc. — All rights reserved


Why Databricks
certification?
Benefits of certification

©2022 Databricks Inc. — All rights reserved 14


Stand out from the crowd
Achieve personal development goals and become more competitive

Validate your skills


with a respected
1 certification exam in
the big data and AI
space

©2022 Databricks Inc. — All rights reserved 15


Stand out from the crowd
Achieve personal development goals and become more competitive

Validate your skills Increase your


with a respected efficiency by learning
1 certification exam in 2 the things you need
the big data and AI to be successful on
space Databricks

©2022 Databricks Inc. — All rights reserved 16


Stand out from the crowd
Achieve personal development goals and become more competitive

Validate your skills Increase your


with a respected efficiency by learning
1 certification exam in 2 the things you need
the big data and AI to be successful on
space Databricks

Turn recruiter and


future employer
3 heads by proudly
displaying your
certification

©2022 Databricks Inc. — All rights reserved 17


Stand out from the crowd
Achieve personal development goals and become more competitive

Validate your skills Increase your


with a respected efficiency by learning
1 certification exam in 2 the things you need
the big data and AI to be successful on
space Databricks

Improve stakeholder
Turn recruiter and
and peers
future employer
perceptions when it
3 heads by proudly 4 comes to reputation,
displaying your
credibility, and
certification
confidence

©2022 Databricks Inc. — All rights reserved 18


Certification
Overview
Databricks Certified Data
Analyst Associate

Databricks Certification Program

©2022 Databricks Inc. — All rights reserved 19


Data Analyst

DATABRICKS LAKEHOUSE DATA ANALYST


FUNDAMENTALS ACCREDITATION ASSOCIATE CERTIFICATION

• What is the Databricks Lakehouse Platform? (SP) • Data Analysis with Databricks SQL (ILT/SP)
• What is Databricks SQL? (SP) • Certification Overview Course for the
• What is Databricks Data Science Data Engineering Databricks Certified Data Analyst Associate
Workspace? (SP) (SP coming 22-July-2022)
• What is Databricks Machine Learning? (SP) • Databricks Certified Data Analyst Associate
Exam (for $200 USD)
• Databricks Lakehouse Fundamentals
Accreditation (SP)

SP = self-paced training (free for Databricks customers, available in Data Analyst Learning plan via Databricks
Academy)
ILT = instructor-led training course (available for a fee)

©2022 Databricks Inc. — All rights reserved


Course Objectives
1 Describe the learning context, format, and structure behind the exam.

2 Describe the topics covered in the exam.

3 Recognize the different types of questions provided on the exam.

4 Identify resources to learn the material covered in the exam.

©2022 Databricks Inc. — All rights reserved


Certification Overview
Databricks Certified Data Analyst Associate Exam

Certification
Certification Certification Certification
Exam
Exam Overview Exam Topics Exam Questions
Preparation

Lesson 1 Lesson 2 Lesson 3 Lesson 4

©2022 Databricks Inc. — All rights reserved 22


Certification Exam
Overview

©2022 Databricks Inc. — All rights reserved 23


Describe the learning context,
format, and structure behind
the exam.

©2022 Databricks Inc. — All rights reserved 24


Audience,
Expectations, and
Scope

©2022 Databricks Inc. — All rights reserved 25


Target Audience
• Data Analyst
• Comprehensive, practitioner
certification
• Assess candidates at a level
equivalent to six months of
experience with Databricks SQL

6 months Associate

©2022 Databricks Inc. — All rights reserved


Data Analyst Associate Expectations
Therefore, the following is expected of an associate-level data
analyst:

• Describe Databricks SQL and its capabilities


• Manage data with Databricks tools and best practices
• Use Structured Query Language (SQL) to complete tasks in the
Lakehouse
• Create production-grade data visualizations and dashboards
• Develop analytics applications to solve common data analytics
problems

©2022 Databricks Inc. — All rights reserved


Out-of-scope
And the following is not expected of an associate-level data analyst:

• Spark SQL
• SQL in the Data Science and Engineering Workspace
• Databricks SQL Administration
• Use of any third-party BI tools

©2022 Databricks Inc. — All rights reserved


Up-to-date Exam Details
You can always find up-to-date exam details on the exam’s webpage:

https://databricks.com/learn/certification/data-analyst-associate

©2022 Databricks Inc. — All rights reserved


Key Concepts

©2022 Databricks Inc. — All rights reserved 30


Question Distribution

Databricks SQL – 22% (10/45)


Data Management – 20% (9/45)
SQL – 29% (13/45)
Data Visualization and Dashboards – 18% (8/45)
Analytics Applications – 11% (5/45)

©2022 Databricks Inc. — All rights reserved 31


Lakehouse Platform
SIMPLE OPEN COLLABORATIVE

BI & SQL Real-time Data Data Science


Data Engineering
Analytics Applications & Machine Learning

Data Management & Governance

Open Data Lake

Structured Semi-structured Unstructured Streaming

©2022 Databricks Inc. — All rights reserved 32


Business Intelligence
Visual ETL & Data Ingestion
Azure Data
Azure
Synapse
Open
Factory
Google
BigQuery

Amazon
Unify your data
ecosystem with open
Redshift

Machine Learning source, standards, and


Amazon
SageMaker
Azure Machine
Learning formats
Google
AI Platform

Lakehouse Platform

Data Providers

Centralized Governance
AWS Glue
Partners
Top Consulting & SI Partners
450+
Across the data landscape

©2022 Databricks Inc. — All rights reserved


Databricks SQL

©2022 Databricks Inc. — All rights reserved 34


Databricks SQL
Delivering analytics on the freshest
data with data warehouse
performance and data lake economics

■ Better price / performance than other cloud data


warehouses
■ Simplify discovery and sharing of new insights
■ Connect to familiar BI tools, like Tableau or Power
BI
■ Simplified administration and governance

©2022 Databricks Inc. — All rights reserved


Better price / performance
Run SQL queries on your
lakehouse and analyze your
freshest data with up to 6x
better price/performance than
traditional cloud data
warehouses.

Source: Performance Benchmark with Barcelona Supercomputing Center

©2022 Databricks Inc. — All rights reserved


Better together | Broad integration with BI
tools
Connect your preferred BI tools with
optimized connectors that provide
fast performance, low latency, and
high user conconcurrency to your
data lake for your existing BI tools.

Coming soon:

©2022 Databricks Inc. — All rights reserved


Why use Databricks
SQL?

©2022 Databricks Inc. — All rights reserved 38


A new home for data analysts
Enable data analysts to quickly
perform ad-hoc and exploratory
data analysis, with a new SQL
query editor, visualizations and
dashboards. Automatic alerts can
be triggered for critical changes,
allowing to respond to business
needs faster.

©2022 Databricks Inc. — All rights reserved


Simple administration and governance
Quickly setup SQL / BI
optimized compute with SQL
warehouses. Databricks
automatically determines
instance types and
configuration for the best
price/performance. Then, easily
manage usage, perform quick
auditing, and troubleshooting
with query history.

©2022 Databricks Inc. — All rights reserved


Use Cases

Connect existing BI tools to one Collaboratively explore the Build data-enhanced


source of truth for all your data latest and freshest data applications

Maximize existing investments by Respond to business needs faster Build rich and custom data
connecting your preferred BI tools to with a self-served experience enhanced applications for your
your data lake with Databricks SQL designed for every analysts in your own organization or your
Warehouses. Re-engineered and organization. Databricks SQL Analytics customers. Benefit from the ease
optimized connectors ensure fast provides a simple and secure access of connectivity, management, and
performance, low latency, and high user to data, ability to create or reuse SQL better price / performance of
concurrency to your data lake. Now queries to analyze the data that sits Databricks SQL Analytics to
analysts can use the best tool for the job directly on your data lake, and quickly simplify development of
on one single source of truth for your mock-up and iterate on visualizations data-enhanced applications at
data while minimizing more ETL and data and dashboards that fit best the scale, all served from your data
silos. business. lake.

©2022 Databricks Inc. — All rights reserved


01-2 – DEMO: NAVIGATING
DATABRICKS SQL

©2022 Databricks Inc. — All rights reserved


Unity Catalog
on
Databricks SQL

©2022 Databricks Inc. — All rights reserved 43


Lesson goals

1 Describe the object model in Unity Catalog.

2 Write queries using three-level namespace notation.

©2022 Databricks Inc. — All rights reserved


Unity Catalog
Object Model

©2022 Databricks Inc. — All rights reserved 45


Object Model

©2022 Databricks Inc. — All rights reserved 46


Metastore

• Stores data assets


• Permissions
• Created with default storage
location (external object store)
• Metastore Admin

©2022 Databricks Inc. — All rights reserved 47


Catalog

• First level of organization


• Users can see all catalogs where
USAGE is granted

©2022 Databricks Inc. — All rights reserved 48


Schema

• aka, Database
• Second level of organization
• Users can see all schemas where
USAGE is granted on both the
schema and the catalog

©2022 Databricks Inc. — All rights reserved 49


Managed Table

• Third level of organization


• Supported format: Delta
• Data is written to a new directory in
the metastore’s default location
• Created using CREATE TABLE
statement with no LOCATION
clause
• Example:
CREATE TABLE table1 …

©2022 Databricks Inc. — All rights reserved 50


External Table

• Third level of organization


• Data stored in a location outside
the managed storage location
• DROP TABLE does not delete data
• Can easily clone a table to a new
schema or table name without
moving data
• Supported formats:
• Delta, csv, json, avro, parquet, orc, text

©2022 Databricks Inc. — All rights reserved 51


Creating External Tables

• Two credential types:


• Storage Credential or External Location
• Use the LOCATION clause
• Example using External Location only
CREATE TABLE table2
LOCATION 's3://<bucket_path>/<table_directory>'
...

• Example using Storage Credential


CREATE TABLE table2
LOCATION 's3://<bucket_path>/<table_directory>'
...
WITH CREDENTIAL <credential-name>;

©2022 Databricks Inc. — All rights reserved 52


View

• Third level of organization


• Can be composed from tables and
views in multiple schemas or
catalogs
• Created using CREATE VIEW:

CREATE VIEW view1 AS


SELECT column1, column2
FROM table1 ...

©2022 Databricks Inc. — All rights reserved 53


Three-Level Namespace Notation

• Data objects must be specified with three elements, depending on


granularity required: Catalog, Schema, and Table
• Example:
CREATE TABLE main.default.department
(
deptcode INT,
deptname STRING,
location STRING
);
• Or, with a USE statement:
USE main.default;
SELECT * FROM department;

©2022 Databricks Inc. — All rights reserved 54


Suggestions for Best Practices

• Use Delta as the data format when creating tables


• If possible, use external tables

©2022 Databricks Inc. — All rights reserved 55


01-4 – DEMO: SCHEMAS,
TABLES, AND VIEWS ON
DATABRICKS SQL

©2022 Databricks Inc. — All rights reserved


Ingesting Data
for Databricks
SQL

©2022 Databricks Inc. — All rights reserved 57


Lesson goals

1 Describe how to connect Databricks SQL to an object store

2 Explain how Partner Connect can be used to ingest data

3 Provide proper data access privileges to users

©2022 Databricks Inc. — All rights reserved


Ingesting Existing Data

• Databricks SQL can ingest Parquet, JSON, CSV, Delta, and more
• Individual file
• Full directory of files of a single type
• Example (Azure Databricks):
CREATE TABLE table1 LOCATION
'wasbs://[account].blob.core.windows.net/[container]/[path/]'

©2022 Databricks Inc. — All rights reserved 59


Partner Connect

• Connect to Databricks partners


• Data ingestion, preparation, BI, and visualization tools
• Data Ingestion:
• Fivetran
• Rivery
• Click Partner Connect in the sidebar menu to get started
• More detail in Databricks Academy course:
• How to Ingest Data for Databricks SQL

©2022 Databricks Inc. — All rights reserved 60


GRANT and REVOKE

• Databricks SQL supports standard GRANT and REVOKE statements


in SQL
• Permission types include CREATE, MODIFY, SELECT, USAGE, and
more.
• Permissions can be granted to users, groups, or both
• Can also grant all permissions
• Example:
GRANT ALL PRIVILEGES ON TABLE table1 TO finance;
• Revoke privileges in the same way
©2022 Databricks Inc. — All rights reserved 61
Data Explorer

• A UI tool for working with database entities


• Grant and revoke permissions, view schema details, preview
sample data, and see table details and properties
• Click “Data” in the sidebar menu to access the Data Explorer

©2022 Databricks Inc. — All rights reserved 62


02-1 – DEMO:
INGESTING DATA

©2022 Databricks Inc. — All rights reserved


Joins

©2022 Databricks Inc. — All rights reserved 64


Join

• Combine rows from two relations based on a criteria


• Relations are tables, views, and more
• Many join types: INNER, LEFT, RIGHT, FULL, SEMI, ANTI, and
CROSS
• The criteria is a boolean expression that specifies how the
relations will be joined

©2022 Databricks Inc. — All rights reserved 65


Join

• Example:
SELECT id, name, deptname
FROM employee
INNER JOIN department ON employee.deptno =
department.deptno;

©2022 Databricks Inc. — All rights reserved 66


SELECT name, f_color FROM table1 INNER JOIN table2 ON table1.id = table2.pid;

INNER JOIN

Output:
67

©2022 Databricks Inc. — All rights reserved


SELECT name, f_color FROM table1 LEFT OUTER JOIN table2 ON table1.id = table2.pid;

LEFT JOIN

Output:
68

©2022 Databricks Inc. — All rights reserved


SELECT name, f_color FROM table1 RIGHT OUTER JOIN table2 ON table1.id = table2.pid;

RIGHT JOIN

Output:
69

©2022 Databricks Inc. — All rights reserved


SELECT name, f_color FROM table1 FULL OUTER JOIN table2 ON table1.id = table2.pid;

FULL JOIN

Output:
70

©2022 Databricks Inc. — All rights reserved


SELECT name FROM table1 LEFT SEMI JOIN table2 ON table1.id = table2.pid;

SEMI JOIN

Output:
71

©2022 Databricks Inc. — All rights reserved


SELECT name FROM table1 LEFT ANTI JOIN table2 ON table1.id = table2.pid;

ANTI JOIN

Output:
72

©2022 Databricks Inc. — All rights reserved


SELECT name, f_color FROM table1 CROSS JOIN table2 ON table1.id = table2.pid;

Output:

CROSS JOIN

©2022 Databricks Inc. — All rights reserved 73


02-5-1 – DEMO:
DELTA COMMANDS
IN DATABRICKS SQL

©2022 Databricks Inc. — All rights reserved


02-5-2 – DEMO:
OPTIONAL:
BASIC SQL

©2022 Databricks Inc. — All rights reserved


Data
Visualization

©2022 Databricks Inc. — All rights reserved 76


Table

• Default visualization
• Customizable columns
• Change heading
• Add description
• Change font color
• Conditional font color
• Based on each data
value

©2022 Databricks Inc. — All rights reserved 77


Details, Counter, Pivot

Details Counter Pivot

©2022 Databricks Inc. — All rights reserved 78


Charts

• Chart types: Line, Bar,


Area, Pie, Scatter, Bubble,
Heatmap, and Box
• Grouping
• Stacking
• Error Bars

©2022 Databricks Inc. — All rights reserved 79


Histogram

• Display a count
• Control of number of
buckets

©2022 Databricks Inc. — All rights reserved 80


Cohort, Funnel, Word Cloud

Cohort Funnel Word Cloud

©2022 Databricks Inc. — All rights reserved 81


Maps

Choropleth Map Marker Map

©2022 Databricks Inc. — All rights reserved 82


Sankey and Sunburst

Sankey Sunburst

©2022 Databricks Inc. — All rights reserved 83


Databricks Academy

• We aren’t going to cover all visualizations in this course.


• More detail can be found in the Databricks Academy course:
• Data Visualization with Databricks SQL
• If you are using Tableau or PowerBI, you can connect both to Databricks
SQL
• More detail in the Databricks Academy course:
• How to Integrate BI Tools with Databricks SQL

©2022 Databricks Inc. — All rights reserved 84


03-2 – DEMO: DATA
VISUALIZATIONS AND
DASHBOARDS

©2022 Databricks Inc. — All rights reserved


03-4 – DEMO:
NOTIFYING STAKEHOLDERS

©2022 Databricks Inc. — All rights reserved


Exam Logistics

©2022 Databricks Inc. — All rights reserved 87


Exam Platform
• Databricks Academy certifications
are offered through Kryterion’s
Webassessor platform.
• Webassessor is a simple, scalable
assessment solution resulting in an
easy test-taking experience.
• Test-takers can register for exams
by heading to
https://webassessor.com/databrick
s

©2022 Databricks Inc. — All rights reserved


Proctoring Details
• During the exam, you will be monitored via webcam by a Webassessor
proctor.
• The proctor will:
• Monitor you during the exam.
• Answer any exam delivery questions you might have.
• Provide technical support.
• The proctor will not provide assistance on the content of the exam.
• No test aids will be available during the exam.

©2022 Databricks Inc. — All rights reserved


Exam Grading
• Certification exams are automatically graded.
• You will receive your pass/fail grade immediately, and you will receive
topic-level percentage scores to assist you in focusing any study efforts
moving forward.
• Databricks reserves the right to adjust any exam scores and pass/fail
statuses based on the proctor’s session notes.

©2022 Databricks Inc. — All rights reserved


Certificate Awarding Process
• If it’s been determined that you’ve
passed the exam, your badge and
certificate will be awarded via
credentials.databricks.com.
• You will receive the badge and
certificate within 24 hours of
passing the exam.
• You will be notified at the email
associated with your Webassessor
account.

©2022 Databricks Inc. — All rights reserved


Exam Format and
Structure

©2022 Databricks Inc. — All rights reserved 92


Basic Exam Details
• Time allotted to complete exam = 1.5 hours (90 minutes)
• Passing scores = At least 70% on the overall exam
• Exam fee = $200
• Retake policy = As many times as you want, whenever you want (for the
same fee)
• Number of Questions = 45
• More info. on the Databricks Academy FAQ:
http://files.training.databricks.com/lms/docebo/databricks-academy-faq.pdf

©2022 Databricks Inc. — All rights reserved


Code Examples
• All code examples will be in SQL
• All SQL will adhere to ANSI standards

©2022 Databricks Inc. — All rights reserved


Q&A

©2022 Databricks Inc. — All rights reserved 95


Thank you

©2022 Databricks Inc. — All rights reserved 96

You might also like