SlideShare a Scribd company logo
2
Most read
8
Most read
12
Most read
Learn to Use Databricks
for Data Science
Sean Owen, Principal Solutions Architect
Austin Ford, Sr. Product Manager
Data Science is a tough job
▪ Today, companies are becoming
more and more data-driven, and
the ones getting the most out of
their data will be the ones to
succeed
▪ As a result, Data Science is now a
core capability of many
businesses
▪ Unfortunately, it comes with a
challenging, complex workflow at
scale
What does a data science workflow look like?
I need the correctly sized compute
resource for my task
I need to be able to find and access
the right data sources to fuel my
analysis
I need to be sure my toolbox is ready
with the packages and libraries
required for my work
1. Setup
I’ve been given a business
question to answer with data.
Before I can even get started on
the data science, I need to set
up my development
environment.
What does a data science workflow look like?
I uncover insights through
statistical inference, modeling, or
other methods
I start with exploratory data
analysis to familiarize myself with
the data and form hypotheses
I synthesize the results of my work
and the answers to the original
business question
2. Data Science
Once the initial overhead of
setup is complete, the real work
begins.
At any point, I could be sent
back to the Setup phase to add
another data source, change
the size of my compute
resource, or pull in another
library.
What does a data science workflow look like?
I share the results with my business
stakeholders via email or Slack
I formulate the results into a report
or dashboard so they can be
consumed
I get feedback about my work from
my stakeholders and iterate with
them to have the biggest impact
3. Sharing Results
The most important step comes
once I finish the analysis:
sharing the results with my
stakeholders.
Our answer: The Databricks Lakehouse Platform
We want to remove the overhead so
you can focus on the most important
part of your work — data science
Structured Semi-structured Unstructured Streaming
BI &
SQL Analytics
Machine Learning
Real-time Data
Applications
Data Management & Governance
Open Data Storage
Data Science &
Engineering
Lakehouse Platform
Simple | Open | Collaborative
Reliable | Scalable | Secure
Structured Semi-structured Unstructured Streaming
BI &
SQL Analytics
Machine Learning
Real-time Data
Applications
Data Management & Governance
Open Data Storage
Data Science &
Engineering
Lakehouse Platform
Simple | Open | Collaborative
Reliable | Scalable | Secure
Our focus today
Databricks makes setup easy
1. Setup
The Lakehouse brings all
your company’s data
together into a single place
so you don’t have to go
digging through a variety of
data sources
Easily choose the right
compute resource for your
task and switch as needed
single-machine VMs
GPUs Spark clusters
Databricks’ runtimes come
prepackaged with the most
common data science tools,
and customization is easy
Add Python libraries on top of a
runtime with a single line of code
Databricks has the tools to enable you to focus on
your work
2. Data Science
Multi-language,
collaborative notebooks
with co-presence,
commenting, and co-editing
Built-in visualizations that
take you from raw data to
insights in two clicks
Auto-logged revision
history and a git integration
to ensure reproducibility and
enable version control
Databricks lets your share results and iterate
quickly
3. Sharing Results
Easily share your notebooks
with stakeholders, who can
view them as reports
Create a dashboard directly
from your notebook’s results
Iterate with your
stakeholders directly in the
notebook through comments
and co-presence
Getting practical: hands-on with an expert
Sean Owen
Principal Solutions Architect

More Related Content

PDF
Technical Deck Delta Live Tables.pdf
Ilham31574
 
PDF
Moving to Databricks & Delta
Databricks
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PPTX
Data platform modernization with Databricks.pptx
CalvinSim10
 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PDF
Getting Started with Databricks SQL Analytics
Databricks
 
PDF
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
HostedbyConfluent
 
PDF
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
Technical Deck Delta Live Tables.pdf
Ilham31574
 
Moving to Databricks & Delta
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Data platform modernization with Databricks.pptx
CalvinSim10
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Getting Started with Databricks SQL Analytics
Databricks
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
HostedbyConfluent
 
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 

What's hot (20)

PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
Intro to Delta Lake
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Databricks Fundamentals
Dalibor Wijas
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PPTX
Databricks Platform.pptx
Alex Ivy
 
PDF
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
PPTX
Introduction to Azure Databricks
James Serra
 
PPTX
Microsoft Fabric Introduction
James Serra
 
PDF
Data Mesh 101
ChrisFord803185
 
PDF
Introducing Databricks Delta
Databricks
 
PDF
Data Mesh for Dinner
Kent Graziano
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PDF
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
PDF
3D: DBT using Databricks and Delta
Databricks
 
PDF
adb.pdf
AdityaMehta724216
 
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
PDF
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
Data Lakehouse Symposium | Day 4
Databricks
 
Intro to Delta Lake
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Databricks Fundamentals
Dalibor Wijas
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Databricks Platform.pptx
Alex Ivy
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
Introduction to Azure Databricks
James Serra
 
Microsoft Fabric Introduction
James Serra
 
Data Mesh 101
ChrisFord803185
 
Introducing Databricks Delta
Databricks
 
Data Mesh for Dinner
Kent Graziano
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
3D: DBT using Databricks and Delta
Databricks
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
Ad

Similar to Learn to Use Databricks for Data Science (20)

PPTX
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
PPTX
Data Engineering Overview for freshers.pptx
xeranaw566
 
PPTX
Data Engineering Overview for new learners.pptx
xeranaw566
 
PPTX
Unlock Data-driven Insights in Databricks Using Location Intelligence
Precisely
 
PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
PDF
Data Science at Scale - The DevOps Approach
Mihai Criveti
 
PPTX
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
PDF
Master Databricks with AccentFuture – Online Training
Accentfuture
 
PPTX
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 
PPTX
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
PDF
Join data mining with brief introduction to data science
panyamramya1806
 
PDF
Practical Data Science_ Tools and Technique.pdf
khushnuma khan
 
PDF
Test-Driven Development_ A Paradigm Shift in Software Engineering (1).pdf
khushnuma khan
 
PPTX
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
 
PDF
Data Science Overview and a brief introduction to data science.pdf
panyamramya1806
 
DOCX
Databricks Online Training | Databricks Online Course
Accentfuture
 
PDF
Data Science Demystified_ Journeying Through Insights and Innovations
Vaishali Pal
 
PDF
Ultimate Data Science Cheat Sheet For Success
Julie Bowie
 
PPTX
Introduction to Data Science for iSchool KKU
Chaiyaphum Rajabhat University
 
PDF
DataScience_RoadMap_2023.pdf
MuhammadRizwanAmanat
 
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
Data Engineering Overview for freshers.pptx
xeranaw566
 
Data Engineering Overview for new learners.pptx
xeranaw566
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Precisely
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Data Science at Scale - The DevOps Approach
Mihai Criveti
 
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Master Databricks with AccentFuture – Online Training
Accentfuture
 
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
Join data mining with brief introduction to data science
panyamramya1806
 
Practical Data Science_ Tools and Technique.pdf
khushnuma khan
 
Test-Driven Development_ A Paradigm Shift in Software Engineering (1).pdf
khushnuma khan
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
 
Data Science Overview and a brief introduction to data science.pdf
panyamramya1806
 
Databricks Online Training | Databricks Online Course
Accentfuture
 
Data Science Demystified_ Journeying Through Insights and Innovations
Vaishali Pal
 
Ultimate Data Science Cheat Sheet For Success
Julie Bowie
 
Introduction to Data Science for iSchool KKU
Chaiyaphum Rajabhat University
 
DataScience_RoadMap_2023.pdf
MuhammadRizwanAmanat
 
Ad

More from Databricks (20)

PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
PDF
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
PDF
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
PDF
Improving Apache Spark for Dynamic Allocation and Spot Instances
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Databricks
 

Recently uploaded (20)

PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
PPTX
International-health-agency and it's work.pptx
shreehareeshgs
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Short term internship project report on power Bi
JMJCollegeComputerde
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
A Systems Thinking Approach to Algorithmic Fairness.pdf
Epistamai
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
International-health-agency and it's work.pptx
shreehareeshgs
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Chad Readey - An Independent Thinker
Chad Readey
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Short term internship project report on power Bi
JMJCollegeComputerde
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
A Systems Thinking Approach to Algorithmic Fairness.pdf
Epistamai
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
short term internship project on Data visualization
JMJCollegeComputerde
 

Learn to Use Databricks for Data Science

  • 1. Learn to Use Databricks for Data Science Sean Owen, Principal Solutions Architect Austin Ford, Sr. Product Manager
  • 2. Data Science is a tough job ▪ Today, companies are becoming more and more data-driven, and the ones getting the most out of their data will be the ones to succeed ▪ As a result, Data Science is now a core capability of many businesses ▪ Unfortunately, it comes with a challenging, complex workflow at scale
  • 3. What does a data science workflow look like? I need the correctly sized compute resource for my task I need to be able to find and access the right data sources to fuel my analysis I need to be sure my toolbox is ready with the packages and libraries required for my work 1. Setup I’ve been given a business question to answer with data. Before I can even get started on the data science, I need to set up my development environment.
  • 4. What does a data science workflow look like? I uncover insights through statistical inference, modeling, or other methods I start with exploratory data analysis to familiarize myself with the data and form hypotheses I synthesize the results of my work and the answers to the original business question 2. Data Science Once the initial overhead of setup is complete, the real work begins. At any point, I could be sent back to the Setup phase to add another data source, change the size of my compute resource, or pull in another library.
  • 5. What does a data science workflow look like? I share the results with my business stakeholders via email or Slack I formulate the results into a report or dashboard so they can be consumed I get feedback about my work from my stakeholders and iterate with them to have the biggest impact 3. Sharing Results The most important step comes once I finish the analysis: sharing the results with my stakeholders.
  • 6. Our answer: The Databricks Lakehouse Platform We want to remove the overhead so you can focus on the most important part of your work — data science
  • 7. Structured Semi-structured Unstructured Streaming BI & SQL Analytics Machine Learning Real-time Data Applications Data Management & Governance Open Data Storage Data Science & Engineering Lakehouse Platform Simple | Open | Collaborative Reliable | Scalable | Secure
  • 8. Structured Semi-structured Unstructured Streaming BI & SQL Analytics Machine Learning Real-time Data Applications Data Management & Governance Open Data Storage Data Science & Engineering Lakehouse Platform Simple | Open | Collaborative Reliable | Scalable | Secure Our focus today
  • 9. Databricks makes setup easy 1. Setup The Lakehouse brings all your company’s data together into a single place so you don’t have to go digging through a variety of data sources Easily choose the right compute resource for your task and switch as needed single-machine VMs GPUs Spark clusters Databricks’ runtimes come prepackaged with the most common data science tools, and customization is easy Add Python libraries on top of a runtime with a single line of code
  • 10. Databricks has the tools to enable you to focus on your work 2. Data Science Multi-language, collaborative notebooks with co-presence, commenting, and co-editing Built-in visualizations that take you from raw data to insights in two clicks Auto-logged revision history and a git integration to ensure reproducibility and enable version control
  • 11. Databricks lets your share results and iterate quickly 3. Sharing Results Easily share your notebooks with stakeholders, who can view them as reports Create a dashboard directly from your notebook’s results Iterate with your stakeholders directly in the notebook through comments and co-presence
  • 12. Getting practical: hands-on with an expert Sean Owen Principal Solutions Architect