Azure Databricks Overview
Azure Databricks Overview
Apache Spark
VISION Accelerate innovation by unifying data science,
engineering and business
Through a Keystone Research study, companies in the top quartile that harness cloud, data and AI
vastly outperformed companies in the bottom quartile by nearly doubling operating margins
and realizing $100M in additional operating income.
Hardest Part of AI isn’t AI, it’s Data
“Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015
Data Machine
Resource Monitoring
Verification
Management
Data Collection Serving
Configuration Infrastructure
ML
Code
Analysis Tools
Feature Process
Extraction Management Tools
Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the
small green box in the middle. The required surrounding infrastructure is vast and complex.
Data & AI Technologies are in Silos
x
Great for Data, but not AI Great for AI, but not for data
Apache Spark: The First Unified Analytics Engine
Uniquely combines Data & AI technologies
Runtime
Delta
Spark Core Engine
Big Data Processing Machine Learning
ETL + SQL +Streaming MLlib + SparkR
Enterprises face challenges beyond Apache Spark
Disconnect
Engineers Scientists
DATA
x DATA
ENGINEERS SCIENTISTS
AZURE DATABRICKS COLLABORATIVE WORKSPACE
DATA SOURCES Jobs
Notebooks
Models
Blob Storage Apis
Dashboards
Best of
Databricks Best of Microsoft
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
12
Databricks Accelerating Innovation
BUSINESS DRIVER DESCRIPTION CLIENT
Genomic processing takes Time required to process full exomes increases non-
too long and costs too linearly as the number of Exomes increases. Able to
much leverage the elasticity of the cloud and DBR
Customer receives broad Required a solution that was adept at ETL, Data
set of data requiring ETL warehousing, and advanced analytics including NLP,
and advanced analytics machine learning, that could interface with existing infr.
Massive ETL process with Necessity to ingest and transform and load a wide variety
constantly changing input of ever changing input streams. Traditional ETL tools
formats couldn’t scale in performance and keep up with changes
GOAL Analyze IoT data to predict switch failures and keep customers online
Internal sponsor changes the Walking in technical lock-step. Unlocking the cloud. Starbucks POC proves velocity and Azure Databricks is
game. Starbucks had been trying Led by Microsoft CSAs Jason wanted to deprecate Oracle security. The POC was geared Starbucks’ Unified Analytics
to move its analytics platform to Robey and Ed Hagan and Exadata. But after two years, they toward proving the solution can Platform. After 11 months of
the cloud for two years to support Databricks Solution Architect Bilal had only enabled 15 (out of deliver the speed to market engagement, Starbucks
complex modeling and analysis Obeidat, the technical teams for 300+) data scientists and Starbucks wanted while also committed to Azure Databricks as
across its lines of business (LOBs), both companies worked like a analysts on an HDI-based cloud meeting their stringent security their advanced analytics platform.
which would allow them to retire single unit to develop a new solution, so teams kept falling compliance requirements. Azure Marketing analytics will be the
their on-premises Oracle Exadata reference architecture, implement back to old system. Microsoft and Databricks integration with Azure first use case deployed, with 9
system. The problem was that the the POC, and triage feature Databricks started from scratch Active Directory was a big help on additional use cases planned,
HDI-based solution they were requests. with a new reference architecture the security front. And after such as supply chain, loyalty, and
trying to implement just didn’t that would support all required seeing Azure Databricks in action, fraud detection. Starbucks has
Jointly navigating the the marketing team estimated it committed to $5M in Databricks
work despite a spiderweb of use cases and provide cloud
business. Romeo Bolibol, Sr. AE, will drive $100M annually in top- licenses, driving $16M in Azure
technologies they had efficiencies.
Tony Clark, Databricks AE, and line revenue growth and consumption over 2.5 years.
implemented to prop it up. Then a
Pouneh Partowkia, Databricks One advanced analytics efficiencies. There is opportunity for
new Director of Analytics came on
Alliance Lead, used their solution for all businesses and exponential growth as new use
board, who had just finished Near-term ROI. Cost recovery
respective connections to build roles. Starbucks wanted a single cases are developed.
implementing Databricks at Nike. from Exadata would be slow, so
support across cloud, BI, and LOB data lake that every line of
He immediately reached out to Starbucks needed to show near-
decision makers, with Nate Shea- business could leverage. Azure What’s next? One immediate
Databricks to see if it would work term ROI. The team got very
han, GBB, serving as the catalyst Databricks deployed with Azure opportunity the team is pursuing
on Azure. Azure Databricks was in creative, using $800K in ECIF
between Microsoft and Data Lake Store provides the is how Azure Databricks could be
public preview at the time, so ($300K in HDI consumption credit
Databricks. central advanced analytics and rolled out to China – Starbucks’
Databricks quickly pulled in the during
Microsoft team. Together they Power sponsor a key factor.
data lake platform. Starbucks Key migration and $500K in Key
biggest growth market.
data engineering, data scientist, services).
Resources Databricks also
Key
mapped out plans for a POC. Because the Director of Analytics Resources
Key and data analyst teams can all
Key contributed
POC $1.4 million in
Resources knew what the solution could do
Databricks Resources work in the same place,
Resources services.
CSAs
first hand, the team didn’t need Databricks ECIF
Databricks decreasing time to market.
to spend time on building MICROSOFT CONFIDENTIAL / FOR INTERNAL USE
ONLY
Microsoft Overview
Results
Results
Q5 Pipeline Generation Targets
1 LMCO = Analytics (Baylor)/ Security (Gordon)
2 NGC = ESS Analytics (Vitek)/ Security (Raber/ Papay)
3 RTN = GBS Analytics (Lee)/ Security (Brown/ Costa)
4 MITRE = Analytics (Sorensen)/ Security (Finn)
5 ULA = Analytics / Security (IBM)
6 General Dynamics = Analytics (?) / Security (Baker/ Olmstead)
7 HII/ NNS = Analytics (Bharat) / Security (Forest (ret)
8 SAIC = Analytics (? not Chitra, Onstatt) / Security (Lynch/ )
2
0
Action Plan - Here is my ask:
22
End