0% found this document useful (0 votes)
168 views23 pages

Azure Databricks Overview

Uploaded by

Ankita Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views23 pages

Azure Databricks Overview

Uploaded by

Ankita Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Unified Analytics Platform with Databricks &

Apache Spark
VISION Accelerate innovation by unifying data science,
engineering and business

SOLUTION Unified Analytics Platform

WHO WE • Original creators of , Databricks Delta &


ARE • 2000+ global companies use our platform across big data
& machine learning lifecycle
AI has huge promise
Huge disruptive innovations are affecting most enterprises on the planet

Transportation Healthcare and Genomics Internet of Things Fraud Prevention Personalization

and many more...

Through a Keystone Research study, companies in the top quartile that harness cloud, data and AI
vastly outperformed companies in the bottom quartile by nearly doubling operating margins
and realizing $100M in additional operating income.
Hardest Part of AI isn’t AI, it’s Data
“Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015

Data Machine
Resource Monitoring
Verification
Management
Data Collection Serving
Configuration Infrastructure
ML
Code
Analysis Tools

Feature Process
Extraction Management Tools

Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the
small green box in the middle. The required surrounding infrastructure is vast and complex.
Data & AI Technologies are in Silos

x
Great for Data, but not AI Great for AI, but not for data
Apache Spark: The First Unified Analytics Engine
Uniquely combines Data & AI technologies

Runtime
Delta
Spark Core Engine
Big Data Processing Machine Learning
ETL + SQL +Streaming MLlib + SparkR
Enterprises face challenges beyond Apache Spark

Disconnect
Engineers Scientists

Complex data pipelines and infrastructure

Unified Analytics Engine


Data & AI People are in Silos

DATA
x DATA
ENGINEERS SCIENTISTS
AZURE DATABRICKS COLLABORATIVE WORKSPACE
DATA SOURCES Jobs
Notebooks
Models
Blob Storage Apis
Dashboards

Data Lake Store DATA ENGINEERS DATA SCIENTISTS

SQL Data Warehouse DATABRICKS RUNTIME BI Reporting


Dashboards
Cosmos DB for Big Data for Machine Learning
Batch & Streaming
Data Lakes & Data Warehouses
Event Hub

IoT Hub DATABRICKS CLOUD SERVICE


Security Integration
Azure Data Factory
Azure Portal
One-Click setup
Unified Billing
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized
for Azure

Best of
Databricks Best of Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workflows

Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)


Differentiated experience on Azure
ENHANCE BUILD ON THE MOST COMPLIANT SCALE WITHOUT
PRODUCTIVITY CLOUD LIMITS
Get started quickly by Simplify security and identity Operate at massive scale
launching your new Spark control with built-in integration with without limits globally.
environment with one click. Active Directory.

Share your insights in Accelerate data


Regulate access with fine-grained
powerful ways through rich processing with the fastest
user permissions to Azure
integration with Power BI. Spark engine.
Databricks’ notebooks, clusters, jobs
and data.
Improve collaboration
amongst your analytics team
through a unified workspace. Build with confidence on the
trusted cloud backed by
Innovate faster with native unmatched support, compliance and
integration with rest of Azure SLAs.
platform
Broad Customer Adoption
• Now generally available (as of March 2018)
• Over 500 customers took part in the preview of Azure Databricks
• Widely adopted in many industries (e.g. Retail, Media & Entertainment, Healthcare)

12
Databricks Accelerating Innovation
BUSINESS DRIVER DESCRIPTION CLIENT
Genomic processing takes Time required to process full exomes increases non-
too long and costs too linearly as the number of Exomes increases. Able to
much leverage the elasticity of the cloud and DBR

Customer receives broad Required a solution that was adept at ETL, Data
set of data requiring ETL warehousing, and advanced analytics including NLP,
and advanced analytics machine learning, that could interface with existing infr.

Massive ETL process with Necessity to ingest and transform and load a wide variety
constantly changing input of ever changing input streams. Traditional ETL tools
formats couldn’t scale in performance and keep up with changes

Generalized data analytic Predictive maintenance and age of aircraft use


solution for all mission case based off of sensor and telemetry data
centers collected during operations.
1
3
Customer Case Study
INDUSTRY: MANUFACTURING

GOAL Analyze IoT data to predict switch failures and keep customers online

CHALLENGE Inefficient detection of equipment failures resulted in a 60%


detection rate of failures, leaving customers with more downtime

DATA 2 million switch records took 6 hours to process. Increased to 10 billion


records with Databricks

DATABRICKS IMPACT 10 billion records processed in 14 minutes and a 94% detection


rate meant 25,000 homes were kept online resulting in a better
customer experience
14
Information Security Risk - Example
Business Challenges Positive Business Outcomes
• Threat Response & Data Eng. Teams working on separate
• Unification of Big Data Analytical Pipeline
Infrastructure • Data retention of 2 years
• Threat Response Team has access to 2 weeks of historical
• Ingest a more comprehensive set of Data
data, which is insufficient to triage and investigate potential • Move from Quantitative analytics with SQL to
breaches
Predictive analytics
• Unable to ingest and ETL a large number of data sources
• Only able to write SQL Queries – unable to develop more
advanced

Critical Capabilities Business Results


• Ease of Use for cluster management: creation, auto-
• Customer average 20% decrease of EC2 Cost
scaling, tuning & shutdown • Customer able to run investigations on 2 years of historical data
• Ability Threat response engineers to build predictive which significantly reduces the Risks of a breach
models & leverage a distributed computation • Customer is able to automate investigations which reduces time
framework without Eng. assistance to decision
• Threat Response autonomously run full data sets • Estimated Impact to Customer Business: $10M+ in savings
easily at scale ○ Cost Savings & Avoidance
• Access to expertise on Spark & advance ML
○ Risk Mitigation
concepts
○ Impact Revenue and Productivity
The anatomy of the win
Microsoft and Databricks unlock cloud analytics at Starbucks; sidelines Oracle
Exadata
1 2 3 4 5
Engage the Build Identify priorities Demonstrate Land and
customer the team and challenges proof expand

Internal sponsor changes the Walking in technical lock-step. Unlocking the cloud. Starbucks POC proves velocity and Azure Databricks is
game. Starbucks had been trying Led by Microsoft CSAs Jason wanted to deprecate Oracle security. The POC was geared Starbucks’ Unified Analytics
to move its analytics platform to Robey and Ed Hagan and Exadata. But after two years, they toward proving the solution can Platform. After 11 months of
the cloud for two years to support Databricks Solution Architect Bilal had only enabled 15 (out of deliver the speed to market engagement, Starbucks
complex modeling and analysis Obeidat, the technical teams for 300+) data scientists and Starbucks wanted while also committed to Azure Databricks as
across its lines of business (LOBs), both companies worked like a analysts on an HDI-based cloud meeting their stringent security their advanced analytics platform.
which would allow them to retire single unit to develop a new solution, so teams kept falling compliance requirements. Azure Marketing analytics will be the
their on-premises Oracle Exadata reference architecture, implement back to old system. Microsoft and Databricks integration with Azure first use case deployed, with 9
system. The problem was that the the POC, and triage feature Databricks started from scratch Active Directory was a big help on additional use cases planned,
HDI-based solution they were requests. with a new reference architecture the security front. And after such as supply chain, loyalty, and
trying to implement just didn’t that would support all required seeing Azure Databricks in action, fraud detection. Starbucks has
Jointly navigating the the marketing team estimated it committed to $5M in Databricks
work despite a spiderweb of use cases and provide cloud
business. Romeo Bolibol, Sr. AE, will drive $100M annually in top- licenses, driving $16M in Azure
technologies they had efficiencies.
Tony Clark, Databricks AE, and line revenue growth and consumption over 2.5 years.
implemented to prop it up. Then a
Pouneh Partowkia, Databricks One advanced analytics efficiencies. There is opportunity for
new Director of Analytics came on
Alliance Lead, used their solution for all businesses and exponential growth as new use
board, who had just finished Near-term ROI. Cost recovery
respective connections to build roles. Starbucks wanted a single cases are developed.
implementing Databricks at Nike. from Exadata would be slow, so
support across cloud, BI, and LOB data lake that every line of
He immediately reached out to Starbucks needed to show near-
decision makers, with Nate Shea- business could leverage. Azure What’s next? One immediate
Databricks to see if it would work term ROI. The team got very
han, GBB, serving as the catalyst Databricks deployed with Azure opportunity the team is pursuing
on Azure. Azure Databricks was in creative, using $800K in ECIF
between Microsoft and Data Lake Store provides the is how Azure Databricks could be
public preview at the time, so ($300K in HDI consumption credit
Databricks. central advanced analytics and rolled out to China – Starbucks’
Databricks quickly pulled in the during
Microsoft team. Together they Power sponsor a key factor.
data lake platform. Starbucks Key migration and $500K in Key
biggest growth market.
data engineering, data scientist, services).
Resources Databricks also
Key
mapped out plans for a POC. Because the Director of Analytics Resources
Key and data analyst teams can all
Key contributed
POC $1.4 million in
Resources knew what the solution could do
Databricks Resources work in the same place,
Resources services.
CSAs
first hand, the team didn’t need Databricks ECIF
Databricks decreasing time to market.
to spend time on building MICROSOFT CONFIDENTIAL / FOR INTERNAL USE
ONLY
Microsoft Overview
Results
Results
Q5 Pipeline Generation Targets
1 LMCO = Analytics (Baylor)/ Security (Gordon)
2 NGC = ESS Analytics (Vitek)/ Security (Raber/ Papay)
3 RTN = GBS Analytics (Lee)/ Security (Brown/ Costa)
4 MITRE = Analytics (Sorensen)/ Security (Finn)
5 ULA = Analytics / Security (IBM)
6 General Dynamics = Analytics (?) / Security (Baker/ Olmstead)
7 HII/ NNS = Analytics (Bharat) / Security (Forest (ret)
8 SAIC = Analytics (? not Chitra, Onstatt) / Security (Lynch/ )

2
0
Action Plan - Here is my ask:

1 hour strategy session on each account


Who, what, where?

Specific Uses Cases for DB and the SSP/ PS team to target

Targeted plan to POC in each account/ multiple BUs or Divisions

Let’s get this party started!

Complete the above by 6/21


Report out to Yagy and Davis on the plan6/25 (e-mail)
Azure Databricks
For more information:
databricks.com/azure

Get Started with


Azure Databricks:
http://bit.ly/AzureDatabricks

22
End

You might also like