0% found this document useful (0 votes)
34 views13 pages

Azure Databricks - An Introduction 2019 Roadshow

Uploaded by

Ankita Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Azure Databricks - An Introduction 2019 Roadshow

Uploaded by

Ankita Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Azure Databricks

An Introduction
Why Spark?
• Open-source data processing engine built around speed, ease of use, and
sophisticated analytics
• In memory engine that is up to 100 times faster than Hadoop
• Largest open-source data project with 1000+ contributors
• Highly extensible with support for Scala, Java and Python alongside Spark SQL,
GraphX, Streaming and Machine Learning Library (MLlib)

Why Databricks?
• Databricks is the premium version of Spark available in the market
• Spark founders created Databricks
• Spark is the dominant workload in Hadoop
• Databricks commits 75% of the code to Open Source Spark
Hadoop MapReduce
MapReduce in Hadoop

V V V
M M M
V V V
Drive M M M
Azure Disk Disk
Storag
r V V V
e M M M
V V V
M M M

Azure Storage > Driver > VM/Parallelization > write to Disk > VM/Parallelization > write to disk >
repeat…

Writing to disk takes time… every time you run this process in MapReduce
What is Azure Databricks?
Apache® Spark™ is FASTER and EASIER than MapReduce in Hadoop

V V V
M M M
V V V
Drive M M M
Azure Cache Cache
Storag
r V V V
e M M M
V V V
M M M

Faster – In Spark data stays in cache this give Spark the speed over MapReduce (writing to disk)

Easier – You can use the language you are most comfortable with in Spark (Python, Scala, R, SQL)
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized
for Azure

Best of Best of
Databricks Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workfl ows

Interactive workspace that enables collaboration between data scientists, data engineers, and
business analysts.

Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SL As)


Azure Databricks key audiences &
benefits

Data scientist Data engineer CDO, VP of analytics


Integrated workspace Improved ETL performance Fast, collaborative analytics platform
accelerating time to market
Easy data exploration • Zero management clusters, serverless
No dev-ops required
Collaborative experience Easy to schedule jobs
Enterprise grade security
Interactive dashboards Automated workflows
• Encryption
Faster insights Enhanced monitoring &
troubleshooting • End-to-end auditing
• Best Spark & serverless
• Automated alerts & easy access to • Role-based control
• Databricks managed Spark logs
• Compliance
Zero Management Spark

Cluster democratization (High-


concurrency)
Unified analytics platform
Azure Databricks
Azure Databricks
Collaborative Workspace

IoT / streaming Machine learning


data models
DATA DATA BUSINESS
ENGINEER SCIENTIST ANALYST

Deploy Production Jobs &


Cloud Workflows BI tools
storage

MULTI-STAGE JOB SCHEDULER NOTIFICATION &


PIPELINES LOGS
Data warehouses
Optimized Databricks Runtime Engine Data exports

Hadoop storage
DATABRICKS APACHE HIGH- Rest APIs
I/O SPARK CONCURRENCY Data warehouses

Enhance Productivity Build on secure & trusted cloud Scale without limits
DATA WA R E H O U S I N G PATT E R N I N A Z U R E
Loading and preparing data for analysis with a data warehouse
DATA LOADING

DATA Azure Azure Data API’s, CLI &


FACTORY Import/Export Box
Service GUI Tools SERVING
DATA PROCESSING STORAGE
APPLICATIONS
INGEST STORAGE

COSMOS DB
r
AZURE DATABRICKS
LOGS, FILES AND DATA LAKE AZURE
MEDIA STORE STORAGE
(UNSTRUCTURED)
AZURE SQL
DW

HDINSIGHT

AAS

BUSINESS / CUSTOM DASHBOARDS


APPS
(STRUCTURED)
SQL DB
COSMOS DB
OPERATIONAL DATA
A D V A N C E D A N A LY T I C S P AT T E R N I N A Z U R E
Performing data collection/understanding, modeling and deployment
MODEL TRAINING SERVING
STORAGE

AZURE ML AZURE SQL Server DATA


Service DATABRICKS (In-database SCIENCE VM
SENSORS AND IOT (Spark ML) ML) COSMOS DB
(UNSTRUCTURED)

APPLICATIONS

LONG TERM DATA PROCESSING


STORAGE SQL DB

r
LOGS, FILES AND
SQL DB SQL DW
MEDIA DATA LAKE AZURE COSMOS DB AZURE DATABRICKS HDINSIGHT
(UNSTRUCTURED) STORE STORAGE

TRAINED MODEL HOSTING


ORCHESTRATIO AZURE
N ANALYSIS
SERVICES
BUSINESS / CUSTOM DASHBOARDS
APPS AZURE KUBERNETES SQL Server
(STRUCTURED) DATA SERVICE (In-database
FACTORY ML)
B I G DATA S T R E A M I N G PATT E R N W I T H A Z U R E

MACHINE LEARNING

AZURE ML R SERVER
AZURE DATABRICKS
STUDIO (Spark ML)
SENSORS AND IOT
(UNSTRUCTURED)

REAL-TIME
STREAM INGESTION STREAM ANALYTICS APPLICATIONS

r
LOGS, FILES AND
MEDIA EVENT HUBS IoT HUB KAFKA on HDINSIGHT STREAM AZURE DATABRICKS
(UNSTRUCTURED) ANALYTICS (Spark Streaming)

LONG-TERM STORAGE

BUSINESS / CUSTOM REAL-TIME


APPS DASHBOARDS
(STRUCTURED)
K N OW I N G T H E VA R I O U S B I G DATA S O LU T I ON S

CONTROL EASE OF USE

Reduced Administration

Azure Databricks

Azure HDInsight

ANALYTICS
BIG DATA
Azure Marketplace
HDP | CDH | MapR

Any Hadoop technology, any Workload optimized, managed Frictionless & Optimized Spark
distribution clusters clusters

IaaS Clusters Managed Clusters


Azure Data Lake
Analytics
Azure Data Lake Store

STORAGE
BIG DATA
Azure Storage
Azure Databricks Next Step
Azure Databricks Home
Documentation, Pricing, Get Started Information
https://azure.microsoft.com/en-us/services/databricks/
Demo

You might also like