Azure Data Platform Overview
Azure Data Platform Overview
Dedicated
Higher cost
Higher administration Lower administration
On-Premises
The “evolution” of data platforms
On-Premises IaaS
The “evolution” of data platforms
On-Premises IaaS PaaS
The “evolution” of data platforms
On-Premises IaaS PaaS Pay per query
The “evolution” of data platforms
Microsoft Big Data Portfolio
Scale Up
Sequential Scale Out + Across
SQL Server 2016 Fast Track Analytics Platform System Relational Non-relational
VM hosted on Microsoft Azure Infrastructure (“IaaS”)
• From Microsoft images (gallery) or your own images (custom)
SQL 2008R2 / 2012 / 2014 / 2016 / 2017 Web / Standard / Enterprise
Images refreshed with latest version, SP, CU
• Windows Server 2008 R2 / 2012 R2 / 2016, Linux RHEL / Ubuntu
• Fast provisioning (~10 minutes).
• Accessible via RDP and Powershell
• Full compatibility with SQL Server “Box” software
Elasticity
• 1 core / 2 GB mem / 1 TB 128 cores / 3.5 TB mem / 256 TB
Azure SQL Database
A relational database-as-a-service (“PaaS”), fully managed by Microsoft.
For cloud-designed apps when near-zero administration and enterprise-grade capabilities are key.
Perfect for organizations looking to dramatically increase the DB:IT ratio.
Azure SQL Database Managed Instance
Managed Instance
Best for modernization at
Instance scoped programming model with
scale with low cost and effort
high compatibility to on-premises databases
Single
Standalone managed database best for
predictable and stable workloads
Elastic pool
Shared resource model best for greater
efficiency through multi-tenancy
Security
• TDE • Row level security
• SQL Audit • Always Encrypted
100
TB
More choices and full integration into Azure’s ecosystem and ser vices
Managed community Languages and Scale in seconds with Secure and compliant Industr y-leading
MySQL, PostgreSQL, frameworks of your choice built-in high availability global reach
and MariaDB
My
SMP - Symmetric
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
Multiprocessing •
•
SQL Server implementations traditionally have been SMP
Mostly, the solution is housed on a shared storage
MPP - Massively • Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
Parallel Processing • Segments communicate using high-speed network between nodes
SMP vs MPP
Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Support your smallest to your largest data storage needs while handling queries up to 100x faster.
A Z U R E R E L AT I O N A L DATA B A S E P L AT F O R M
SQL Data
SQL Database PostgreSQL MySQL MariaDB
Warehouse
Power BI, App Services, Data Factory,
Analytics, ML, Cognitive, Bot…
Azure Compute
Azure Storage
Hadoop: Made up of Hadoop Distributed File System (HDFS), YARN and MapReduce (Ideal for data lake)
OLTP vs OLAP/DW
SMP vs MPP
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
Table API
MongoDB API
Cassandra API
Column-family
Document
Key-value Graph
CRM
BI + Reporting
INGEST STORE PREP MODEL & SERVE
(& store)
Graph
Advanced Analytics
Image
Social Data orchestration Big data store Transform & Clean Data warehouse
and monitoring
AI
IoT Azure Databricks
Azure Data Factory Azure Data Lake Azure SQL Data Warehouse
Storage Gen2
SSIS Azure HDInsight Azure Analysis Services
Blob Storage
PolyBase & Stored SQL Database (Single, MI,
Azure Data Lake Procedures HyperScale)
Storage Gen1
Power BI Dataflow SQL Server in a VM
SQL Server 2019 Big
Data Cluster Azure Data Lake Analytics Cosmos DB
Power BI Aggregations
Blob Storage Data Lake Store
Multiple replicas across Replicas across 3 Zones Multiple replicas across each GRS + Read access to secondary
a datacenter of 2 regions
Protect against disk, node, rack and Separate secondary endpoint
Protect against disk, zone failures Protects against major
RPO delay to secondary can be
node, rack failures regional disasters
Synchronous writes to all 3 zones queried
Write is ack’d when all Asynchronous to secondary
12 9s of durability SLA: 99.99% (read), 99.9% (write)
replicas are committed 16 9s of durability
Available in 8 regions
Superior to dual-parity SLA: 99.9%
RAID SLA: 99.9%
11 9s of durability
SLA: 99.9%
Edge
Analytics Engines High Performance
(Hadoop, Spark, SCOPE …)
AI / ML
Compute
Data
Box
Data
Caching Layer (Avere tech) Box
Edge
REST HDFS NFS SMB … Avere
FXT
Extra Hot Tier - Premium (SSD + NVME)
Azure
Automatic Hot Tier (HDD) Stack
Lifecycle
Management Cool Tier (HDD)
Azure File
Cooler Tier Sync
Offline
• Data Box
• Data Box Heavy
• Data Box Disk
• Disk Import / Export
Data Box Data Box DiskPREVIEW Data Box Heavy PREVIEW Data Box Gateway PREVIEW Data Box Edge PREVIEW
• Capacity: 100 TB • Capacity: 8TB ea.; 40TB/order • Capacity: 1 PB • Virtual device provisioned in • Local Cache Capacity: ~12 TB
• Weight: ~50 lbs • Secure, ruggedized USB • Weight 500+ lbs your hypervisor • Includes Data Box Gateway
• Secure, ruggedized drives orderable in packs of • Secure, ruggedized • Supports storage gateway, and Azure IoT Edge.
appliance 5 (up to 40TB). appliance SMB, NFS, Azure blob, files • Preview: September 2018
• GA September 2018 • Currently in Preview • Preview September 2018 • Preview: September 2018
• Data Box Edge manages
• Data Box enables bulk • Perfect for projects that • Same service as Data Box, • Virtual network transfer uploads to Azure and can
migration to Azure when require a smaller form factor, but targeted to petabyte- appliance (VM), runs on your pre-process data prior to
network isn’t an option. e.g., autonomous vehicles. sized datasets. choice of hardware. upload.
Order Send Fill Return Upload Cloud to Edge Edge to Cloud Pre-processing ML Inferencing
Exactly what is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
Confidential Classification
Security Boundaries Business Impact / Criticality Public information
Department High (HBI) Internal use only
Business unit Medium (MBI) Supplier/partner confidential
etc… Low (LBI) Personally identifiable information (PII)
etc… Sensitive – financial
Sensitive – intellectual property
Downstream App/Purpose Owner / Steward / SME etc…
Data Warehouse
Serving, Security & Compliance
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc query
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Easily create reports (Self-service BI)
• Know questions
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure ser vices (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Director y integration, compliance, enterprise -grade SL As)
Fully-managed Hadoop and Spark
Azure for the cloud
HDInsight 100% Open Source Hortonworks
data platform
Hadoop and Spark
Clusters up and running in minutes
as a Service on Azure
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Hortonworks Data Platform (HDP) 3.0
(under the covers of HDInsight 4.0 – public preview)
Simply put, Hortonworks ties all the open source products together (20)
Custom
apps BI Analytics
SQL Server
SQL
master instance
Which experience do you want? Code first Visual tooling Notebooks Jobs
© Microsoft Corporation
Advanced analytics pattern in Azure
Data collection and understanding, modeling, and deployment
SQL DB
Logs, files, and media
(unstructured)
Azure Data Azure Data Azure Databricks HDInsight
Azure Cosmos DB SQL DB
Lake store Lake Analytics
Storage
SQL DW
Orchestration Trained model hosting
© Microsoft Corporation
Artificial Intelligence Decision Tree
Big Data Decision Tree v4
Business Intelligence Solutions Decision Tree
Q&A ?
James Serra, Big Data Evangelist
Email me at: [email protected]
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)
C L O U D D ATA W A R E H O U S E
Logs (unstructured)
Media (unstructured)
PolyBase
Files (unstructured) Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Warehouse Services
Business/custom apps
(structured)
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
M O D E R N D ATA W A R E H O U S E
Logs (unstructured)
Azure Databricks
Media (unstructured)
PolyBase
Files (unstructured) Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Warehouse Services
Business/custom apps
(structured)
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
A D VA N C E D A N A LY T I C S O N B I G D ATA
Logs (unstructured)
Media (unstructured)
PolyBase
Files (unstructured) Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Warehouse Services
Business/custom apps
(structured)
Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet
their unique needs.
R E A L T I M E A N A LY T I C S
Azure Databricks
Logs (unstructured)
Apache Kafka for Cosmos DB Real-time apps
HDInsight
Media (unstructured)
Files (unstructured)
PolyBase
Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Business/custom apps Warehouse Services
(structured)
Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to
tailor the above architecture to meet their unique needs.
D ATA M A R T C O N S O L I D AT I O N
PolyBase
Hadoop
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
HUB & SPOKE ARCHITECTURE FOR BI
Data Marts
PolyBase
Logs (unstructured)
Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Power BI
Data Warehouse
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
A U TO S C A L I N G D ATA W A R E H O U S E
PolyBase
Logs (unstructured)
Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Azure Analysis Power BI
Data Warehouse Services
Media (unstructured)
Azure Functions
Files (unstructured) (Auto-scaling)
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
D ATA W A R E H O U S E M I G R AT I O N
Business/custom apps
(structured) PolyBase
Azure SQL Data Business/custom apps
Warehouse
Logs (unstructured)
Azure Analysis
Services
Media (unstructured)
Azure Data Factory Azure Data Lake Store Gen2 Azure Databricks
Power BI
Files (unstructured)
Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.