100% found this document useful (2 votes)
494 views

Azure Data Platform Overview

Uploaded by

praveeninday
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
494 views

Azure Data Platform Overview

Uploaded by

praveeninday
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

About Me

 Microsoft, Big Data Evangelist


 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
I tried to understand the Microsoft data platform on my own…

And felt like I was body slammed by Randy


Savage:

Let’s prevent that from happening…


Data platform continuum
Hybrid Cloud Off premises
On premises
Shared
Lower cost

Dedicated
Higher cost
Higher administration Lower administration
On-Premises
The “evolution” of data platforms
On-Premises IaaS
The “evolution” of data platforms
On-Premises IaaS PaaS
The “evolution” of data platforms
On-Premises IaaS PaaS Pay per query
The “evolution” of data platforms
Microsoft Big Data Portfolio
Scale Up
Sequential Scale Out + Across

Azure SQL Database Azure SQL DW


Microsoft has solutions covering
Cloud

SQL Server in Azure VM Databricks


and connecting all four
Cosmos DB quadrants – that’s why SQL
Insights
HDInsight Server is one of the most utilized
Business intelligence databases in the world
Machine learning analytics
SQL Server Stretch
On-premises

SQL Server 2017 Hadoop Key

SQL Server 2016 Fast Track Analytics Platform System Relational Non-relational
 VM hosted on Microsoft Azure Infrastructure (“IaaS”)
• From Microsoft images (gallery) or your own images (custom)
SQL 2008R2 / 2012 / 2014 / 2016 / 2017 Web / Standard / Enterprise
Images refreshed with latest version, SP, CU
• Windows Server 2008 R2 / 2012 R2 / 2016, Linux RHEL / Ubuntu
• Fast provisioning (~10 minutes).
• Accessible via RDP and Powershell
• Full compatibility with SQL Server “Box” software

 Pay per use


• Per minute (only when running)
• Cost depends on size and licensing
• EA customers can use existing SQL licenses (BYOL)
• Network: only outgoing (not incoming)
• Storage: only used (not allocated)

 Elasticity
• 1 core / 2 GB mem / 1 TB   128 cores / 3.5 TB mem / 256 TB
Azure SQL Database
A relational database-as-a-service (“PaaS”), fully managed by Microsoft.
For cloud-designed apps when near-zero administration and enterprise-grade capabilities are key.
Perfect for organizations looking to dramatically increase the DB:IT ratio.
Azure SQL Database Managed Instance

Managed Instance
Best for modernization at
Instance scoped programming model with
scale with low cost and effort
high compatibility to on-premises databases

Single
Standalone managed database best for
predictable and stable workloads

Elastic pool
Shared resource model best for greater
efficiency through multi-tenancy
Security
• TDE • Row level security
• SQL Audit • Always Encrypted

Supports compatibility modes (SQL Server 2005+), Instance sizes up to 8TB


Adapts on-demand to your workload's needs, auto-scaling up to 100TB per database.

100
TB

Reliable and available Scalable High performance


Programming General Business Hyperscale Elastic Pools
Model Purpose Critical
Instance (MI) GA, 8TB GA, 4TB Private Preview, April private
100TB preview
Database GA, 4TB GA, 4TB Public Preview, GA
(Single) 100TB
A Z U R E DATA B A S E S E R V I C E S F O R
M YS Q L , P O S TG R E S Q L , A N D M A R I A D B

More choices and full integration into Azure’s ecosystem and ser vices

Managed community Languages and Scale in seconds with Secure and compliant Industr y-leading
MySQL, PostgreSQL, frameworks of your choice built-in high availability global reach
and MariaDB

My

Easy Lift and Shift Enterprise Ready


SMP vs MPP

SMP - Symmetric
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)

Multiprocessing •

SQL Server implementations traditionally have been SMP
Mostly, the solution is housed on a shared storage

MPP - Massively • Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
Parallel Processing • Segments communicate using high-speed network between nodes
SMP vs MPP
Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Support your smallest to your largest data storage needs while handling queries up to 100x faster.
A Z U R E R E L AT I O N A L DATA B A S E P L AT F O R M

SQL Data
SQL Database PostgreSQL MySQL MariaDB
Warehouse
Power BI, App Services, Data Factory,
Analytics, ML, Cognitive, Bot…

Intelligent: Advisors, Tuning, Monitoring


Database
Services Flexible: On-demand scaling, Resource governance
Platform
Trusted: HA/DR, Backup/Restore, Security, Audit, Isolation

Azure Compute

Azure Storage

Global Azure with 54 Regions


Azure Database Migration Service (DMS)
A seamless, end-to-end solution for moving on-premises SQL Server, Oracle, and other relational
databases to the cloud.

Azure Database Migration Guide


https://datamigration.microsoft.com/
Relational and non-relational defined
Relational databases (RDBMS, SQL Databases)
• Example: Microsoft SQL Server, Oracle Database, IBM DB2
• Mostly used in large enterprise scenarios
• Analytical RDBMS (OLAP, MPP) solutions are SQL DW, Redshift, Teradata, Netezza
Non-relational databases (NoSQL databases)
• Example: Azure Cosmos DB, MongoDB, Cassandra
• Four categories: Key-value stores, Wide-column stores, Document stores and Graph stores

Hadoop: Made up of Hadoop Distributed File System (HDFS), YARN and MapReduce (Ideal for data lake)

OLTP vs OLAP/DW
SMP vs MPP
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service

Table API
MongoDB API
Cassandra API

Column-family
Document

Key-value Graph

Guaranteed low latency at the 99th percentile


Elastic scale out
of storage & throughput Five well-defined consistency models

Turnkey global distribution Comprehensive SLAs


LOB

CRM
BI + Reporting
INGEST STORE PREP MODEL & SERVE
(& store)
Graph

Advanced Analytics
Image

Social Data orchestration Big data store Transform & Clean Data warehouse
and monitoring
AI
IoT Azure Databricks
Azure Data Factory Azure Data Lake Azure SQL Data Warehouse
Storage Gen2
SSIS Azure HDInsight Azure Analysis Services
Blob Storage
PolyBase & Stored SQL Database (Single, MI,
Azure Data Lake Procedures HyperScale)
Storage Gen1
Power BI Dataflow SQL Server in a VM
SQL Server 2019 Big
Data Cluster Azure Data Lake Analytics Cosmos DB
Power BI Aggregations
Blob Storage Data Lake Store

Large partner ecosystem Built for Hadoop


Global scale – All 50 regions Hierarchical namespace
Durability options ACLs, AAD and RBAC
Tiered - Hot/Cool/Archive Performance tuned for big data
Cost Efficient Very high scale capacity and throughput

Azure Data Lake Storage Gen2

Large partner ecosystem Built for Hadoop


Global scale – All 50 regions Hierarchical namespace
Durability options ACLs, AAD and RBAC
Tiered - Hot/Cool/Archive Performance tuned for big data
Cost Efficient Very high scale capacity and throughput
Zone 1 Zone 2 Zone 3

LRS ZRS GRS RA-GRS

Multiple replicas across Replicas across 3 Zones Multiple replicas across each GRS + Read access to secondary
a datacenter of 2 regions
Protect against disk, node, rack and Separate secondary endpoint
Protect against disk, zone failures Protects against major
RPO delay to secondary can be
node, rack failures regional disasters
Synchronous writes to all 3 zones queried
Write is ack’d when all Asynchronous to secondary
12 9s of durability SLA: 99.99% (read), 99.9% (write)
replicas are committed 16 9s of durability
Available in 8 regions
Superior to dual-parity SLA: 99.9%
RAID SLA: 99.9%
11 9s of durability
SLA: 99.9%
Edge
Analytics Engines High Performance
(Hadoop, Spark, SCOPE …)
AI / ML
Compute
Data
Box

Data
Caching Layer (Avere tech) Box
Edge
REST HDFS NFS SMB … Avere
FXT
Extra Hot Tier - Premium (SSD + NVME)
Azure
Automatic Hot Tier (HDD) Stack
Lifecycle
Management Cool Tier (HDD)
Azure File
Cooler Tier Sync

Archive Tier Azure Backup


Current Future
Deep Storage Tier (Glass, DNA, etc.)
File Sync Fuse Site Replication Network Acceleration
• Windows Srv <-> Azure • Mount blobs as local FS • On premise & cloud • Aspera
• Local caching • Commit on write • Windows, Linux • Signiant
• With offline (Databox) can • Linux • Physical, virtual
'sync' remainder • Hyper-V, VMWare

AZCopy NetApp Data Factory Partners


• Throughput +30% • CloudSync • On premise & cloud sources • Peer Global File Service
• S3 to Azure Blobs • SnapMirror • Structured & unstructured • Talon FAST
• Sync to cloud • SnapVault • Over 60 connectors • Zerto
• Hi Latency 10-100% • UI design data flow •…

Offline
• Data Box
• Data Box Heavy
• Data Box Disk
• Disk Import / Export

Fast Data Transfer


microsoft.com/en-us/garage/profiles/fast-data-transfer/
Offline Data Transfer Online Data Transfer

Data Box Data Box DiskPREVIEW Data Box Heavy PREVIEW Data Box Gateway PREVIEW Data Box Edge PREVIEW
• Capacity: 100 TB • Capacity: 8TB ea.; 40TB/order • Capacity: 1 PB • Virtual device provisioned in • Local Cache Capacity: ~12 TB
• Weight: ~50 lbs • Secure, ruggedized USB • Weight 500+ lbs your hypervisor • Includes Data Box Gateway
• Secure, ruggedized drives orderable in packs of • Secure, ruggedized • Supports storage gateway, and Azure IoT Edge.
appliance 5 (up to 40TB). appliance SMB, NFS, Azure blob, files • Preview: September 2018
• GA September 2018 • Currently in Preview • Preview September 2018 • Preview: September 2018
• Data Box Edge manages
• Data Box enables bulk • Perfect for projects that • Same service as Data Box, • Virtual network transfer uploads to Azure and can
migration to Azure when require a smaller form factor, but targeted to petabyte- appliance (VM), runs on your pre-process data prior to
network isn’t an option. e.g., autonomous vehicles. sized datasets. choice of hardware. upload.

Network Data Transfer Edge Compute

Order Send Fill Return Upload Cloud to Edge Edge to Cloud Pre-processing ML Inferencing
Exactly what is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.

• Inexpensively store unlimited data


• Collect all data “just in case”
• Store data with no modeling – “Schema on read”
• Complements EDW
• Frees up expensive EDW resources
• Quick user access to data
• ETL Hadoop tools
• Easily scalable
• Place to move older data (archive)
• Place to backup data to
Needs data governance so your data lake does not turn
into a data swamp!
Objectives
 Plan the structure based on optimal data retrieval
 Avoid a chaotic, unorganized data swamp

Common ways to organize the data:


Time Partitioning Data Retention Policy Probability of Data Access
Year/Month/Day/Hour/Minute Temporary data Recent/current data
Permanent data Historical data
Applicable period (ex: project lifetime) etc…
Subject Area etc…

Confidential Classification
Security Boundaries Business Impact / Criticality Public information
Department High (HBI) Internal use only
Business unit Medium (MBI) Supplier/partner confidential
etc… Low (LBI) Personally identifiable information (PII)
etc… Sensitive – financial
Sensitive – intellectual property
Downstream App/Purpose Owner / Steward / SME etc…
Data Warehouse
Serving, Security & Compliance
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc query
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Easily create reports (Self-service BI)
• Know questions
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure

Best of Databricks Best of Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workflows

Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

Native integration with Azure ser vices (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Director y integration, compliance, enterprise -grade SL As)
Fully-managed Hadoop and Spark
Azure for the cloud
HDInsight 100% Open Source Hortonworks
data platform
Hadoop and Spark
Clusters up and running in minutes
as a Service on Azure
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Hortonworks Data Platform (HDP) 3.0
(under the covers of HDInsight 4.0 – public preview)

Simply put, Hortonworks ties all the open source products together (20)
Custom
apps BI Analytics

SQL Server
SQL
master instance

Compute pool Compute pool Compute pool Directly


SQL Compute SQL Compute SQL Compute SQL Compute read from
SQL Compute
… HDFS
Node Node Node Node Node

Data mart Storage pool

SQL Data SQL Data


SQL SQL SQL
Node Node Spark Spark … Spark
Server Server Server

HDFS Data Node HDFS Data Node HDFS Data Node


Storage Storage
Kubernetes pod
IoT data

Node Node Node Node Node Node Node


Persistent storage
Machine learning and AI portfolio
When to use what Microsoft ML &
AI products

Build your own or consume Consume


Build your own
pre-trained models?

Azure Machine Spark ML, Cognitive


Learning SparkR, SparklyR services, bots

Which experience do you want? Code first Visual tooling Notebooks Jobs

(On-prem) (cloud) (cloud) Azure Databricks


Deployment target BYOT AML Studio
ML Server

What engines do you want to use?


On-prem SQL Server SQL Server Hadoop Azure Batch DSVM Spark Spark
Hadoop

© Microsoft Corporation
Advanced analytics pattern in Azure
Data collection and understanding, modeling, and deployment

Model training Serving storage

Sensors and IoT


(unstructured) Azure ML Azure ML ML server SQL Server Data Science Batch AI
Azure Databricks
Services Studio (Spark ML) (in-database ML) VM Applications
Cosmos DB

Long-term storage Data processing

SQL DB
Logs, files, and media
(unstructured)
Azure Data Azure Data Azure Databricks HDInsight
Azure Cosmos DB SQL DB
Lake store Lake Analytics
Storage
SQL DW
Orchestration Trained model hosting

Business/custom apps Power BI


(structured) Azure Analysis
Azure Container SQL Server Services Dashboards
Azure Data
Service (in-database ML)
Factory

© Microsoft Corporation
Artificial Intelligence Decision Tree
Big Data Decision Tree v4
Business Intelligence Solutions Decision Tree
Q&A ?
James Serra, Big Data Evangelist
Email me at: [email protected]
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)
C L O U D D ATA W A R E H O U S E

INGEST STORE PREP & TRAIN MODEL & SERVE

Logs (unstructured)

Media (unstructured)

PolyBase

Files (unstructured) Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Warehouse Services

Business/custom apps
(structured)

Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
M O D E R N D ATA W A R E H O U S E

INGEST STORE PREP & TRAIN MODEL & SERVE

Logs (unstructured)

Azure Databricks

Media (unstructured)

PolyBase

Files (unstructured) Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Warehouse Services

Business/custom apps
(structured)

Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
A D VA N C E D A N A LY T I C S O N B I G D ATA

INGEST STORE PREP & TRAIN MODEL & SERVE

Logs (unstructured)

Azure Databricks Cosmos DB


SparkR
Real-time apps

Media (unstructured)

PolyBase

Files (unstructured) Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Warehouse Services

Business/custom apps
(structured)

Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet
their unique needs.
R E A L T I M E A N A LY T I C S

INGEST STORE PREP & TRAIN MODEL & SERVE

Sensors and IoT


(unstructured)

Azure Databricks

Logs (unstructured)
Apache Kafka for Cosmos DB Real-time apps
HDInsight

Media (unstructured)

Files (unstructured)
PolyBase

Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Data Azure Analysis Power BI
Business/custom apps Warehouse Services
(structured)

Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to
tailor the above architecture to meet their unique needs.
D ATA M A R T C O N S O L I D AT I O N

INGEST STORE MODEL & SERVE

PolyBase

RDBMS data marts


Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Azure Analysis Power BI
Data Warehouse Services

Hadoop

Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
HUB & SPOKE ARCHITECTURE FOR BI

INGEST STORE PREP & TRAIN MODEL & SERVE

Data Marts

Business/custom apps Azure Databricks SQL


(structured)

Multiple Azure SQL


Database instances

PolyBase
Logs (unstructured)

Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Power BI
Data Warehouse

Media (unstructured) Data Cubes

Multiple Azure Analysis


Services instances
Files (unstructured)

Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
A U TO S C A L I N G D ATA W A R E H O U S E

INGEST STORE PREP & TRAIN MODEL & SERVE

Business/custom apps Azure Databricks


(structured)

PolyBase
Logs (unstructured)

Azure Data Factory Azure Data Lake Store Gen2 Azure SQL Azure Analysis Power BI
Data Warehouse Services

Media (unstructured)

Azure Functions
Files (unstructured) (Auto-scaling)

Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
D ATA W A R E H O U S E M I G R AT I O N

INGEST STORE PREP & TRAIN MODEL & SERVE

Business/custom apps
(structured) PolyBase
Azure SQL Data Business/custom apps
Warehouse

Logs (unstructured)

Azure Analysis
Services

Media (unstructured)
Azure Data Factory Azure Data Lake Store Gen2 Azure Databricks

Power BI
Files (unstructured)

Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.

You might also like