SlideShare a Scribd company logo
Grant Fritchey | www.ScaryDBA.com
www.ScaryDBA.com
Introducing
Azure SQL Data Warehouse
Grant Fritchey
grant@scarydba.com
Grant Fritchey | www.ScaryDBA.com
Goals
 Understand the basic infrastructure and architecture behindAzure SQL
Data Warehouse
 Learn different methods of design, querying, and data migration in
order to begin an implementation ofAzure SQL Data Warehouse
 Investigate the tooling available in support of automation and
monitoring around Azure SQL Data Warehouse
Grant Fritchey | www.ScaryDBA.com
Get in touch Grant Fritchey
scarydba.com
grant@scarydba.com
@gfritchey
Grant Fritchey | www.ScaryDBA.com
Azure SQL Data Warehouse
 Analytics Platform System (APS)
 Not simply a database
» Massively parallel computing platform
 Platform as a Service (PaaS)
 Pay for what you use
» Pay for when you use it
 Connectivity dependent
 Just a database
4
Grant Fritchey | www.ScaryDBA.com
ARCHITECTURE
AzureSQL DataWarehouse
5
Grant Fritchey | www.ScaryDBA.com
Azure SQL Data Warehouse
 Built on a combination ofAzure SQL Database and Analytics Platform
System(APS)
 DBMS = Azure SQL Database
 Processing = APS
 Storage = Azure BLOB Storage
 Default storage is through columnstore
 It’s still SQL Server at it’s core
6
Grant Fritchey | www.ScaryDBA.com 7
BlobStorage
APS
Control Node:
Coordinates data movement
and workload management
Compute Nodes:
Provide processing mechanisms
in parallel or individually
Massively Parallel Processing
Engine
Read Access Geo-Redundant Storage:
RA-GRS stores multi-terabyte data
across Azure geo regions
Application
Grant Fritchey | www.ScaryDBA.com
Table Architecture
 Clustered columnstore by default
 Each “table” consists of 60 tables
 Tables consist of segments
» 100k per compressed row group improves performance
» 1 million rows per/group is max
 Columnstore storage
» Compressed colulmnstore segments
» Delta store (standard clustered index)
8
Grant Fritchey | www.ScaryDBA.com
Protection Features
 Locally Redundant Storage
 Geo-Redundant Storage
 Automated backups
» Every 8 hours
» Kept for 7 days
 Transparent Data Encryption
9
Grant Fritchey | www.ScaryDBA.com
Security
 SQL Server logins
 AzureActive Directory
 Manage ResourceGroups
 Firewall
 Built-in Auditing
10
Grant Fritchey | www.ScaryDBA.com 11
Grant Fritchey | www.ScaryDBA.com
DATABASE DESIGN
AzureSQL DataWarehouse
12
Grant Fritchey | www.ScaryDBA.com
Actually, Table Design
 Define table distribution
 Partitioning
 Statistics
 GeneralTips
 Unsupported
13
Grant Fritchey | www.ScaryDBA.com
Table Distribution
 Each table consists of 60 tables
» 60 distributions
 Round-robin
» One, then the next
 Hash
 For best performance, pick the distribution method
14
Grant Fritchey | www.ScaryDBA.com
Round-Robin Distribution
 Starting out
 No join key to other tables
 No good hash candidate
 Joins against this table aren’t significant
 Staging or temporary table
15
Grant Fritchey | www.ScaryDBA.com
Hash Distribution
 Ensure
» No updates
» Even data distribution
» Minimal data movement
 Suggestions for Hash key
» Highly selective data
» Minimal nulls and duplicates
» Avoid dates
» Avoid fewer than 60 values
» Foreign key columns
16
Grant Fritchey | www.ScaryDBA.com
Ensuring Index Quality
 Avoid memory pressure when building indexes
» Balance memory with concurrency
 Avoid high volume DML operations
» Deletes are not deleted until table rebuild
» Inserts are added to delta group
» Updates are logical delete then an insert (delta group)
» Different than large DML operations
— 102,400 rows per distribution, or 6.144 million rows in an operation goes to direct
storage
 Avoid small or trickle load operations
» Very small data loads always go to delta group
 Be cautious with the number of partitions
» Each partition is a new table
» Each table is 60 tables
17
Grant Fritchey | www.ScaryDBA.com
Table Tips
 Row Store
» < 60 million rows
» Frequent updates
» Small dimension tables
 Columnstore
» > 60 million rows
» Infrequent updates
» Fact tables & large dimension tables
18
Grant Fritchey | www.ScaryDBA.com
Partitioning
 60 million rows per partition to see benefits
 There can be too many partitions
 Partitioning can prevent 1 million rows per group
 Partitioning can cause rows to go to delta row group instead of
compressed row group
 Partition elimination must occur to see benefits
19
Grant Fritchey | www.ScaryDBA.com
Statistics
 No automatic creation
 No automatic update
 Microsoft suggests creating statistics on every column as a start point
» I don’t agree, but this is a better choice than no statistics
 Multi-column statistics supported
» Histogram is still only on first column
 Syntax is the same
20
Grant Fritchey | www.ScaryDBA.com
General Tips
 Denormalization is actually viable
 Use minimum viable data size
 Heap tables for transient data
21
Grant Fritchey | www.ScaryDBA.com
Unsupported
 Currently (these things change)
» Identity
» Primary key, foreign key, unique and check constraints
» Unique indexes
» Computed columns
» Sparse columns
» User-Defined types
» Sequence
» Triggers
» Indexed views
» Synonyms
22
Grant Fritchey | www.ScaryDBA.com
And Memory
 Connection group setting
 More memory more processing as ADW size increases
 Still only 30 connections
 Fundamental to data loads as well as querying
23
Grant Fritchey | www.ScaryDBA.com 24
Grant Fritchey | www.ScaryDBA.com
D-SQL
AzureSQL DataWarehouse
25
Grant Fritchey | www.ScaryDBA.com
New & Different
 CREATETABLEAS SELECT
 GROUP BY differences
 Labels
 Stored procedures limitations
 View limitations
 General Notes
26
Grant Fritchey | www.ScaryDBA.com
CREATE TABLE AS SELECT
 Must define distribution
 Uses parallel processing
 Uses
» Copy a table
» Change structure on a table
» Replace ANSI derived tables (unsupported)
» External data import
27
Grant Fritchey | www.ScaryDBA.com
GROUP BY
 Unsupported
» ROLLUP
» GROUPING SETS
» CUBE
28
Grant Fritchey | www.ScaryDBA.com
Labels
 Mark a query
 Useful for troubleshooting
29
Grant Fritchey | www.ScaryDBA.com
Stored procedures limitations
 Unsupported
» Temporary stored procedures
» Numbered stored procedures
» Extended stored procedures
» CLR stored procedures
» Encryption
» Replication
» Table-valued parameters
» Read-only parameters
» Default parameters
» Execution contexts
» RETURN statement
30
Grant Fritchey | www.ScaryDBA.com
View Limitations
 Schema binding
 No data manipulation through view
 No temporary tables
 No support for EXPAND/NOEXPAND
 No indexed views
31
Grant Fritchey | www.ScaryDBA.com
General Notes
 Cursurs are not supported
» UseWHILE
 Transaction isolation level is limited to READ_UNCOMMITTED
 No SELECT or UPDATE for variable assignment
» Instead
SET @i = (SELECT count(*) FROM dbo.Table)
32
Grant Fritchey | www.ScaryDBA.com
DATA IMPORT MECHANISMS
AzureSQL DataWarehouse
33
Grant Fritchey | www.ScaryDBA.com
Import Processes
 Azure Data Factory
 SSIS
 Polybase
 3rd Party
34
Grant Fritchey | www.ScaryDBA.com
Azure Data Factory
 Currently single core through control node
» Can use Polybase
 Reads from
» Azure blob storage
» Azure SQL Database
» On-premises SQL Server
» SQL ServerVM in Azure
 Requires software installations locally to On-Premise andVMs
 Second slowest method (unless Polybase is used)
35
Grant Fritchey | www.ScaryDBA.com
SSIS
 Single core through control node only
 Include retry logic
 Increase timeout, radically
 Use “all or nothing” load processing
 Parallel loads from multiple SSIS can help
 Slowest method according to Microsoft
36
Grant Fritchey | www.ScaryDBA.com
Polybase
 Supports delimted file and Hadoop
 Supports compressed files
» Gzip,zlab, snappy
 Single compressed file per reader, for better performance, multiple
compressed files scaled for DWU
 Compressed files load slower, but upload faster
 Single operation
 Load speed increases with scale
» Readers increase
» Writers increase
37
Grant Fritchey | www.ScaryDBA.com
3rd Party
38
Grant Fritchey | www.ScaryDBA.com
Data Loading Tips
 Network bandwidth must be considered unless the load is all done
withinAzure
» Express Route, paid access, can help
 Memory affects columnstore, so use more memory for load processes
 Fixed length file format not currently supported by Polybase
 Remember, it’s all a balancing act between upload speed & import
speeds
 100k chunks to get data onto compressed segments in columnstore
39
Grant Fritchey | www.ScaryDBA.com
TOOLING
AzureSQL DataWarehouse
40
Grant Fritchey | www.ScaryDBA.com
Available Tools
 Azure Portal
 Visual Studio
 SQL Server Management Studio
 PowerShell
41
Grant Fritchey | www.ScaryDBA.com 42
Grant Fritchey | www.ScaryDBA.com
MAINTENANCE
AzureSQL DataWarehouse
43
Grant Fritchey | www.ScaryDBA.com
SQL Server
 Index Maintenance
» But not for defragmentation
 Statistics maintenance
 Monitoring
 Backups
» Managed for you, just monitor
44
Grant Fritchey | www.ScaryDBA.com
Statistics
 No automatic creation
 No automatic update
» Update after data loads
» Update after data modification
» If either of the above doesn’t change data distribution, don’t update the
statistics
 Target columns
» JOIN
» GROUP BY
» ORDER BY
» WHERE
» HAVING
 Syntax is the same as SQL Server
45
Grant Fritchey | www.ScaryDBA.com
DBCC SHOW_STATISTICS()
 Limits
» No undocumented features
» No stats_stream
» Square brackets not supported
» Cannot use column names to identify stats
— Must use the stats name
46
Grant Fritchey | www.ScaryDBA.com
Monitoring
 Portal
 Dynamic ManagementViews
» Sys.pdw_loader_backup_runs
» Sys.dm_pdw_exec_sessions
» Sys.dm_pdw_exec_requests
» Sys.dm_pdw_request_steps
» Sys.dm_pdw_sql_requests
» Sys.dm_pdw_dms_workers
» Sys.dm_pdw_waits
 DBCC
» PDW_SHOWEXECUTIONPLAN
» PDW_SHOWSPACEUSED
47
Grant Fritchey | www.ScaryDBA.com
Microsoft Marketing Slide
48
Grant Fritchey | www.ScaryDBA.com
Resources
 Microsoft Documentation
 Azure Data Platform Learning Resources
 Grant Fritchey
 ColumnstoreArchitecture
 Troubleshooting
 CreatingArtificial KeyValues
49
Grant Fritchey | www.ScaryDBA.com
Goals
 Understand the basic infrastructure and architecture behindAzure SQL
Data Warehouse
 Learn different methods of design, querying, and data migration in
order to begin an implementation ofAzure SQL Data Warehouse
 Investigate the tooling available in support of automation and
monitoring around Azure SQL Data Warehouse
Grant Fritchey | www.ScaryDBA.com
Get in touch Grant Fritchey
scarydba.com
grant@scarydba.com
@gfritchey
Grant Fritchey | www.ScaryDBA.com
Most useful docs
 https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-best-practices/
 https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-tables-index/#causes-of-poor-columnstore-index-quality
 https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-tables-distribute/
52

More Related Content

PPTX
Azure SQL Data Warehouse for beginners
Michaela Murray
 
PDF
Azure SQL Data Warehouse
Antonios Chatzipavlis
 
PPTX
Azure SQL DWH
Shy Engelberg
 
PDF
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
PPTX
What's new in SQL Server 2016
James Serra
 
PPTX
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 
PPTX
An intro to Azure Data Lake
Rick van den Bosch
 
PPTX
Introduction to PolyBase
James Serra
 
Azure SQL Data Warehouse for beginners
Michaela Murray
 
Azure SQL Data Warehouse
Antonios Chatzipavlis
 
Azure SQL DWH
Shy Engelberg
 
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
What's new in SQL Server 2016
James Serra
 
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 
An intro to Azure Data Lake
Rick van den Bosch
 
Introduction to PolyBase
James Serra
 

What's hot (20)

PDF
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
PPTX
Azure Lowlands: An intro to Azure Data Lake
Rick van den Bosch
 
PPTX
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
PDF
Changing the game with cloud dw
elephantscale
 
PPTX
A lap around Azure Data Factory
BizTalk360
 
PDF
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
DataStax
 
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PPTX
SQL Server 2016 - Stretch DB
Shy Engelberg
 
PPTX
Get started with Microsoft SQL Polybase
Henk van der Valk
 
PPTX
Architecting a datalake
Laurent Leturgez
 
PPTX
Microsoft Azure Data Warehouse Overview
Justin Munsters
 
PPTX
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
DataStax
 
PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
Tom Kerkhove
 
PPTX
HA/DR options with SQL Server in Azure and hybrid
James Serra
 
PPTX
Modern data warehouse
Rakesh Jayaram
 
PPTX
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
PPTX
Azure Data Factory
HARIHARAN R
 
PDF
Azure Data services
Rajesh Kolla
 
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
Azure Lowlands: An intro to Azure Data Lake
Rick van den Bosch
 
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
Changing the game with cloud dw
elephantscale
 
A lap around Azure Data Factory
BizTalk360
 
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
DataStax
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
SQL Server 2016 - Stretch DB
Shy Engelberg
 
Get started with Microsoft SQL Polybase
Henk van der Valk
 
Architecting a datalake
Laurent Leturgez
 
Microsoft Azure Data Warehouse Overview
Justin Munsters
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
DataStax
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Tom Kerkhove
 
HA/DR options with SQL Server in Azure and hybrid
James Serra
 
Modern data warehouse
Rakesh Jayaram
 
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
Azure Data Factory
HARIHARAN R
 
Azure Data services
Rajesh Kolla
 
Ad

Similar to Introducing Azure SQL Data Warehouse (20)

PDF
KoprowskiT_SQLSat230_Rheinland_SQLAzure-fromPlantoBackuptoCloud
Tobias Koprowski
 
PDF
SQL Server Optimization Checklist
Grant Fritchey
 
PDF
Azure SQL
GlobalLogic Ukraine
 
PPTX
Azure Data platform
Mostafa
 
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
PDF
A to z for sql azure databases
Antonios Chatzipavlis
 
PPTX
Designer's Favorite New Features in SQLServer
Karen Lopez
 
PDF
Azure Fundamentals.pdf
AayushMaheshwari23
 
PPTX
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
PPTX
Data Estate Modernization
Indra Dharmawan
 
PPTX
Why you should(n't) run your databases in the cloud
Microsoft TechNet - Belgium and Luxembourg
 
PPTX
Tech-Spark: Azure SQL Databases
Ralph Attard
 
PPTX
Sql azure dec_2010 Lynn & Ike
Ike Ellis
 
PPTX
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...
Karen Lopez
 
PDF
Changing Your Habits: Tips to Tune Your T-SQL
Grant Fritchey
 
PDF
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
IDERA Software
 
PDF
SQLSaturday#290_Kiev_WindowsAzureDatabaseForBeginners
Tobias Koprowski
 
DOC
Sql Azure Database whitepaper r01
Ismail Muhammad
 
PPTX
Exploring Microsoft Azure Infrastructures
CCG
 
PDF
Azure - Data Platform
giventocode
 
KoprowskiT_SQLSat230_Rheinland_SQLAzure-fromPlantoBackuptoCloud
Tobias Koprowski
 
SQL Server Optimization Checklist
Grant Fritchey
 
Azure Data platform
Mostafa
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
A to z for sql azure databases
Antonios Chatzipavlis
 
Designer's Favorite New Features in SQLServer
Karen Lopez
 
Azure Fundamentals.pdf
AayushMaheshwari23
 
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Data Estate Modernization
Indra Dharmawan
 
Why you should(n't) run your databases in the cloud
Microsoft TechNet - Belgium and Luxembourg
 
Tech-Spark: Azure SQL Databases
Ralph Attard
 
Sql azure dec_2010 Lynn & Ike
Ike Ellis
 
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...
Karen Lopez
 
Changing Your Habits: Tips to Tune Your T-SQL
Grant Fritchey
 
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
IDERA Software
 
SQLSaturday#290_Kiev_WindowsAzureDatabaseForBeginners
Tobias Koprowski
 
Sql Azure Database whitepaper r01
Ismail Muhammad
 
Exploring Microsoft Azure Infrastructures
CCG
 
Azure - Data Platform
giventocode
 
Ad

More from Grant Fritchey (20)

PDF
You Need a PostgreSQL Restore Plan Presentation
Grant Fritchey
 
PDF
PostgreSQL Query Performance Monitoring for the Absolute Beginner
Grant Fritchey
 
PDF
Leveraging AI for the PostgreSQL DBA #pgconf.eu
Grant Fritchey
 
PDF
Exploring Execution Plans, Learning to Read SQL Server Execution Plans
Grant Fritchey
 
PPTX
SQL Server Performance Tuning: Common Problems, Possible Solutions
Grant Fritchey
 
PDF
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
PPTX
Migrating To PostgreSQL
Grant Fritchey
 
PPTX
PostgreSQL Performance Problems: Monitoring and Alerting
Grant Fritchey
 
PDF
Automating Database Deployments Using Azure DevOps
Grant Fritchey
 
PDF
Learn To Effectively Use Extended Events_Techorama.pdf
Grant Fritchey
 
PDF
Using Query Store to Understand and Control Query Performance
Grant Fritchey
 
PPTX
You Should Be Standing Here: Learn How To Present a Session
Grant Fritchey
 
PDF
Redgate Community Circle: Tools For SQL Server Performance Tuning
Grant Fritchey
 
PDF
10 Steps To Global Data Compliance
Grant Fritchey
 
PDF
Time to Use the Columnstore Index
Grant Fritchey
 
PDF
Introduction to SQL Server in Containers
Grant Fritchey
 
PDF
DevOps for the DBA
Grant Fritchey
 
PDF
SQL Injection: How It Works, How to Stop It
Grant Fritchey
 
PDF
Privacy and Protection in the World of Database DevOps
Grant Fritchey
 
PDF
SQL Server Tools for Query Tuning
Grant Fritchey
 
You Need a PostgreSQL Restore Plan Presentation
Grant Fritchey
 
PostgreSQL Query Performance Monitoring for the Absolute Beginner
Grant Fritchey
 
Leveraging AI for the PostgreSQL DBA #pgconf.eu
Grant Fritchey
 
Exploring Execution Plans, Learning to Read SQL Server Execution Plans
Grant Fritchey
 
SQL Server Performance Tuning: Common Problems, Possible Solutions
Grant Fritchey
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Migrating To PostgreSQL
Grant Fritchey
 
PostgreSQL Performance Problems: Monitoring and Alerting
Grant Fritchey
 
Automating Database Deployments Using Azure DevOps
Grant Fritchey
 
Learn To Effectively Use Extended Events_Techorama.pdf
Grant Fritchey
 
Using Query Store to Understand and Control Query Performance
Grant Fritchey
 
You Should Be Standing Here: Learn How To Present a Session
Grant Fritchey
 
Redgate Community Circle: Tools For SQL Server Performance Tuning
Grant Fritchey
 
10 Steps To Global Data Compliance
Grant Fritchey
 
Time to Use the Columnstore Index
Grant Fritchey
 
Introduction to SQL Server in Containers
Grant Fritchey
 
DevOps for the DBA
Grant Fritchey
 
SQL Injection: How It Works, How to Stop It
Grant Fritchey
 
Privacy and Protection in the World of Database DevOps
Grant Fritchey
 
SQL Server Tools for Query Tuning
Grant Fritchey
 

Recently uploaded (20)

PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
Presentation about variables and constant.pptx
kr2589474
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 

Introducing Azure SQL Data Warehouse

  • 1. Grant Fritchey | www.ScaryDBA.com www.ScaryDBA.com Introducing Azure SQL Data Warehouse Grant Fritchey [email protected]
  • 2. Grant Fritchey | www.ScaryDBA.com Goals  Understand the basic infrastructure and architecture behindAzure SQL Data Warehouse  Learn different methods of design, querying, and data migration in order to begin an implementation ofAzure SQL Data Warehouse  Investigate the tooling available in support of automation and monitoring around Azure SQL Data Warehouse
  • 3. Grant Fritchey | www.ScaryDBA.com Get in touch Grant Fritchey scarydba.com [email protected] @gfritchey
  • 4. Grant Fritchey | www.ScaryDBA.com Azure SQL Data Warehouse  Analytics Platform System (APS)  Not simply a database » Massively parallel computing platform  Platform as a Service (PaaS)  Pay for what you use » Pay for when you use it  Connectivity dependent  Just a database 4
  • 5. Grant Fritchey | www.ScaryDBA.com ARCHITECTURE AzureSQL DataWarehouse 5
  • 6. Grant Fritchey | www.ScaryDBA.com Azure SQL Data Warehouse  Built on a combination ofAzure SQL Database and Analytics Platform System(APS)  DBMS = Azure SQL Database  Processing = APS  Storage = Azure BLOB Storage  Default storage is through columnstore  It’s still SQL Server at it’s core 6
  • 7. Grant Fritchey | www.ScaryDBA.com 7 BlobStorage APS Control Node: Coordinates data movement and workload management Compute Nodes: Provide processing mechanisms in parallel or individually Massively Parallel Processing Engine Read Access Geo-Redundant Storage: RA-GRS stores multi-terabyte data across Azure geo regions Application
  • 8. Grant Fritchey | www.ScaryDBA.com Table Architecture  Clustered columnstore by default  Each “table” consists of 60 tables  Tables consist of segments » 100k per compressed row group improves performance » 1 million rows per/group is max  Columnstore storage » Compressed colulmnstore segments » Delta store (standard clustered index) 8
  • 9. Grant Fritchey | www.ScaryDBA.com Protection Features  Locally Redundant Storage  Geo-Redundant Storage  Automated backups » Every 8 hours » Kept for 7 days  Transparent Data Encryption 9
  • 10. Grant Fritchey | www.ScaryDBA.com Security  SQL Server logins  AzureActive Directory  Manage ResourceGroups  Firewall  Built-in Auditing 10
  • 11. Grant Fritchey | www.ScaryDBA.com 11
  • 12. Grant Fritchey | www.ScaryDBA.com DATABASE DESIGN AzureSQL DataWarehouse 12
  • 13. Grant Fritchey | www.ScaryDBA.com Actually, Table Design  Define table distribution  Partitioning  Statistics  GeneralTips  Unsupported 13
  • 14. Grant Fritchey | www.ScaryDBA.com Table Distribution  Each table consists of 60 tables » 60 distributions  Round-robin » One, then the next  Hash  For best performance, pick the distribution method 14
  • 15. Grant Fritchey | www.ScaryDBA.com Round-Robin Distribution  Starting out  No join key to other tables  No good hash candidate  Joins against this table aren’t significant  Staging or temporary table 15
  • 16. Grant Fritchey | www.ScaryDBA.com Hash Distribution  Ensure » No updates » Even data distribution » Minimal data movement  Suggestions for Hash key » Highly selective data » Minimal nulls and duplicates » Avoid dates » Avoid fewer than 60 values » Foreign key columns 16
  • 17. Grant Fritchey | www.ScaryDBA.com Ensuring Index Quality  Avoid memory pressure when building indexes » Balance memory with concurrency  Avoid high volume DML operations » Deletes are not deleted until table rebuild » Inserts are added to delta group » Updates are logical delete then an insert (delta group) » Different than large DML operations — 102,400 rows per distribution, or 6.144 million rows in an operation goes to direct storage  Avoid small or trickle load operations » Very small data loads always go to delta group  Be cautious with the number of partitions » Each partition is a new table » Each table is 60 tables 17
  • 18. Grant Fritchey | www.ScaryDBA.com Table Tips  Row Store » < 60 million rows » Frequent updates » Small dimension tables  Columnstore » > 60 million rows » Infrequent updates » Fact tables & large dimension tables 18
  • 19. Grant Fritchey | www.ScaryDBA.com Partitioning  60 million rows per partition to see benefits  There can be too many partitions  Partitioning can prevent 1 million rows per group  Partitioning can cause rows to go to delta row group instead of compressed row group  Partition elimination must occur to see benefits 19
  • 20. Grant Fritchey | www.ScaryDBA.com Statistics  No automatic creation  No automatic update  Microsoft suggests creating statistics on every column as a start point » I don’t agree, but this is a better choice than no statistics  Multi-column statistics supported » Histogram is still only on first column  Syntax is the same 20
  • 21. Grant Fritchey | www.ScaryDBA.com General Tips  Denormalization is actually viable  Use minimum viable data size  Heap tables for transient data 21
  • 22. Grant Fritchey | www.ScaryDBA.com Unsupported  Currently (these things change) » Identity » Primary key, foreign key, unique and check constraints » Unique indexes » Computed columns » Sparse columns » User-Defined types » Sequence » Triggers » Indexed views » Synonyms 22
  • 23. Grant Fritchey | www.ScaryDBA.com And Memory  Connection group setting  More memory more processing as ADW size increases  Still only 30 connections  Fundamental to data loads as well as querying 23
  • 24. Grant Fritchey | www.ScaryDBA.com 24
  • 25. Grant Fritchey | www.ScaryDBA.com D-SQL AzureSQL DataWarehouse 25
  • 26. Grant Fritchey | www.ScaryDBA.com New & Different  CREATETABLEAS SELECT  GROUP BY differences  Labels  Stored procedures limitations  View limitations  General Notes 26
  • 27. Grant Fritchey | www.ScaryDBA.com CREATE TABLE AS SELECT  Must define distribution  Uses parallel processing  Uses » Copy a table » Change structure on a table » Replace ANSI derived tables (unsupported) » External data import 27
  • 28. Grant Fritchey | www.ScaryDBA.com GROUP BY  Unsupported » ROLLUP » GROUPING SETS » CUBE 28
  • 29. Grant Fritchey | www.ScaryDBA.com Labels  Mark a query  Useful for troubleshooting 29
  • 30. Grant Fritchey | www.ScaryDBA.com Stored procedures limitations  Unsupported » Temporary stored procedures » Numbered stored procedures » Extended stored procedures » CLR stored procedures » Encryption » Replication » Table-valued parameters » Read-only parameters » Default parameters » Execution contexts » RETURN statement 30
  • 31. Grant Fritchey | www.ScaryDBA.com View Limitations  Schema binding  No data manipulation through view  No temporary tables  No support for EXPAND/NOEXPAND  No indexed views 31
  • 32. Grant Fritchey | www.ScaryDBA.com General Notes  Cursurs are not supported » UseWHILE  Transaction isolation level is limited to READ_UNCOMMITTED  No SELECT or UPDATE for variable assignment » Instead SET @i = (SELECT count(*) FROM dbo.Table) 32
  • 33. Grant Fritchey | www.ScaryDBA.com DATA IMPORT MECHANISMS AzureSQL DataWarehouse 33
  • 34. Grant Fritchey | www.ScaryDBA.com Import Processes  Azure Data Factory  SSIS  Polybase  3rd Party 34
  • 35. Grant Fritchey | www.ScaryDBA.com Azure Data Factory  Currently single core through control node » Can use Polybase  Reads from » Azure blob storage » Azure SQL Database » On-premises SQL Server » SQL ServerVM in Azure  Requires software installations locally to On-Premise andVMs  Second slowest method (unless Polybase is used) 35
  • 36. Grant Fritchey | www.ScaryDBA.com SSIS  Single core through control node only  Include retry logic  Increase timeout, radically  Use “all or nothing” load processing  Parallel loads from multiple SSIS can help  Slowest method according to Microsoft 36
  • 37. Grant Fritchey | www.ScaryDBA.com Polybase  Supports delimted file and Hadoop  Supports compressed files » Gzip,zlab, snappy  Single compressed file per reader, for better performance, multiple compressed files scaled for DWU  Compressed files load slower, but upload faster  Single operation  Load speed increases with scale » Readers increase » Writers increase 37
  • 38. Grant Fritchey | www.ScaryDBA.com 3rd Party 38
  • 39. Grant Fritchey | www.ScaryDBA.com Data Loading Tips  Network bandwidth must be considered unless the load is all done withinAzure » Express Route, paid access, can help  Memory affects columnstore, so use more memory for load processes  Fixed length file format not currently supported by Polybase  Remember, it’s all a balancing act between upload speed & import speeds  100k chunks to get data onto compressed segments in columnstore 39
  • 40. Grant Fritchey | www.ScaryDBA.com TOOLING AzureSQL DataWarehouse 40
  • 41. Grant Fritchey | www.ScaryDBA.com Available Tools  Azure Portal  Visual Studio  SQL Server Management Studio  PowerShell 41
  • 42. Grant Fritchey | www.ScaryDBA.com 42
  • 43. Grant Fritchey | www.ScaryDBA.com MAINTENANCE AzureSQL DataWarehouse 43
  • 44. Grant Fritchey | www.ScaryDBA.com SQL Server  Index Maintenance » But not for defragmentation  Statistics maintenance  Monitoring  Backups » Managed for you, just monitor 44
  • 45. Grant Fritchey | www.ScaryDBA.com Statistics  No automatic creation  No automatic update » Update after data loads » Update after data modification » If either of the above doesn’t change data distribution, don’t update the statistics  Target columns » JOIN » GROUP BY » ORDER BY » WHERE » HAVING  Syntax is the same as SQL Server 45
  • 46. Grant Fritchey | www.ScaryDBA.com DBCC SHOW_STATISTICS()  Limits » No undocumented features » No stats_stream » Square brackets not supported » Cannot use column names to identify stats — Must use the stats name 46
  • 47. Grant Fritchey | www.ScaryDBA.com Monitoring  Portal  Dynamic ManagementViews » Sys.pdw_loader_backup_runs » Sys.dm_pdw_exec_sessions » Sys.dm_pdw_exec_requests » Sys.dm_pdw_request_steps » Sys.dm_pdw_sql_requests » Sys.dm_pdw_dms_workers » Sys.dm_pdw_waits  DBCC » PDW_SHOWEXECUTIONPLAN » PDW_SHOWSPACEUSED 47
  • 48. Grant Fritchey | www.ScaryDBA.com Microsoft Marketing Slide 48
  • 49. Grant Fritchey | www.ScaryDBA.com Resources  Microsoft Documentation  Azure Data Platform Learning Resources  Grant Fritchey  ColumnstoreArchitecture  Troubleshooting  CreatingArtificial KeyValues 49
  • 50. Grant Fritchey | www.ScaryDBA.com Goals  Understand the basic infrastructure and architecture behindAzure SQL Data Warehouse  Learn different methods of design, querying, and data migration in order to begin an implementation ofAzure SQL Data Warehouse  Investigate the tooling available in support of automation and monitoring around Azure SQL Data Warehouse
  • 51. Grant Fritchey | www.ScaryDBA.com Get in touch Grant Fritchey scarydba.com [email protected] @gfritchey
  • 52. Grant Fritchey | www.ScaryDBA.com Most useful docs  https://azure.microsoft.com/en-us/documentation/articles/sql-data- warehouse-best-practices/  https://azure.microsoft.com/en-us/documentation/articles/sql-data- warehouse-tables-index/#causes-of-poor-columnstore-index-quality  https://azure.microsoft.com/en-us/documentation/articles/sql-data- warehouse-tables-distribute/ 52