0% found this document useful (0 votes)
50 views26 pages

Data50 2020 02 - Feb 09

Uploaded by

etest2272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views26 pages

Data50 2020 02 - Feb 09

Uploaded by

etest2272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

aka.

ms/DATA50 #MSIgniteTheTour
Optimize data warehousing
query performance
Speaker name
Title

aka.ms/DATA50 #MSIgniteTheTour
Resources

Session Resources Hub


aka.ms/DATA50

Session Code on GitHub


aka.ms/DATA50Repo

All Event Session Resources


aka.ms/mymsignitethetour

aka.ms/DATA50 #MSIgniteTheTour
Agenda

What is Azure Synapse Analytics

Maximizing Performance

Query Performance Tuning

aka.ms/DATA50 #MSIgniteTheTour
Agenda

What is Azure Synapse Analytics

Using Polybase to Load Data in a data warehouse

Data Loading best practices

aka.ms/DATA50 #MSIgniteTheTour
What is Azure Synapse
Analytics?

aka.ms/DATA50 #MSIgniteTheTour
Azure Synapse
Analytics

A limitless analytics service with


unmatched time to insight,
that delivers insights from all your
data, across data warehouses
and big data analytics systems,
with blazing speed

aka.ms/DATA50 #MSIgniteTheTour
Data Warehouse Processes

Provision Load Query

Automate workflow via Azure Data Factory

aka.ms/DATA50 #MSIgniteTheTour
Data Warehouse Architecture

Compute Node Compute Node Compute Node

0110101010101 0110101010101 0110101010101


0101011010101 0101011010101 0101011010101
1101010101011 1101010101011 1101010101011
0 Compute Node 0 Compute Node 0 Compute Node

0110101010101 0110101010101 0110101010101


Control 0101011010101 0101011010101 0101011010101
Node 1101010101011 1101010101011 1101010101011
aka.ms/DATA50 0 0 0 #MSIgniteTheTour
Data warehouse performance in Azure Synapse
Analytics
Query performance
Data preparation

Read data from Load processed


files using DBFS data into tables
optimized for
Azure Databricks analytics

Data ingestion Data storage


Load flat files Applications
into data lake
on a schedule
Visualize
Load into
Azure Data
Logs, files, and media Azure Storage/ SQL DW
Factory
(unstructured) Data Lake Store tables
Azure Synapse Analytics

Power BI
Dashboards

Serving

Business and custom


apps (structured)
Applications Extract and
manage their Azure Data
SQL DB transform
transactional Factory
relational data
data directly
Transactional storage Data prep.

aka.ms/DATA50 #MSIgniteTheTour
Maximizing Performance

aka.ms/DATA50 #MSIgniteTheTour
Maximizing Query Performance
Table distribution

Round Robin Hash Distributed Replicated


Tables Tables Tables

aka.ms/DATA50 #MSIgniteTheTour
Maximizing Query Performance
Is the default option for newly created
Round-robin distribution
tables

Evenly distributes the data across the


Round-robin available compute nodes in a random
Tables manner, giving an even distribution of
data across all nodes

Loading into Round-robin tables is fast

Queries on Round-robin tables may


require more data movement as data is
“reshuffled” to organize the data for the
query

Great to use for loading staging tables

aka.ms/DATA50 #MSIgniteTheTour
Maximizing Query Performance
Hash distribution
Distributes rows based on the value in the
distribution column, using a deterministic
hash function to assign each row to one
Hash Distributed distribution.
Tables
Is designed to achieve high performance
for queries that run against large fact
tables in a star schema.

Choosing a good distribution column is


important to ensure the hash distribution
performs well

As a starting point, use on tables that are


greater than 2GB in size and has frequent
inserts, updates and deleted

But don’t choose a volatile column for the


hash distributed column
aka.ms/DATA50 #MSIgniteTheTour
Maximizing Query Performance
A full copy of a table is placed on every
Replicated Table
single compute node to minimize data
movement

Replicated Works well for dimension tables in a star


Tables schema that are less than 2GB in size
and are used regularly in queries with
simple predicates

Should not be used on dimension tables


that are updated on a regular basis

You can convert existing round-robin


tables to replicated tables to take
advantage of the feature using a CTAS
statement

aka.ms/DATA50 #MSIgniteTheTour
Create statistics after loading
Improve the query performance for
users

Azure Synapse Analytics

Production
Tables

aka.ms/DATA50 #MSIgniteTheTour
Demo:
Query Performance
Tuning

aka.ms/DATA50 #MSIgniteTheTour
Query Performance Tuning

aka.ms/DATA50 #MSIgniteTheTour
Overcomes the 10,000-row limit of DMV’s,
output

Pinpoint and fix queries with plan regression

• View queries which produce multiple plans


• 7-day retention period

Query Data Store • Full query text

A/B Testing with your Azure Synapse Analytics


(SQL DW)

Identify, improve and tune ad hoc queries

• Top hitting queries for performance tuning

aka.ms/DATA50 #MSIgniteTheTour
© Microsoft Corporation
Query Data Store
Dynamic Management Views
VIEW DATABASE
STATE permission
Query Query Text DMVs are in UTC
time zone

Sys.query_store_query Sys.query_query_text
Query_id (PK) Query_test_id (PK)

Plan
Sys.query_store_plan
Plan_id (PK)

Runtime stats Runtime stats interval


Sys.query_store_runtime_stats Sys.query_store_runtime_stats_interval
Runtime_stats_id (PK) Runtime_stats _interval_id (PK)

aka.ms/DATA50 #MSIgniteTheTour
Query execution with Query Data Store CONTROL
1
Queries

5
Engine QDS

Flush to disc every 15 minutes seconds 3


2
Shell DB DMS

10GB is the max storage size


4

Retention period is 7 days

Compute Compute Compute Compute


Maximum plans per query is 200
DMS DMS DMS DMS

SQL DB SQL DB SQL DB SQL DB


Dist_DB_ Dist_DB_ Dist_DB_ Dist_DB_
1 15 31 46
Dist_DB_ Dist_DB_ Dist_DB_ Dist_DB_



2 16 32 47

aka.ms/DATA50 Dist_DB_ Dist_DB_ #MSIgniteTheTour


Dist_DB_ Dist_DB_
15 30 45 60
Azure Synapse Analytics recommendations
Recommendation
generation (every 24 hours)
Azure Advisor Recommendation
Blade
Data
skew +
Azure Synapse Replicate
Analytics tables
Recommendation
Telemetry API

Stats

Tempdb

Adaptive
Cache

aka.ms/DATA50 #MSIgniteTheTour
Select the proper table distribution

Detect data skew

In Summary: • Use Query Data store


• Consider changing key columns
• Only as fast as your slowest

Query Performance distribution

Provision additional adaptive cache


capacity

Reduce tempdb contention

Create and update statistics

aka.ms/DATA50 #MSIgniteTheTour
© Microsoft Corporation
/MS Learn alert
Complete interactive learning
exercises, watch videos, and
practice and apply your new
skills.
aka.ms/DATA50MSLearnCollection

aka.ms/DATA50 #MSIgniteTheTour
Resources

Session Resources
aka.ms/DATA50

Session Code on GitHub


aka.ms/DATA50repo

All Event Resources


aka.ms/mymsignitethetour
Get Certified
aka.ms/azuredataengineer

aka.ms/DATA50 #MSIgniteTheTour
Optimize data warehousing
query performance
Speaker name
Title

aka.ms/DATA50 #MSIgniteTheTour

You might also like