0% found this document useful (0 votes)
2 views

SQLDEV320A WEEK10-1

Uploaded by

adams.radiy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SQLDEV320A WEEK10-1

Uploaded by

adams.radiy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Instructor:

SQL Server boB Taylor


Developme [email protected]
nt m
SQLDEV
320 A
Spring
2021 MCA, MCM, MCSM,
Week 10 MCSE, MCSD, MCT,
Data Scientist
TODAY REVIEW
• RECAP WEEK 9
• ASSIGNMENT REVIEW
• MEET RANDAL ROOT (TBD) ~7:30

QUERY EXECUTION LIFECYCLE

STATISTICS & ACTIVITY


MONITORING

QUERY PLAN ANALYSIS


Query Execution Lifecycle

•Parse SELECT statement into logical units: keywords, expressions, operators, and
identifiers.
•Build query tree describing the logical steps needed to transform the source data into
the
• format required by the result set.
•The query optimizer analyzes different ways the source tables can be accessed. It then
selects the series of steps that returns the results fastest while using fewer resources. The
query tree is updated to record this exact series of steps. The final, optimized version of the
query tree is called the execution plan.
•The relational engine starts executing the execution plan. As the steps that require data from
the base tables are processed, the relational engine requests that the storage engine pass
up data from the rowsets requested from the relational engine.
•The relational engine processes the data returned from the storage engine into the format
• defined for the result set and returns the result set to the client.
Query Query compilation is the process of choosing a
good enough execution plan that the Optimizer
Compilatio has to act in the short amount of time
Parse a query into a tree representation
n Normalize and validate the query
Evaluate possible query plans
Pick a good enough plan, based on cost
Query execution is the process of
executing the plan that is created
during query compilation and
Query optimization
Not necessarily performed directly
Executio after query compilation
May trigger a query recompilation
n Compilation versus recompilation
Query recompiles may occur because
of correctness-related reasons or
plan optimality-related reasons
Query Plans and Execution
Contexts
Parameter A =
?
Parameter B = Query Plan
?
User = ?
Parameter A = Parameter A = Parameter A =
12 100 11
Parameter B = Parameter B = Parameter B =
‘xy’ ‘ftr’ ‘sd’
User = Jorge User = Nabil User = Walter
Execution Context Execution Context Execution Context
1 2 3
Compilation and Execution
Overview Plan in
YES
Cache?

Parse/Normalize Recompilation Compile T-SQL

Test the Execution Plan


Compile T-SQL
for correcteness-
related reasons
Load all relevant
statistics from disk to
memory NO
Is plan correct?
Recompilation

YES
Any outdated Update statistics
statistics? YES
one by one Compare recompilation Wait for memory grant
(optionally thresholds with table scheduler to OK
NO
asynchronously) cardinalities or modification request
counters
Optimize SQL
Statements Open (activate) plan
NO

Based on comparison,
Generate the Execution Plan. any outdated statistics?
Save recompilation thresholds YES Run plan to completion
of all referenced tables in the
query with the Execution Plan

Place plan in Cache


First Stage – Compilation • Dissecting and
transforming your SQL
Parse statement statements into compiler-
ready data structures
• Also includes validation of
Create an algebrized the syntax to ensure that it
tree
is legal

Normalize tree

YES Convert algebrized tree


to query graph
Is DML?

• Object binding, which includes


verifying that the tables and Optimize query graph
columns exist, and expanding
the views
• Loading the metadata NO, DDL or Utility Produce a query plan
information
• Syntax-based optimizations
Second Stage – Optimization
Stage 1 - Trivial Plan Stage 2 - Simplification

• Non a cost-based optimizer • Cost-based optimizer if previously


• Statistics are loaded and validated unsuccessful
at this stage • Has three phases:
• This step generates plans for • Phase 0 - Transaction Processing
which there are no alternatives • Phase 1 - Quick Plan
that require a cost-based decision • Phase 2 - Full Optimization

• Example: INSERT statement with a


VALUE has only one possible plan
Second Stage – Optimization

Query Optimizer may use


various inputs (for example,
Evaluates the cost of various
statistics and parameterized If it gets it wrong – you get
plan alternatives and gives
values) that is related to what is perceived as an
you the best one, based on
density/selectivity and inefficient plan
the provided information
cardinality available to create
the execution plan.

Sources of inefficiency:
• Bad cardinality estimation?
• Look at plan
• Parameter-sensitive plans?
• Dynamic un-parameterized SQL
Server?
• Bad physical database design?
• Missing indexes?
Second Stage – Optimization
Overview
The RT is a mechanism used by SQL Server to determine if
a table has changed enough to force a recompile of a
query plan to determine if a more efficient plan is
available for the current data distribution

Recompilati
on The threshold crossing test is performed to decide

Threshold
whether to recompile a query plan:
| colmodctr(current) – colmodctr(snapshot) |
>= RT

(RT)
If there are no statistics, or nothing is interesting, then
table cardinality is used:

| cardinality(current) – cardinality(snapshot) |
>= RT
Recompilation Threshold
Calculation

Permanent Temporary Table variable With TF2371:


table
If n <= 500, RT = 500 table
If n < 6, RT = 6 RT does not exist RT when colmodctr >
If n > 500, RT = 500 + If 6 <= n <= 500, RT SQRT(table
0.20 * n = 500 cardinality*1000)
If n > 500, RT = 500 +
0.20 * n

n = table rows (cardinality) or colmodctr of the leading column of the statistics


object
The relational engine may need to build a worktable to
perform a logical operation specified in an SQL
statement.

Query
Worktables are internal tables that are used to hold
intermediate results.

Execution Worktables are generated for certain GROUP BY,


ORDER BY, or UNION queries.

Lifecycle:
• For example, if an ORDER BY clause references columns that are not
covered by any indexes, the relational engine may need to generate
a worktable to sort the result set into the order requested.

Worktable Worktables are also sometimes used as spools that


temporarily hold the result of executing a

s part of a query plan.

Worktables are built in tempdb and are dropped


automatically when they are no longer needed.
Query Execution Lifecycle: Execution
Plan
SQL Server execution SQL Server reuses
SQL Server has a
plans have the any existing plan it
pool of memory that
following main finds, saving the
is used to store both
components: overhead of
execution plans and
• Query Plan recompiling the SQL
data buffers.
• Execution Context statement.

If no existing Execution plans SQL Server detects


execution plan remain in the the changes that
exists, SQL Server procedure cache as invalidate an
generates a new long as there is execution plan and
execution plan for enough memory to marks the plan as
the query. store them. not valid.
The query optimizer uses statistics to
create query plans that improve query

Statistics
performance.

& Activity
For most queries, the query optimizer
already generates the necessary statistics
for a high quality query plan; in a few
cases, you need to create additional
statistics or modify the query design for

monitorin
best results.

Statistics for query optimization are


objects that contain statistical information

g:
about the distribution of values in one or
more columns of a table or indexed view.
• The query optimizer uses these statistics to estimate
the

Statistics cardinality, or number of rows, in the


query result.
There are three options that you can set that affect when and how
statistics are created and updated.
These options are set at the database level only.
AUTO_CREATE_STATISTICS
When AUTO_CREATE_STATISTICS, is on, the query optimizer

Statistics
creates statistics on individual columns
in the query predicate, as necessary, to improve cardinality
estimates for the query plan.

& Activity AUTO_UPDATE_STATISTICS


When AUTO_UPDATEE_STATISTICS, is on, the query optimizer
determines when statistics might be

monitoring out-of-date and then updates them when they are used by a
query.
INCREMENTAL
: When ON, the statistics created are per partition statistics.
Setting statistics monitoring

Monitoring
ALTER DATABASE { database_name | CURRENT }
SET
{
AUTO_CREATE_STATISTICS { OFF | ON [ ( INCREMENTAL =
{ ON | OFF } ) ] }
| AUTO_UPDATE_STATISTICS { ON | OFF }
| AUTO_UPDATE_STATISTICS_ASYNC { ON | OFF }
}
Contents of a Query Plan

Query plans include:


How data is accessed
How data is joined
Sequence of operations
Use of temporary worktables and sorts
Estimated rowcounts, iterations, and costs from each step
Actual rowcounts and iterations
How data is aggregated
Use of parallelism
Query execution warnings
Blocking or non-
blocking?

Operator
Requires memory
Characteristi grant?
cs

Order preserving?
Graphical Showplan Flow
Outer Table
5 3 1

2
Inner table

Resultset 1 and 2 are joined using a nested loops join, creating resultset 3
Resultsets 3 and 4 are joined using a hash match join, creating
resultset
Resultsets5 5 and 6 are joined using a nested loops join, creating a
resultset for the Select clause
Query plans are trees. Therefore, any join

Traversin
branch can be as substantial as an entire,
separate query

ga
Examine major sub-branches first by
looking top-down at the outermost joins

Query Examine leaves of the separate significant


sub-branches separately

Plan Total Subtree Cost tells you which branches


are the most costly
Logical JOINs:
• INNER JOIN
Operator • OUTER JOIN (LEFT, RIGHT or
FULL)
s– • CROSS JOIN

Physical Physical JOINs:


Joins • Nested Loop
• Merge
• Hash
Nested Loops Join

Pseudo-code:
for each row R1 in the outer table
begin
for each row R2 in the inner table
if R1 joins with R2
return (R1, R2)
if R1 did not join
return (R1, NULL)
end
Nested Loops Join
SELECT *
FROM Production.WorkOrder
INNER JOIN Production.WorkOrderRouting
ON Production.WorkOrder.WorkOrderID =
Production.WorkOrderRouting.WorkOrderID
WHERE Production.WorkOrderRouting.ModifiedDate =
CAST('2005-08-01' AS DATETIME);
Merge Join

Pseudo-code:
for each row R1 in the outer table
begin
for each row R2 in the inner table
if R1 joins with R2
return (R1, R2)
if R1 did not join
return (R1, NULL)
end
Merge Join
SELECT *
FROM Production.WorkOrder
INNER JOIN Production.WorkOrderRouting
ON Production.WorkOrder.WorkOrderID =
Production.WorkOrderRouting.WorkOrderID
WHERE Production.WorkOrderRouting.ModifiedDate >
CAST('2005-08-01' AS DATETIME);
Hash Join

Pseudo-code:
for each row R1 in the outer table
begin
for each row R2 in the inner table
if R1 joins with R2
return (R1, R2)
if R1 did not join
return (R1, NULL)
end
Hash Join
SELECT FirstName, LastName, EmailAddress
FROM Person.Person
INNER JOIN Person.EmailAddress ON
Person.EmailAddress.BusinessEntityID =
Person.Person.BusinessEntityID
WHERE Person.EmailAddress.ModifiedDate > CAST('2005-08-01'
AS DATETIME);
There are three types of hash joins:

• In-memory
• Build phase the hash table fits completely in
memory
• Grace
Hash Join • Build phase hash table does not fit completely in
memory and spills to the disk (worktable in
tempdb)
• Recursive
• Build phase hash table is very large and have to
use many levels of merge joins and hash
partitioning
The term hash bailout is sometimes
used to describe grace hash joins or
recursive hash joins
Seek Operators
•Index Seek

•Clustered Index Seek


The Clustered Index Seek operator uses the ability of indexes to
retrieve rows from a clustered index

The Index Seek operator uses the seeking ability of indexes to


retrieve rows from a nonclustered index. The Argument column
contains the name of the nonclustered index being used
Lookup Operators
Row Identifier (RID) Lookup
Key Lookup

RID Lookup is a bookmark lookup on a heap using a supplied row


identifier (RID) to look up the row in the table and the name of the
table in which the row is looked up. That operator is always
accompanied by a NESTED LOOP JOIN

The Key Lookup operator is a bookmark lookup on a table with a


clustered index. Key Lookup is always accompanied by a Nested
Loops operator. That operator may indicate that the query might
benefit from a covering index
Scan Operators
Table Scan
Clustered Index Scan
Index Scan
Columnstore Index Scan
Other Notable Operators
Table Spool

Index Spool

Sort

Stream Aggregation

Hash Match (Aggregation)


Tips to Optimize Query Plans
Review and Optimize Joins
• Nested loops for small inputs
Review SCANS and RID Lookups
• Remove them by using covering indexes
Review Key Lookups
• Eliminate using covering indexes
Review Warnings
• For example in this case the operator,
indicated a Plan Affecting Convert

Other Considerations
• Eliminate Spools
• Rewind and Rebinds
Join Optimization

1 2 3 4 5 6

Limit the Limit the Create Join on mostly Join on Avoid using
number of number of Indexes on Unique columns with SELECT *
Joins rows to be Join Columns Columns the same data
joined type
Join Optimization (continued)
Avoid negative
This introduces additional contention, because it often results in
logic, such as !=,
evaluation of each row (index scans)
<>, NOT (…)

Do not use ORDER


BY, unless it is If covered by an index, make the index sort by the desired order
required

LIKE operator
leading wildcards
If you must use LIKE, make the first character a literal
almost always
causes a table scan
Stored Procedure Optimization

Use SET
NOCOUNT ON Beware of
Return only
This still increments widely
Always the columns
@@ROWCOUNT varying
validate required in a
function parameter
Prevents sending the parameters Select (avoid
inputs, which
DONE_IN_PROC early in the the
can lead to
message for each code construction
statement in an stored parameter
Select *)
procedure sniffing issues
User Defined Function (UDF) Optimization
These harmless Options to replace
constructs can be truly UDFs include:
detrimental to Considering inline expressions
performance, because for simple functions
they are called for Considering derived tables if
every row of the result possible
set

If you are using UDFs


that do not access
Can also be obfuscated data, make sure you
from the query plan specify the
SCHEMABINDING
option during creation
Query Plan
Analysis
•Demo
Don’t forget
your class
evaluation
!!
Week 10
Assignment

NOTHING
!

You might also like