SQL Server Query Performance
Tuning Basics
Why query tuning is necessary for
developers?
• Remember your reputation will follow you.
• YES if you want to write sustainable code.
• You don’t want to be known as the developer that cost the
company 3 years of development time to undo their mess.
• A simple mistake can impact entire SQL Server instance
and not just single database
Query execution process
Query execution plan basics
Questions
1. What happens when optimizer under-estimates the memory requirement for a query?
2. What is the effect of incorrect cardinality estimates on query plan?
3. How does optimizer derive cardinality estimates?
4. What is the direction of data flow within a query plan?
5. Execution Plan for DDL statements?
Common operators
Data Access Operators
Lookup is the process of looking up corresponding data rows in the base
table
Understanding Tables and Indexes
Heap
• Table with no clustered index
• Rows are organized as they are inserted
• Pages are not linked in any way to each other
Clustered Index
• The leaf level of a clustered index is the data itself
• When a clustered index is created, data is physically copied and ordered
based on the clustering key
• Order is maintained logically through a doubly linked list (page chain)
Non Clustered Indexes
• The leaf level consists of the index key , any included columns and
• Data row’s bookmark value (either the clustering key if the table is
clustered or the row’s physical RID if the table is a heap).
• Has exactly same number of rows as that of base table
• Included Columns and Covering Indexes
Structure of Clustered/Non-Clustered Index
Guidelines for Clustered Index Key
Clustering key should be
• Unique
• Narrow
• Static
• Ever-increasing (preferably)
• Order of data access/retrieval matters
• Guidelines are not mandatory and not
necessarily applicable for all scenarios
Useful SET options
Display the number of milliseconds required to parse, compile,
SET STATISTICS TIME and execute each statement.
Display information regarding the amount of disk activity
SET STATISTICS IO generated by Transact-SQL statements.
SQL Server Execution Model
DMV sys.dm_os_wait_stats
Data Distribution Statistics
Parameters influencing execution plan
Optimizer refers to statistics (samples of the actual data ) in order to estimate
cardinality
Statistics Concepts
Statistics reduce the amount of data that has to be processed during
optimization. If the optimizer had to scan the actual table or index data for
cardinality estimations, plan generation would be costly and lengthy.
Main reason for statistics is to speed up plan generation.
• Statistics have to be created and maintained, and this requires some resources.
• As statistics are stored separately from the table or index they relate to, some
effort has to be taken to keep them in sync with the original data.
• Statistics reduce, or summarize, the original amount of information. Due to
this, all statistical information involves a certain amount of uncertainty
Whenever a distribution statistic no longer reflects the source data, the optimizer
may make wrong assumption about cardinalities, which in turn may lead to poor
execution plans.
Contents of statistics
DBCC SHOW_STATISTICS(<table_name>, <stats_name>)
sys.dm_db_stats_properties
Density
1/(distinct values) for the columns comprising the statistics
high density –> less unique data
What is the density of a column that contains only a single value
repeated in every row?
Histogram Consists of a sampling of data distribution for a column or
an index key (or the first column of a multi-column index)
of up to 200 rows/steps
RANGE_HI_KEY A key value showing the upper boundary of a
histogram step.
RANGE_ROWS Specifies how many rows are inside the range (they
are smaller than this RANGE_HI_KEY, but bigger
than the previous smaller RANGE_HI_KEY).
EQ_ROWS Specifies how many rows are exactly equal to
RANGE_HI_KEY.
AVG_RANGE_ROWS Average number of rows per distinct value inside
the range.
DISTINCT_RANGE_ROWS Specifies how many distinct key values are inside
this range (not including the previous key before
RANGE_HI_KEY and RANGE_HI_KEY itself);
Automatic statistics update thresholds
Manual updates
• UPDATE STATISTICS
• sp_updatestats
Filtered statistics New in SQL Server 2008
Index fragmentation
Fragmentation Any condition which causes more than the optimal amount of disk I/O
to be performed in accessing a table or index
Internal Fragmentation When pages are less than fully used, the part of each
page that is unused constitutes a form of fragmentation, since the table’s or index’s
rows are no longer packed together as tightly as they could be.
External Fragmentation
Logical Fragmentation Logical fragmentation occurs when logical ordering of pages,
which is based on the key value, does not match the physical ordering inside the data
file.
Extent Fragmentation Extent fragmentation occurs when the extents of a table or
index are not contiguous within the database leaving extents from one or more indexes
intermingled in the file.
DMF sys.dm_db_index_physical_stats
avg_fragmentation_in_percent: This is a percentage value that represents external
fragmentation. For a clustered table and leaf level of index pages, this is Logical
fragmentation, while for heap, this is Extent fragmentation. The lower this value, the
better it is.
avg_page_space_used_in_percent: This is an average percentage use of pages that
represents to internal fragmentation. Higher the value, the better it is.
Physical Join Operators
http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/
http://technet.microsoft.com/en-in/library/ms189313(v=sql.105).aspx
Parallelism Operators
Noteworthy Patterns in Query Plan
https://www.simple-talk.com/sql/learn-sql-server/operator-of-the-week---spools,-eager-spool/
• Data Distribution Statistics Query Plan Quality
• Density and Histogram
• Literals, Variables, Parameters
• Parameter Sniffing
• Hints
• OPTIMIZE FOR (specific value/UNKNOWN) hint
• RECOMPILE
• Limitations of Statistics
• Filtered Statistics