Sybase - SQL Server Performance and Tuning Guide
Sybase - SQL Server Performance and Tuning Guide
Contributing authors: Server Publications Group, Learning Products Group, and Product Performance Group
Document Orders
To order additional documents, U.S. and Canadian customers should call
Customer Fulfillment at (800) 685-8225, fax (617) 229-9845.
Customers in other countries with a U.S. license agreement may contact Customer
Fulfillment via the above fax number. All other international customers should
contact their Sybase subsidiary or local distributor.
Upgrades are provided only at regularly scheduled software release dates.
Copyright © 1989–1996 by Sybase, Inc. All rights reserved.
No part of this publication may be reproduced, transmitted, or translated in any
form or by any means, electronic, mechanical, manual, optical, or otherwise,
without the prior written permission of Sybase, Inc.
Sybase Trademarks
APT-FORMS, Data Workbench, DBA Companion, Deft, GainExposure, Gain
Momentum, Navigation Server, PowerBuilder, Powersoft, Replication Server,
S-Designor, SQL Advantage, SQL Debug, SQL SMART, SQL Solutions, SQR,
SYBASE, the Sybase logo, Transact-SQL, and VQL are registered trademarks of
Sybase, Inc. ADA Workbench, AnswerBase, Application Manager, APT-Build,
APT-Edit, APT-Execute, APT-Library, APT-Translator, APT Workbench, Backup
Server, Bit-Wise, Client-Library, Configurator, Connection Manager, Database
Analyzer, DBA Companion Application Manager, DBA Companion Resource
Manager, DB-Library, Deft Analyst, Deft Designer, Deft Educational, Deft
Professional, Deft Trial, Developers Workbench, DirectCONNECT, Easy SQR,
Embedded SQL, EMS, Enterprise Builder, Enterprise Client/Server, Enterprise
CONNECT, Enterprise Manager, Enterprise SQL Server Manager, Enterprise Work
Architecture, Enterprise Work Designer, Enterprise Work Modeler, EWA,
ExElerator, Gain Interplay, Gateway Manager, InfoMaker, Interactive Quality
Accelerator, Intermedia Server, IQ Accelerator, Maintenance Express, MAP, MDI,
MDI Access Server, MDI Database Gateway, MethodSet, Movedb, Navigation
Server Manager, Net-Gateway, Net-Library, New Media Studio, ObjectCONNECT,
OmniCONNECT, OmniSQL Access Module, OmniSQL Gateway, OmniSQL
Server, OmniSQL Toolkit, Open Client, Open Client CONNECT, Open
Client/Server, Open Client/Server Interfaces, Open Gateway, Open Server, Open
Server CONNECT, Open Solutions, PC APT-Execute, PC DB-Net, PC Net Library,
Powersoft Portfolio, Powersoft Professional, Replication Agent, Replication
Driver, Replication Server Manager, Report-Execute, Report Workbench, Resource
Manager, RW-DisplayLib, RW-Library, SAFE, SDF, Secure SQL Server, Secure SQL
Toolset, SKILS, SQL Anywhere, SQL Code Checker, SQL Edit, SQL Edit/TPU, SQL
Server, SQL Server/CFT, SQL Server/DBM, SQL Server Manager, SQL Server
Monitor, SQL Station, SQL Toolset, SQR Developers Kit, SQR Execute, SQR
Toolkit, SQR Workbench, Sybase Client/Server Interfaces, Sybase Gateways,
Sybase Intermedia, Sybase Interplay, Sybase IQ, Sybase MPP, Sybase SQL Desktop,
Sybase SQL Lifecycle, Sybase SQL Workgroup, Sybase Synergy Program, Sybase
Virtual Server Architecture, Sybase User Workbench, SyBooks, System 10, System
11, the System XI logo, Tabular Data Stream, Warehouse WORKS, Watcom SQL,
web.sql, WebSights, WorkGroup SQL Server, XA-Library, and XA-Server are
trademarks of Sybase, Inc.
All other company and product names used herein may be trademarks or
registered trademarks of their respective companies.
Restricted Rights
Use, duplication, or disclosure by the government is subject to the restrictions set
forth in subparagraph (c)(1)(ii) of DFARS 52.227-7013 for the DOD and as set forth
in FAR 52.227-19(a)-(d) for civilian agencies.
Sybase, Inc., 6475 Christie Avenue, Emeryville, CA 94608.
Table of Contents
About This Book
Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
How to Use This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii
Formatting SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii
SQL Syntax Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix
Obligatory Options {You Must Choose At Least One} . . . . . . . . . . xxxix
Optional Options [You Don’t Have to Choose Any]. . . . . . . . . . . . xxxix
Ellipsis: Do It Again (and Again)... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl
Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli
If You Need Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli
3. Data Storage
Performance and Object Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
Major Performance Gains Through Query Optimization. . . . . . . . . . . . . 3-1
Query Processing and Page Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
SQL Server Data Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Row Density on Data Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Linked Data Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Text and Image Pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Additional Page Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Global Allocation Map (GAM) Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Allocation Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
vi Table of Contents
Sybase SQL Server Release 11.0.x
x Table of Contents
Sybase SQL Server Release 11.0.x
xx Table of Contents
Sybase SQL Server Release 11.0.x
Glossary
Index
Audience
This manual is intended for:
• Sybase® System Administrators
• Database designers
• Application developers
Related Documents
SQL Server relational database management system documentation
is designed to satisfy both the inexperienced user’s preference for
simplicity and the experienced user’s desire for convenience and
comprehensiveness. The user’s guide and the reference manuals
address the various needs of end users, database and security
administrators, application developers, and programmers.
Other manuals you may find useful are:
• SQL Server installation and configuration guide, which describes
the installation procedures for SQL Server and the operating
system-specific system administration, security administration,
and tuning tasks.
• SQL Server Reference Manual, which contains detailed information
on all of the commands and system procedures discussed in this
manual.
xxxvi
Sybase SQL Server Release 11.0.x Conventions
Conventions
new line. Clauses that have more than one part extend to additional
lines, which are indented.
Key Definition
command Command names, command option names, utility
names, utility flags, and other keywords are in
bold Courier in syntax statements, and in bold
Helvetica in paragraph text.
variable Variables, or words that stand for values that you fill
in, are in italics.
{ } Curly braces indicate that you choose at least one of
the enclosed options. Do not include braces in your
option.
[ ] Brackets mean choosing one or more of the enclosed
options is optional. Do not include brackets in your
option.
( ) Parentheses are to be typed as part of the command.
| The vertical bar means you may select only one of
the options shown.
, The comma means you may choose as many of the
options shown as you like, separating your choices
with commas to be typed as part of the command.
xxxviii
Sybase SQL Server Release 11.0.x Conventions
(3 rows affected)
Case
• Curly Braces and Vertical Bars: Choose one and only one option.
{die_on_your_feet | live_on_your_knees |
live_on_your_feet}
• Curly Braces and Commas: Choose one or more options. If you
choose more than one, separate your choices with commas.
{cash, check, credit}
An ellipsis (...) means that you can repeat the last unit as many times
as you like. In this syntax statement, buy is a required keyword:
buy thing = price [cash | check | credit]
[, thing = price [cash | check | credit]]...
You must buy at least one thing and give its price. You may choose a
method of payment: one of the items enclosed in square brackets.
You may also choose to buy additional things: as many of them as
you like. For each thing you buy, give its name, its price, and
(optionally) a method of payment.
Expressions
Usage Definition
expression Can include constants, literals, functions, column
identifiers, variables or parameters
logical expression An expression that returns TRUE, FALSE or UNKNOWN
constant An expression that always returns the same value, such as
expression “5+3” or “ABCDE”
float_expr Any floating-point expression or expression that implicitly
converts to a floating value
integer_expr Any integer expression, or an expression that implicitly
converts to an integer value
numeric_expr Any numeric expression that returns a single value
char_expr Any expression that returns a single character-type value
binary_expression An expression that returns a single binary or varbinary value
xl
Sybase SQL Server Release 11.0.x Examples
Examples
Many of the examples in this manual are based on a database called
pubtune. The database schema is the same as the pubs2 database, but
the tables used in the examples have more rows: titles has 5000,
authors has 5000, and titleauthor has 6250. Different indexes are
generated to show different features for many examples, and these
indexes are described in the text.
The pubtune database is not provided. Since most of the examples
show the results of commands such as set showplan or set statistics io,
running the queries in this manual on pubs2 tables will not produce
the same I/O results, and in many cases, will not produce the same
query plans.
xlii
1 Introduction to Performance
Analysis 1.
Response Time
Response time is the time that a single task takes to complete. You
can shorten response time by:
• Reducing contention and wait times, particularly disk I/O wait
times
• Using faster components
• Reducing the amount of time the resources are needed
In some cases, SQL Server is also optimized to reduce initial response
time, that is, the time it takes to return the first row to the user. This
is especially useful in applications where a user may retrieve several
rows with a query, but then browse through them slowly with a
front-end tool.
Throughput
What Is Tuning?
Tuning is optimizing performance. A system model of SQL Server
and its environment can be used to identify performance problems at
each layer.
Response
Access manager Data
cache
Procedure
cache
Data tables
Transaction
Indexes log
System
procedures
Tuning Levels
SQL Server and its environment and applications can be broken into
components, or tuning layers, in order to isolate certain components
of the system for analysis. In many cases, two or more layers must be
tuned to work optimally together.
In some cases, removing a resource bottleneck at one layer can reveal
another problem area. On a more optimistic note, resolving one
Application Layer
The majority of this guide describes tuning queries and the majority
of your efforts in maintaining high SQL Server performance will
involve tuning the queries on your server.
Issues at the application layer include the following:
• Decision support vs. online transaction processing (OLTP)
require different performance strategies
• Transaction design can reduce concurrency, since long
transactions hold locks, and reduce the access of other users to
the data
• Referential integrity requires joins for data modification
• Indexing to support selects increases time to modify data
• Auditing for security purposes can limit performance
Options at the application layer include:
Database Layer
Devices Layer
Network Layer
Virtually all users of SQL Server access their data via the network.
Major issues with the network layer are:
• The amount of network traffic
• Network bottlenecks
• Network speed
Options include:
• Configuring packet sizes to match application needs
• Configuring subnets
Hardware Layer
Database Design
Real world
data requirements
Entities
Relationships
Attributes
(Third Normal Form)
Relational Model
(Third Normal Form)
Tables
Columns
Indexes
Physical
Keys
implementation
Views
Referential
integrity
Triggers
Segments
Data access
requirements
DBMS
constraints
Normalization
When a table is normalized, the non-key columns depend on the key,
the whole key, and nothing but the key.
From a relational model point of view, it is standard to have tables
that are in Third Normal Form. Normalized physical design
provides the greatest ease of maintenance, and databases in this form
are clearly understood by teams of developers.
However, a fully normalized design may not always yield the best
performance. It is recommended that you design for Third Normal
Form, and then, as performance issues arise, denormalize to solve
them.
Levels of Normalization
3NF
2NF
1NF
Not normalized
Benefits of Normalization
Employee
emp_num emp_lname dept_no Repeating group
Employee Emp_dept
emp_num emp_lname emp_num dept_no
10052 Jones 10052 A10
10101 Sims 10052 C66
10101 D60
Figure 2-4: Correcting first normal form violations by creating two tables
Primary key
Emp_dept Dept
emp_num dept_no dept_no dept_name
10052 A10 A10 accounting
10074 A10 D60 development
10074 D60
Primary key
Primary key
Figure 2-6: Correcting second normal form violations by creating two tables
Dept
dept_no dept_name mgr_emp_num mgr_lname
A10 accounting 10073 Johnson
D60 development 10089 White
M80 marketing 10035 Dumont
Primary key
Depends on
Depend on nonkey field
primary key
The solution is to split the Dept table into two tables, as shown in
Figure 2-8. In this case, the Employees table, shown in Figure 2-4
already stores this information, so removing the mgr_lname field
from Dept brings the table into Third Normal Form.
Dept (dept_no, dept_name, mgr_emp_num)
Dept
dept_no dept_name mgr_emp_num
A10 accounting 10073
D60 development 10089
M80 marketing 10035
Employee
emp_num emp_lname
10073 Johnson
10089 White
10035 Dumont
Primary key
Figure 2-8: Correcting Third Normal Form violations by creating two tables
Risks of Denormalization
Disadvantages of Denormalization
Denormalization Input
Denormalization Techniques
titleauthor authors
title_id au_id au_id au_lname
join columns
titleauthor authors
title_id au_id au_lname au_id au_lname
Adding derived columns can help eliminate joins and reduce the
time needed to produce aggregate values. The total_sales column in
the titles table of the pubs2 database provides one example of a
derived column used to reduce aggregate value processing time.
The example in Figure 2-11 shows both benefits. Frequent joins are
needed between the titleauthor and titles tables to provide the total
advance for a particular book title.
titleauthor titles
title_id advance title_id title
join columns
titles titleauthor
title_id title sum_adv title_id advance
You can create and maintain a derived data column in the titles table,
eliminating both the join and the aggregate at run time. This
increases storage needs, and requires maintenance of the derived
column whenever changes are made to the titles table.
Collapsing Tables
If most users need to see the full set of joined data from two tables,
collapsing the two tables into one can improve performance by
eliminating the join.
For example, users frequently need to see the author name, author
ID, and the blurbs copy data at the same time. The solution is to
collapse the two tables into one. The data from the two tables must be
in a one-to-one relationship to collapse tables.
join columns
Collapsing the tables eliminates the join, but loses the conceptual
separation of the data. If some users still need access to just the pairs
of data from the two tables, this access can be restored by queries that
select only the needed columns or by using views.
Duplicating Tables
newauthors
au_id au_lname copy
newauthors blurbs
au_id au_lname copy au_id copy
Splitting Tables
Horizontal split
Vertical split
Horizontal Splitting
Authors
Problem: Usually only active
active records are accessed active
inactive
active
inactive
inactive
Inactive_Authors Active_Authors
Vertical Splitting
Vertical table splitting makes even more sense when both of the
above conditions are true. When a table contains very long columns
that are not accessed frequently, placing them in a separate table can
greatly speed the retrieval of the more frequently used columns.
With shorter rows, more data rows fit on a data page, so fewer pages
can be accessed for many queries.
Figure 2-16 shows how the authors table can be partitioned.
Problem: Authors
Frequently access lname and fname, au_id lname fname phone city
infrequently access phone and city
Solution: Partition data vertically
Authors_Frequent Authors_Infrequent
au_id lname fname au_id phone city
titleauthor titles
title_id au_id advance title_id sum_adv
authors
au_id sum_adv
If you use application logic, be very sure that the data integrity
requirements are well documented and well known to all application
developers and to those who must maintain applications.
➤ Note
Using application logic to manage denormalized data is risky. The same
logic must be used and maintained in all applications that modify the data.
Batch Reconciliation
Overhead
Page header
32 Bytes
Page size
2048 bytes
Usable space Data rows
2016 bytes
Page headers use 32 bytes, leaving 2016 bytes for data storage on
each page.1 Information in the page header includes pointers to the
next page and the previous page used by the object, and the object ID
of the table or index using that page.
Each row is stored contiguously on the page. The information stored
for each row consists of the actual column data plus information
such as the row number (one byte) and the number of variable-
length and null columns in the row (one byte).
Rows cannot cross page boundaries, except for text and image
columns. Each data row has at least 4 bytes of overhead; rows that
contain variable-length data have additional overhead. Chapter 5,
“Estimating the Size of Tables and Indexes,” explains overhead in
detail.
The row offset table stores pointers to the starting location for each
data row on the page. Each pointer requires 2 bytes.
1. The maximum number of bytes for a data row is 1960 (plus two bytes of overhead) due to
overhead for logging: the row, plus the overhead about the transaction, must fit on a
single page in the transaction log.
The usable space on a page, divided by the row size, tells us how
many rows can be stored on a page. This figure gives us the row
density. The size of rows can affect your performance dramatically:
the smaller the data rows, the more rows you can store per page.
When rows are small, you’ll need to read fewer pages to answer your
select queries, so your performance will be better for queries that
perform frequent table scans.
5 rows/page 10 rows/page
less dense more dense
header header header
A F A F
B G B G
C H C H
D I D I
E J E J
Page 1 Page 2 Page 1
Extents
Each table and each level of each index forms a doubly-linked list of
pages. Each page in the object stores a pointer to the next page in the
chain and to the previous page in the chain. When new pages need to
be inserted, the pointers on the two adjacent pages change to point to
the new page. When SQL Server scans a table, it reads the pages in
order, following these page pointers.
prev next
Old link
New link
New page to be linked
SQL Server tries to keep the page allocations close together for
objects, as follows:
Text and image columns for a table are stored as a separate page
chain, consisting of a set of text or image pages. If a table has multiple
text or image columns, it still has only one of these separate data
structures. Each table with a text or image column has one of these
page chains. The table itself stores a 16-byte pointer to the first page
of the text value for the row. Additional pages for the value are
linked by next and previous pointers, just like the data pages. The
first page stores the number of bytes in the text value. The last page
in the chain for a value is terminated with a null next-page pointer.
Figure 3-4 shows a table with text values. Each of the three rows
stores a pointer to the starting location of its text value in the
text/image page chain.
Data page
16 bytes End of page chain
998723567 0x00015f... pointers for a value
409567008 0x00015d...
486291786 0x00015a...
Text object
• One for the actual location of the text in the text object
Each text or image page stores up to 1800 bytes. Every non-null value
uses at least one full data page.
Text objects are listed separately in sysindexes. The index ID column,
indid, is always 255, and the name is the table name, prefixed with the
letter “t”.
Each database has a GAM page. It stores a bitmap for all allocation
units of a database, with one bit per allocation unit. When an
allocation unit has no free extents available to store objects, its
corresponding bit in the GAM is set to 1. This mechanism expedites
allocating new space for objects. Users cannot view the GAM; it
appears in the system catalogs as the table sysgams.
Allocation Pages
Each table, index and text chain has one or more OAM pages stored
on pages allocated to the table or index. If a table has more than one
OAM page, the pages are linked in a chain. These OAM pages store
pointers to each allocation unit that contains pages for the object. The
first page in the chain stores allocation hints, indicating which OAM
page in the chain has information about allocation units with free
space. This provides a fast way to allocate additional space for
objects, keeping the new space close to pages already used by the
object.
Each OAM page holds allocation mappings (OAM entries) for 250
allocation units. A single OAM page stores information for 2000 to
63,750 data or index pages.
Each entry in the OAM page stores the page ID of the allocation page
and the number of free and used pages for the object within that
allocation page. If a table is widely spread out across the storage
space for a database so that each allocation unit stores only one
extent (8 pages) for the table, the 250 rows on the OAM page can only
point to 250* 8 = 2000 database pages. If the table is very compactly
stored, so that it uses all 255 pages available in each of its allocation
units, one OAM page stores allocation information for 250 * 255 =
63,750 pages.
Figure 3-5 shows how allocation units, extents, and objects are
managed by OAM pages and allocation pages.
• There are two allocation units shown, one starting at page 0 and
one at page 256. The first page of each is the allocation page.
• A table is stored on four extents, starting at pages 1 and 24 on the
first allocation unit and pages 272 and 504 on the second unit.
• The first page of the table is the table’s OAM page. It points to the
allocation page for each allocation unit where the object uses
pages, so it points to pages 0 and 256.
• Allocation pages 0 and 256 store object IDs and information about
the extents and pages used on the extent. So, allocation page 0
points to page 1 and 24 for the table, and allocation page 256
points to pages 272 and 504. Of course, these allocation pages also
point to other objects stored in the allocation unit, but these
pointers are not shown here.
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
One extent,
8 pages
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
. Pages used
. by object
.
OAM page
248 249 250 251 252 253 254 255
Allocation
page
256 257 258 259 260 261 262 263
Other pages
264 265 266 267 268 269 270 271
When you insert data into a heap, the data row is always added to
the last page of the table. If the last page is full, a new page is
allocated in the current extent. If the extent is full, SQL Server looks
for empty pages on other extents in use by the table. If there are no
available pages, a new extent is allocated to the table.
insert employee
values (17823, "White", "Snow", ...)
New row
When you delete rows from a heap, and there is no useful index, SQL
Server scans all of the data rows in the table to find the rows to delete.
It has no way of knowing how many rows match the conditions in
the query without examining every row.
When a data row is deleted from the page, the rows that follow it on
the page move up so that the data on the page remains contiguous.
Next page pointers
delete from employee
where emp_id = 12854 Page scanning
Deleted rows
Empty space
Before delete:
After delete:
If you delete the last row on a page, the page is deallocated. If there
are other pages on the extent still in use by the table, the page can be
used again by the table when a page is needed. If all other pages on
the extent are empty, the whole extent is deallocated. It can be
allocated to other objects in the database. The first data page for a
table or index is never deallocated.
• If the length of the row changes, and there is enough free space on
the page, the row remains in the same place on the page, but other
rows move up or down to keep the rows contiguous on the page.
The row offset pointers at the end of the page are adjusted to
point to the changed row locations.
• If the row does not fit on the page, the row is deleted from its
current page, and the “new” row is inserted on the last page of
the table. This type of update can cause contention on the last
page of the heap, just as inserts do.
For more information on how updates are performed, see “Update
Operations” on page 7-32.
SQL Server has two major strategies for using its data cache
efficiently:
• LRU Replacement Strategy reads the data pages sequentially into
the cache, replacing a “least recently used” buffer. The buffer is
placed on the MRU end of the data buffer chain. It moves down
the cache toward the LRU end as more pages are read into the
cache.
SQL Server uses this strategy for:
- Statements that modify data on pages
- Pages that are needed more than once by a single query
- OAM pages
- Many index pages
MRU LRU
Wash marker
Clean buffer
Figure 3-9: LRU strategy takes a clean page from the LRU end of the cache
Clean page
Figure 3-10: MRU strategy places pages just before the wash marker
➤ Note
Large I/O on heaps is effective as long as the page chains are not
fragmented. See “Maintaining Heaps” on page 3-19 for information on
maintaining heaps.
Inserts on heaps take place on the last page of the heap table. If an
insert is the first row on a new page for the table, a clean data buffer
is allocated to store the data page, as shown in Figure 3-12. This page
starts to move down the MRU/LRU chain in the data cache as other
processes read pages into memory.
If a second insert to the page takes place while the page is still in
memory, the page is located in cache, and moves back to the top of
the MRU/LRU chain.
Clean page
The changed data page remains in cache until it moves past the wash
marker or until a checkpoint or the housekeeper task writes it to disk.
“The Data Cache” on page 15-7 explains more about these processes.
When you update or delete a row from a heap table, the effects on the
data cache are similar to the process for inserts. If a page is already in
the cache, the whole buffer (a single page, or up to eight pages,
depending on the I/O size) is placed on the MRU end of the chain,
and the row is changed. If the page is not in cache, it is read from the
disk into a clean buffer from the LRU end of the cache. Its placement
Maintaining Heaps
Over time, I/O on heaps can become inefficient. Deletes and
updates:
• Can result in many partially filled pages
• Can lead to inefficient large I/O, since page chains will not be
contiguous on the extents
There are two methods to reclaim space in heaps after deletes and
updates have created empty space on pages or have caused
fragmentation:
• Create and then drop a clustered index
• Use bcp (the bulk copy utility) and truncate table
You can create and drop a clustered index on a heap table in order to
reclaim space if updates and deletes have created many partially full
pages in a heap table. To create a clustered index, you must have free
space in the database of at least 120 percent of the table size. Since the
leaf level of the clustered index consists of the actual data rows of the
table, the process of creating the index makes a complete copy of the
table before it deallocates the old pages. The additional 20 percent
provides room for the root and intermediate index levels. If you use
long keys for the index, it will take more space.
Transaction log
storage is sequential
Indexes are database objects that can be created for a table to speed
direct access to specific data rows. Indexes store the values of the key
or keys that were named when the index was created and logical
pointers to the data pages or to other index pages.
177-32-1176 267-41-2394...
267-41-2394 341-22-1782...
177-32-1176 409-56-7008
756-30-7391
409-56-7008...
427-17-2319...
756-30-7391
527-72-3246...
899-46-2035
756-30-7391...
807-91-6654...
527-72-3246...
Types of Indexes
SQL Server provides two types of indexes:
• Clustered indexes, where the table data is physically stored in the
order of the keys on the index.
• Nonclustered indexes, where the storage order of data in the table
is not related to index keys.
You can create only one clustered index on a table because there is
only one possible physical ordering of the data rows. You can create
up to 249 nonclustered indexes per table.
A table that has no clustered index is called a heap. The rows in the
table are in no particular order, and all new rows are added to the
end of the table. Chapter 3, “Data Storage,” discusses heaps and SQL
operations on heaps.
Index Pages
Index entries are usually much smaller than a data row in a data
page, and index pages are much more densely populated. A data
row might have 200 bytes (including row overhead), so there would
be 10 rows per page. An index on a 15-byte field would have about
100 rows per page (the pointers require 4–9 bytes per row, depending
on the type of index and the index level).
Indexes can have multiple levels:
• Root level
• Leaf level
• Intermediate level
Root Level
The root level is the highest level of the index. There is only one root
page. If the table is very small, so that the entire index fits on a single
page, there are no intermediate levels, and the root page stores
pointers to the data pages. For larger tables, the root page stores
pointers to the intermediate level index pages.
Leaf Level
The lowest level of the index is the leaf level. At the leaf level, the
index contains a key value for each row in the table, and the rows are
stored in sorted order by the index key:
• For clustered indexes, the leaf level is the data.
• For nonclustered indexes, the leaf level contains the index key
values, a pointer to the page where the rows are stored, and a
pointer to the rows on the data page. The leaf level is the level just
above the data.
Intermediate Level
All levels between root and leaf are intermediate levels. An index on
a large table or an index using long keys may have many
intermediate levels. A very small table may not have an intermediate
level; the root pages point directly to the leaf level.
Each level (except the root level) of the index is a page chain: The
page headers contain next page and previous page pointers to other
pages at the same index level.
Pointers between
index levels
Next and previous
page chain pointers
Start or end of
chained pages
Level 2 Level 1 Level 0
Clustered Indexes
In clustered indexes, leaf-level pages are also the data pages. The
data rows are physically ordered by the index key. Physical ordering
means that:
• All entries on a page are in index key order.
• By following the “next page” pointers at the data level, you read
the entire table in index key order.
Page 1127
Hunter
Jenkins
select *
Page 1132
from employees
where lname = "Green" Bennet
Key Pointer Chan
Page 1007 Dull
Bennet 1132 Edwards
Key Pointer Greane 1133
Page 1001 Hunter 1127 Page 1133
Bennet 1007 Greane
Karsen 1009 Green
Smith 1062 Page 1009 Greene
Karsen 1009
Page 1127
Hunter
Jenkins
On the root level page, “Green” is greater than “Bennet,” but less
than Karsten, so the pointer for “Bennet” is followed to page 1007.
On page 1007, “Green” is greater than “Greane,” but less than
“Hunter,” so the pointer to page 1133 is followed to the leaf level
page, where the row is located and returned to the user.
This retrieval via the clustered index requires:
• One read for the root level of the index
• One read for the intermediate level
• One read for the data page
These reads may come either from cache (called a logical read) or
from disk (called a physical read). “Indexes and I/O Statistics” on
page 6-8 provides more information on physical and logical I/O and
SQL Server tools for reporting it. On tables that are frequently used,
the higher levels of the indexes are often found in cache, with lower
levels and data pages being read from disk. See “Indexes and
Caching” on page 4-23 for more details on how indexes use the
cache.
This description covers point queries, queries that use the index key
in the where clause to find a single row or a small set of rows. Chapter
When you insert a row into a table with a clustered index, the data
row must be placed in physical order according to the key value on
the table. Other rows on the data page move down on the page, as
needed, to make room for the new value. As long as there is room for
the new row on the page, the insert does not affect any other pages in
the database.The clustered index is used to find the location for the
new row. Figure 4-5 shows a simple case where there is room on an
existing data page for the new row. In this case, the key values in the
index do not need to change.
insert employees (lname) Page 1132
values ("Greco") Bennet
Key Pointer Chan
Page 1007 Dull
Bennet 1132 Edwards
Key Pointer Greane 1133
Page 1001 Hunter 1127 Page 1133
Bennet 1007 Greane
Karsen 1009 Greco
Smith 1062 Page 1009 Green
Karsen 1009 Greene
Page 1127
Hunter
Jenkins
If there is not enough room on the data page for the new row, a page
split must be performed:
• A new data page is allocated on an extent already in use by the
table. If there is no free page, a new extent is allocated.
Page 1127
Hunter
Jenkins
If a new row needs to be added to a full index page, the page split
process on the index page is similar to the data page split. A new
page is allocated, and half the index rows are moved to the new page.
A new row is inserted at the next highest level of the index to point
to the new index page.
Overflow Pages
Before insert
Page 1133
Greane
Page 1133 Greco
Greane Green
Greco Greene
Green
Greene Overflow data Page 1156
page Greene
Page 1134
Gresham
Gridley
Page 1134
Gresham
Gridley
data pages
The only rows that will be placed on this overflow page are
additional rows with the same key value. In a non-unique clustered
index with many duplicate key values, there can be numerous
overflow pages for the same value.
The clustered index does not contain pointers directly to overflow
pages. Instead, the next page pointers are used to follow the chain of
overflow pages until a value is found that does not match the search
value.
When you delete a row from a table that has a clustered index, other
rows on the page move up to fill the empty space so that data
remains contiguous on the page. Figure 4-8 shows a page with four
rows before a delete removes the second row on the page. The
following two rows move up.
Gree
Page 1127
n
Hunter
Jenkins
extent that belongs to the table, the extent is also deallocated, and it
becomes available for the expansion of other objects in the database.
Page 1127
Hunter
Jenkins
Figure 4-9: Deleting the last row on a page (before the delete)
In Figure 4-10, which shows the table after the delete, the pointer to
the deleted page has been removed from index page 1007 and the
following index rows on the page have been moved up to keep the
space used contiguous.
delete
from employees Page 1133
where lname = "Gridley" Greane
Page 1134
Key Pointer Green
Page 1007 Greane
Bennet 1132
Key Pointer
Greane 1133
Page 1001 Hunter 1127
Bennet 1007 Empty page
available for
Karsen 1009
reallocation
Smith 1062 Page 1009
Karsen 1009
Page 1127
Gridley
Hunter
Gridle
Jenkins
y
Root page Intermediate Data pages
Figure 4-10: Deleting the last row on a page (after the delete)
If you delete a pointer from an index page, leaving only one row on
that page, the row is moved onto an adjacent page, and the empty
page is deallocated. The pointers on the parent page are updated to
reflect the changes.
Nonclustered Indexes
The B-tree works much the same for nonclustered indexes as it does
for clustered indexes, but there are some differences. In nonclustered
indexes:
• The leaf pages are not the same as the data pages.
• The leaf level stores one key-pointer pair for each row in the
table.
• The leaf level pages store the index keys and page pointers, plus
a pointer to the row offset table on the data page. This
combination of page pointer plus the row offset number is called
the row ID, or RID.
• The root and intermediate levels store index keys and page
pointers to other index pages. They also store the row ID of the
key’s data row.
With keys of the same size, nonclustered indexes require more space
than clustered indexes.
Row IDs are managed by an offset table on each data page. The offset
table starts at the last byte on the page. There is a 2-byte offset table
entry for each row on the page. As rows are added, the offset table
grows from the end of the page upward as the rows fill from the top
of the page. The offset table stores the byte at which its
corresponding row on the page starts.
increased. A row offset entry for the new row has been added. Note
that the row offset values are not sequential.
New
row
56 156 96 76 32
When you select a row using a nonclustered index, the search starts
at the root level. In the example in Figure 4-14, “Green” is greater
than “Bennet,” but less than “Karsen,” so the pointer to page 1007 is
followed. “Green” is greater than “Greane,” but less than “Hunter,”
so the pointer to page 1133 is followed. Page 1133 is the leaf page,
showing that the row for “Green” is the second position on page
1421. This page is fetched, the “2” byte in the offset table is checked,
and the row is returned from the byte position on the data page.
When you insert rows into a heap that has a nonclustered index, the
insert goes to the last page of the table. If the heap is partitioned, the
insert goes to the last page on one of the partitions. Then the
nonclustered index is updated to include the new row. If the table
has a clustered index, it is used to find the location for the row. The
clustered index is updated, if necessary, and the nonclustered index
is updated to include the new row.
Figure 4-15 shows an insert into a table with a clustered index. Since
the ID value is 24, the row is placed at the end of the table. A row is
also inserted into the leaf level of the nonclustered index, containing
the row ID of the new values.
insert employees Page 1242
(empid, lname) Key Pointer 10 O’Leary
values(24, "Greco") 11 Ringer
Page 1132
12 White
Bennet 1421,1
Key RowID Pointer 13 Jenkins
Chan 1129,3
Page 1007 Dull 1409,1 Page 1307
Bennet 1421,1 1132 Edwards 1018,5 14 Hunter
Key RowID Pointer Greane 1307,4 1133 15 Smith
Page 1001 Hunter 1307,1 1127 Page 1133 16 Ringer
Bennet 1421,1 1007 Greane 1307,4 17
Karsen 1411,3 1009 Greco 1409,4
Smith 1307,2 1062 Page 1009 Green 1421,2 Page 1421
Karsen 1411,3 1315 Greene 1409,2 18 Bennet
19 Green
Page 1127 20 Yokomoto
Hunter 1307,1
Jenkins 1242,4
Page 1409
21 Dull
22 Greene
23 White
24 Greco
When you delete a row from a table, the query can use a
nonclustered index on the column or columns in the where clause to
locate the data row to delete. The row in the leaf level of the
nonclustered index that points to the data row is also removed. If
there are other nonclustered indexes on the table, the rows on the leaf
level of those indexes are also deleted.
delete employees Page 1242
where lname = "Green" Key Pointer 10 O’Leary
11 Ringer
Page 1132
12 White
Bennet 1421,1
Key RowID Pointer 13 Jenkins
Chan 1129,3
Page 1007 Dull 1409,1 Page 1307
Bennet 1421,1 1132 Edwards 1018,5 14 Hunter
Key RowID Pointer Greane 1307,4 1133 15 Smith
Page 1001 Hunter 1307,1 1127 Page 1133 16 Ringer
Bennet 1421,1 1007 Greane 1307,4 17 Greane
Karsen 1411,3 1009 Greco 1409,4
Smith 1307,2 1062 Page 1009 Green 1421,2 Page 1421
Karsen 1411,3 1315 Greene 1409,2 18 Bennet
20 Yokomoto
Page 1127
Hunter 1307,1
Jenkins 1242,4
Page 1409
21 Dull
22 Greene
Green
Gree
23 White
n
24 Greco
If the delete operation removes the last row on the data page, the
page is deallocated and the adjacent page pointers are adjusted.
References to the page are also deleted in higher levels of the index.
If the delete operation leaves only a single row on an index
intermediate page, index pages may be merged, as with clustered
indexes. See “Index Page Merges” on page 4-13.
There is no automatic page merging on data pages, so if your
applications make many random deletes, you can end up with data
pages that have only a single row, or a few rows, on a page.
Index Covering
Nonclustered indexes can provide a special type of optimization
called index covering. Since the leaf level of nonclustered indexes
contains the key values for each row in a table, queries that access
only the key values can retrieve the information by using the leaf
level of the nonclustered index as if it were the actual data. This is
index covering.
You can create indexes on more than one key, called composite
indexes. Composite indexes can have up to 16 columns adding up to
a maximum 256 bytes.
A nonclustered index that covers the query is faster than a clustered
index, because it reads fewer pages: index rows are smaller, more
rows fit on the page, so fewer pages need to be read.
A clustered index, by definition, is covered. Its leaf level contains the
complete data rows. This also means that scanning at that level (that
is, the entire table) is the same as performing a table scan.
There are two forms of optimization using indexes that cover the
query:
• The matching index scan
• The non-matching index scan
For both types of covered queries, the nonclustered index keys must
contain all of the columns named in the select list and any clauses of
your query: where, having, group by, and order by. Matching scans have
additional requirements. “Choosing Composite Indexes” on page
6-28 describes query types that make good use of covering indexes.
This type of index covering lets you skip the last read for each row
returned by the query, the read that fetches the data page. For point
queries that return only a single row, the performance gain is slight—
just one page. For range queries, the performance gain is larger, since
the covering index saves one read for each row returned by the
query.
In addition to having all columns named in the query included in the
index, the columns in the where clauses of the query must include the
leading column of the columns in the index. For example, for an
index on columns A, B, C, D, the following sets can perform
matching scans: A, AB, ABC, AC, ACD, ABD, AD, and ABCD. The
Figure 4-17: Matching index access does not have to read the data row
When the columns specified in the where clause do not name the
leading column in the index, but all of the columns named in the
select list and other query clauses (such as group by or having) are
included in the index, SQL Server saves I/O by scanning the leaf
level of the nonclustered index, rather than scanning the table. It
• Index pages can cycle through the cache many times, if number of
index trips is configured.
When a query that uses an index is executed, the root, intermediate,
leaf, and data pages are read in that order. If these pages are not in
cache, they are read into the MRU end of the cache and move toward
the LRU end as additional pages are read in.
MRU LRU
D L I R
4
R Root
3
I Intermediate
2 L Leaf
1 D Data
Figure 4-19: Caching used for a point query via a nonclustered index
Each time a page is found in cache, it is moved to the MRU end of the
page chain, so the root page and higher levels of the index tend to
stay in the cache. Figure 4-20 shows a root page moving back to the
top of the cache for a second query using the same index.
R D L I
1
4
3
2 R Root
I Intermediate
L Leaf
D Data
Indexes and the tables they index can use different caches. A System
Administrator or table owner can bind a clustered or nonclustered
index to one cache, and its table to another.
L I R
R Root
I Intermediate
L Leaf
D Data
Root Inter- Leaf Data
mediate
Figure 4-21: Caching with separate caches for data and log
A special strategy keeps index pages in cache. Data pages make only
a single trip through the cache: They are read in at the MRU end or
the cache or placed just before the wash marker, depending on the
cache strategy chosen for the query. Once the pages reach the LRU
end of the cache, the buffer for that page is reused when another page
needs to be read into cache.
Index pages can make multiple trips through the cache, controlled by
a counter. When the counter is greater than 0 for an index page and it
reaches the LRU end of the page chain, the counter is decremented
by one, and the page is placed at the MRU end again.
By default, the number of trips that an index page makes through the
cache is set to 0. A System Administrator can set the configuration
parameter number of index trips. For more information, see “number of
index trips” on page 11-22 of the System Administration Guide.
Importance of Sizing
You should know the size of your tables and indexes, and you should
be able to predict the size of your database objects as your tables
grow. Knowing this information will help you:
• Decide on storage allocation, especially if you use segments
• Decide whether it is possible to improve performance for specific
queries
• Determine the optimum size for named data caches for specific
tables and indexes
SQL Server provides several tools that provide information on
current object size or that can predict future size:
• The system procedure sp_spaceused reports on the current size of
an existing table and any indexes.
• The system procedure sp_estspace can predict the size of a table
and its indexes, given a number of rows as a parameter.
• The output of some dbcc commands report on page usage as well
as performing database consistency checks.
You can also compute the size using formulas provided in this
chapter.
The sp_spaceused and dbcc commands report actual space usage. The
other methods presented in this chapter provide size estimates.
Over time, the effects of data modifications on a set of tables tends to
produce data pages and index pages that average approximately 75
percent full. The major factors are:
• When you insert a row onto a full page of a table with a clustered
index, the page splits, leaving two pages that are about 50 percent
full.
• When you delete rows from heaps or from tables with clustered
indexes, the space used on the page decreases. You can have
pages that contain very few rows or even a single row.
• After some deletes and page splits have occurred, inserting rows
into tables with a clustered index tends to fill up pages that have
been split or pages where rows have been deleted.
Page splits also take place when rows need to be inserted into full
index pages, so index pages also tend to end up being approximately
75 percent full.
Column Meaning
rowtotal Reports an estimate of the number of rows. The value is read from
the OAM page. Though not always exact, this estimate is much
quicker and leads to less contention than select count(*).
reserved Reports pages reserved for use by the table and its indexes. It
includes both the used unused pages in extents allocated to the
objects. It is the sum of data, index_size, and unused.
data Reports the kilobytes on pages used by the table.
Column Meaning
index_size Reports the total kilobytes on pages in use for the indexes.
unused Reports the kilobytes of unused pages in extents allocated to the
object, including the unused pages for the object’s indexes.
If you want to see the size of the indexes reported separately, use this
command:
sp_spaceused titles, 1
index_name size reserved unused
-------------------- ---------- ---------- ----------
title_id_cix 14 KB 1294 KB 38 KB
title_ix 256 KB 272 KB 16 KB
type_price_ix 170 KB 190 KB 20 KB
➤ Note
The “1” in the sp_spaceused syntax indicates that index information should
be printed. It has no relation to index IDs or other information.
Advantages of sp_spaceused
Disadvantages of sp_spaceused
dbcc tablealloc(titles)
The default report option of OPTIMIZED is used for this run.
The default fix option of FIX is used for this run.
***************************************************************
TABLE: titles OBJID = 208003772
INDID=1 FIRST=2032 ROOT=2283 SORT=1
Data level: 1. 864 Data Pages in 109 extents.
Indid : 1. 15 Index Pages in 3 extents.
INDID=2 FIRST=824 ROOT=827 SORT=1
Indid : 2. 47 Index Pages in 7 extents.
TOTAL # of extents = 119
Alloc page 2048 (# of extent=2 used pages=10 ref pages=10)
Alloc page 2304 (# of extent=1 used pages=7 ref pages=7)
Alloc page 1536 (# of extent=25 used pages=193 ref pages=193)
Alloc page 1792 (# of extent=27 used pages=216 ref pages=216)
Alloc page 2048 (# of extent=29 used pages=232 ref pages=232)
Alloc page 2304 (# of extent=28 used pages=224 ref pages=224)
Alloc page 256 (# of extent=1 used pages=1 ref pages=1)
Alloc page 768 (# of extent=6 used pages=47 ref pages=47)
Total (# of extent=119 used pages=930 ref pages=930) in this
database
DBCC execution completed. If DBCC printed error messages,
contact a user with System Administrator (SA) role.
The dbcc report shows output for titles with a clustered index (the
information starting with “INDID=1”) and a nonclustered index.
For the clustered index, dbcc reports both the amount of space taken
by the data pages themselves, 864 pages in 109 extents, and by the
root and intermediate levels of the clustered index, 15 pages in 3
extents.
For the nonclustered index, it reports the number of pages and
extents used by the index.
Notice that some of the allocation pages are reported more than once
in this output, since the output reports on three objects: the table, its
clustered index, and its nonclustered index.
At the end, it reports the total number of extents used by the table
and its indexes. The OAM pages and distribution pages are included.
You can use dbcc indexalloc to display the information for each index
on the table. This example displays information about the
nonclustered index on titles:
dbcc indexalloc(titles, 2)
Advantages of dbcc
Disadvantages of dbcc
Total_Mbytes
-----------------
138.30
Advantages of sp_estspace
Disadvantages of sp_estspace
➤ Note
Do not confuse this figure with the maximum row size, which is 1960 bytes,
due to overhead in other places in SQL Server.
For the most accurate estimate, round down divisions that calculate
the number of rows per page (rows are never split across pages) and
round up divisions that calculate the number of pages.
If your table includes text or image datatypes, use 16 (the size of the
text pointer that is stored in the row) in the calculations below. Then
see “text and image Data Pages” on page 5-26.
If the configuration parameter page utilization percent is set to less than
100, SQL Server may allocate new extents before filling all pages on
the allocated extents. This does not change the number of pages that
an object uses, but leaves empty pages in extents allocated to the
object. See “page utilization percent” on page 11-29 in the System
Administration Guide.
The storage sizes for SQL Server datatypes are shown in the
following table:
Datatype Size
char Defined size
nchar Defined size * @@ncharsize
varchar Actual number of characters
nvarchar Actual number of characters *
@@ncharsize
binary Defined size
varbinary Data size
int 4
smallint 2
tinyint 1
float 4 or 8, depending on precision
double precision 8
real 4
numeric 2–17, depending on precision and scale
decimal 2–17, depending on precision and scale
money 8
smallmoney 4
datetime 8
smalldatetime 4
Datatype Size
bit 1
text 16 bytes + 2K * number of pages used
image 16 bytes + 2K * number of pages used
timestamp 8
The formulas that follow show how to calculate the size of tables and
clustered indexes. If your table does not have clustered indexes, skip
Steps 3, 4, and 5. Once you compute the number of data pages in Step
2, go to Step 6 to add the number of OAM pages.
4 (Overhead)
+ Sum of bytes in all fixed-length columns
= Data row size
4 (Overhead)
+ Sum of bytes in all fixed-length columns
+ Sum of bytes in all variable-length columns
= Subtotal
5 (Overhead)
+ Sum of bytes in the fixed-length index keys
= Clustered row size
5 (Overhead)
+ Sum of bytes in the fixed-length index keys
+ Sum of bytes in variable-length index keys
= Subtotal
(2016 / Clustered row size) - 2 = No. of clustered index rows per page
No. of rows / No. of CI rows per page = No. of index pages at next level
If the result for the number of index pages at the next level is greater
than 1, repeat the following division Step, using the quotient as the
next dividend, until the quotient equals 1, which means that you
have reached the root level of the index:
Add the number of pages at each level to determine the total number
of pages in the index:
Each table and each index on a table has an object allocation map
(OAM). The OAM is stored on pages allocated to the table or index.
A single OAM page holds allocation mapping for between 2,000 and
63,750 data or index pages.
In the clustered index example that follows, there are 750,000 data
pages, requiring between 12 and 376 OAM pages. The clustered
index has 3411 pages and require 1 or 2 OAM pages. In the
nonclustered index example, the index has 164,137 pages and
requires between 3 and 83 OAM pages. In most cases, the number of
OAM pages required is closer to the minimum value. See “Why the
Range?” on page 3-8 for more information.
Minimum Maximum
Clustered index pages
OAM pages + +
Data pages + +
OAM pages + +
Total
The following example computes the size of the data and clustered
index for a table containing:
• 9,000,000 rows
• Sum of fixed-length columns = 100 bytes
• Sum of 2 variable-length columns = 50 bytes
• Clustered index key, fixed length, 4 bytes
4 (Overhead)
+ 100 Sum of bytes in all fixed-length columns
+ 50 Sum of bytes in all variable-length columns
154 = Subtotal
In the first part of this Step, the number of rows per page is rounded
down:
5 (Overhead)
+ 4 Sum of bytes in the fixed-length index keys
9 = Clustered Row Size
Both the table and the clustered index require one or more OAM
Pages.
750,000/63,750 = 12 (minimum)
750,000/2000 = 376 (maximum)
3379/63,750 = 1 (minimum)
3379/2000 = 2 (maximum)
Minimum Maximum
Clustered index pages 3379 3379
OAM pages 1 2
Data pages 750000 750000
OAM pages 12 376
Total 753392 753757
7 (Overhead)
+ Sum of Fixed-Length Keys
= Size of Leaf Index Row
9 (Overhead)
+ Sum of length of fixed-length keys
+ Sum of length of variable-length keys
+ Number of variable-length keys + 1
= Subtotal
2016/ Size of leaf index row = No. of leaf rows per page
No. of rows in table / No. of leaf rows per page = No. of leaf pages
(2016 / Size of non-leaf row) - 2 = No. of non-leaf index rows per page
If the number of index pages at the next level above is greater than 1,
repeat the following division step, using the quotient as the next
dividend, until the quotient equals 1, which means that you have
reached the root level of the index:
Add the number of pages at each level to determine the total number
of pages in the index:
Minimum Maximum
Nonclustered index pages
OAM pages + +
Total
9 (Overhead)
+ 4 Sum of length of fixed-length keys
+ 20 Sum of length of variable-length keys
+ 2 Number of variable-length keys + 1
35 = Subtotal
Minimum Maximum
Index pages 164137 164137
OAM pages 3 83
Total pages 164140 164220
➤ Note
Fillfactor affects size at index creation time. Fillfactors are not maintained
as tables are updated. Use these adjustments for read-only tables.
Other values for fillfactor reduce the number of rows per page on data
pages and leaf index pages. To compute the correct values when
using fillfactor, multiply the size of the available data page (2016) by
the fillfactor. For example, if your fillfactor is 75 percent, your data page
would hold 1471 bytes. Use this value in place of 2016 when you
calculate the number of rows per page. See “Step 2: Compute the
Number of Data Pages” on page 5-14 and “Step 8: Calculate the
Number of Leaf Pages in the Index” on page 5-19.
Distribution Pages
In Step 1
Use the sum of the average length of the variable length columns
instead of the sum of the defined length of the variable length
columns to determine the average data row size.
In Step 2
Use the average data row size in the first formula.
In Step 3
You must perform the addition twice. The first time, calculate the
maximum index row size, using the given formula. The second time,
calculate the average index row size, substituting the sum of the
average number of bytes in the variable-length index keys for the
sum of the defined number of bytes in the variable-length index
keys.
In Step 4
Substitute this formula for the first formula in Step 4, using the two
length values:
In Step 6
You must perform the addition twice. The first time, calculate the
maximum leaf index row size, using the given formula. The second
time, calculate the average leaf index row size, substituting the
average number of bytes in the variable-length index keys for the
sum of byte in the variable-length index keys.
In Step 7
Use the average leaf index row size in the first division procedure.
In Step 8
Use the average leaf index row size.
In Step 9
Substitute this formula for the first formula in Step 9, using the
maximum and averages calculated in Step 6:
SQL Server cannot store more than 256 data or index rows on a page.
Even if your rows are extremely short, the minimum number of data
pages will be:
max_rows_per_page Value
Each text or image column stores a 16-byte pointer in the data row
with the datatype varbinary(16). Each text or image column that is
initialized requires at least 2K (one data page) of storage space.
text and image columns are designed to store “implicit” null values,
meaning that the text pointer in the data row remains null, and there
is no text page initialized for the value, saving 2K of storage space.
If a text or image column is defined to allow null values, and the row
is created with an insert statement that includes NULL for the text or
image column, the column is not initialized, and the storage is not
allocated.
If a text or image column is changed in any way with update, then the
text page is allocated. Of course, inserts or updates that place actual
data in a column initialize the page. If the text or image column is
subsequently set to NULL, a single page remains allocated.
Each text or image page stores 1800 bytes of data. To calculate the
number of text chain pages that a particular entry will use, use this
formula:
The result should be rounded up in all cases; that is, a data length of
1801 bytes requires two 2K pages.
Introduction
This chapter introduces the basic query analysis tools that can help
you choose appropriate indexes, and discusses index selection
criteria for point queries, range queries, and joins.
This chapter discusses:
• Indexing and performance
• Index limits
• Tools for index tuning, especially set io statistics
• How to estimate I/O
• How indexes are used to avoid sorting
• How to choose indexes
• The distribution page
• Maintaining indexes
• Additional tips and techniques
Underlying Problems
➤ Note
The optimizer costs both physical and logical I/O for the query, as well as
other costs. The rule of thumb above does not describe how the optimizer
determines query costs.
Large index entries cause large indexes, so try to keep them as small
as possible. You can create indexes with keys up to 256 bytes, but
these indexes can store very few rows per index page, which
increases the amount of disk I/O needed during queries. The index
has more levels, and each level has more pages. Nonmatching index
scans can be very expensive.
The following example uses sp_estspace to demonstrate how the
number of index pages and leaf levels required increases with key
size. It creates a nonclustered indexes using 10-, 20-, and 40-character
keys.
create table demotable (c1 char(10),
c2 char(20),
c4 char(40))
create index t1 on demotable(c1)
Total_Mbytes
-----------------
83.58
Tool Function
set showplan on Shows the query plan for a query, including the indexes
selected, join order, and worktables. See Chapter 8,
“Understanding Query Plans.”
set statistics io on Shows how many logical and physical reads and writes
are performed to process the query. See “Indexes and I/O
Statistics” on page 6-8.
set statistics time on Shows how long it takes to execute the query.
set noexec on Usually used with set showplan on, this command
suppresses execution of the query. You see the plan the
optimizer would choose, but the query is not executed.
noexec is useful when the query would return very long
results or could cause performance problems on a
production system. Note that output from statistics io is
not shown when noexec is in effect (since the query does
not perform I/O).
dbcc traceon (302) This special trace flag lets you see the calculations the
optimizer uses to determine whether indexes should be
used. See “dbcc traceon 302” on page 9-14.
Tool Function
sp_configure fillfactor Sets or displays the default fillfactor for index
pages.
sp_help, sp_helpindex Provides information on indexes that exist for a
table.
sp_estspace Provides estimates of table and index size, the
number of pages at each level of an index, and
the time needed to create each index.
sp_spaceused Provides information about the size of tables and
its indexes.
update statistics Updates the statistics kept about distribution and
density of keys in an index.
Tool Function
set forceplan Forces the query to use the tables in the order
specified in the from clause.
set table count Increases the number of tables optimized at once.
select, delete, update clauses: Specifies the index, I/O size, or cache strategy to
use for the query.
(index...prefetch...mru_lru)
set prefetch Toggles prefetch for query tuning
experimentation.
sp_cachestrategy Sets status bits to enable or disable prefetch and
fetch-and-discard cache strategy.
Query
Change optimizer choices:
* set forceplan on
* set table count on Parse
* select...index...prefetch...
mru|lru
Optimizer
* set prefetch size “eavesdropping”:
Optimize
Optimize
dbcc traceon
(302)
Query Plan: Compile
Compile
set showplan on
Use the system procedure sp_sysmon (or the separate product, SQL
Server Monitor) as you work on index tuning. Look at the output for
improved cache hit ratios, a reduction in the number of physical
reads, and fewer context switches for physical reads.
For more information about using sp_sysmon see Chapter 19,
“Monitoring SQL Server Performance with sp_sysmon,” especially
the section “Index Management” on page 19-32.
Output Meaning
scan count Number of times an index or table was searched
logical reads Number of times a page is referenced in cache
physical reads Number of reads performed from disk
Total writes Number of writes to disk
Scan Count
The scan count shows the number of times a table or index was used
in the query. It does not necessarily mean that a table scan was
performed. A scan can represent any of these access methods:
• A table scan.
• An access via a clustered index. Each time the query starts at the
root page of the index, and follows pointers to the data pages, it is
counted.
• An access via a nonclustered index. Each time the query starts at
the root page of the index, and follows pointers to the leaf level of
the index (for a covered query) or to the data pages, it is counted.
You need to use showplan, as described in Chapter 8, “Understanding
Query Plans,” to determine which access method is used.
With 2K I/O, the number of times that a page is found in cache for a
query is logical reads minus physical reads. When you see output
like this:
logical reads: 624, physical reads: 624
it means that all of the pages for a table had to be read from disk.
Often, when indexes are used to access a table, or when you are
rerunning queries during testing, statistics io reports a combination of
logical and physical reads, like this output from a point query:
Physical reads are not reported in pages, but are reported as the
actual number of times SQL Server needs to access the disk. If the
query uses 16K I/O (as reported by showplan), a single physical read
brings 8 data pages into cache. If a query reports 100 16K physical
reads, it has read 800 data pages. If the query needs to scan each of
those data pages, it reports 800 logical reads. If a query, such as a join
query must read the page multiple times because other I/O has
flushed the page from the cache, each read is counted.
Reads and writes are also reported for any worktable that needs to be
created for the query. When a query creates more than one
worktable, the worktables are numbered in statistics io output to
correspond to the worktable numbers used in showplan output.
If you are testing a query and checking its I/O, and you execute the
same query a second time, you may get surprising physical reads
results if the query uses LRU replacement strategy. The first
execution reports a high number of physical reads, while the second
attempt reports 0 reads. However, this does not mean that your
tuning efforts have been instantly successful.
The first time you execute the query, all the data pages are read into
cache and remain there until some other server processes flushes
them from the cache. Depending on the cache strategy used for the
query, the pages may remain in cache for a longer or shorter time.
• If the query performs fetch-and-discard (MRU) caching, the
pages are read into the cache at the wash marker. In small or very
active caches, pages read into the cache at the wash marker are
flushed fairly quickly.
• If the query reads the pages in at the top of the MRU/LRU chain,
the pages remain in cache for much longer periods of time. This is
especially likely to happen if you have a large data cache and the
activity on your server is low.
For more information on testing and cache performance, see “Testing
Data Cache Performance” on page 15-10.
Estimating I/O
Checking the output from set statistics io provides information when
you actually execute a query. However, if you know the approximate
size of your tables and indexes, you can make I/O estimates without
running queries. Once you develop this knowledge of the size of
your tables and indexes, and the number of index levels in each
index, you can quickly determine whether I/O performance for a
query is reasonable, or whether a particular query needs tuning
efforts.
Following are some guidelines and formulas for making these
estimates.
Table Scans
remain in cache for a short time, and do not tend to flush other
more heavily used pages out of cache.
• LRU replacement strategy: Pages replace a least-recently-used
buffer and are placed on the most-recently-used end of the chain.
They remain in cache until other disk I/O flushes them from the
cache.
Table scans are performed:
• When no index exists on the columns used in the query.
• When the optimizer chooses not to use an index. It makes this
choice when it determines that using the index is more expensive
than a table scan. This is more likely with nonclustered indexes.
The optimizer may determine that it is faster to read the table
pages directly than it is to go through several levels of indexes for
each row that is to be returned.
As a rule of thumb, table scans are chosen over nonclustered
index access when the query returns more rows than there are
pages in the table when using 2K I/O, and more rows than
pages divided by 8 when using 16K I/O.
For example, if sp_estspace gives table size of 76,923 pages and your
system reads 50 pages per second into 2K buffers, the time to execute
a table scan on the table is:
76923 pages/50 reads per second = 1538 seconds, about 25
minutes
If your cache can use 16K buffers, the value is:
76,923 pages/8 pages per read = 9615 reads
9615 reads/50 reads per second = 192 seconds, about 3 minutes
The speed could improve if some of the data were in cache.
A point query that uses an index performs one I/O for each index
level plus one read for the data page. In a frequently used table, the
root page and intermediate pages of indexes are often found in
cache, so that physical I/O is lower by one or two reads.
If a query returns 150 rows, and the table has 10 rows per page, the
query needs to read 15 data pages, plus the needed index pages. If
the query uses 2K I/O, it requires 15 or 16 I/Os for the data pages,
depending on whether the range starts in the middle of a page. If
your query uses 16K I/O, these 15 data pages require a minimum of
2 or 3 I/Os for the database. 16K I/O reads entire extents in a single
I/O, so 15 pages might occupy 2 or 3 extents if the page chains are
contiguous in the extents. If the page chains are not contiguous,
because the table has been frequently updated, the query could
require as many as 15 or 16 16K I/Os to read the entire range. See
Page 1127
Hunter
Jenkins
Figure 6-6: Computing reads for a covering nonclustered index range query
The optimizer estimates that a range query that returns 500 rows,
with an index structure of 3 levels and 100 rows per page on the leaf
level of the nonclustered index, requires 507 or 508 I/Os:
• 1 read for the root level and 1 read for the intermediate level
• 5 or 6 reads for the leaf level of the index
• 500 reads for the data pages
Although it is possible that some of the rows in the result set will be
found on the same data pages, or that they will be found on data
pages already in cache, this is not predictable. The optimizer costs a
physical I/O for each row to be returned, and if this estimate exceeds
the cost of a table scan, it chooses the table scan. If the table in this
example has less than 508 pages, the optimizer chooses a table scan.
If the data is clustered in the order required by the sort, the sort is not
needed and is not performed.
select fname, lname, id Page 1132
from employees Bennet
order by lname Chan
Clustered index on lname Dull
Edwards
Key Pointer
Page 1007 Page 1133
Bennet 1132 Greane
Key Pointer Greane 1133 Greaves
Green 1144 Greco
Page 1001
Bennet 1007 Hunter 1127
Karsen 1009
Page 1009 Page 1144
Smith 1062
Karsen 1009 Green
Greene
Highland
Page 1127
Hunter
Jenkins
The following range query returns about 2000 rows. It can use a
clustered index on title_id to reduce I/O:
select * from titles
where title_id between ’T43’ and ’T791’
order by title_id
Table: titles scan count 1, logical reads: 246, physical reads: 246
Total writes for this command: 0
Since the data is stored in ascending order, a query requiring
descending sort order (for example, order by title_id desc) cannot use
any indexes, but must sort the data.
When all the columns named in the select list, the search arguments,
and the order by clause are included in a nonclustered index, SQL
Server uses the leaf level of the nonclustered index to retrieve the
data and does not have to read the data pages.
If the sort is in ascending order, and the order by columns form a prefix
subset of the index keys, the rows are returned directly from the
nonclustered index leaf pages.
If the sort is in descending order, or the columns do not form a prefix
subset of the index keys, a worktable is created and sorted.
With an index on au_lname, au_fname, au_id of the authors table, this
query can return the data directly from the leaf pages:
select au_id, au_lname
from authors
order by au_lname, au_fname
Choosing Indexes
Questions to ask when working with index selection are:
• What indexes are associated currently with a given table?
• What are the most important processes that make use of the
table?
• What is the overall ratio of select operations to data modifications
performed on the table?
• Has a clustered index been assigned to the table?
• Can the clustered index be replaced by a nonclustered index?
• Do any of the indexes cover one or more of the critical queries?
• Is a composite index required to enforce the uniqueness of a
compound primary key?
• What indexes can be defined as unique?
• What are the major sorting requirements?
• Do the indexes support your joins, including those referenced by
triggers and referential integrity constraints?
• Does indexing affect update types (direct vs. deferred)?
• What indexes are needed for cursor positioning?
• If dirty reads are used, are there unique indexes to support the
scan?
• Should IDENTITY columns be added to tables and indexes to
generate unique indexes? (Unique indexes are required for
updatable cursors and dirty reads.)
When deciding how many indexes to use, consider:
• Space constraints
• Access paths to table
• Percentage of data modifications vs. select operations
• Performance requirements of reports vs. OLTP
• Performance impacts of index changes
• How often you can update statistics
➤ Note
If the primary key is a monotonically increasing value, placing a clustered
index on this key can cause contention for the data page where the inserts
take place. This severely limits concurrency. Be sure that your clustered
index key randomizes the location of inserts.
Also, remember that with each insert, all nonclustered indexes have
to be updated, so there is a performance price to pay. The leaf level
has one entry per row, so you will have to change that row with every
insert.
All nonclustered indexes need to be updated:
• For each insert into the table.
• For each delete from the table.
• For any update to the table that changes any part of an index’s
key, or that deletes a row from one page and inserts it on another
page.
• For almost every update to the clustered index key. Usually, such
an update means that the row moves to a different page.
• For every data page split.
If your needs analysis shows that more than one column would
make a good candidate for a clustered index key, you may be able to
Small index entries yield small indexes, producing less index I/O to
execute queries. Longer keys produce fewer entries per page, so an
index requires more pages at each level, and in some cases,
additional index levels.
vs.
Figure 6-9: Sample rows for small and large index entries
Index
Index Choices Range Query on price Point Query on title
Pages
1 Nonclustered on title 36,800 Clustered index, about Nonclustered index, 6 I/Os
Clustered on price 650 26,600 pages (140,000 * .19)
With 16K I/O: 3,125 I/Os
2 Clustered on title 3,770 Table scan, 140,000 pages Clustered index, 6 I/Os
Nonclustered on price 6,076
With 16K I/O: 17,500 I/Os
3 Nonclustered on 36,835 Nonmatching index scan, Nonclustered index,
title, price about 35,700 pages 5 I/Os
With 16K I/O: 4,500 I/Os
4 Nonclustered on 36,835 Matching index scan, about Nonmatching index scan,
price, title 6,800 pages (35,700 * .19) about 35,700 pages
With 16K I/O: 850 I/Os With 16K I/O: 4,500 I/Os
Index Statistics
When you create an index on a table that contains data, SQL Server
creates a distribution page containing two kinds of statistics about
index values:
• A distribution table
• A density table
An index’s distribution page is created when you create an index on
a table that contains data. If you create an index on an empty table,
no distribution page is created. If you truncate the table (removing all
of its rows) the distribution page is dropped.
The data on the distribution page is not automatically maintained by
SQL Server. You must run the update statistics command to update the
data. You should run this command:
• When you feel that the distribution of the keys in an index has
changed
• If you truncate a table and reload the data
• When you determine that query plans may be less optimal due to
incorrect statistics
index, the density uses 2 bytes of storage. The rest of the page is
available to store the steps. Figure 6-10 shows how to compute the
number of steps that will be stored on the distribution page. Fixed-
length columns have 2bytes of overhead per step; variable-length
columns have 7 bytes of overhead per step.
Fixed-Length Key
2016 - (Number of keys * 2)
Number of keys =
Bytes per key +2
Variable-Length Key
2016 - (Number of keys * 2)
Number of keys =
Bytes per key +7
for each prefix of columns in composite indexes. That is, for an index
on columns A, B, C, D, it stores the density for:
• A
• A, B
• A, B, C
• A, B, C, D
If density statistics are not available, the optimizer uses default
percentages, as shown in Table 6-7.
When the optimizer checks for a value in the distribution table, it will
find that one of these conditions holds:
• The value falls between two consecutive rows in the table.
• The value equals one row in the middle of the table.
• The value equals the first row or the last row in the table.
• The value equals more than one row in the middle of the table.
• The value equals more than one row, including the first or last
row in the table.
• The value is less than the first row, or greater than the last row in
the table. (In this case, you should run update statistics.)
Depending on which cases match the query, the optimizer uses
formulas involving the step location (beginning, end, or middle of
the page), the number of steps, the number of rows in the table, and
the density to compute an estimated number of rows.
The optimizer uses the density table to help compute the number of
rows that a query will return. Even if the value of a search argument
is not known when the query is optimized, SQL Server can use the
density values in an index, as long as the leading column or columns
are specified for composite indexes.
Index Maintenance
Indexes should evolve as your system evolves.
• Over time, indexes should be based on the transactions and
processes that are being run, not on the original database design.
• Drop and rebuild indexes only if they are hurting performance.
• Keep index statistics up to date.
➤ Note
Failure to update statistics can severely hurt performance.
Rebuilding Indexes
➤ Note
The sorted data option copies the entire data level of a clustered index, so
you need approximately 120 percent of the space required for the table
available in your database.
• Heap table
• Table that contains text or image columns.
The contents are maintained by SQL Server. To display index
information, use sp_helpindex.
Table 6-8: Page pointers for unpartitioned tables in the sysindexes table
specify how full to create index pages and the data pages of clustered
indexes. Figure 6-12 illustrates a table with a fillfactor of 50 percent.
Page 1019
Page 945 Greane
Greane Green
Havier
Page 1133
Green Page 1243
Heim Havier
Page 326 Heemstra
Heim
Ippolito
Page 786
Heim
Hill
Figure 6-12: Table and clustered index with fillfactor set to 50 percent
If you are creating indexes for tables that will grow in size, you can
reduce the impact of page splitting on your tables and indexes by
using the fillfactor option for create index. Note that the fillfactor is used
only when you create the index; it is not maintained over time. The
purpose of fillfactor is to provide a performance boost for tables that
will experience growth; maintaining that fillfactor by continuing to
split partially full pages would defeat the purpose.
When you use fillfactor, except for a fillfactor value of 100 percent, data
and index rows are spread out across the disk space for the database
farther than they are by default.
If you use fillfactor, especially a very low fillfactor, you may notice these
effects on queries and maintenance activities:
• More pages must be read for each query that does a table scan or
leaf-level scan on a nonclustered index. In some cases, it may also
add a level to an index’s B-tree structure, since there will be more
pages at the data level and possibly more pages at each index
level.
Query
Parse and
Normalize Indexes
titleauthor
Optimize
Optimize
Tables titles
Compile
Compile Caches
authors
Execute
Results
The goal of the optimizer is to select the access method that reduces
the total time needed to process a query. The optimizer bases its
choice on the contents of the tables being queried and other factors
such as cache strategies, cache size, and I/O size. Since disk access is
generally the most expensive operation, the most important task in
optimizing queries is to provide the optimizer with appropriate
index choices, based on the transactions to be performed.
SQL Server’s cost-based query optimizer has evolved over many
years, taking into account many different issues. However, because
of its general-purpose nature, the optimizer may select a query plan
that is different from the one you expect. In certain situations, it may
make the incorrect choice of access methods. In some cases, this may
be the result of inaccurate or incomplete information. In other cases,
additional analysis and the use of special query processing options
can determine the source of the problem and provide solutions or
workarounds. Chapter 9, “Advanced Optimizing Techniques”
describes additional tools for debugging problems like this.
You can use many of these options at the same time, but some of
them suppress others, as described below.
showplan, statistics io, and other commands produce their output while
stored procedures are run. The system procedures that you might
use for checking table structure or indexes as you test optimization
strategies can produce voluminous output. You may want to have
hard copies of your table schemas and index information or you can
use separate windows for running system procedures such as
sp_helpindex.
For longer queries and batches, you may want to save showplan and
statistics io output in files. The “echo input” flag to isql echoes the input
into the output file, with line numbers included. The syntax is:
Novell NetWare
load isql -P password -e -i input_file
-o outputfile
VMS
isql /password = password
/echo
/input = inputfile
/output = outputfile
While showplan and noexec make useful companions, noexec stops all
the output of statistics io. The statistics io command reports actual disk
I/O; while noexec is on, no I/O takes place, so the reports are not
printed.
set statistics time displays information about the time it takes to execute
SQL Server commands. It prints these statistics:
• Parse and compile time – the number of CPU ticks taken to parse,
optimize, and compile the query.
• Execution time – the number of CPU ticks taken to execute the
query.
• SQL Server CPU time – the number of CPU ticks taken to execute
the query, converted to milliseconds.
To see the clock_rate for your system, execute:
sp_configure "sql server clock tick length"
See “sql server clock tick length” on page 11-96 of the System
Administration Guide for more information.
• SQL Server elapsed time – the elapsed time is the difference
between the time the command started and the current time, as
taken from the operating system clock, in milliseconds.
The following formula converts ticks to milliseconds:
CPU_ticks * clock_rate
Milliseconds =
1000
The following output shows that the query was parsed and compiled
in one clock tick, or 100 ms. It took 120 ticks, or 12,000 ms., to execute.
Total elapsed time was 17,843 ms., indicating that SQL Server spent
some time processing other tasks or waiting for disk or network I/O
to complete.
Parse and Compile Time 1.
SQL Server cpu time: 100 ms.
type
------------ ------------------------
UNDECIDED 210,500.00
business 256,000.00
cooking 286,500.00
news 266,000.00
Optimizer Strategies
The following sections explain how the optimizer analyzes these
specific types of queries:
• Search arguments in the where clause
• Joins
• Queries using or clauses and the in (values_list) predicate
• Aggregates
• Subqueries
• Updates
The optimizer looks for SARGs in the where clauses of a query and for
indexes that match the columns. If your query uses one or more of
these clauses to scan an index, you will see the showplan output “Keys
are: <keylist>” immediately after the index name information. If you
think your query should be using an index, and it causes table scans
instead, look carefully at the search clauses and operators.
Clause Conversion
between Converted to >= and <= clauses.
like If the first character in the pattern is a constant, like clauses can
be converted to greater than or less than queries. For example,
like "sm%" becomes >= "sm" and < "sn". The expression like "%x" is
not optimizable.
expressions If the right-hand portion of the where clause contains arithmetic
expressions that can be converted to a constant, the optimizer
can use the density values, and may use the index, but cannot
use the distribution table on the index.
Use these guidelines when you write search arguments for your
queries:
• Avoid functions, arithmetic operations, and other expressions on
the column side of search clauses.
• Avoid incompatible datatypes.
• Use the leading column of a composite index. The optimization of
secondary keys provides less performance.
• Use all the search arguments you can to give the optimizer as
much as possible to work with.
• Check showplan output to see which keys and indexes are used.
Figure 7-3 shows how predicates are applied by the optimizer and in
query execution, and questions to ask when examining predicates
and index choices.
Y
N Is there an index on the Can an index be created or changed so
column? that the query requires fewer I/Os than
with existing indexes?
Y
N
Is this the best index Is performance acceptable using this
available for the query? index, or does more work on indexing
need to be done?
Y
Use this index to retrieve How many rows did this index qualify,
data pages and qualify rows. and how many I/Os did it require?
Use this predicate to qualify How many of the rows qualified by the
rows. index are not qualified by this
predicate?
Query Execution
Optimizing Joins
Joins pull information from two or more tables, requiring nested
iterations on the tables involved. In a two-table join, one table is
treated as the outer table; the other table becomes the inner table.
SQL Server examines the outer table for rows that satisfy the query
conditions. For each row that qualifies, SQL Server must then
examine the inner table, looking at each row where the join columns
match.
Optimizing the join columns in queries is extremely important.
Relational databases make extremely heavy use of joins. Queries that
perform joins on several tables are especially critical, as explained in
the following sections.
Some subqueries are also converted to joins. These are discussed on
page 7-27.
Join Syntax
The process of creating the result set for a join is to nest the tables,
and to scan the inner tables repeatedly for each qualifying row in the
outer table.
Outer TableA
Table A: Table B:
1,000,000 rows 100,000 rows
10 rows per page 10 rows per page
100,000 pages 10,000 pages
No index Clustered index on join
column
If TableA is the outer table, it is accessed via a table scan. When the
first qualifying row is found, the clustered index on TableB is used to
find the row or rows where TableB.col1 matches the value retrieved
from TableA. When that completes, the scan on TableA continues until
another match is found. The clustered index is used again to retrieve
the next set of matching rows from TableB. This continues until TableA
has been completely scanned. If 10 rows from TableA match the
search criteria, the number of page reads required for the query is:
Pages Read
Table scan of TableA 100,000
10 clustered index accesses of TableB + 30
Total 100,030
If TableB is the outer table, the clustered index is used to find the first
row that matches the search criteria. Then, TableA is scanned to find
the rows where TableA.col1 matches the value retrieved from TableB.
When the table scan completes, another row is read from the data
pages for TableB, and TableA is scanned again. This continues until all
matching rows have been retrieved from TableB. If there are 10 rows
in TableB that match, this access choice would require the following
number of page reads:
Pages Read
1 clustered index access of TableB + 3
10 table scans of TableA 1,000,000
Total 1,000,003
For any join using an index, the optimizer uses a statistic called the
density to help optimize the query. The density is the average
proportion of duplicate keys in the index. It varies between 0 percent
and 100 percent. An index whose keys are all duplicates of each other
will have a density of 100 percent, while an index with N rows,
whose keys are all unique, will have a density of 1/N.
The query optimizer uses the density to estimate the number of rows
that will be returned for each scan of the inner table of a join for a
particular index. For example, if the optimizer is considering a join
with a 10,000-row table, and an index on the table has a density of 25
percent, the optimizer would estimate 2500 rows per scan for a join
using that index.
SQL Server maintains a density for each prefix of columns in
composite indexes. That is, it keeps a density on the first column, the
first and second columns, the first, second, and third columns, and so
on, up to and including the entire set of columns in the index. The
optimizer uses the appropriate density for an index when estimating
the cost of a join using that index. In a 10,000-row table with an index
on seven columns, the entire seven-column key might have a density
of 1/10,000, while the first column might have a density of only 1/2,
indicating that it would return 5000 rows.
The densities on an index are part of the statistics that are maintained
by the create index and update statistics commands.
If statistics are not available, the optimizer uses default percentages:
Join Permutations
When you are joining four or fewer tables, SQL Server considers all
possible permutations of the four tables. It establishes this cutoff
because the number of permutations of join orders multiplies with
each additional table, requiring lengthy computation time for large
joins.
The method the optimizer uses to determine join order has excellent
results for most queries with much less CPU time than examining all
permutations of all combinations. The set table count command allows
you to specify the number of tables that the optimizer considers at
once. See “Increasing the Number of Tables Considered by the
Optimizer” on page 9-7.
Changing the order of the tables in the from clause normally has no
effect on the query plan, even on tables that join more than four
tables.
When you have more than four tables in the from clause, SQL Server
optimizes each subset of four tables. Then, it remembers the outer
table from the best plan involving four tables, eliminates it from the
set of tables in the from clause, and optimizes the best set of four tables
out of the remaining tables. It continues until only four tables remain,
at which point it optimizes those four tables normally.
For example, suppose you have a select statement with the following
from clause:
from T1, T2, T3, T4, T5, T6
The optimizer looks at all possible sets of 4 tables taken from these 6
tables. The 15 possible combinations of all 6 tables are:
T1, T2, T3, T4
T1, T2, T3, T5
T1, T2, T3, T6
T1, T2, T4, T5
T1, T2, T4, T6
T1, T2, T5, T6
T1, T3, T4, T5
T1, T3, T4, T6
T1, T3, T5, T6
T1, T4, T5, T6
T2, T3, T4, T5
T2, T3, T4, T6
T2, T3, T5, T6
T2, T4, T5, T6
T3, T4, T5, T6
For each one of these combinations, the optimizer looks at all the join
orders (permutations). For example, for the set of tables T2, T3, T5,
T6, there are 24 possible join orders or permutations for this
combination of 4 tables. SQL Server looks at these 24 possible orders:
T2, T3, T5, T6
T2, T3, T6, T5
T2, T5, T3, T6
T2, T5, T6, T3
T2, T6, T3, T5
T2, T6, T5, T3
T3, T2, T5, T6
T3, T2, T6, T5
T3, T5, T2, T6
T3, T5, T6, T2
T3, T6, T2, T5
T3, T6, T5, T2
T5, T2, T3, T6
T5, T2, T6, T3
four tables that appear in the from clause makes the order of tables in
the from clause irrelevant.
The only time that the order of tables in the from clause can make any
difference is when the optimizer comes up with the same cost
estimate for two join orders. In that case, it chooses the first of the two
join orders that it encounters. The order of tables in the from clause
affects the order in which the optimizer evaluates the join orders, so
in this one case, it can have an effect on the query plan. Notice that it
does not have an effect on the query cost, or on the query
performance.
or syntax
The optimizer estimates the cost index access for each clause in the
query. For queries with or clauses on different columns in the same
table, SQL Server can choose to use a different index for each clause.
The query uses a table scan if either of these conditions is true:
• The cost of all the index accesses is greater than the cost of a table
scan.
• At least one of the clauses names a column that is not indexed, so
the only way to resolve the clause is to perform a table scan.
Queries in cursors cannot use the OR strategy, and must perform a
table scan. However, queries in cursors can use the multiple
matching index scans strategy.
Optimizing Aggregates
Aggregates are processed in two steps:
• First, appropriate indexes are used to retrieve the appropriate
rows, or the table is scanned. For vector (grouped) aggregates, the
results are placed in a worktable. For scalar aggregates, results
are computed in a variable in memory.
• Second, the worktable is scanned to return the results for vector
aggregates, or the results are returned from the internal variable.
In many cases, aggregates can be optimized to use a composite
nonclustered index on the aggregated column and the grouping
column, if any, rather than performing table scans.
For example, if the titles table has a nonclustered index on type, price,
the following query retrieves its results from the leaf level of the
nonclustered index:
select type, avg(price)
from titles
group by type
Table 7-3 shows some of other optimization methods for aggregates.
Optimizing Subqueries
➤ Note
This section describes SQL Server release 11.0 subquery processing. If
your stored procedures, triggers, and views were created on SQL Server
prior to release 11.0 and have not been dropped and re-created, they may
not use the same processing. See sp_procqmode in the SQL Server
Reference Manual for more information on determining the processing
mode.
select title_id
from titles
where total_sales = <internal_variable>
The search clause in the second step of this transformation can be
optimized. If there is an index on total_sales, the query can use it.
Short Circuiting
When and joins the clauses, the evaluation of the list stops as soon as
any clause evaluates to FALSE.
This query contains two and clauses in addition to the subquery:
select au_fname, au_lname, title, royaltyper
from titles t, authors a, titleauthor ta
where t.title_id = ta.title_id
and a.au_id = ta.au_id
and advance >= (select avg(advance)
from titles t2
where t2.type = t.type)
and price > 100
and au_ord = 1
SQL Server orders the execution steps to evaluate the subquery last.
If a row does not meet an and condition, SQL Server discards the row
without checking any more and conditions and begins to evaluate the
next row, so the subquery is not processed unless the row meets all of
the and conditions.
Optimizing Subqueries
Update Operations
SQL Server handles updates in different ways, depending on the
changes being made to the data and the indexes used to locate the
rows. The two major types of updates are deferred updates and
Direct Updates
In-Place Updates
Before After
Before After
Page
header
Length in Bytes
Row Before After
1 24 24
2 20 100
3 60 60
4 48 48
The update in Figure 7-8 changes the length of the second row from
20 to 100 bytes, so the row offsets change for the rows that follow it
on the page.
Cheap direct updates are almost as fast as in-place updates. They
require the same amount of I/O, but more processing. Two changes
are made to the data page (the row and the offset table). Any changed
index keys are updated by deleting old values and inserting new
values. This affects only indexes whose keys change, since the page
and row ID do not change.
If the data does not fit on the same page, SQL Server performs an
expensive direct update, if possible. An expensive direct update
deletes the data row, including all index entries, and then inserts the
modified row and index entries.
SQL Server uses a table scan or index to find the row in its original
location and then deletes the row. If the table has a clustered index,
SQL Server uses the index to determine the new location for the row;
otherwise, SQL Server inserts the new row at the end of the heap.
For an expensive direct update, all the following requirements must
be met:
• The length of a data row changes so that the row no longer fits on
the same data page and the row needs to move to a different page,
or the update affects key columns for the clustered index.
• The index used to find the row is not changed by the update.
• The update statement does not include a join.
• The affected columns are not used for referential integrity.
Before
56 32
Page 1133
Page 1144
136 76 56 32
56 32
Expensive direct updates are the slowest type of direct update. The
delete is performed on one data page and the insert is performed on
a different data page. All index entries must be updated, since the
row location changes.
Deferred Updates
SQL Server uses deferred updates when direct update conditions are
not met. Deferred updates are the slowest type of update.
The steps involved in deferred updates are:
• Locate the affected data rows, writing the log records for deferred
delete and insert of the data pages as rows are located.
• Read the log records for the transaction. Perform the deletes on
the data pages and delete any affected index rows.
• At the end of the operation, re-read the log, and make all inserts
on the data pages and insert any affected index rows.
Deferred updates are always required for:
• Updates that use joins
• Updates to columns used for referential integrity
Some other situations that require deferred updates are:
• The update moves the row to a new page while the table is being
accessed via a table scan or clustered index.
• Duplicate rows are not allowed in the table, and there is no
unique index to prevent them.
• The index used to find the data row is not unique, and the row
moves because the update changes the clustered index key or
because the new row does not fit on the page.
Deferred updates incur more overhead than direct updates because
they require re-reading the transaction log to make the final changes
to the data and indexes. This involves additional traversal of the
index trees.
For example, if there is a clustered index on title, this query performs
a deferred update:
update titles set title = "Portable C Software"
where title = "Designing Portable Software"
update employee
set lname = "Hubbard"
where lname = "Green" Page 1242
Key Pointer 10 O’Leary
11 Ringer
Before update Page 1132
12 White
Bennet 1421,1
Key RowID Pointer 13 Jenkins
Chan 1129,3
Page 1007 Dull 1409,1 Page 1307
Bennet 1421,1 1132 Edwards 1018,5 14 Hunter
Key RowID Pointer Greane 1307,4 1133 15 Smith
Page 1001 Hunter 1307,1 1127 Page 1133 16 Ringer
Bennet 1421,1 1007 Greane 1307,4 17 Greane
Karsen 1411,3 1009 Green 1421,2
Smith 1307,2 1062 Page 1009 Greene 1409,2 Page 1421
Karsen 1411,3 1315 18 Bennet
19 Green
Page 1127 20 Yokomoto
Hunter 1307,1
Jenkins 1242,4
Page 1409
21 Dull
22 Greene
23 White
Greene 1409,2
Page 1421
Step 2: Change data page
18 Bennet
19 Hubbard
20 Yokomoto
Optimizing Updates
Table 7-4 shows the effects of index type on update mode for 3
different updates: the update of a key column, a variable-length
column, and a fixed-length column. In all cases, duplicate rows are
not allowed. For the indexed cases, the index is on title_id.
update titles set [ title_id | var_len_col |
fixed_len_col ] = value
where title_id = "T1234"
This table shows how a unique index can promote a more efficient
update mode than a nonunique index on the same key. For example,
with a unique clustered index, all of these updates can be performed
in direct mode, but must be performed in deferred mode if the index
is not unique.
For tables with clustered indexes that are not unique, a unique index
on any other column in the table provides improved update
performance. In some cases, you may want to add an identity
column to a table in order to include it as a key in an index that
would otherwise be non-unique.
Update to:
Variable length
Indexing Variable-length Key Fixed length column
column
No index N/A direct deferred_varcol
Clustered, unique direct direct direct
Clustered, not unique deferred deferred deferred
Clustered, not unique, deferred direct deferred_varcol
with a unique index on
another column
Nonclustered, unique deferred_varcol direct direct
Nonclustered, not deferred_varcol direct deferred_varcol
unique
If the key for a table is fixed length, the only difference in update
modes from those shown in the table occurs for nonclustered
indexes. For a nonclustered, non-unique index, the update mode is
deferred_index for updates to the key. For a nonclustered, unique
index, the update mode is direct for updates to the key.
You can use showplan to figure out if updates are deferred or direct,
but it does not give you more detailed information about the type of
deferred or direct update it is. Output from the system procedure
sp_sysmon (or the separate product, SQL Server Monitor) supplies
detailed statistics about the type of updates performed during a
sample interval.
Run sp_sysmon as you tune updates and look for reduced numbers of
deferred updates, reduced locking, and reduced I/O.
See “Transaction Profile” on page 19-22 in Chapter 19, “Monitoring
SQL Server Performance with sp_sysmon.”
For longer queries and batches, you may want to save output into
files. The “echo input” flag to isql echoes the input into the output file,
with line numbers included. The syntax is shown on page 7-6.
Using showplan
The set showplan on command is your main tool for understanding
how the optimizer executes your queries. The following sections
explore query plan output.
In this chapter, the discussion of showplan messages is divided into
four sections:
• Basic showplan messages—those you see when using fairly simple
select statements and data modification commands. See Table 8-1
on page 8-2.
• showplan messages for particular clauses, predicates, and so on,
such group by, aggregates, or order by. See Table 8-2 on page 8-13.
• showplan messages describing access methods. See Table 8-3 on
page 8-23.
• showplan messages for subqueries. See Table 8-4 on page 8-36.
Each message is explained in detail under its own heading. The
message and related messages are shown in bold type in the showplan
output.
Step Message
STEP N
showplan output displays “STEP N” for every query, where N is an
integer, beginning with “STEP 1”. For some queries, SQL Server
cannot effectively retrieve the results in a single step and must break
the query plan into several steps. For example, if a query includes a
group by clause, SQL Server breaks it into at least two steps:
• One step to select the qualifying rows from the table and to group
them, placing the results in a worktable
• Another step to return the rows from the worktable
This example demonstrates a single-step query.
select au_lname, au_fname
from authors
where city = "Oakland"
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
Multiple-step queries are demonstrated under the group by command
on page 8-13 and in other places in this chapter.
STEP 1
The type of query is CREATE INDEX.
TO TABLE
titleauthor
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Index : au_lname_ix
Ascending scan.
Positioning by key.
Keys are:
au_lname
Using I/O Size 2 Kbytes.
FROM TABLE
titleauthor
Nested iteration.
Index : ta_au_tit_ix
Ascending scan.
Positioning by key.
Index contains all needed columns. Base
table will not be read.
Keys are:
au_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
titles
Nested iteration.
Using Clustered Index.
Index : tit_id_ix
Ascending scan.
Positioning by key.
Keys are:
title_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
The sequence of tables in this output shows the order chosen by the
SQL Server query optimizer, which is not the order in which they
were listed in the from clause or where clause:
• First, the qualifying rows from the authors table are located (using
the search clause on au_lname).
• Those rows are then joined with the titleauthor table (using the
join clause on the au_id columns).
• Finally, the titles table is joined with the titleauthor table to retrieve
the desired columns (using the join clause on the title_id
columns).
STEP 1
The type of query is INSERT.
The update mode is direct.
FROM TABLE
titles
Using Clustered Index.
Index : tit_id_ix
Ascending scan.
Positioning by key.
Keys are:
title_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
salesdetail
The clustered index on title_id provided the best access method for
looking up the referenced value.
(discussed later), “TO TABLE” indicates that the results are going to
the “Worktable” table rather than a user table. The following
examples illustrate the use of the “TO TABLE” statement:
insert sales
values ("8042", "QA973", "12/7/95")
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is INSERT.
The update mode is direct.
TO TABLE
sales
update publishers
set city = "Los Angeles"
where pub_id = "1389"
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is UPDATE.
The update mode is direct.
FROM TABLE
publishers
Nested iteration.
Index : pub_id_ix
Ascending scan.
Positioning by key.
Keys are:
pub_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
publishers
The second query indicates that the publishers table is used as both
the “FROM TABLE” and the “TO TABLE”. In the case of update
operations, the optimizer needs to read the table that contains the
row(s) to be updated, resulting in the “FROM TABLE” statement,
and then needs to modify the row(s), resulting in the “TO TABLE”
statement.
STEP 1
The type of query is DELETE.
The update mode is direct.
FROM TABLE
authors
Nested iteration.
Index : au_names
Ascending scan.
Positioning by key.
Keys are:
au_lname
au_fname
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
authors
Deferred Mode
STEP 1
The type of query is INSERT.
The update mode is deferred.
FROM TABLE
mytable
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
mytable
This command copies every row in the table and appends the rows
to the end of the table. The query processor needs to differentiate
between the rows that are currently in the table (prior to the insert
command) and the rows being inserted, so that it does not get into a
continuous loop of selecting a row, inserting it at the end of the table,
selecting the row that it just inserted, and re-inserting it again. The
query processor solves this problem by performing the operation in
two steps:
1. It scans the existing table and writes insert records into the
transaction log for each row that it finds.
2. When all the “old” rows have been read, it scans the log and
performs the insert operations.
Evaluate Grouped type AGGREGATE The query contains an aggregate. page 8-15
“Grouped” indicates that there is a
or
grouping column for the aggregate
Evaluate Ungrouped type (vector aggregate); “Ungrouped”
AGGREGATE indicates there is no grouping
page 8-17
column. The variable indicates the
type of aggregate.
Evaluate Grouped ASSIGNMENT Query includes compute (ungrouped) page 8-16
OPERATOR or compute by (grouped).
Evaluate Ungrouped ASSIGNMENT
OPERATOR
WorktableN created for The query contains a distinct keyword page 8-20
DISTINCT. in the select list that requires a sort to
eliminate duplicates.
WorktableN created for ORDER The query contains an order by clause page 8-21
BY. that requires ordering rows.
This step involves sorting. The query includes on order by or page 8-22
distinct clause, and results must be
sorted.
Using GETSORTED. The query created a worktable and page 8-23
sorted it. GETSORTED is a particular
technique used to return the rows.
This statement appears in the showplan output for any query that
contains a group by clause. Queries that contain a group by clause are
always executed in at least two steps:
• One step selects the qualifying rows into a worktable and groups
them.
• Another step returns the rows from the worktable.
STEP 1
The type of query is SELECT (into
Worktable1).
GROUP BY
Evaluate Grouped COUNT AGGREGATE.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
FROM TABLE
Worktable1.
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With MRU Buffer Replacement Strategy.
STEP 1
The type of query is SELECT (into Worktable1).
GROUP BY
Evaluate Grouped COUNT AGGREGATE.
Evaluate Grouped SUM OR AVERAGE AGGREGATE.
FROM TABLE
titles
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
The type of query is SELECT.
FROM TABLE
Worktable1.
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With MRU Buffer Replacement Strategy.
In the first step, the worktable is created and the aggregates are
computed. The second step selects the results from the worktable.
compute by Message
Evaluate Grouped ASSIGNMENT OPERATOR
Queries using compute by display the same aggregate messages as
group by as well as the “Evaluate Grouped ASSIGNMENT
OPERATOR” message. The values are placed in a worktable in one
step, and the computation of the aggregates is performed in a second
step. This query uses type and advance, like the group by query
example:
select type, advance from titles
having title like "Compu%"
order by type
compute avg(advance) by type
In the showplan output, the computation of the aggregates takes place
in Step 2:
STEP 1
The type of query is INSERT.
The update mode is direct.
Worktable1 created for ORDER BY.
FROM TABLE
titles
Nested iteration.
Index : title_ix
Ascending scan.
Positioning by key.
Keys are:
title
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
The type of query is SELECT.
Evaluate Grouped SUM OR AVERAGE AGGREGATE.
Evaluate Grouped COUNT AGGREGATE.
Evaluate Grouped ASSIGNMENT OPERATOR.
This step involves sorting.
FROM TABLE
Worktable1.
Using GETSORTED
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With MRU Buffer Replacement Strategy.
Ungrouped Aggregates
STEP 1
The type of query is SELECT.
Evaluate Ungrouped COUNT AGGREGATE.
Evaluate Ungrouped SUM OR AVERAGE AGGREGATE.
FROM TABLE
titles
Nested iteration.
Index : tp
Ascending scan.
Positioning by key.
Keys are:
type
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
STEP 2
The type of query is SELECT.
Notice that showplan considers this a two-step query, which is similar
to the showplan from the group by query shown earlier. Since the scalar
aggregate returns a single value, SQL Server uses an internal variable
to compute the result of the aggregate function as the qualifying
rows from the table are evaluated. After all rows from the table have
been evaluated (Step 1), the final value from the variable is selected
(Step 2) to return the scalar aggregate result.
compute Messages
STEP 1
The type of query is INSERT.
The update mode is direct.
Worktable1 created for ORDER BY.
FROM TABLE
titles
Nested iteration.
Index : titles_ix
Ascending scan.
Positioning by key.
Keys are:
title
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
The type of query is SELECT.
Evaluate Ungrouped SUM OR AVERAGE AGGREGATE.
Evaluate Ungrouped COUNT AGGREGATE.
Evaluate Ungrouped ASSIGNMENT OPERATOR.
This step involves sorting.
FROM TABLE
Worktable1.
Using GETSORTED
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With MRU Buffer Replacement Strategy.
Some queries that include distinct use a sort step to locate the
duplicate values in the result set. distinct queries and order by queries
can avoid the sorting step when the indexes used to locate rows
support the order by or distinct clause.
For those cases where the sort must be performed, the distinct
keyword in a select list and the order by clause share some showplan
messages:
• Each generates a worktable message
• The message “This step involves sorting.”
• The message “Using GETSORTED”
STEP 1
The type of query is INSERT.
The update mode is direct.
Worktable1 created for DISTINCT.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
The type of query is SELECT.
This step involves sorting.
FROM TABLE
Worktable1.
Using GETSORTED
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With MRU Buffer Replacement Strategy.
STEP 1
The type of query is INSERT.
The update mode is direct.
Worktable1 created for ORDER BY.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
STEP 2
The type of query is SELECT.
This step involves sorting.
FROM TABLE
Worktable1.
Using GETSORTED
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With MRU Buffer Replacement Strategy.
The messages “This step involves sorting” and “Using
GETSORTED” are explained on page 8-22.
Sorting Message
This step involves sorting.
This showplan message indicates that the query must sort the
intermediate results before returning them to the user. Queries that
use distinct or that have an order by clause not supported by an index
require an intermediate sort. The results are put into a worktable,
and the worktable is then sorted. For examples of this message, see
“Worktable Message for distinct” on page 8-20 or “Worktable
Message for order by” on page 8-21.
“GETSORTED” Message
Using GETSORTED
This statement indicates one of the ways that SQL Server returns
result rows from a table. In the case of “Using GETSORTED,” the
rows are returned in sorted order. However, not all queries that
return rows in sorted order include this step. For example, order by
queries whose rows are retrieved using an index with a matching
sort sequence do not require “GETSORTED.”
The “Using GETSORTED” method is used when SQL Server must
first create a temporary worktable to sort the result rows and then
return them in the proper sorted order. The examples for distinct on
page 8-20 and for order by on page 8-21 show the “Using
GETSORTED” message.
Message Explanation
Table Scan. Indicates that the query performs a page 8-24
table scan.
Using N Matching Index Scans Indicates that a query with in or or is page 8-25
performing multiple index scans,
one for each or condition or in list
item.
Using Clustered Index. Query uses the clustered index on page 8-26
the table.
Index : index_name Query uses an index on the table; the page 8-27
variable shows the index name.
Ascending scan. Indicates the direction of the scan. page 8-28
All scans are ascending.
Positioning at start of These messages indicate how scans page 8-28
table. are taking place.
Positioning by Row
IDentifier (RID).
Positioning by key.
Positioning at index start.
Message Explanation
Scanning only up to the first These messages indicate min and max page 8-29
qualifying row. optimization, respectively.
Scanning only the last page
of the table.
Index contains all needed Indicates that the nonclustered index page 8-29
columns. Base table will not covers the query.
be read.
Keys are: Included when the positioning page 8-31
message indicates “Positioning by
key.” The next line(s) show the index
key(s) used.
Using Dynamic Index. Reported during some queries using page 8-31
or clauses or in (values list).
WorktableN created for Indicates that an inner table of a join page 8-33
REFORMATTING. has no useful indexes, and that SQL
Server has determined that it is
cheaper to build a worktable and an
index on the worktable than to
perform repeated table scans.
Log Scan. Query fired a trigger that uses page 8-35
inserted or deleted tables.
Using I/O size N Kbytes. Variable indicates the I/O size for page 8-35
disk reads and writes.
With LRU/MRU buffer Reports the caching strategy for the page 8-36
replacement strategy. table.
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
select title
from titles
where title_id in ("T18168","T55370")
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Using 2 Matching Index Scans
Index : title_id_ix
Ascending scan.
Positioning by key.
Keys are:
title_id
Index : title_id_ix
Ascending scan.
Positioning by key.
Keys are:
title_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
STEP 1
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Using Clustered Index.
Index : tit_id_ix
Ascending scan.
Positioning by key.
Keys are:
title_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Index : au_name_ix
Ascending scan.
Positioning by key.
Keys are:
au_fname
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
Positioning Messages
Positioning at start of table.
Positioning by Row IDentifier (RID).
Positioning by key.
Positioning at index start.
These messages describe how access to a table or to the leaf level of a
nonclustered index takes place. The choices are:
• “Positioning at start of table.” This message indicates a table scan,
starting at the first row of the table.
• “Positioning by Row IDentifier (RID).” This message is printed
after the OR strategy has created a dynamic index of row IDs. See
“Using Dynamic Index.” on page 8-31 for more information
about how row IDs are used.
• “Positioning by key.” This messages indicates that the index is
used to find the qualifying row or the first qualifying row. It is
printed for:
Scanning Messages
Scanning only the last page of the table.
This message indicates that a query containing an ungrouped
(scalar) max aggregate needs to access only the last page of the table.
In order to use this special optimization, the aggregate column needs
to be the leading column in an index. See “Optimizing Aggregates”
on page 7-25 for more information.
Scanning only up to the first qualifying row.
This message appears only for queries using an ungrouped (scalar)
min aggregate. The aggregated column needs to be the leading
column in an index. See “Optimizing Aggregates” on page 7-25 for
more information.
The next query shows output for a matching scan, using a composite
index on au_lname, au_fname, au_id:
select au_fname, au_lname, au_id
from authors
where au_lname = "Williams"
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Index : au_names_id
Ascending scan.
Positioning by key.
Index contains all needed columns. Base
table will not be read.
Keys are:
au_lname
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
The index is used to find the first occurrence of “Williams” on the
nonclustered leaf page. The query scans forward, looking for more
occurrences of “Williams” and returning any it finds. Once a value
greater than “Williams” is found, the query has found all the
matching values, and the query stops. All the values needed in the
where clauses and select list are included in this index, so no access to
the table is required.
With the same composite index on au_lname, au_fname, au_id, this
query performs a nonmatching scan, since the leading column of the
index is not included in the where clause:
select au_fname, au_lname, au_id
from authors
where au_id = "A93278"
Note that the showplan output does not contains a “Keys are...”
message, and the positioning message is “Positioning at index start.”
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Index : au_names_id
Ascending scan.
Positioning at index start.
Index contains all needed columns. Base
table will not be read.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
This query must scan the entire leaf level of the nonclustered index,
since the rows are not ordered and there is no way to know the
uniqueness of a particular column in a composite index.
Keys Message
Keys are:
keys_list
This message is followed by the key(s) used whenever SQL Server
uses a clustered or a matching nonclustered index scan to locate
rows. For composite indexes, all keys in the where clauses are listed.
Examples are included under those messages.
SQL Server does not use the OR strategy for all queries that contain
or clauses. The following conditions must be met:
• All columns in the or clause must belong to the same table.
• If any portion of the or clause requires a table scan (due to lack of
an appropriate index or poor selectivity of a given index), then a
table scan is used for the entire query.
If the query contains or clauses on different columns of the same
table, and each of those columns has a useful index, SQL Server can
use different indexes for each clause.
The OR strategy cannot be used for cursors.
The showplan output below includes three “FROM TABLE” sections:
• The first two “FROM TABLE” blocks in the output show the two
index accesses, one for “Bill” and one for “William”.
• The final “FROM TABLE” block shows the “Using Dynamic
Index” output with its companion positioning message,
“Positioning by Row IDentifier (RID).” This is the step where the
dynamic index is used to locate the table rows to be returned.
select au_id, au_fname, au_lname
from authors
where au_fname = "Bill"
or au_fname = "William"
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Index : au_fname_ix
Ascending scan.
Positioning by key.
Keys are:
au_fname
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
authors
Nested iteration.
Index : au_fname_ix
Ascending scan.
Positioning by key.
Keys are:
au_fname
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
authors
Nested iteration.
Using Dynamic Index.
Ascending scan.
Positioning by Row IDentifier (RID).
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
Reformatting Message
WorktableN Created for REFORMATTING.
When joining two or more tables, SQL Server may choose to use a
reformatting strategy to join the tables when the tables are large and
the tables in the join do not have a useful index. The reformatting
strategy:
• Inserts the needed columns from qualifying rows of the smaller
of the two tables into a worktable.
• Creates a clustered index on the join column(s) of the worktable.
The index is built using the keys that join the worktable to the
outer table in the query.
• Uses the clustered index in the join to retrieve the qualifying rows
from the table.
See “Saving I/O Using the Reformatting Strategy” on page 7-17 for
more information on reformatting.
➤ Note
If your queries frequently employ the reformatting strategy, examine the
tables involved in the query. Unless there are other overriding factors, you
may want to create an index on the join columns of the table.
STEP 1
The type of query is INSERT.
The update mode is direct.
Worktable1 created for REFORMATTING.
FROM TABLE
titleauthor
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
The type of query is INSERT.
The update mode is direct.
Worktable2 created for REFORMATTING.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable2.
STEP 3
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
Worktable1.
Nested iteration.
Using Clustered Index.
Ascending scan.
Positioning by key.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
Worktable2.
Nested iteration.
Using Clustered Index.
Ascending scan.
Positioning by key.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
or database used in the query uses a data cache with large I/O sized
pools, the SQL Server optimizer can choose to use large I/O for some
types of queries.
See Chapter 15, “Memory Use and Performance,” for more
information on large I/Os and the data cache.
Flattened Queries
When subqueries are flattened into existence joins, the output looks
like normal showplan output for a join, with the possible exception of
the message “EXISTS TABLE: nested iteration.”
This message indicates that instead of the normal join processing,
which looks for every row in the table that matches the join column,
SQL Server uses an existence join and returns TRUE as soon as the
first qualifying row is located. For more information on subquery
flattening, see “Flattening in, any, and exists Subqueries” on page
7-27.
SQL Server flattens the following subquery into an existence join:
select title
from titles
where title_id in
(select title_id
from titleauthor)
and title like "A Tutorial%"
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Index : title_ix
Ascending scan.
Positioning by key.
Keys are:
title
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
titleauthor
Materialized Queries
STEP 1
The type of query is SELECT (into Worktable1).
GROUP BY
Evaluate Grouped MAXIMUM AGGREGATE.
FROM TABLE
sales_summary
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
TO TABLE
Worktable1.
STEP 2
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
Worktable1.
EXISTS TABLE : nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
The showplan message “EXISTS TABLE: nested iteration,” near the
end of the output, shows that SQL Server has performed an existence
join.
The structure is shown in Figure 8-1, using the showplan output from
this query:
select title_id
from titles
where total_sales > all (select total_sales
from titles
where type = 'business')
STEP 1
The type of query is SELECT.
Correlated Subquery.
Subquery under an ALL predicate.
STEP 1
The type of query is SELECT.
Subquery delimiters
Evaluate Ungrouped ANY AGGREGATE. Subquery plan
FROM TABLE
titles
EXISTS TABLE : nested iteration.
Index : tp
Ascending scan.
Positioning by key.
Keys are:
type
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
Type of Subquery
Correlated Subquery.
Non-correlated Subquery.
Every subquery is either correlated or noncorrelated. showplan
evaluates the type of subquery and, if the subquery is correlated,
Subquery Predicates
Subquery under an IN predicate.
Subquery under an ANY predicate.
Subquery under an ALL predicate.
Subquery under an EXISTS predicate.
Subquery under an EXPRESSION predicate.
Table 8-5 lists the showplan messages that identify the operator or
expression that introduces the subquery.
Message
Subquery under an IN The subquery is introduced by in or not
predicate. in.
Subquery under an ANY The subquery is introduced by any.
predicate.
Subquery under an ALL The subquery is introduced by all.
predicate.
Subquery under an EXISTS The subquery is introduced by exists or
predicate. not exists.
Subquery under an The subquery is introduced by an
EXPRESSION predicate. expression, or the subquery is in the
select list.
For example:
select type, title_id
from titles
where price > all
(select price
from titles
where advance < 15000)
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
QUERY PLAN FOR SUBQUERY 1 (at nesting level 1 and at line 4).
Correlated Subquery.
Subquery under an ALL predicate.
STEP 1
The type of query is SELECT.
Evaluate Ungrouped ANY AGGREGATE.
FROM TABLE
titles
EXISTS TABLE : nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
STEP 1
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Index : title_ix
Ascending scan.
Positioning by key.
Keys are:
title
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
QUERY PLAN FOR SUBQUERY 1 (at nesting level 1 and at line 1).
Correlated Subquery.
Subquery under an EXPRESSION predicate.
STEP 1
The type of query is SELECT.
Evaluate Ungrouped ONCE AGGREGATE.
FROM TABLE
publishers
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
STEP 1
The type of query is SELECT.
FROM TABLE
publishers
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
QUERY PLAN FOR SUBQUERY 1 (at nesting level 1 and at line 3).
Correlated Subquery.
Subquery under an EXPRESSION predicate.
STEP 1
The type of query is SELECT.
Evaluate Ungrouped ONCE-UNIQUE AGGREGATE.
FROM TABLE
titles
Nested iteration.
Index : comp_i
Ascending scan.
Positioning by key.
Keys are:
price
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
STEP 1
The type of query is SELECT.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
QUERY PLAN FOR SUBQUERY 1 (at nesting level 1 and at line 4).
Correlated Subquery.
Subquery under an EXISTS predicate.
STEP 1
The type of query is SELECT.
Evaluate Ungrouped ANY AGGREGATE.
FROM TABLE
publishers
EXISTS TABLE : nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
➤ Note
This chapter suggests workarounds to certain optimization problems. If you
experience these types of problems, call Sybase Technical Support.
◆ WARNING!
Use these options with caution. The forced plans may be
inappropriate in some situations and cause very poor performance. If
you include these options in your applications, be sure to check their
query plans, I/O statistics, and other performance data regularly.
These options are generally intended for use as tools for tuning and
experimentation, not as long-term solutions to optimization
problems.
forceplan example
STEP 1
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Index : title_ix
Ascending scan.
Positioning by key.
Keys are:
title
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
titleauthor
Nested iteration.
Index : ta_au_tit_ix
Ascending scan.
FROM TABLE
authors
Nested iteration.
Using Clustered Index.
Index : au_id_ix
Ascending scan.
Positioning by key.
Keys are:
au_id
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
statistics io for the query shows a total of 154 physical reads and 2431
logical reads:
Table: titles scan count 1, logical reads: 29, physical
reads: 27
Table: authors scan count 34, logical reads: 102, physical
reads: 35
Table: titleauthor scan count 25, logical reads: 2300,
physical reads: 92
Total writes for this command: 0
If you use forceplan, the optimizer chooses a reformatting strategy on
titleauthor, resulting in this showplan report:
QUERY PLAN FOR STATEMENT 1(at line 1).
STEP 1
The type of query is INSERT.
The update mode is direct.
Worktable1 created for REFORMATTING.
FROM TABLE
titleauthor
Nested iteration.
Index : ta_au_tit_ix
Ascending scan.
Positioning at index start.
Index contains all needed columns. Base table will not
be read.
Using I/O Size 2 Kbytes.
STEP 2
The type of query is SELECT.
FROM TABLE
titles
Nested iteration.
Index : title_ix
Ascending scan.
Positioning by key.
Keys are:
title
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
authors
Nested iteration.
Table Scan.
Ascending scan.
Positioning at start of table.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strategy.
FROM TABLE
Worktable1.
Nested iteration.
Using Clustered Index.
Ascending scan.
Positioning by key.
Using I/O Size 2 Kbytes.
With LRU Buffer Replacement Strateg
Table: titles scan count 1, logical reads: 29, physical
reads: 27
Table: authors scan count 25, logical reads: 5525, physical
reads: 221
Table: titleauthor scan count 1, logical reads: 92, physical
reads: 60
Table: Worktable1 scan count 125000, logical reads: 389350,
physical reads: 27
Total writes for this command: 187
Figure 9-1 shows the sequence of the joins and the number of scans
required for each query plan.
s
s
an
an
sc
sc
34
25
Uses index, Uses index, Uses index
finds 25 rows finds 34 rows
ns
s
an
ca
sc
0s
25
00
*5
25
Reformatting Uses Index, Table scans Uses index
required finds 25 rows on worktable
• If the query joins more than four tables, use set table count to see if
it results in an improved join order. See “Increasing the Number
of Tables Considered by the Optimizer” on page 9-7.
select select_list
from table_name
(index index_name)
[, table_name ...]
where ...
delete table_name
from table_name (index index_name) ...
update table_name set col_name = value
from table_name (index index_name) ...
Here’s an example:
select pub_name, title
from publishers p, titles t (index date_type)
where p.pub_id = t.pub_id
and type = "business"
and pubdate > "1/1/93"
Specifying an index in a query can be helpful when you suspect that
the optimizer is choosing a suboptimal query plan. When you use
this option:
• Always check statistics io for the query to see whether the index
you choose requires less I/O than the optimizer’s choice.
• Be sure to test a full range of valid values for the query clauses,
especially if you are tuning range queries, since the access
methods for these queries are sensitive to the size of the range. In
some cases, skew of values in a table or out-of-data statistics may
be other causes for apparent failure to use the correct index.
Use this option only after testing to be certain that the query
performs better with the specified index option. Once you include
this index option in applications, you should check regularly to be
sure that the resulting plan is still superior to other choices that the
optimizer makes.
If you want to force a table scan, use the table name in place of
index_name.
➤ Note
If you have a nonclustered index with the same name as the table,
attempting to specify a table name causes the nonclustered index to be
used. You can force a table scan using select select_list from tableA (0).
➤ Note
If you are experimenting with prefetch sizes and checking statistics i/o for
physical reads, you may need to clear pages from the cache so that SQL
Server will perform physical I/O on the second execution of a query. If the
table or index, or its database, is bound to a named data cache, you can
unbind and rebind the object. If the query uses the default cache, or if other
tables or indexes are bound to the object’s cache, you can run queries on
other tables that perform enough I/O to push the pages out of the memory
pools.
set prefetch on
When you trace queries through this facility, run your queries in the
same manner as your application, as follows:
• Supply the same parameters and values to your stored
procedures or SQL statements.
• If the application uses cursors, use cursors in your tests.
Be very careful to ensure that your trace tests cause the optimizer to
make the same decisions as in your application. You must supply the
same parameters and values to your stored procedures or where
clauses.
If you are using stored procedures, make sure that they are actually
being optimized during the trial by executing them with recompile.
In most situations, SQL Server can use only one index per table in a
query. This means the optimizer must often choose between indexes
when there are multiple where clauses supporting both search
arguments and join clauses. The optimizer’s first step is to match
search arguments and join clauses to available indexes.
The most important item that you can verify using this trace facility
is that the optimizer is evaluating all possible where clauses included
in each Transact-SQL statement.
If a clause is not included in this output, then the optimizer has
determined it is not a valid search argument or join clause. If you
believe your query should benefit from the optimizer evaluating this
clause, find out why the clause was excluded, and correct it if
possible. The most common reasons for “non-optimizable” clauses
include:
• Data type mismatches
• Use of functions, arithmetic, or concatenation on the column
• Numerics compared against constants that are larger than the
definition of the column
See “Search Arguments and Using Indexes” on page 7-8 for more
information on requirements for search arguments.
Identifying how the optimizer estimates I/O often leads to the root
of the problems and to solutions. You will be able to see when the
optimizers uses your distribution page statistics and when it uses
default values.
The first line identifies the table name and its associated object ID.
The actual output for this line looks like this:
Entering q_score_index() for table ’titles’ (objecti 208003772),
varno = 0
The optimizer analyzes all search arguments for all tables in each
query, followed by all join clauses for each table in the query.
Therefore, you first see q_score_index() called for all tables in which the
optimizer has found a search clause. The routine numbers the tables
in the order in which they were specified in the from clause and
displays the numbers as the varno. It starts numbering with 0 for the
first table.
Any search clause not included in this section should be evaluated to
determine whether its absence impacts performance.
Following the search clause analysis, q_score_index() is called for all
tables where the optimizer has found a join clause. As above, any join
clause not included in this section should be evaluated to determine
whether its absence is impacting performance.
The next line prints the size of the table in both rows and pages:
The table has 5000 rows and 624 pages.
These sizes are pulled from the system tables where they are
periodically maintained. There are some known problems where
inaccurate row estimates cause bad query plans, so verify that this is
not the cause of your problem.
The next two lines indicate the type of clause and a representation of
the clause itself with column names and abbreviations for the
operators. It indicates:
• That it is evaluating a search clause, like this:
Scoring the SEARCH CLAUSE:
au_fname EQ
• That it is evaluating a join clause, like this:
Scoring the JOIN CLAUSE:
au_id EQ au_id
All search clauses for all tables are evaluated before any join clauses
are evaluated.
If your queries include a range query or clauses that are treated like
range queries, they are evaluated in a single analysis to produce an
estimate of the number of rows for the range. For example,
Scoring the SEARCH CLAUSE:
au_lname LT
au_lname GT
Range queries include:
• Queries using the between clause
• Interval clauses with and on the same column name, such as:
datecol1 >= "1/1/94" and datecol1 < "2/1/94"
• like clauses such as:
like "k%"
Specified Indexes
If the query has specified the use of a specific index by including the
index keyword and the index name in parentheses after the table
name in the from clause, this is noted in the output:
User forces index IndexID.
Specifying an index prevents consideration of other alternatives.
If the I/O size and cache strategy are also included in the query, these
messages are printed:
User forces data prefetch of 8K
User forces LRU buffer replacement strategy
The next line of output displays the cost of a table scan for
comparison, provided that there is at least one other qualification or
index that can be considered. It reports index ID 0 and should match
the table size estimate displayed earlier. The line looks like this:
Base cost: indid: IndexID rows: rows pages: pages prefetch: <S|N>
I/O size: io_size cacheid: cacheID replace: <LRU | MRU>
Here is an example:
Base cost: indid: 0 rows: 5000 pages: 624 prefetch: N
I/O size: 2 cacheid: 0 replace: LRU
Output Meaning
indid The index ID from sysindexes; 0 for the table itself.
rows The number of rows in the table.
pages The number of pages in the table.
prefetch Whether prefetch would be considered for the table scan.
I/O size The I/O size to be used.
cacheid The ID of the data cache to be used.
replace The cache replacement strategy to be used, either LRU or MRU.
Verify page and row counts for accuracy. Inaccurate counts can cause
bad plans. To get a completely accurate count, use the set statistics io on
command along with a select * from tablename query. In a VLDB (very
large database) or in 24x7 shops (applications that must run 24 hours
a day, 7 days a week), where that is not practical, you may need to
rely on the reasonable accuracy of the sp_spaceused system procedure.
dbcc allocation-checking commands print the object size and correct
the values on which sp_spaceused and other object-size estimates are
based.
Costing Indexes
Next, the optimizer evaluates each useful index for a given clause to
determine its cost. The optimizer first looks for a unique index that is
totally qualified—meaning that the query contains where clauses on
each of the keys in the index. If such an index is available, the
optimizer immediately knows that only a single row satisfies the
clause, and it prints the following line:
Unique index_type index found--return rows 1 pages pages
The index_type is either clustered or nonclustered. There are three
possibilities for the number of pages:
• The unique index is clustered. The logical I/O cost is the height of
the index tree. In a clustered index, the data pages are the leaf
level of the index, so the data page access is included.
• The unique nonclustered index covers the query. The logical I/O
is the height of the index tree. The data page access is not needed,
and not counted.
• The unique nonclustered index does not cover the query. An
additional logical I/O is necessary to get from the leaf level of the
nonclustered index to the data page, so the logical I/O cost is the
height of the nonclustered index plus one page.
If the index is not unique, then the optimizer determines the cost, in
terms of logical I/Os, for the clause. Before doing so, it prints this
line:
Relop bits are: integer
This information can be ignored. It merely restates the comparison
operator (that is, =, <, >, interval, and so on) listed in the q_score_index()
line mentioned earlier as an integer bitmap. This information is only
necessary for Sybase Engineering to debug optimizer problems and
it has no value for customer-level troubleshooting.
To estimate the I/O cost for each clause, the optimizer has a number
of tools available to it, depending on the clause type (search clause or
join clause) and the availability of index statistics. For more
information, see “Index Statistics” on page 6-35.
page with the index unless the table is empty. This histogram is a
sampling of the index key values every N rows.
N is dependent on the full size of the key (including overhead) and
the number of rows in the table. Each sampling is known as a step.
Since the optimizer knows how many rows exist between steps and
the density of keys in the index, it can estimate the number of rows
satisfying a clause with reasonable accuracy.
For search clauses, the optimizer can look up specific values on the
distribution page, if these values are known at compile time. In this
case, it first identifies the distribution page and the number of steps
with the following trace output:
Qualifying stat page; pgno: page_number steps: steps
For atomic datatypes (datatypes such as tinyint, smallint, int, char,
varchar, binary, and varbinary, which are not internally implemented
as structures), it prints the constant value the search argument
supplied to the optimizer. It looks like this:
Search value: constant_value
If the value is implemented as a structure, the following message is
output to indicate that the optimizer does not waste time building
the structure’s printable representation:
*** CAN’T INTERPRET ***
This indicates that several steps matched the constant and that they
were found either at the beginning or at the end of the distribution
page.
equal to a single row (1st or last) -use endsingleSC
This indicates that only one step matched the constant and it was
found either at the beginning or at the end of the distribution page.
equal to several rows in middle of page -use midseveralSC
This indicates that several steps matched the constant and that they
were found in the middle of the distribution page.
equal to single row in middle of page -use midsingleSC
This indicates that several steps matched the constant and that they
were found in the middle of the distribution page.
For a range query, the trace facility looks up the steps for both the
upper and lower bounds of the query. This message appears:
Scoring SARG interval, lower bound.
After displaying the costing estimates for the lower bound, the net
selectivity is calculated and displayed as:
Net selectivity of interval: float_value
The selectivity for search clauses is printed as the fraction of the rows
in the table expected to qualify. Therefore, the lower the number, the
more selective the search clause and the fewer the rows that are
expected to qualify. Search clauses are output as:
Search argument selectivity is float_val.
The selectivity of the clause is printed last. Join clauses are output as:
Join selectivity is float_val.
The selectivity for join clauses is output as the whole number of the
fraction 1 divided by the selectivity. Therefore, the higher the
number selectivity, the more selective the join clause, and the fewer
the rows are expected to qualify.
At this point, the optimizer has evaluated all indexes for this clause
and will proceed to optimize the next clause.
Introduction
This chapter presents certain types of SQL queries where simple
changes in the query can improve performance. This chapter
emphasizes only queries and does not focus on schema design.
Many of the tips are not related to the SQL Server query optimizer.
These tips are intended as suggestions and guidelines, not absolute
rules. You should use the query analysis tools to test the alternate
formulations suggested here.
Performance of these queries may change with future releases of
SQL Server.
For example, the optimizer cannot optimize the final select in the
following procedure, because it cannot know the value of @x until
execution time:
create procedure p
as
declare @x int
select @x = col
from tab where ...
select *
from tab2
where indexed_col = @x
When SQL Server encounters unknown values, it uses
approximations to develop a query plan, based on the operators in
the search argument, as shown in Table 10-1.
➤ Note
SQL Server does optimize search arguments that are linked with or. This
description applies only to join clauses.
SQL Server can optimize selects with joins that are linked with union.
The result of or is somewhat like the result of union, except for the
treatment of duplicate rows and empty tables:
• union removes all duplicate rows (in a sort step); union all does not
remove any duplicates. The comparable query using or might
return some duplicates.
• A join with an empty table returns no rows.
For example, when SQL Server processes this query, it must look at
every row in one of the tables for each row in the other table:
select *
from tab1, tab2
where tab1.a = tab2.b
or tab1.x = tab2.y
If you use union, each side of the union is optimized separately:
select *
from tab1, tab2
where tab1.a = tab2.b
union all
select *
from tab1, tab2
where tab1.x = tab2.y
You can use union instead of union all if you want to eliminate
duplicates, but this eliminates all duplicates. It may not be possible
to get exactly the same set of duplicates from the rewritten query.
Aggregates
SQL Server uses special optimizations for the max and min aggregates
when there is an index on the aggregated column.
For min, it reads the first value on the root page of the index.
For max, it goes directly to the end of the index to find the last row.
min and max optimizations are not applied if:
• The expression inside the max or min is anything but a column.
Compare max(numeric_col*2) and max(numeric_col)*2, where
numeric_col has a nonclustered index. The second uses max
optimization; the first performs a scan of the nonclustered index.
• The column inside the max or min is not the first column of an
index. For nonclustered indexes, it can perform a scan on the leaf
level of the index; for clustered indexes, it must perform the table
scan.
• There is another aggregate in the query.
• There is a group by clause.
In addition, the max optimization is not applied if there is a where
clause.
Note that char null is really stored as varchar, and binary null is really
varbinary. Joining char not null with char null involves a conversion;
the same is true of the binary types. This affects all character and
binary types, but does not affect numeric datatypes and datetimes.
Introduction
SQL Server protects the tables or data pages currently used by active
transactions by locking them. Locking is a concurrency control
mechanism: it ensures the consistency of data across transactions. It
is needed in a multi-user environment, since several users may be
working with the same data at the same time.
This chapter discusses:
• Consistency issues that arise in multiuser databases
• SQL Server options for enforcing different levels of isolation
• Locks used in SQL Server
• How different isolation levels affect SQL Server locks
• Defining an isolation level using the set transaction isolation level
command or the at isolation clause
• How the holdlock and noholdlock keywords affect locking
• Cursors and locking
• Locks used by Transact-SQL commands
• System procedures for examining locks and user processes
blocked by locks (sp_lock and sp_who)
• SQL Server’s handling of deadlocks
• Locking and performance issues
• Strategies for reducing lock contention
• Configuration options that affect locking
Overview of Locking
Consistency of data means that if multiple users repeatedly execute
a series of transactions, the results are the same each time. This
means that simultaneous retrievals and modifications of data do not
interfere with each other.
For example, assume that the transactions in Figure 11-1, T1 and T2,
are run at approximately the same time.
T1 Event Sequence T2
begin transaction T1 and T2 start begin transaction
T3 Event Sequence T4
begin transaction T3 and T4 start begin transaction
T5 Event Sequence T6
begin transaction T5 and T6 start begin transaction
T7 Event Sequence T8
begin transaction T7 and T8 start begin transaction
Granularity of Locks
Table locks also provide a way to avoid lock collisions at the page
level. SQL Server automatically uses table locks for some commands.
Page Locks
The following examples show what kind of page locks SQL Server
uses for the respective statement (assuming that indexes are used on
the search arguments):
select balance from account Shared page lock
where acct_number = 25
insert account values(34, 500) Exclusive page lock
delete account Update page locks
where balance < 0 Exclusive page locks
update account set balance = 0 Update page lock
where acct_number = 25 Exclusive page lock
Table Locks
Demand Locks
Yes
No Promote to
Does any other process hold
exclusive lock on object? table lock.
Yes
Do not promote
to table lock.
Precedence of Settings
You can change the lock promotion thresholds for any user database
or an individual table. Settings for an individual table override the
database or server-wide settings; settings for a database override the
server-wide values.
Server-wide values for lock promotion are represented by the lock
promotion HWM, lock promotion LWM, and lock promotion PCT configuration
parameters. Server-wide values apply to all user tables on the server
unless the database or tables have lock promotion values configured
for them.
Use the system procedure sp_sysmon to see how many times lock
promotions take place and the types of promotions they are. See
Chapter 19, “Monitoring SQL Server Performance with sp_sysmon”
and the topic “Lock Promotions” on page 19-46.
If there is a problem, look for signs of lock contention in “Granted”
and “Waited” data in the Lock Detail section of the sp_sysmon output.
(See “Lock Detail” on page 19-42 for more information.) If lock
contention is high and lock promotion is frequent, consider changing
the lock promotion thresholds for the tables involved.
Use SQL Server Monitor, a separate Sybase product, to see how
changes to the lock promotion threshold affect the system at the
object level.
T3 Event Sequence T4
begin transaction T3 and T4 start begin transaction
You can also change the isolation level for a query by using the at
isolation clause with the select or readtext statements. The options in the
at isolation clause are:
Level Option
0 read uncommitted
1 read committed
3 serializable
T7 Event Sequence T8
begin transaction T7 and T8 start begin transaction
In transaction T7, SQL Server applies shared page locks (if an index
exists on the acct_number argument) or a shared table lock (if no
index exists) and holds those locks until the end of T7. The insert in T8
cannot get its exclusive lock until T7 releases those shared locks. If T7
is a long transaction, T8 (and other transactions) may wait for longer
periods of time using isolation level 3 instead of the other levels. As
a result, you should use level 3 only when required.
for update, SQL Server uses update page locks by default when
scanning tables or views referenced with the for update clause of declare
cursor. If the for update list is empty, all tables and views referenced in
the from clause of the select_statement receive update locks.
SQL Server releases shared locks when the cursor position moves off
a data page. If a row of an updatable cursor is updated or deleted,
SQL Server promotes its shared (for cursors declared without the for
update clause) or update lock to an exclusive lock. Any exclusive locks
acquired by a cursor in a transaction are held until the end of that
transaction. This also applies to shared or update locks for cursors
using the holdlock keyword or isolation level 3.
The following describes the locking behavior for cursors at each
isolation level:
• At level 0, SQL Server uses no locks on any base table page that
contains a row representing a current cursor position. Cursors
acquire no read locks for their scans, so they do not block other
applications from accessing the same data. However, cursors
operating at this isolation level are not updatable, and they
require a unique index on the base table to ensure accuracy.
• At level 1, SQL Server uses a shared or update lock on base table
or index pages that contain a row representing a current cursor
position. The page remains locked until the current cursor
position moves off the page as a result of fetch statements.
• At level 3, SQL Server uses a shared or update lock on any base
table or index pages that have been read in a transaction through
the cursor. SQL Server holds the locks until the transaction ends;
it does not release the locks when the data page is no longer
needed.
If you do not set the close on endtran option, a cursor remains open past
the end of the transaction, and its current page lock remains in effect.
It could also continue to acquire locks as it fetches additional rows.
When declaring an updatable cursor using the for update clause, you
can tell SQL Server to use shared page locks (instead of update page
locks) in the cursor’s declare cursor statement:
declare cursor_name cursor
for select select_list
from {table_name | view_name} shared
for update [of column_name_list]
Table 11-1: Summary of locks for insert and create index statements
Table Page
Statement
Lock Lock
insert IX X
create clustered index X -
create nonclustered index S -
IX = intent exclusive, S = shared, X = exclusive
Table 11-2 describes the types of locks SQL Server applies for select,
delete, and update statements. It divides the select, update, and delete
statements into two groups, since the types of locks they use can vary
if the statement’s search argument references indexed columns on
the object.
Table 11-2: Summary of locks for select, update and delete statements
Note that the above tables do not describe situations in which SQL
Server initially uses table locks (if a query requires the entire table),
or when it promotes to a table lock after reaching the lock promotion
threshold.
Example of Locking
T1 Event Sequence T2
begin transaction T1 and T2 start begin transaction
T1 Locks T2 Locks
Update lock page 1
Exclusive lock page 1 Shared lock page 1 denied, wait for release
Intent exclusive table lock on account
Update lock page 5
Exclusive lock page 5
Release all locks at commit
Shared lock page 1, release lock page 1
Intent shared table lock on account
Shared lock page 2, release lock page 2
Shared lock page 3, release lock page 3
Shared lock page 4, release lock page 4
Shared lock page 5, release lock page 5
Release intent shared table lock
T1 Locks T2 Locks
Exclusive table lock on account Shared lock page 1 denied, wait for release
Release exclusive table lock at commit
Shared lock page 1, release lock page 1
Intent shared table lock on account
Shared lock page 2, release lock page 2
Shared lock page 3, release lock page 3
Shared lock page 4, release lock page 4
Shared lock page 5, release lock page 5
Release intent shared table lock
If you add a holdlock or make isolation level 3 the default using the
transaction isolation level option for transaction T2, the lock sequence is
as follows (assuming an index exists for acct_number):
T1 Locks T2 Locks
Update lock page 1
Exclusive lock page 1 Shared lock page 1 denied, wait for release
Intent exclusive table lock on account
Update lock page 5
Exclusive lock page 5
Release all locks at commit
Shared lock page 1
Intent shared table lock on account
Shared lock page 2
Shared lock page 3
Shared lock page 4
Shared lock page 5
Release all locks at commit
If you add holdlock or make transaction isolation level 3 for T2 and no index
exists for acct_number, SQL Server applies table locks for both
transactions instead of page locks:
T1 Locks T2 Locks
Exclusive table lock on account Shared table lock denied, wait for release
Release exclusive table lock at commit
Shared table lock on account
Release shared table lock at commit
To get a report on the locks currently being held on SQL Server, use
the system procedure sp_lock:
sp_lock
The class column will display the cursor name for locks
associated with a cursor for the current user and the cursor id
for other users.
spid locktype table_id page dbname class
---- ----------- ---------- ---- ------ ---------------
1 Ex_intent 1308531695 0 master Non cursor lock
1 Ex_page 1308531695 761 master Non cursor lock
5 Ex_intent 144003544 0 userdb Non cursor lock
5 Ex_page 144003544 509 userdb Non cursor lock
5 Ex_page 144003544 1419 userdb Non cursor lock
5 Ex_page 144003544 1420 userdb Non cursor lock
5 Ex_page 144003544 1440 userdb Non cursor lock
5 Sh_page 144003544 1440 userdb Non cursor lock
5 Sh_table 144003544 0 userdb Non cursor lock
5 Update_page 144003544 1440 userdb Non cursor lock
4 Ex_table 240003886 0 pubs2 Non cursor lock
4 Sh_intent 112003436 0 pubs2 Non cursor lock
4 Ex_intent-blk 112003436 0 pubs2 Non cursor lock
The locktype column indicates not only whether the lock is a shared
lock (“Sh” prefix), an exclusive lock (“Ex” prefix), or an “update”
lock, but also whether it is held on a table (“table” or “intent”) or on
a “page.”
A “blk” suffix indicates that this process is blocking another process
that needs to acquire a lock. As soon as the blocking process
completes, the other processes move forward. A “demand” suffix
indicates that the process will acquire an exclusive lock as soon as all
current shared locks are released.
Avoiding Deadlocks
lock contention than page locks, since no other process can access the
table. Creating a useful index for the query allows the data
modification statement to use page locks, improving concurrent
access to the table.
If creating an index for a lengthy update or delete transaction is not
possible, you can perform the operation in a cursor, with frequent
commit transaction statements to reduce the number of page locks.
Hot spots occur when all updates take place on a certain page, as in
a heap table, where all inserts happen on the last page of the page
chain. For example, an unindexed history table that is updated by
everyone will always have lock contentions on the last page.
The best solution to this problem is to partition the history table.
Partitioning a heap table creates multiple page chains in the table,
and therefore multiple last pages for inserts. Concurrent inserts to
the table are less likely to block one another, since multiple last pages
are available. Partitioning provides a way to improve concurrency
for heap tables without creating separate tables for different groups
of users. See “Improving Insert Performance with Partitions” on
page 13-12 for information about partitioning tables.
Another solution for hot spots is to create a clustered index to
distribute the updates across the data pages in the table. Like
partitioning, this solution creates multiple insertion points for the
table. However, it also introduces some overhead for maintaining the
physical order of the table’s rows.
select into does not carry over the base table’s max_rows_per_page value,
but creates the new table with a max_rows_per_page value of 0. Use
sp_chgattribute to set the max_rows_per_page value on the target table.
These locking guidelines can help reduce lock contention and speed
performance:
• Never include user interaction between the beginning of a
transaction and its commit or rollback.
Since SQL Server holds some locks until the transaction ends, if a
user can hold up the commit or rollback (even for a short time),
there will be higher lock contention.
• Keep transactions short.
SQL Server releases exclusive and update locks only on commit
or rollback. The longer the transaction, the longer these locks are
held. This blocks other activity and leads to blocking and
deadlocks.
• Keep transactions in one batch.
Network interaction during a transaction can introduce
unnecessary delays in completing the transaction and releasing
its locks.
• Use the lowest level of locking required by each application, and
use isolation level 3 only when necessary.
Each lock counts toward SQL Server’s limit of total number of locks.
By default, SQL Server is configured with 5000 locks. A System
Administrator can change this limit using the sp_configure system
procedure. For example:
sp_configure number of locks, 10000
You may also need to adjust the total memory option of sp_configure,
since each lock uses 72 bytes of memory.
The number of locks required by a server can vary depending on the
number of concurrent processes and the types of actions performed
by the transactions. However, a good starting assumption is that
each concurrent process uses about 20 locks.
What Is a Cursor?
Declare cursor
Open cursor
Fetch row
Process row
(Examine/Update/Delete)
Deallocate cursor
Declare cursor
Open cursor
Fetch row
Process row
Table
(Examine/Update/Delete) locks
Page (intent); Memory
Yes Get next locks some
page
row? locks
No
Close cursor
Deallocate cursor
The memory resource descriptions in Figure 12-3 and Table 12-1 refer
to ad hoc cursors sent using isql or Client-Library™. For other kinds
of cursors, the locks are the same, but the memory allocation and
deallocation differ somewhat according to the type of cursor, as
described in “Memory Use and Execute Cursors” on page 12-5.
Table 12-1: Locks and memory use for isql and Client-Library client cursors
Cursor
Resource Use
Command
declare cursor When you declare a cursor, SQL Server allocates memory to
the cursor and to store the query plan that is generated. The
size of the query plan depends on the select statement, but it
generally ranges from one to two pages.
open When you open a cursor,SQL Server starts processing the
select statement. The server optimizes the query, traverses
indexes, and sets up memory variables. The server does not
access rows yet, except when it needs to build worktables.
However, it does set up the required table-level locks (intent
locks) and, if there are subqueries or joins, page locks on the
outer table(s).
fetch When you execute a fetch, SQL Server sets up the required
page lock, gets the row or rows required and reads specified
values into the cursor variables, or sends the row to the
client. The page lock is held until a fetch moves the cursor off
the page or until the cursor is closed. This page lock is either
a shared page lock or an update page lock, depending on
how the cursor is written.
close When you close a cursor, SQL Server releases the shared
locks and some of the memory allocation. You can open the
cursor again, if necessary.
deallocate cursor When you deallocate a cursor, SQL Server releases the rest of
the memory resources used by the cursor. To reuse the
cursor, you must declare it again.
The descriptions of declare cursor and deallocate cursor in Table 12-1 refer
to adhoc cursors that are sent using isql or Client-Library. Other kinds
of cursors allocate memory differently:
• For cursors that are declared on stored procedures, only a small
amount of memory is allocated at declare cursor time. Cursors
declared on stored procedures are sent using Client-Library or
the pre-compiler and are known as “execute cursors.”
• For cursors declared within a stored procedure, memory is
already available for the stored procedure, and the declare
statement does not require additional memory.
Specify the cursor mode when you declare the cursor. Note that if the
select statement includes certain options, the cursor is not updatable
even if you declare it for update.
return
Results from tests like these can vary widely. They are most
pronounced on systems with busy networks, larger numbers of
active database users, and multiple users accessing the same table.
Using sp_lock, examine the locks that are in place at each arrow:
If you issue another fetch command after the last row of the result set
has been fetched, the locks on the last page are released, so there will
be no cursor-related locks.
fetch curs2
go begin tran
go
select *
from authors
holdlock
where au_id = au_id
fetched at left
go
sp_lock
go
delete from authors
where current of curs2
go
/* what happens? */
close curs2
go
• Fetch more than one row if you are returning rows to the client.
• Keep cursors open across commits and rollbacks.
• Open multiple cursors on a single connection.
Cursors cannot use the dynamic index of row IDs generated by the
OR strategy. Queries that use the OR strategy in standalone select
statements usually table scan using read-only cursors. If they are
updatable cursors, they may need to use a unique index and still
require access to each data row in sequence in order to evaluate the
query clauses.
Read-only cursors using union create a worktable when the cursor is
declared, and sort it to remove duplicates. Fetches are performed on
the worktable. Cursors using union all can return duplicates and do
not require a worktable.
SQL Server acquires update locks on all the tables that have columns
listed in the for update clause of the cursor select statement. If the for
update clause is not included in the cursor declaration, all the tables
referenced in the from clause acquire update locks.
This query includes the name of the column in the for update clause:
declare curs3 cursor
for
select au_lname, au_fname, price
from titles t, authors a,
titleauthor ta
where advance <= $1000
and t.title_id = ta.title_id
and a.au_id = ta.au_id
for update of price
Table 12-5 shows the effects of:
• Omitting the for update clause entirely—no shared clause
• Omitting the column name from the for update clause
• Including the name of the column to be updated in the for update
clause
• Adding shared after the name of the titles table while using for
update of price
In the table, the additional locks, or more restrictive locks for the two
versions of the for update clause are emphasized.
Table 12-5: Effects of for update clause and shared on cursor locking
The following problems may indicate that your system could benefit
from attention to object placement:
• Single-user performance is all right, but response time increases
significantly when multiple processes are executed.
• Access to a mirrored disk takes twice as long as access to an
unmirrored disk.
• Query performance degrades when system table activity
increases.
• Maintenance activities seem to take a long time.
• Stored procedures seem to slow down as they create temporary
tables.
• Insert performance is poor on heavily used tables.
Underlying Problems
Logical device
userdev1
Physical disk
Desirable:
access spread
equally across
disks
Undesirable:
sybsecurity
master
tempdb Application
tempdb databases
Desirable:
master
Application
sybsecurity tempdb databases
➤ Note
Using operating system files for user data devices is not recommended on
UNIX systems, since these systems buffer I/O in the operating system.
Databases placed on operating system files may not be recoverable after a
system crash.
Placing the transaction log on the same device as the data itself is
such a common but dangerous reliability problem that both create
database and alter database require the use of the with override option if
you attempt to put the transaction log on the same device as the data
itself. Placing the log on a separate segment:
• Limits log size, which keeps it from competing with other objects
for disk space
• Allows use of threshold management techniques to prevent the
log from filling up and to automate transaction log dumps
• Improves performance, if the log is placed is on separate physical
disk
• Ensures full recovery in the event of hard disk crashes on the data
device, if the log is placed on a separate physical disk
Disk 1
device1
data
pubtune
log
Disk 2
device2
The log device can perform significant I/O on systems with heavy
update activity. SQL Server writes log records to syslogs when
transactions commit and may need to read log pages into memory
for deferred updates or transaction rollbacks.
If your log and data are on the same database devices, the extents
allocated to store log pages are not contiguous; log extents and data
extents are mixed. When the log is on its own device, the extents tend
to be allocated sequentially, reducing disk head travel and seeks,
thereby maintaining a higher I/O rate.
If you mirror data, put the mirror on a separate physical disk from
the device that it mirrors. Disk hardware failure often results in
whole physical disks being lost or unavailable. Do not mirror a
database device to another portion of the same physical disk.
Undesirable Desirable
device1
device1 Mirror
device2 Mirror
device2
• Serial mode increases the time to required write data even more
than noserial mode. SQL Server starts the first write, and waits
for it to complete before initiating the second write. The time
required is W1+W2.
◆ WARNING!
Unless you are sure that your mirrored database system does not
need to be absolutely reliable, do not use noserial mode.
data_dev1 data_dev2
A System Administrator must initialize the device with disk init, and
the disk must be allocated to the database by the System
Administrator or the database owner with create database or alter
database.
Once the devices are available to the database, the database owner or
object owners can create segments and place objects on the devices.
If you create a user-defined segment, you can place tables or indexes
on that segment with the create table and create index commands:
create table tableA(...) on seg1
create nonclustered index myix on tableB(...)
on seg2
By controlling their location, you can arrange for active tables and
indexes to be spread across disks.
Nonclustered
indexes
Table
device1 device 2
Disk1 Disk2
segment1 segment2
segment3 TableA
When a table includes a text or image datatype, the table itself stores
a pointer to the text or image value. The actual text or image data is
stored on a separate linked list of pages. Writing or reading a text
value requires at least two disk accesses, one to read or write the
pointer and subsequent reads or writes for the text values. If your
application frequently reads or writes these values, you can improve
performance by placing the text chain on a separate physical device.
Isolate text and image chains to disks that are not busy with other
application-related table or index access.
Device1 Device2
When you create a table with a text or image column, SQL Server
creates a row for the text chain in sysindexes. The value in the name
column is the table name prefixed with a “t”; the indid is always 255.
Note that if you have multiple text or image columns in a single table,
there is only one text chain. By default, the text chain is placed on the
same segment with the table.
You can use sp_placeobject to move all future allocations for the text
columns to a separate segment. See “Placing Text Pages on a Separate
Device” on page 16-15 for more information.
If the current last page becomes full, SQL Server allocates and links a
new last page.
The single page chain model works well for tables that have modest
insert activity. However, as multiple transactions attempt to insert
data into the table at the same time, performance problems can occur.
Only one transaction at a time can obtain an exclusive lock on the last
page, so other concurrent insert transactions block, as shown in
Figure 13-12.
Other inserts
block until
transaction A
releases lock
Case 1: Case 2:
I/O performance not addressed Better I/O performance
insert
transactions
Insert
transactions
Heap tables that have large amounts of concurrent insert activity will
benefit from partitioning. Partitioning can also reduce I/O
contention for certain tables, as discussed under “How Partitions
Address I/O Contention” on page 13-14.
You can partition tables that contain data or tables that are empty. For
best performance, partition a table before inserting data.
Partitioned tables require slightly more disk space than
unpartitioned tables, since SQL Server reserves a dedicated control
page for each partition. If you create 30 partitions for a table, SQL
Server immediately allocates 30 control pages for the table, which
cannot be used for storing data.
Restrictions
Prior to release 11.0, all of a heap’s data was inserted at the end of a
single page chain. This meant that a cursor scan of a heap table could
read all data up to and including the final insertion made to that
table, even if insertions took place after the cursor scan started.
With release 11.0, data can be inserted into one of many page chains
of a partitioned table. The physical insertion point may be before or
after the current position of a cursor scan. This means that a cursor
scan against a partitioned table is not guaranteed to scan the final
inserts made to that table; the physical location of the insert is
unknown.
If your cursor operations require all inserts to be made at the end of
a single page chain, do not partition the table used in the cursor scan.
Partitioning Tables
The syntax for using the partition clause to alter table is:
alter table table_name partition n
where table_name is the name of the table and n is the number of
partitions (page chains) to create.
➤ Note
You cannot include the alter table...partition command in a user-defined
transaction.
Any data that was in the table before invoking alter table remains in
the first partition. Partitioning a table does not move the table’s data.
If a partition runs out of space on the device to which it is assigned,
it will try to allocate space from any device in the table’s segment.
This behavior is called page stealing.
After you partition the table, SQL Server randomly assigns each
insert transaction (including internal transactions) to one of the
table’s partitions. Once a transaction is assigned to a partition, all
insert statements within that transaction go to the same partition.
You cannot assign transactions to specific partitions.
SQL Server manages partitioned tables transparently to users and
applications. Partitioned tables appear to have a single page chain
when queried or when viewed with most utilities. The dbcc checktable
and dbcc checkdb commands list the number of data pages in each
➤ Note
Partitioning or unpartitioning a table does not affect the sysindexes rows for
that table’s nonclustered indexes. (The indid values for these rows are
greater than 1.) root values for the table’s nonclustered indexes still point to
the root page of each index, since the indexes themselves are not
partitioned.
The dbcc checktable and dbcc checkdb commands show the number of
data pages in each of a table’s partitions. See Chapter 17, “Checking
Database Consistency,” in the System Administration Guide for
information about dbcc.
Unpartitioning Tables
The default SQL Server configuration works well for most servers
that use partitioned tables. If you require very large numbers of
partitions, you may want to change the default values for the partition
groups and partition spinlock ratio configuration parameters. See Chapter
11, “Setting Configuration Parameters,” in the System Administration
Guide for more information.
What Is tempdb?
tempdb is a database that is used by all users of SQL Server. Anyone
can create objects in tempdb. Many processes use it silently. It is a
server-wide resource that is used primarily for:
• Internal processing of sorts, creating worktables, reformatting,
and so on
• Storing temporary tables and indexes created by users
Many applications use stored procedures that create tables in tempdb
to expedite complex joins or to perform other complex data analysis
that is not easily performed in a single step.
You can create truly temporary tables by using “#” as the first
character of the table name:
create table #temptable (...)
or:
select select_list
into #temptable ...
Temporary tables:
• Exist only for the duration of the user session or for the scope of
the procedure that creates them
• Cannot be shared between user connections
• Are automatically dropped at the end of the session or procedure
(or can be dropped manually)
When you create indexes on temporary tables, the indexes are stored
in tempdb:
create index tempix on #temptable(col1)
Worktables
Worktables are created in tempdb by SQL Server for sorts and other
internal server processes. These tables:
• Are never shared
• Disappear as soon as the command completes
tempdb
data and log
(2MB)
d_master
Use sp_helpdb to see the size and status of tempdb. The following
example shows tempdb defaults at installation time:
1> sp_helpdb tempdb
name db_size owner dbid created status
--------- -------- ------ ------ ----------- --------------------
tempdb 2.0 MB sa 2 May 22, 1995 select into/bulkcopy
Sizing tempdb
tempdb needs to be big enough to handle the following processes for
every concurrent SQL Server user:
• Internal sorts
• Other internal worktables that are created for distinct, group by, and
order by, for reformatting and for the OR strategy
• Temporary tables (those created with “#” as the first character of
their names)
• Indexes on temporary tables
• Regular user tables in tempdb
• Procedures built by dynamic SQL
Some applications may perform better if you use temporary tables to
split up multi-table joins. This strategy is often used for:
• Cases where the optimizer does not choose a good query plan for
a query that joins more than four tables
• Queries that exceed the 16-table join limit
• Very complex queries
To estimate the correct size for tempdb, you need the following
information:
• Maximum number of concurrent user processes (an application
may require more than one process)
• Size of sorts, as reported by set statistics io writes, for queries with
order by clauses that are not supported by an index
• Size of worktables, as reported by set statistics io writes, for
reformatting, group by, distinct, and the OR strategy (but not for
sorts)
• Number of steps in the query plans for reformatting, group by, and
so on, which indicates the number of temporary tables created
• Number of local and remote stored procedures and/or user
sessions that create temporary tables and indexes
• Size of temporary tables and indexes, as reported by statistics io
• Number of temporary tables and indexes created per stored
procedure
Sizing Formula
Processing
Temp tables +
Estimate =
* 1.25
Final estimate =
1. Processing requirements:
Processing 8.2MB
Temp tables + 16MB
Estimate = 24.2MB
* *1.25
Final estimate = 30MB
Placing tempdb
Keep tempdb on separate physical disks from your critical application
databases at all costs. Use the fastest disks available. If your platform
supports solid state devices and your tempdb use is a bottleneck for
your applications, use them.
These are the principles to apply when deciding where to place
tempdb. Note that the pages in tempdb should be as contiguous as
possible because of its dynamic nature.
• Expand tempdb on the same device as the master database. If the
original logical device is completely full, you can initialize
another database (logical) device on the same physical device,
It is not a good idea to have tempdb span disks. If you do, your
temporary tables or worktables will span disk media, and this will
definitely slow things down. It is better for tempdb to have a single,
contiguous allocation.
tempdb tempdb
When you create and populate temporary tables in tempdb, use the
select into command, rather than create table and insert...select whenever
possible. The select into/bulkcopy database option is turned on by
default in tempdb to enable this behavior.
select into operations are faster because they are only minimally
logged. Only the allocation of data pages is tracked, not the actual
changes for each data row. Each data insert in an insert...select query is
fully logged, resulting in more overhead.
indexing of the table from the access to it by using more than one
procedure or batch.
Query
Parse and
Normalize
Compile
Compile
Results
When you create a table in the same stored procedure or batch where
it is used, the query optimizer cannot determine how large the table
is, since the work of creating the table has not been performed at the
time the query is optimized. This applies to temporary tables and to
regular user tables.
The optimizer assumes that any such table has 10 data pages and 100
rows. If the table is really large, this assumption can lead the
optimizer to choose a suboptimal query plan.
These two techniques can improve the optimization of temporary
tables:
• Creating indexes on temporary tables
• Breaking complex uses of temporary tables into multiple batches
or procedures to provide information for the optimizer
Memory Fundamentals
Having ample memory reduces disk I/O, which improves
performance, since memory access is much faster than disk access.
When a user issues a query, the data and index pages must be in
memory, or read into memory, in order to examine the values on
them. If the pages already reside in memory, SQL Server does not
need to perform disk I/O.
Adding more memory is cheap and easy, but developing around
memory problems is expensive. Give SQL Server as much memory
as possible.
Physical Kernel
memory
SQL Server
internal structures
SQL
Server
memory Procedures
size
Cache
Data
Procedure cache
MRU LRU
myproc
The memory allocated for the procedure cache holds the optimized
query plans (and occasionally trees) for all batches, including any
triggers.
If more than one user uses a procedure or trigger simultaneously,
there will be multiple copies of it in cache. If the procedure cache is
too small, users trying to execute stored procedures or queries that
fire triggers receive an error message, and have to resubmit the
query. Space becomes available when unused plans age out of the
cache.
Procedure cache
Data cache
Figure 15-3: Effect of increasing procedure cache size on the data cache
When you first install SQL Server, the default procedure cache size is
configured as 20 percent of memory that remains after other memory
needs have been met. The optimum value for procedure cache varies
from application to application, and it may also vary as usage
patterns change throughout the day, week, or month. The
configuration parameter to set the size, procedure cache percent, is
documented in Chapter 11 of the System Administration Guide.
When SQL Server is started, the error log states how much procedure
cache is available.
Maximum number of procedures in cache
proc buffers
proc headers
How big should the procedure cache be? On a production server, you
want to minimize the procedure reads from disk. When users need to
execute a procedure, SQL Server should be able to find an unused
tree or plan in the procedure cache for the most common procedures.
The percentage of times the server finds an available plan in cache is
called the cache hit ratio. Keeping a high cache hit ratio for
procedures in cache improves performance.
The formulas in Figure 15-5 make a good starting point.
Procedure (Max # of concurrent users) *
cache size = (Size of largest plan) * 1.25
When you first install SQL Server, it has a single data cache which is
used by all SQL Server processes and objects for data, index, and log
pages.
The following pages describe the way this single data cache is used.
“Named Data Caches” on page 15-12 describes how to improve
performance by dividing the data cache into named caches, and how
to bind particular objects to these named caches. Most of the
Fill
cache
If the cache is smaller than the total number of used pages, there is a
chance that a given statement will have to perform disk I/O. A cache
does not reduce the maximum possible response time, but it does
decrease the likelihood that the maximum delay will be suffered by a
particular process.
Fill
cache
Aging
Dirty pages steady
Average response time
Checkpoint
start aging state
To see the cache hit ratio for a single query, use set statistics io to see the
number of logical and physical reads, and set showplan on to see the
I/O size used by the query.
To compute the cache hit ratio, use the formula in Figure 15-8.
The sp_sysmon system procedure reports on cache hits and misses for:
• All caches on SQL Server
• The default data cache
• Any user-configured caches
The server-wide report provides the total number of cache searches
and the percentage of hits and misses. See “Cache Statistics
Summary (All Caches)” on page 19-50.
For each cache, the report contains the search, hit and miss statistics
and also reports on the number of times that a needed buffer was
found in the wash section. See “Cache Management By Cache” on
page 19-54.
used pages in cache, with some space for the less frequently used
pages.
• You can assign tables or databases used in decision support (DSS)
to specific caches with large I/O configured. This keeps DSS
applications from contending for cache space with online
transaction processing (OLTP) applications. DSS applications
typically access large numbers of sequential pages, and OLTP
applications typically access relatively few random pages.
• You can bind tempdb to its own cache. All processes that create
worktables or temporary tables use tempdb, so binding it to its
own cache keeps its cache use from contending with other user
processes. Proper sizing of tempdb’s cache can keep most tempdb
activity in memory for many applications. If this cache is large
enough, tempdb activity can avoid performing I/O.
• You can bind a database’s log to a cache, again reducing
contention for cache space and access to the cache.
Most of these possible uses for named data caches have the greatest
impact on multiprocessor systems with high transaction rates or
frequent DSS queries and multiple users. Some of them can increase
performance on single CPU systems when they lead to improved
utilization of memory and reduce I/O.
You can configure the default cache and any named caches you
create for large I/O by splitting a cache into pools. The default I/O
size is 2K, one SQL Server data page. For queries where pages are
stored sequentially and accessed sequentially, you can read up to
eight data pages in a single I/O. Since the majority of I/O time is
spent doing physical positioning and seeking on the disk, large I/O
can greatly reduce disk access time.
Large I/O can increase performance for:
• Queries that table scan, both single-table queries and queries that
perform joins
• Queries that scan the leaf level of a nonclustered index
• Queries that use text or image data
• Queries that allocate several pages, such as select into
• Bulk copy operations on heaps, both copy in and copy out
• The update statistics command, dbcc checktable, and dbcc checkdb
When a cache is configured for 16K I/O and the optimizer chooses
16K I/O for the query plan, SQL Server reads an entire extent, eight
2K data pages, when it needs to access a page that is not in cache.
There are some occasions when 16K I/O cannot be performed. See
“When prefetch Specification Is Not Followed” on page 9-11.
Certain types of SQL Server queries are likely to benefit from large
I/Os. Identifying these types of queries can help you determine the
correct size for data caches and memory pools.
In the following examples, the database or the specific table, index or
text and image page chain must be bound to a named data cache that
has large memory pools, or the default data cache must have large
I/O pools. Most of the queries shown here use fetch and discard
(MRU) replacement strategy. Types of queries that can benefit from
large I/O are:
• Queries that scan entire tables, either heap tables or tables with
clustered indexes:
select title_id, price from titles
select count(*) from authors
where state = "CA" /* no index on state */
• Range queries on tables with clustered indexes. These include
queries like:
where indexed_colname < value
where indexed_colname > value
where indexed_colname between value1 and value2
where indexed_colname > value1
and indexed_colname < value2
where indexed_colname like "string%"
• Queries that scan the leaf level of a nonclustered index, both
matching and nonmatching scans. If there is a nonclustered index
on type, price, this query could use large I/O on the leaf level of
then index, since all the columns used in the query are contained
in the index:
select type, sum(price)
from titles
group by type
• Queries that select text or image columns:
select au_id, copy from blurbs
Figure 15-9: Caching strategies joining a large table and a small table
You can configure up to 4 pools in any data cache, but in most cases,
caches for individual objects will perform best with only a 2K pool
and a 16K pool. Caches for databases where the log is not bound to a
separate cache should also have a 4K pool configured for syslogs if 4K
log I/O size is configured for the database.
Pages can be linked into a cache at two locations: at the head of the
MRU/LRU chain in the pool, or at the pool’s wash marker. The SQL
Server optimizer chooses the cache replacement strategy, unless the
strategy is specified in the query. The two strategies are:
• “LRU replacement strategy” replaces a least-recently used page,
linking the newly read page or pages at the beginning of the page
chain in the pool.
• “Fetch-and-discard” strategy or “MRU replacement strategy”
links the newly read buffers at the wash marker in the pool.
Cache replacement strategies can affect the cache hit ratio for your
query mix:
• Pages that are read into cache with the fetch-and-discard strategy
remain in cache a much shorter time than queries read in the
MRU end of the cache. If such a page is needed again, for example
if the same query is run again very soon, the pages will probably
need to be read from disk again.
• Pages that are read into cache with the fetch-and-discard strategy
do not displace pages that already reside in cache before the wash
mark. This means that pages before wash marker are much more
By the time SQL Server has optimized a query and needs to access
data pages, it:
• Has a good estimate of the number of pages it needs to read for
each table
• Knows the size of the data cache(s) available to the tables and
indexes in the query and the I/O size available for the cache(s),
and has used this information to incorporate the I/O size and
cache strategy into the query plan
• Has determined whether the data will be accessed via a table
scan, clustered index access, nonclustered index, or other
optimizer strategy
• Has determined which cache strategy to use for each table and
index
The optimizer’s knowledge is limited, though, to the single query it
is analyzing, and to certain statistics about the table and cache. It
does not have information about how many other queries are
simultaneously using the same data cache, and it has no statistics on
whether table storage is fragmented in such a way that large I/Os
would be less effective. This combination of factors can lead to
excessive I/O in some cases. For example, users may experience
higher I/O and poor performance if many queries with large result
sets are using a very small memory pool.
Command Function
sp_cacheconfig Creates or drops names caches and changes the size or
cache type. Reports on sizes of caches and pools.
sp_poolconfig Creates and drops I/O pools and changes their size.
Command Function
sp_bindcache Binds databases or database objects to a cache.
sp_unbindcache Unbinds specific objects or databases from a cache.
sp_unbindcache_all Unbinds all objects bound to a specified cache.
sp_helpcache Reports summary information about data caches and lists
the databases and databases objects that are bound to a
cache. Also reports on the amount of overhead required
by a cache.
sp_sysmon Reports statistics useful for tuning cache configuration,
including cache spinlock contention, cache utilization and
disk I/O patterns.
You can affect the I/O size and cache strategy for select, delete, and
update commands. These options are described in Chapter 9,
“Advanced Optimizing Techniques.”
• For information about specifying the I/O size, see “Specifying
I/O Size in a Query” on page 9-9.
• For information about specifying cache strategy, see “Specifying
the Cache Strategy” on page 9-12.
salesdb
70% of I/O
log
segment with one table
saleshistorydb
20% of I/O
log
master device, with tempdb
also using a second disk
10% of I/O
Creating caches for tempdb, the transaction logs, and for a few tables
or indexes that you want to keep completely in cache can reduce
cache spinlock contention and improve cache hit ratios.
When users perform operations that require logging, log records are
first stored in a “user log cache” until certain events flush the user’s
log records to the current transaction log page in cache. Log records
are flushed when a transaction ends, when the log page is full, when
the transaction changes tables in another database, at certain system
events, and when another process needs to write a page referenced in
the user log cache.
To economize on disk writes, SQL Server holds partially filled
transaction log pages for a very brief span of time so that records of
several transactions can be written to disk simultaneously. This
process is called “group commit.”
In environments with high transaction rates or transactions that
create large log records, the 2K transaction log pages fill quickly, and
a large proportion of log writes are due to full log pages, rather than
group commits. Creating a 4K pool for the transaction log can greatly
reduce log writes in these environments.
sp_sysmon reports on the ratio of transaction log writes to transaction
log allocations. You should try using 4K log I/O if all of these
conditions are true:
• Your database is using 2K log I/O
• The number of log writes per second is high
• The average number of writes per log page is slightly above one
Here is some sample output showing that a larger log I/O size might
help performance:
per sec per xact count % of total
Transaction Log Writes 22.5 458.0 1374 n/a
Transaction Log Alloc 20.8 423.0 1269 n/a
Avg # Writes per Log Page n/a n/a 1.08274 n/a
See “Transaction Log Writes” on page 19-32 for more information.
To check the log I/O size for a database, you can check the server’s
error log. The size of I/O for each database is printed in the error log
when SQL Server starts. You can also use the sp_logiosize system
procedure. To see the size for the current database, execute
sp_logiosize with no parameters. To see the size for all databases on the
server and the cache in use by the log, use:
sp_logiosize "all"
To set the log I/O size for a database to 4K, the default, you must be
using the database. This command sets the size to 4K:
sp_logiosize "default"
By default, SQL Server sets the log I/O size for user databases to 4K.
If no 4K pool is available in the cache that the log uses, 2K I/O is
automatically used instead.
If a database is bound to a cache, all objects not explicitly bound to
other caches use the database’s cache. This includes the syslogs table.
In order to bind syslogs to another cache, you must first put the
database in single user mode with sp_dboption, and then use the
database and execute sp_bindcache. Here is an example:
sp_bindcache pubs_log, pubtune, syslogs
For further tuning after configuring a cache for the log, check
sp_sysmon output. Look at output for:
• The cache used by log (the cache it is explicitly bound to, or the
cache that its database uses)
• The disk that the log is stored on
• The average number of writes per log page
When looking at the log cache section, check “Cache Hits” and
“Cache Misses” to determine whether most of the pages needed for
deferred operations, triggers and rollbacks are being found in cache.
In the “Disk Activity Detail” section, look at the number of “Reads”
performed.
When you choose divide a cache for tables and/or indexes into
pools, try to make this division based on the proportion of I/O
performed by your queries that use the corresponding I/O sizes. If
most of your queries can benefit from 16K I/O, and you configure a
very small 16K cache, you may actually see worse performance.
Most of the space in the 2K pool will remain unused, and the 16K
pool will experience high turnover. The cache hit ratio will be
significantly reduced. The problem will be most severe with join
queries that have to repeatedly re-read the inner table from disk.
Making a good choice about pool sizes requires:
• A thorough knowledge of the application mix and the I/O size
your queries can use
• Careful study and tuning, using monitoring tools to check cache
utilization, cache hit rates, and disk I/O
You can examine query plans and I/O statistics to determine those
queries that are likely to perform large I/O and the amount of I/O
these queries perform. This information can form the basis for
estimating the amount of 16K I/O the queries should perform with a
16K memory pool. For example, a query that table scans and
performs 800 physical I/Os using a 2K pool should perform about
100 8K I/Os. See “Types of Queries That Can Benefit From Large
I/O” on page 15-14 for a list of types.
To test out your estimates, however, you need to actually configure
the pools and run the individual queries and your target mix of
queries to determine optimum pool sizes. Choosing a good initial
size for your first test using 16K I/O depends on a good sense of the
types of queries in your application mix. This estimate is especially
important if you are configuring a 16K pool for the first time on an
active production server. Make the best possible estimate of
simultaneous uses of the cache. Here are some guidelines:
The wash area for each pool in each cache is configurable. If the wash
size is set too high, SQL Server may perform unnecessary writes. If
the wash area is too small, SQL Server may not be able to find a clean
buffer at the end of the buffer chain and may have to wait for I/O to
complete before it can proceed. Generally, wash size defaults are
correct, and only need to be adjusted in large pools with very high
rates of data modification. See “Changing the Wash Area for a
Memory Pool” on page 9-18 of the System Administration Guide for
more information.
When you bind or unbind an object, all of the object’s pages that are
currently in the cache are flushed to disk (if dirty) or dropped from
the cache (if clean) during the binding process. The next time the
pages are needed by user queries, they must be read from the disk
again, slowing the performance of the queries.
SQL Server acquires an exclusive lock on the table or index while the
cache is being cleared, so binding can slow other users of the object.
The binding process may have to wait until for transactions to
complete in order to acquire the lock.
➤ Note
The fact that binding and unbinding objects from caches removes them
from memory can be useful when tuning queries during development and
testing. If you need to check physical I/O for a particular table, and earlier
tuning efforts have brought pages into cache, you can unbind and rebind
the object. The next time the table is accessed, all pages used by the query
must be read into the cache.
The plans of all stored procedures and triggers using the bound
objects are recompiled the next time they are run. If a database is
bound to the cache, this affects all the objects in the database.
For example, if a table has 624 data pages, and the cache is
configured for 16K I/O, SQL Server reads 8 pages per I/O. Dividing
624 by 8 equals 78 I/Os. If a table scan that performs large I/O
performs significantly more I/O than the optimum, you should
explore the causes.
There are several reasons why a query that performs large I/O might
require more reads than you anticipate:
• The cache used by the query has a 2K cache and many other
processes have brought pages from the table into the 2K cache. If
SQL Server is performing 16K I/O and finds that one of the pages
it needs to read is already in the 2K cache, it performs 2K I/O on
all of the other pages in the extent.
• The first extent on each allocation unit stores the allocation page,
so if a query needs to access all 255 pages on the extent, it must
perform 2K I/O on the 7 pages that share the extent with the
allocation page. The other 31 extents can be read using 16K I/O.
So, the minimum number of reads for an entire allocation unit is
always 38, not 32.
• In nonclustered indexes, an extent may store both leaf level pages
and pages from higher levels of the index. Regular index access,
finding pages by starting from the root and following index
pointers, always performs 2K I/O, so it is likely that these some
of the pages will be in the 2K cache during these index level scans.
the rest of the pages in the extent will therefore be read using 2K
I/O. Note that this applies only to nonclustered indexes and their
leaf pages, and does not apply to clustered index pages and the
data pages, which are always on separate extents.
• The table storage is fragmented, due to page-chain pointers that
cross extent boundaries and allocation pages. Figure 15-11 shows
a table that has become fragmented.
40
48
56
64
72
80
Next/previous
page pointers
The steps that lead to the memory use in the preceding figure are as
follows:
1. Table is loaded. The gray boxes indicate the original pages of the
table.
2. First additional page is allocated for inserts, indicated by the first
heavily striped box.
3. Deletes cause page 3 of the table, located in extent 40 to be
deallocated.
4. Another page is needed, page 2 is allocated and linked into the
page chain, as shown by the lightly striped box.
5. Two more pages are allocated, as shown by the other two
heavily striped boxes.
Instead of 5 reads using 16K I/O with the MRU strategy, (because it’s
occupying 5 extents) the query does 7 I/Os. The query reads the
pages following the page pointers, so it:
• Performs a 16K I/O to read the extent 40, and performs logical
I/O on pages 1, 2, 4-8, skipping page 3.
• Performs physical I/O the extents, and then logical I/O on the
pages on extent 48, 56, and 64, in turn
• The second-to-last page in extent 64 points to page 3. In this small
table, of course, it is extremely likely that extent 40 is still in the
16K pool. It examines page 3, which then points to a page in
extent 64.
• The last page in extent 64 points to extent 72.
With a small table, the pages would still be in the data cache, so there
would be no extra physical I/O. But when the same kind of
fragmentation occurs in large tables, the I/O required rises,
especially if a large number of users are performing queries with
large I/O that could flush buffers out of the cache. This example sets
fillfactor to 80:
create unique clustered index title_id_ix
on titles(title_id)
with fillfactor = 80
The sp_sysmon output for each data cache includes information that
can help you determine the effectiveness for large I/Os:
• “Large I/O Usage” on page 19-60 reports the number of large
I/Os performed and denied, and provides summary statistics.
• “Large I/O Detail” on page 19-61 reports the total number of
pages that were read into the cache by a large I/O, and the
number of pages that were actually accessed while in the cache.
• For clustered indexes, drop and re-create the clustered index. All
nonclustered indexes will be re-created automatically.
• For covering nonclustered indexes, drop and re-create the index.
For clustered indexes and nonclustered indexes on tables that will
continue to receive updates, using a fillfactor to spread the data
slightly should slow fragmentation. This is described in the next
section. Fillfactor does not apply to heap tables.
Speed of Recovery
As users modify data in SQL Server, only the transaction log is
written to disk immediately, in order to ensure recoverability. The
changed or “dirty” data and index pages stay in the data cache until
one of these events causes them to be written to disk:
• The checkpoint process wakes up, determines that the changed
data and index pages for a particular database need to be written
to disk, and writes out all dirty pages in each cache used by the
database. The combination of the setting for recovery interval and
the rate of data modifications on your server determines how
often the checkpoint process writes changed pages to disk.
• As pages move down the MRU/LRU chain in the cache, they
move into the buffer wash area of the cache, where dirty pages
are automatically written to disk.
• SQL Server has spare CPU cycles and disk I/O capacity between
user transactions, and the housekeeper task uses this time to
write dirty buffers to disk.
• A user issues a checkpoint command.
This combination of write strategies has two major benefits:
• Many transactions may change a page in the cache or read the
page in the cache, but only one physical write is performed.
• SQL Server performs many physical writes at times when the I/O
does not cause contention with user processes.
The size of the audit queue can be set by a System Security Officer.
The default configuration is:
• A single audit record requires a minimum of 22 bytes, up to a
maximum of 424 bytes. This means that a single data page stores
between 4 and 80 records.
• The default size of the audit queue is 100 records, requiring
approximately 42K. The minimum size of the queue is 1 record,
the maximum size is 65335 records.
There are trade-offs in sizing the audit queue. If the audit queue is
large, so that you do not risk having user processes sleep, you run the
risk of losing any audit records in memory if there is a system failure.
The maximum number of records that can be lost is the size of the
audit queue. If security is your chief concern, keep the queue small.
If you can risk losing audit records and require high performance,
make the queue larger.
Increasing the size of the in-memory audit queue takes memory from
the total memory allocated to the data cache.
Audit
record Audit queue size
sysaudits
• Choose the events that you audit. Heavy auditing slows overall
system performance. Audit what you need, and only what you
need.
• If possible, place the sysaudits database on its own device. If that
is not possible, place it on a device that is not used for your most
critical applications.
Techniques Summary
Server
Packets
There is always a point at which increasing the packet size will not
improve performance, and in fact it may decrease performance,
because the packets are not always full. Although there are analytical
methods for predicting that point, it is more common to vary the size
experimentally and plot the results. If such experiments are
conducted over a period of time and conditions, a packet size that
works well for a lot of processes can be determined. However, since
the packet size can be customized for every connection, specific
experiments for specific processes can be beneficial.
Transfer
time
Optimal
size
bcp -Asize
Novell NetWare
load isql -Asize
load bcp -Asize
VMS
isql /tdspacketsize = size
bcp /tdspacketsize = size
For Open Client Client-Library™, use:
ct_con_prop(connection, CS_SET, CSPACKETSIZE,
$packetsize (sizeof(packetsize), NULL)
select *
from view_a
Applications should request only the rows and columns they need,
filtering as much data as possible at the server. In many cases, this
can also reduce the disk I/O load.
Large Transfers
Type Characteristics
Token ring Token ring hardware responds better than Ethernet hardware
during periods of heavy use.
Type Characteristics
Fiber optic Fiber-optic hardware provides very high bandwidth, but is
usually too expensive to use throughout an entire network.
Separate A separate network can be used to handle network traffic
network between the highest volume workstations and SQL Server.
Network Overload
Login Protocol
You must take the presence of other users into consideration before
trying to solve a database problem, especially if those users are using
the same network. Since most networks can transfer only one packet
at a time, many users may be delayed while a large transfer is in
progress. Such a delay may cause locks to be held longer, which
causes even more delays. When response time is “abnormally” high,
and normal tests indicate no problem, it could be because of other
users on the same network. In such cases, ask the user when the
process was being run, if the operating system was generally
sluggish, if other users were doing large transfers, and so on. In
general, consider multi-user impacts before digging deeper into the
database system to solve an abnormal response time problem.
Why is my short
transaction taking
so long???
• Configure max network packet size and additional network memory just for
the applications that need it.
Clients accessing
Server B
After
A B Two
network
cards Client accessing
Server A
Clients accessing
Server B
Use two (or more) ports listening for a single SQL Server. Front-end
software may be directed to any configured network ports by setting
the DSQUERY environment variable.
Using multiple network ports spreads out the network load and
eliminates or reduces network bottlenecks, thus increasing SQL
Server throughput.
Two
ports
Clients
1 2 3
4 5 6 7
Disks
Operating system
2 5 1
RUNNING RUNNING RUNNING
Shared executable
D N
6 I E
S T
3 4 K
Disk I/O
Locks
Procedure cache
7
Lock sleep
Shared memory
➤ Note
Before measuring CPU usage, disable the housekeeper task to eliminate
its effect on these measurements.
Use sp_monitor to see the percentage of time SQL Server uses the CPU
during an elapsed time interval:
Using sp_sysmon
The “Kernel Utilization” section displays how busy the engine was
during the sample period. The percentage in this output is based on
the time that CPU was allocated to SQL Server, it is not a percentage
of the total sample interval.
The “CPU Yields by Engine” section displays information about how
often the engine yielded to the operating system during the interval:
When you measure the CPU usage for SQL Server using operating
system utilities:
• The percentage of time SQL Server uses CPU during an elapsed
time interval is a reflection of a multiple CPU power processing
request.
Engine CPU
0 2 (the start_cpu number specified)
1 3
2 0
3 1
Engine CPU
0 2
1 3
2 0
➤ Note
The housekeeper task does not improve performance for read-only caches
or for data that fits entirely within a cache.
If the housekeeper task can flush all active buffer pools in all
configured caches, it wakes up the checkpoint task. The checkpoint
task determines whether it can checkpoint the database. The
additional checkpoints that occur as a result of the housekeeper
process may improve recovery speed for the database.
In applications that repeatedly update the same database page, the
housekeeper task may initiate some database writes that are not
necessary. Although these writes occur only during the server’s idle
cycles, they may be unacceptable on systems with overloaded disks.
Multiple Indexes
Managing Disks
You may need to adjust the fillfactor in create index commands. Because
of the added throughput with multiple processors, setting a lower
fillfactor may temporarily reduce contention for the data and index
pages.
Setting max_rows_per_page
The use of fillfactor places fewer rows on data and index pages when
the index is created, but the fillfactor is not maintained. Over time, data
modifications can increase the number of rows on a page.
For tables and indexes that experience contention, max_rows_per_page
provides a permanent means to limit the number of rows on data and
index pages.
The sp_helpindex system procedure reports the current
max_rows_per_page setting of indexes. Use the sp_chgattribute system
procedure to change the max_rows_per_page setting.
Setting max_rows_per_page to a lower value does not reduce index
splitting, and, in most cases, increases the number of index page
splits. It can help reduce other lock contention on index pages. If
your problem is index page splitting, careful choice of fillfactor is a
better option.
Transaction Length
Temporary Tables
➤ Note
When create database copies model, it uses 2K I/O.
A single set of six buffers is available for large I/O by create database,
alter database, dbcc checkalloc, and the portion of load database that zeros
pages. If all six buffers are in use when another process issues one of
these commands, the second command performs 2K I/O.
Creating Indexes
Creating indexes affects performance by locking other users out of a
table. The type of lock depends on the index type:
• Creating a clustered index requires an exclusive table lock,
locking out all table activity. Since rows in a clustered index are
arranged in order by the index key, create clustered index reorders
data pages.
• Creating a nonclustered index requires a shared table lock,
locking out update activity.
If you do not configure number of extent i/o buffers, SQL Server performs
2K I/O while it creates indexes. This parameter allows SQL Server to
use 16K buffers for reading and writing intermediate and final
results. Each buffer you configure requires 16K of memory.
Configuring number of extent i/o buffers has these impacts:
• Increasing this parameter decreases the memory available for the
procedure and data caches.
• Only one user at a time can use extent I/O buffers when creating
an index. Other users who start create index commands are
restricted to page I/O.
• Setting number of extent I/O buffers to 10 works well with small
configurations.
• Settings above 100 yield only marginal benefits.
If you have ample memory and perform frequent index
maintenance, configure extent I/O buffers on a permanent basis. In
other cases, it makes sense to schedule index maintenance for off-
hours. Then, I/O extents can be allocated for optimum performance.
When the index maintenance is completed, deallocate the extra I/O
extents, and resume normal memory allocations.
➤ Note
You need to shut down and restart SQL Server in order to change the
number of extents allocated.
If you are creating very large indexes at a time when other SQL
Server activity is at a minimum, setting number of sort buffers and sort
page count can greatly increase create index performance. Both of these
configuration parameters are dynamic and use memory from the
default data cache for each sort operation.
◆ WARNING!
If you use these parameters, be sure to dump the database soon after
the index is created to ensure the compatibility of database dumps.
When you create an index, SQL Server writes the create index
transaction and the page allocations to the transaction log, but does
not log the actual changes to the data and index pages. If you need to
recover a database, and you have not dumped it since you created
the index, the entire create index process is executed again while
loading transaction log dumps.
If you perform routine index re-creations (for example, to maintain
the fillfactor in the index), you may want to schedule these operations
at a time shortly before a routine database dump.
If your data has already been sorted and is in the desired clustered
index order, use the with sorted_data option when creating indexes.
This saves the time needed for the actual sort phase.
➤ Note
The sorted data option still requires space of approximately 120 percent of
the table size to copy the data and store the index pages.
Local Backups
SQL Server sends the local Backup Server instructions, via remote
procedure calls, telling the Backup Server which pages to dump or
load, which backup devices to use, and other options. Backup Server
performs all the disk I/O. SQL Server does not read or send dump
and load data, just instructions.
Remote Backups
Online Backups
If your database has limited log space, and you occasionally hit the
last-chance threshold, install a second threshold that provides
ample time to perform a transaction log dump. Running out of log
space has severe performance impacts. Users cannot execute any
data modification commands until log space has been freed.
You can help minimize recovery time, the time required to reboot
SQL Server, by changing the recovery interval configuration parameter.
The default value of 5 minutes per database works for most
installations. Reduce this value only if functional requirements
dictate a faster recovery period. It can increase the amount of I/O
required. See “Tuning the Recovery Interval” on page 15-35.
Recovery speed may also be affected by the value of the housekeeper
free write percent configuration parameter. The default value of this
parameter allows the server’s housekeeper task to write dirty buffers
to disk during the server’s idle cycles, as long as disk I/O does not
increase by more than 20 percent. See “Configuring the Housekeeper
Task” on page 17-10 for more information on tuning this parameter.
Recovery Order
Bulk Copy
Bulk copy into a table on SQL Server runs fastest when there are no
indexes or triggers on the table. When you are running fast bulk
copy, SQL Server performs reduced logging. It does not log the
actual changes to the database, only the allocation of pages. And,
since there are no indexes to update, it saves all the time updating
indexes for each data insert, and the logging of the changes to the
index pages.
To use fast bulk copy, select into/bulkcopy option must be set for the
database with sp_dboption. Remember to turn the option off after the
bulk copy operation completes.
During fast bulk copy, rules are not enforced but defaults are
enforced.
Since changes to the data are not logged, you should perform a dump
database soon after a fast bulk copy operation. Performing a fast bulk
copy in a database blocks the use of dump transaction, since the
unlogged data changes cannot be recovered from the transaction log
dump.
If you specify a batch size during a fast bulk copy, each new batch
must start on a new data page, since only the page allocations, and
not the data changes, are logged during a fast bulk copy. Copying
1000 rows with a batch size of 1 requires 1000 data pages and 1000
allocation records in the transaction log. If you are using a small
batch size to help detect errors in the input file, you may want to
choose a batch size that corresponds to the numbers of rows that fit
on a data page.
If you are replacing all the data in a large table, use the truncate table
command instead of the delete command. truncate table performs
reduced logging. Only the page deallocations are logged. delete is
completely logged, that is, all the changes to the data are logged.
If you are loading data into a table without a clustered index, you can
create partitions on the heap table and split the batch of data into
multiple batches, one for each partition you create. See “Improving
Insert Performance with Partitions” on page 13-12.
Bulk copying large tables in or out may affect other users’ response
time. If possible:
• Schedule bulk copy operations for off-hours.
• Use fast bulk copy, since it does less logging and less I/O.
Introduction
This chapter describes output from sp_sysmon, a system procedure
that produces SQL Server performance data for the following
categories of SQL Server system activities:
• Kernel Utilization 19-8
• Task Management 19-14
• Transaction Profile 19-22
• Transaction Management 19-27
• Index Management 19-32
• Lock Management 19-40
• Data Cache Management 19-46
• Procedure Cache Management 19-61
• Memory Management 19-63
• Recovery Management 19-63
• Disk I/O Management 19-66
• Network I/O Management 19-72
This chapter explains sp_sysmon output and gives suggestions for
interpreting its output and deducing possible implications. sp_sysmon
output is most valuable when you use it together with a good
understanding of your unique SQL Server environment and its
specific mix of applications. The output has little relevance on its
own.
◆ WARNING!
sp_sysmon and SQL Server Monitor use the same internal counters.
sp_sysmon resets these counters to 0, producing erroneous output for
SQL Server Monitor when it is used simultaneously with sp_sysmon.
Also, starting a second execution of sp_sysmon while an earlier
execution is running clears all of the counters, so the first iteration
reports will be inaccurate.
➤ Note
sp_sysmon will not produce accurate results on pre-11.0 SQL Servers
because many of the internal counters sp_sysmon uses were added in SQL
Server release 11.0. In addition, the uses and meanings of many pre-
existing counters have changed.
Invoking sp_sysmon
To invoke sp_sysmon, execute the following command using isql:
sp_sysmon interval
where interval is an integer time in minutes from 1 to 10.
An sp_sysmon report contains hundreds of lines of output. Use isql
input and output redirect flags to save the output to a file. See the
SQL Server utility programs manual for more information on isql.
Start
Clear Counters
No waitfor
Interval
elapsed?
Yes
Read Counters
Print Output
Stop
You can run sp_sysmon both before and after tuning SQL Server
configuration parameters to gather data for comparison. This data
gives you a basis for performance tuning and lets you observe the
results of configuration changes.
Use sp_sysmon when the system exhibits the behavior you want to
investigate. For example, if you are interested in finding out how the
system behaves under typically loaded conditions, run sp_sysmon
when conditions are normal and typically loaded. In this case, it does
not make sense to run sp_sysmon for ten minutes starting at 7:00 pm,
before the batch jobs begin and after most of the day’s OLTP users
have left the site. In this example, it would be a good idea to run
sp_sysmon both during the normal OLTP load and during batch jobs.
In many tests, it is best to start the applications, and then start
sp_sysmon when caches have had a chance to fill. If you are trying to
measure capacity, be sure that the amount of work you give the
server to do keeps it busy for the duration of the test. Many of the
statistics, especially those that measure data per second, can look
extremely low if the server is idle during part of the sample period.
In general, sp_sysmon produces valuable information under the
following circumstances:
• Before and after changing cache configuration or pool
configuration
• Before and after certain sp_configure changes
• Before and after adding new queries to your application mix
• Before and after increasing or reducing the number of SQL Server
engines
• When adding new disk devices and assigning objects to them
• During peak periods, to look for contention
• During stress tests to evaluate a SQL Server configuration for a
maximum expected application load
• When performance seems slow or behaves abnormally
It can also help with micro-level understanding of certain queries or
applications during development. Some examples are:
• Working with indexes and updates, you can see if certain updates
reported as deferred_varcol are resulting direct vs. deferred
updates.
• Checking caching behavior of particular queries or mix of
queries.
Bottlenecks:
Logical Lock Log I/O Disk Queueing
Contention
Performance
Rows
Columns
Weigh the importance of the per second and per transaction data on
the environment and the category you are measuring. The per
transaction data is generally more meaningful in benchmarks or in
test environments where the workload is well defined.
It is likely that you will find per transaction data more meaningful
for comparing test data than per second data alone because in a
benchmark test environment, there is usually a well-defined number
of transactions, making comparison straightforward. Per transaction
data is also useful for determining the validity of percentage results.
In most cases, per engine data for a category will show a fairly even
balance of activity across all engines. There are a few exceptions:
• If you have fewer processes than CPUs, some of the engines will
show no activity.
• If most processes are doing fairly uniform activity, such as simple
inserts and short selects, and one process performs some I/O
intensive operation such as a large bulk copy, you will see
unbalanced network and disk I/O.
Kernel Utilization
“Kernel Utilization” reports on SQL Server activities. It tells you how
busy SQL Server engines were during the time that the CPU was
available to SQL Server, how often the CPU yielded to the operating
system, the number of times that the engines checked for network
and disk I/O, and the average number of I/Os they found waiting at
each check.
to check CPU activity may show high usage for a SQL Server engine
because they are measuring the looping activity, while “Engine Busy
Utilization” does not include time spent looping—it is considered
idle time.
One measurement that cannot be made from inside SQL Server is the
percentage of time that SQL Server had control of the CPU versus the
time the CPU was in use by the operating system. Check your
operating system documentation for the correct commands.
See “Engine Busy Utilization” on page 19-9 for an explanation of
why operating system commands report different information on
utilization than SQL Server does.
If you want to reduce the time that SQL Server spends checking for
I/O while idle, you can lower the sp_configure parameter runnable
process search count. This parameter specifies the number of times a
SQL Server engine loops looking for a runnable task before yielding
the CPU. For more information, see “runnable process search count”
on page 11-91 of the System Administration Guide.
“Engine Busy Utilization” measures how busy SQL Server engines
were during the CPU time they were given. If the engine is available
to SQL Server for 80 percent of a ten-minute sample interval, and
“Engine Busy Utilization” was 90 percent, it means that SQL Server
was busy for 7 minutes and 12 seconds and idle for 48 seconds as
Figure 19-3 shows.
BUSY IDLE
Figure 19-3: How SQL Server spends its available CPU time
This category can help you decide whether there are too many or too
few SQL Server engines. SQL Server’s high scalability is due to
tunable mechanisms that avoid resource contention. By checking
sp_sysmon output for problems and tuning to alleviate contention,
response time can remain high even at “Engine Busy” values in the
80 to 90 percent range. If values are consistently very high (over 90
percent), it is likely that response time and throughput could benefit
from an additional engine.
The “Engine Busy” values are averages over the sample interval, so
very high averages indicate that engines may be 100 percent busy
during part of the interval. When engine utilization is extremely
high, the housekeeper process writes few or no pages out to disk
(since it runs only during idle CPU cycles.) This means that a
checkpoint will find many pages that need to be written to disk, and
the checkpoint process, a large batch job, or a database dump is likely
to send CPU usage to 100 percent for a period of time, causing a
perceptible dip in response time.
If “Engine Busy Utilization” percentages are consistently high, and
you want to improve response time and throughput by adding SQL
Server engines, carefully check for increased resource contention
after adding each engine.
Network Checks
Non-Blocking
Blocking
After a SQL Server engine completes a task, it loops waiting for the
network to deliver a runnable task. After a certain number of loops
(determined by the sp_configure parameter runnable process search count),
the SQL Server engine goes to sleep after a blocking network I/O.
When a SQL Server engine yields to the operating system because
there are no tasks to process, it wakes up once per clock tick to check
for incoming network I/O. If there is I/O, the operating system
blocks the engine from active processing until the I/O completes.
If a SQL Server engine has yielded and is doing blocking checks, it
might continue to sleep for a period of time after a network packet
arrives. This period of time is referred to as the latency period.
You can reduce the latency period by increasing the runnable process
search count parameter so the SQL Server engine loops for longer
periods of time. See “runnable process search count” on page 11-91
of the System Administration Guide for more information.
“Avg Net I/Os per Check” reports the average number of network
I/Os (both sends and receives) per check for all SQL Server engine
checks that took place during the sample interval.
The sp_configure parameter i/o polling process count specifies the
maximum number of processes that SQL Server runs before the
scheduler checks for disk and/or network I/O completions. Tuning
i/o polling process count affects both the response time and throughput
of SQL Server. See “i/o polling process count” on page 11-79 of the
System Administration Guide.
If SQL Server engines check frequently, but retrieves network I/O
infrequently, you can try reducing the frequency for network I/O
checking.
This section reports on the total number of disk I/O checks, and the
number of checks returning I/O.
“Total Disk I/O Checks” reports the number of times a SQL Server
engine checked disk I/O.
When a task needs to perform I/O, the SQL Server engine running
that task immediately issues an I/O request and puts the task to
sleep waiting for the I/O to complete. The SQL Server engine
processes other tasks, if any, but also uses a scheduling loop to check
for completed I/Os. When the engine finds completed I/Os, it
moves the task from the sleep queue to the run queue.
the engines were during the sample period. If the sample includes
idle time, or the I/O traffic is bursty, it is possible that during the
busy period, a high percentage of the checks were returning I/O.
If the results in this category seem low or high, you can configure i/o
polling process count so that the SQL Server engine checks less or more
frequently. See “i/o polling process count” on page 11-79 in the
System Administration Guide.
“Avg Disk I/Os Returned” reports the average number of disk I/Os
returned over all SQL Server engine checks combined.
Increasing the amount of time that SQL Server engines wait between
checks could result in better throughput because SQL Server engines
can spend more time processing if they spend less time checking for
I/O. However, you should verify this for your environment. Use the
sp_configure parameter i/o polling process count to increase the length of
the checking loop. See “i/o polling process count” on page 11-79 in
the System Administration Guide.
Task Management
“Task Management” provides information on opened connections,
task context switches by engine, and task context switches by cause.
“Task Context Switches Due To” provides an overview of the reasons
that tasks were switched off engines. The possible performance
problems show in this section can be investigated by checking other
sp_sysmon output, as indicated below in the sections that describe the
causes.
Connections Opened
“Task Context Switches Due To” reports the number of times that
SQL Server switched context for a number of common reasons. “% of
total” is the percentage of times the context switch was due to each
specific cause as a percentage of the total number of task context
switches for all SQL Server engines combined.
“Task Context Switches Due To” data can help you identify the
problem and give you clues about how to fix it. For example, if most
of the task switches are caused by physical I/O, try minimizing
physical I/O, by adding more memory or reconfiguring caches.
However, if lock contention causes most of the task switches, check
the“Lock Management” on page 19-40.
Voluntary Yields
“Disk Writes” reports the number of times a task was switched out
because it needed to perform a disk write or because it needed to
access a page that was being written by another process, such as the
housekeeper or the checkpoint process.
Most SQL Server writes happen asynchronously, but processes sleep
during writes for page splits, recovery, and OAM page writes.
If this number seems high, check “Page Splits” on page 19-36 to see if
the problem is caused by data pages and index page splits. In other
cases, you cannot affect this value by tuning.
I/O Pacing
SQL Server paces the number of disk writes that it issues in order to
keep from flooding the disk I/O subsystems during certain
operations that need to perform large amounts of I/O. Checkpoints
and transaction commits that write a large number of log pages are
two examples. The task is switched out and sleeps until the batch of
writes completes, and then wakes up and issues another batch.
By default, the number of writes per batch is set to 10. You may want
to increase the number of writes per batch if:
• You have a high-throughput, high-transaction environment with
a large data cache
• Your system is not I/O bound
Valid values are from 1 to 50. This command sets the number of
writes per batch to 20:
“Last Log Page Writes” is the number of times a task was switched
out because it was put to sleep while writing the last log page.
The task switched out because it was responsible for writing the last
log page as opposed to sleeping while waiting for some other task to
write the log page, as described in “Group Commit Sleeps” on page
19-19.
If this value is high, check “Avg # Writes per Log Page” on page 19-32
to see if SQL Server is repeatedly rewriting the same last page to the
log. If the log I/O size is greater than 2K, reducing the log I/O size
might reduce the number of unneeded log writes.
Modify Conflicts
“Network Packet Sent” reports the number of times a task went into
a send sleep state waiting for the network to send each TDS packet.
The TDS model determines that there can be only one outstanding
TDS packet per connection at any one point in time. This means that
the task sleeps after each packet it sends.
If there is a lot of data to send, and the task is sending many small
packets (512 bytes per packet), the task could end up sleeping a
number of times. The TDS data packet size is configurable, and
different clients can request different packet sizes. For more
information, see “Changing Network Packet Sizes” on page 16-3 and
“default network packet size” on page 11-48 in the System
Administration Guide.
If “Network Packet Sent” is a major cause for task switching, see
“Network I/O Management” on page 19-72 for more information.
SYSINDEXES Lookup
Other Causes
This section reports the number of tasks switched out for any reasons
not described above. In a well-tuned server, this value will rise as
tunable sources of task switching are reduced.
Transaction Profile
This category reports on transaction-related activities, including the
number of data modification transactions, user log cache (ULC)
activity, and transaction log activity.
Updates
Deferred 0.0 0.0 0 0.0 %
Direct In-place 360.2 3.0 21774 100.0 %
Direct Cheap 0.0 0.0 0 0.0 %
Direct Expensive 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total Rows Updated 360.2 3.0 21774 75.0 %
Deletes
Deferred 0.0 0.0 0 0.0 %
Direct 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total Rows Deleted 0.0 0.0 0 0.0 %
Transaction Summary
Committed Transactions
1> insert …
2> insert …
3> insert …
4> commit transaction
5> go
is counted as one transaction.
This number reflects a larger number of transactions than the actual
number that took place during the sample interval if there were
transactions that started before the sample interval began and
completed during the interval. If transactions do not complete
during the interval, “Total # of Xacts” does not count them. In Figure
19-4, both T1 and T2 are counted, but transaction T3 is not.
T1
T2
T3
Interval
Transaction Detail
Inserts
Updates
Deletes
Transaction Management
“Transaction Management” reports on transaction management
activities, including user log cache (ULC) flushes to transaction logs,
ULC log records, ULC semaphore requests, log semaphore requests,
transaction log writes, and transaction log allocations.
ULC Flushes to Xact Log per sec per xact count % of total
------------------------- --------- --------- ------- ----------
by Full ULC 0.0 0.0 0 0.0 %
by End Transaction 120.1 1.0 7261 99.7 %
by Change of Database 0.0 0.0 0 0.0 %
by System Log Record 0.4 0.0 25 0.3 %
by Other 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total ULC Flushes 120.5 1.0 7286
“ULC Flushes to Xact Log” is the total number of times the user log
caches (ULCs) were flushed to a transaction log. “% of total” for each
category is the percentage of times the type of flush took place as a
percentage of the total number of ULC flushes. This category can
help you identify areas in the application that cause problems with
ULC flushes.
There is one user log cache (ULC) for each configured user
connection. SQL Server uses ULCs to buffer transaction log records.
On both SMP and uniprocessor systems, this helps reduce
transaction log I/O. For SMP systems, it reduces the contention on
the current page of the transaction log.
You can configure the size of the ULCs with the user log cache size
parameter of sp_configure. See “user log cache size” on page 11-111 of
the System Administration Guide.
ULC flushes are caused by the following activities:
• “by Full ULC” – a process’s ULC becomes full
• “by End Transaction” – a transaction ended (rollback or commit,
either implicit or explicit)
• “by Change of Database” – a transaction modified an object in a
different database (a multidatabase transaction)
• “by System Log Record” – a system transaction (such as an OAM
page allocation) occurred within the user transaction
• “by Other” – any other reason, including needing to write to disk
• “Total ULC Flushes” – total number of all ULC flushes that took
place during the sample interval
When one of these activities causes a ULC flush, SQL Server copies
all log records from the user log cache to the database transaction log.
By Full ULC
A high value for “by Full ULC” indicates that SQL Server is flushing
the ULCs more than once per transaction, negating some
performance benefits of user log caches. A good rule of thumb is that
if the “% of total” for “by Full ULC” is greater than 20 percent,
consider increasing the size of the user log cache size parameter.
Increasing the ULC size increases the amount of memory required
for each user connection, so you do not want to configure the ULC
size to suit a small percentage of large transactions.
By End Transaction
By Change of Database
Server uses semaphores to protect the user log caches since more
than one process can access the records of a ULC and force a flush.
This category provides the following information:
• Granted – The number of times a task was granted a ULC
semaphore immediately upon request. There was no contention
for the ULC.
• Waited – The number of times a task tried to write to ULCs and
encountered semaphore contention.
• Total ULC Semaphore Requests – The total number of ULC
semaphore requests that took place during the interval. This
includes requests that were granted or had to wait.
This row uses the previous two values to report the average number
of times each log page was written to disk. The value is reported in
the “count” column.
In high throughput applications, you want to see this number as
close to 1 as possible. With low throughput, the number will be
significantly higher. In very low throughput environments, it may be
as high as one write per completed transaction.
Index Management
This category reports on index management activity including
nonclustered maintenance, page splits, and index shrinks.
Nonclustered Maintenance
The data in this section gives information about how insert and
update operations affect indexes. For example, an insert to a
clustered table with 3 nonclustered indexes requires updates to all
three indexes, then the average number of operations that resulted in
maintenance to nonclustered indexes is three.
However, an update to the same table may require only one
maintenance operation, to the index whose key value was changed.
The row ID (RID) entry shows how many times a data page split
occurred in a table with a clustered index. These splits require
updating the nonclustered indexes for all of the rows that move to
the new data page.
Page Splits
“Page Splits” reports on the number of times that SQL Server split a
data page, a clustered index page, or non-clustered index page
because there was not enough room for a new row.
When a data row is inserted into a table with a clustered index, the
row must be placed in physical order according to the key value.
Index rows must also be placed in physical order on the pages. If
there is not enough room on a page for a new row, SQL Server splits
the page, allocates a new page, and moves some rows to the new
page. Page splitting incurs overhead because it involves updating
the parent index page and the page pointers on the adjoining pages,
and adds lock contention. For clustered indexes, page splitting also
requires updating all nonclustered indexes that point to the rows on
the new page.
See “Choosing Fillfactors for Indexes” on page 6-44 and “Decreasing
the Number of Rows per Page” on page 11-30 for more information
about how to temporarily reduce page splits using fillfactor and
max_rows_per_page. Note that using max_rows_per_page almost always
increases the rate of splitting.
The table sales has a clustered index on store_id, customer_id. There are
three stores (A,B,C) and each of them adds customer records in
ascending numerical order.The table contains rows for the key
values A,1; A,2; A,3; B,1; B,2; C,1; C,2 and C,3 and each page holds 4
rows, as shown in Figure 19-5.
Page 1007 Page 1009
A 1 ... B 2 ...
A 2 ... C 1 ...
A 3 ... C 2 ...
B 1 ... C 3 ...
Page 1007 Page 1129 Page 1134 Page 1137 Page 1009
A 1 ... A 3 ... A 5 ... A 7 ... B 2 ...
A 2 ... A 4 ... A 6 ... A 8 ... C 1 ...
B 1 ... C 2 ...
C 3 ...
You can set “ascending inserts mode” for a table, so that pages are
split at the point of the inserted row, rather than in the middle of the
page. Starting from the original table shown in Figure 19-5 on page
19-37, the insertion of “A,4” results in a split at the insertion point,
with a the remaining rows on the page moving to a newly allocated
page:
Adding “A,6”, “A,7” and “A,8” fills the new page, as shown in
Figure 19-11.
Retries
Deadlocks
“Add Index Level” reports the number of times a new index level
was added. This does not happen frequently, so you should expect to
see result values of zero most of the time. The count could have a
value of 1 or 2 if your sample includes inserts into an empty table or
a small table with indexes.
Page Shrinks
pages. Repeated “count” values greater than zero indicate there may
be many pages in the index with fairly small numbers of rows per
page due to delete and update operations. If there are a high number
of shrinks, consider rebuilding indexes.
Lock Management
“Lock Management” reports on locks, deadlocks, lock promotions,
and freelock contention.
Exclusive Table
Granted 403.7 4.0 24376 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total EX-Table Requests 0.0 0.0 0 0.0 %
Shared Table
Granted 325.2 4.0 18202 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total SH-Table Requests 0.0 0.0 0 0.0 %
Exclusive Intent
Granted 480.2 4.0 29028 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total EX-Intent Requests 480.2 4.0 29028 18.9 %
Shared Intent
Granted 120.1 1.0 7261 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total SH-Intent Requests 120.1 1.0 7261 4.7 %
Exclusive Page
Granted 483.4 4.0 29227 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total EX-Page Requests 483.4 4.0 29227 19.0 %
Update Page
Granted 356.5 3.0 21553 99.0 %
Waited 3.7 0.0 224 1.0 %
------------------------- --------- --------- -------
Total UP-Page Requests 360.2 3.0 21777 14.2 %
Shared Page
Granted 3.2 0.0 195 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total SH-Page Requests 3.2 0.0 195 0.1 %
Exclusive Address
Granted 134.2 1.1 8111 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total EX-Address Requests 134.2 1.1 8111 5.3 %
Shared Address
Granted 959.5 8.0 58008 100.0 %
Waited 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total SH-Address Requests 959.5 8.0 58008 37.8 %
Deadlock Detection
Deadlock Searches 0.1 0.0 4 n/a
Searches Skipped 0.0 0.0 0 0.0 %
Avg Deadlocks per Search n/a n/a 0.00000 n/a
Lock Promotions
0.0 0.0 0 n/a
Note that shared and exclusive table locks, “Deadlocks by Lock
Type,” and “Lock Promotions” do not contain detail rows because
there were no occurrences of them during the sample interval.
Lock Summary
Deadlock Percentage
Lock Detail
Address Locks
“Last Page Locks on Heaps” is the number of times there was lock
contention for the last page of a partitioned or unpartitioned heap
table.
This information can indicate if there are tables in the system that
would benefit from partitioning or from increasing the number of
partitions. If you know that one or more tables is experiencing a
problem with last page locks, SQL Server Monitor is a tool that can
help.
See “Improving Insert Performance with Partitions” on page 13-12
for information on how partitions can help solve the problem of last
page locking on unpartitioned heap tables.
Deadlock Detection
Deadlock Searches
Searches Skipped
Lock Promotions
sp_sysmon reports the following activities for the default data cache
and each named cache:
• Spinlock contention
• Utilization
• Cache searches including hits and misses
• Pool turnover for all configured pools
• Buffer wash behavior including buffers passed clean, already in
I/O, and washed dirty
• Prefetch requests performed and denied
• Dirty read page requests
Figure 19-12 shows how these caching features relate to disk I/O and
the data caches.
Hits
Searches
2K Pool Misses
Wash marker
MRU LRU
Cached Discarded
Cache strategy
Denied
Large I/O
Performed
Large I/O detail
16K Pool Pages used
You can use sp_cacheconfig and sp_helpcache output to help you analyze
the data from this category. sp_cacheconfig provides information about
caches and pools and sp_helpcache provides information about objects
bound to caches. See Chapter 9, “Configuring Data Caches” in the
System Administration Guide for more information for information on
how to use these procedures. See “Named Data Caches” on page
15-12 for more information on performance issues and named
caches.
The following sample shows sp_sysmon output for the “Data Cache
Management” categories. The first block of data, “Cache Statistics
Summary,” includes information for all caches. The output also
reports a separate block of data for each cache. These blocks are
identified by the cache name. The sample output shown here
includes only a single user defined cache, although there were more
caches configured during the interval.
Data Cache Management
---------------------
Cache Turnover
Buffers Grabbed 56.7 0.5 3428 n/a
Buffers Grabbed Dirty 0.0 0.0 0 0.0 %
----------------------------------------------------------------------
branch_cache
per sec per xact count % of total
------------------------- --------- --------- ------- ----------
Cache Searches
Cache Hits 360.3 3.0 21783 100.0 %
Found in Wash 0.0 0.0 0 0.0 %
Cache Misses 0.0 0.0 0 0.0 %
------------------------- --------- --------- -------
Total Cache Searches 360.3 3.0 21783
Pool Turnover
0.0 0.0 0 n/a
------------------------- --------- --------- -------
Total Cache Turnover 0.0 0.0 0
Cache Strategy
Cached (LRU) Buffers 354.9 3.0 21454 100.0 %
Discarded (MRU) Buffers 0.0 0.0 0 0.0 %
This section summarizes behavior for the default data cache and all
named data caches combined. Corresponding information is printed
for each data cache. For a full discussion of these rows, see “Cache
Management By Cache” on page 19-54.
design is. A high number of cache misses indicates that you should
investigate statistics for each cache.
Cache Turnover
Buffers Grabbed
“Buffers Grabbed” is the number of buffers that were replaced in all
of the caches. The “count” column represents the number of times
that SQL Server fetched a buffer from the LRU end of the cache,
replacing a database page. If the server was recently restarted, so that
the buffers are empty, reading a page into an empty buffer is not
counted here.
Spinlock Contention
Utilization
Cache Hits
“Cache Hits” is the number of times that a needed page was found in
the data cache. “% of total” is the percentage of cache hits compared
to the total number of cache searches.
Found in Wash
The number of times that the needed page was found in the wash
section of the cache. “% of total” is the percentage of times that the
buffer was found in the wash area as a percentage of the total
number of hits.
If the data indicate a large percentage of cache hits found in the wash
section, it may mean the wash is too big. A large wash section might
lead to increased physical I/O because SQL Server initiates a write
on all dirty pages as they cross the wash marker. If a page in the wash
area is re-dirtied, I/O has been wasted.
If queries on tables in the cache use “fetch-and-discard” strategy, the
first cache hit for a page in one of these buffers finds it in the wash.
The page is moved to the MRU end of the chain, so a second hit soon
after the first finds it still outside the wash area.
See “Specifying the Cache Strategy” on page 9-12 for information
about controlling caching strategy.
If necessary, you can change the wash size. See “Changing the Wash
Area for a Memory Pool” on page 9-18 for more information. If you
make the wash size smaller, run sp_sysmon again under fully loaded
conditions and check the output for “Grabbed Dirty” values greater
than 0. See “Buffers Grabbed Dirty” on page 19-51.
Cache Misses
“Cache Misses” reports the number of times that a needed page was
not found in the cache and had to be read from disk. “% of total” is
the percentage of times that the buffer was not found in the cache as
a percentage of the total searches.
Pool Turnover
16 Kb Pool
LRU Buffer Grab 0.2 0.1 73 15.8 %
Grabbed Dirty 0.0 0.0 0 0.0 %-
----------------------- --------- ------------ ----------
Total Cache Turnover 1.4 0.3 463
This information helps you to determine if the pools and cache are
the right size.
Grabbed Dirty
“Grabbed Dirty” gives statistics for the number of dirty buffers that
reached the LRU before they could be written to disk. When SQL
Server needs to grab a buffer from the LRU end of the cache in order
to fetch a page from disk, and finds a dirty buffer instead of a clean
one, it must wait for I/O on the dirty buffer to complete. “% of total”
is the percentage of buffers grabbed dirty as a percentage of the total
number of buffers grabbed.
If “Grabbed Dirty” is non-zero, it indicates that the wash area of the
pool is too small for the throughput in the pool. Remedial actions
depend on the pool configuration and usage:
• If the pool is very small and has high turnover, consider
increasing the size of the pool and the wash area.
• If the pool is large and is used for a large number of data
modification operations, increase the size of the wash area.
• If there are several objects using the cache, moving some of them
to another cache could help.
• Check query plans and I/O statistics for objects that use the cache
for queries that perform a lot of physical I/O in the pool. Tune
queries, if possible, by adding indexes.
Check the “per second” values for “Buffers Washed Dirty” on page
19-59 and “Buffers Already in I/O” on page 19-59. The wash area
should be large enough so that I/O can be completed on dirty buffers
before they reach the LRU. This depends on the actual number of
physical writes per second that your disk drives achieve.
Also check “Disk I/O Management” on page 19-66 to see if I/O
contention is slowing disk writes.
It might help to increase the housekeeper free write percent parameter. See
“How the Housekeeper Task Improves CPU Utilization” on page
17-9 and “housekeeper free write percent” on page 11-75 in the
System Administration Guide.
Cache Strategy
This section provides data about SQL Server prefetch requests for
large I/O. It reports statistics on the numbers of large I/O requests
performed and denied.
Pages Cached
“Pages by Lrg I/O Cached” prints the total number of pages read
into the cache.
Pages Used
“Pages by Lrg I/O Used” is the number of pages used by a query
while in cache.
Procedure Requests
Procedure Removals
Memory Management
Memory management reports on the number of pages allocated and
deallocated during the sample interval.
Pages Allocated
“Pages Allocated” reports the number of times that a new page was
allocated in memory.
Pages Released
“Pages Released” reports the number of times that a page was freed.
Recovery Management
This data indicates the number of checkpoints caused by the normal
checkpoint process, the number of checkpoints initiated by the
housekeeper task, and the average length of time for each type. This
information is helpful for setting the recovery and housekeeper
parameters correctly.
Recovery Management
-------------------
Checkpoints per sec per xact count % of total
------------------------- --------- --------- ------- ----------
# of Normal Checkpoints 0.00117 0.00071 1 n/a
# of Free Checkpoints 0.00351 0.00213 3 n/a
------------------------- --------- --------- -------
Total Checkpoints 0.00468 0.00284 4
Checkpoints
Checkpoints write all dirty pages (pages that have been modified in
memory, but not written to disk) to the database device. SQL Server’s
automatic (normal) checkpoint mechanism works to maintain a
minimum recovery interval. By tracking the number of log records in
the transaction log since the last checkpoint was performed, it
estimates whether the time required to recover the transactions
exceeds the recovery interval. If so, the checkpoint process scans all
caches and writes all changed data pages to the database device.
When SQL Server has no user tasks to process, a housekeeper task
automatically begins writing dirty buffers to disk. Because these
writes are done during the server’s idle cycles, they are known as
“free writes.” They result in improved CPU utilization and a
decreased need for buffer washing during transaction processing.
If the housekeeper process finishes writing all dirty pages in all
caches to disk, it checks the number of rows in the transaction log
since the last checkpoint. If there are more than 100 log records, it
issues a checkpoint. This is called a “free checkpoint” because it
requires very little overhead. In addition, it reduces future overhead
for normal checkpoints.
Total Checkpoints
“Avg Time per Normal Chkpt” is the time, on average over the
sample interval, that normal checkpoints lasted.
“Avg Time per Free Chkpt” is the time, on average over the sample
interval, that free (or housekeeper) checkpoints lasted.
in the current buffer pool and begins checking for dirty pages in
another pool. If the writes from the next pool need to go to the same
device, it continues to another pool. Once the housekeeper has
checked all of the pools, it waits until the last I/O it has issued has
completed, and then begins the cycle again.
The default batch limit of 3 is designed to provide good device I/O
characteristics for slow disks. You may get better performance by
increasing the batch size for fast disk drives. This value can be set
globally for all devices on the server, or to different values for disks
with different speeds. This command must be reissued each time
SQL Server is restarted.
This command sets the batch size to 10 for a single device, using the
virtual device number from sysdevices:
dbcc tune(deviochar, 8, "10")
To see the device number, use sp_helpdevice, or this query:
select name, low/16777216
from sysdevices
where status&2=2
To change the housekeeper’s batch size for all devices on the server,
use -1 in place of a device number:
dbcc tune(deviochar, -1, "5")
Legal values for batch size are 1 to 255. For very fast drives, setting
the batch size as high as 50 has yielded performance improvements
during testing.
You may want to try setting this value higher if:
• The average time for normal checkpoints is high.
• There are no problems with exceeding I/O configuration limits or
contention on the semaphores for the devices.
• The “# of Free Checkpoints” is 0 or very low, that is, the
housekeeper process is not clearing the cache and writing
checkpoints. If you are tuning this parameter, check for I/O
contention and queue lengths.
The following sample shows sp_sysmon output for the “Disk I/O
Management” categories.
Disk I/O Management
-------------------
I/Os Delayed by
Disk I/O Structures n/a n/a 0 n/a
Server Config Limit n/a n/a 0 n/a
Engine Config Limit n/a n/a 0 n/a
Operating System Limit n/a n/a 0 n/a
----------------------
/dev/rdsk/c1t3d0s6
bench_log per sec per xact count % of total
------------------------- --------- --------- ------- ----------
Reads 0.1 0.0 5 0.1 %
Writes 80.6 0.7 4873 99.9 %
------------------------- --------- --------- ------- ----------
Total I/Os 80.7 0.7 4878 40.0 %
------------------------------------------------------------------
d_master
master per sec per xact count % of total
------------------------- --------- --------- ------- ----------
Reads 56.6 0.5 3423 46.9 %
Writes 64.2 0.5 3879 53.1 %
------------------------- --------- --------- ------- ----------
Total I/Os 120.8 1.0 7302 60.0 %
I/Os Delayed By
SQL Server can exceed its limit for the number of asynchronous disk
I/O requests that can be outstanding for the entire SQL Server at one
time. You can raise this limit using sp_configure with the max async i/os
per server parameter. See “max async i/os per server” on page 11-58 in
the System Administration Guide.
The operating system kernel has a per process and per system limit
on the maximum number of asynchronous I/Os that either a process
or the entire system can have pending at any point in time. This
value indicates how often the system has exceeded that limit. See
“disk i/o structures” on page 11-27 in the System Administration
Guide, and consult your operating system documentation.
This data shows the total number of disk I/Os requested by SQL
Server, and the number and percentage of I/Os completed by each
SQL Server engine.
“Total Requested Disk I/Os” reports the number of times that SQL
Server requested disk I/Os.
“Total Completed Disk I/Os” reports the number of times that each
SQL Server engine completed I/O. “% of total” is the percentage of
times each SQL Server engine completed I/Os as a percentage of the
total number of I/Os completed by all SQL Server engines
combined.
You can also use this information to determine if the operating
system is able to keep pace with disk I/O requests made by all of the
SQL Server engines.
“Reads” and “Writes” report the number of times that reads or writes
to a device took place. The “% of total” column is the percentage of
reads or writes as a percentage of the total number of I/Os to the
device.
Total I/Os
The following sample shows sp_sysmon output for the “Network I/O
Management” categories.
Total TDS Packets Received per sec per xact count % of total
------------------------- --------- --------- ------- ----------
Engine 0 7.9 0.1 479 6.6 %
Engine 1 12.0 0.1 724 10.0 %
Engine 2 15.5 0.1 940 13.0 %
Engine 3 15.7 0.1 950 13.1 %
Engine 4 15.2 0.1 921 12.7 %
Engine 5 17.3 0.1 1046 14.4 %
Engine 6 11.7 0.1 706 9.7 %
Engine 7 12.4 0.1 752 10.4 %
Engine 8 12.2 0.1 739 10.2 %
------------------------- --------- --------- ------- ----------
Total TDS Packets Rec'd 120.0 1.0 7257
--------------------------------------------------------------------
Total TDS Packets Sent per sec per xact count % of total
------------------------- --------- --------- ------- ----------
Engine 0 7.9 0.1 479 6.6 %
Engine 1 12.0 0.1 724 10.0 %
Engine 2 15.6 0.1 941 13.0 %
Engine 3 15.7 0.1 950 13.1 %
Engine 4 15.3 0.1 923 12.7 %
Engine 5 17.3 0.1 1047 14.4 %
Engine 6 11.7 0.1 705 9.7 %
Engine 7 12.5 0.1 753 10.4 %
Engine 8 12.2 0.1 740 10.2 %
The average number of bytes received by the SQL Server engine per
packet during the sample interval.
“Total TDS Packets Sent” represents the number of times SQL Server
sends a packet to a client application.
“Total Bytes Sent” is the number of bytes sent by each SQL Server
engine during the sample interval.
The average number of bytes sent by the SQL Server engine per
packet during the sample interval.
access method
The method used to find the data rows needed to satisfy a query. Access methods
include: table scan, nonclustered index access, clustered index access.
affinity
See process affinity.
aggregate function
A function that works on a set of cells to produce a single answer or set of answers,
one for each subset of cells. The aggregate functions available in Transact-SQL are:
average (avg), maximum (max), minimum (min), sum (sum), and count of the number
of items (count).
allocation page
The first page of an allocation unit, which tracks the use of all pages in the
allocation unit.
allocation unit
A logical unit of 1/2 megabyte. The disk init command initializes a new database file
for SQL Server and divides it into 1/2 megabyte pieces called allocation units.
argument
A value supplied to a function or procedure that is required to evaluate the
function.
arithmetic expression
An expression that contains only numeric operands and returns a single numeric
value. In Transact-SQL, the operands can be of any SQL Server numeric datatype.
They can be functions, variables, parameters, or they can be other arithmetic
expressions. Synonymous with numeric expression.
arithmetic operators
Addition (+), subtraction (-), division (/), and multiplication (*) can be used with
numeric columns. Modulo (%) can be used with int, smallint, and tinyint columns
only.
audit trail
Audit records stored in the sybsecurity database.
auditing
Recording security-related system activity that can be used to detect penetration of
the system and misuse of system resources.
automatic recovery
A process that runs every time SQL Server is stopped and restarted. The process
ensures that all transactions that completed before the server went down are brought
forward and all incomplete transactions are rolled back.
B-tree
Short for balanced tree, or binary tree. SQL Server uses B-tree indexing. All leaf
pages in a B-tree are the same distance from the root page of the index. B-trees
provide consistent and predictable performance, good sequential and random
record retrieval, and a flat tree structure.
backup
A copy of a database or transaction log, used to recover from a media failure.
batch
One or more Transact-SQL statements terminated by an end-of-batch signal, which
submits them to SQL Server for processing.
Boolean expression
An expression that evaluates to TRUE (1), or FALSE (0). Boolean expressions are
often used in control of flow statements, such as if or while conditions.
buffer
A unit of storage in a memory pool. A single data cache can have pools configured
for different I/O sizes, or buffer sizes. All buffers in a pool are the same size. If a
pool is configured for 16K I/O, all buffers are 16K, holding eight data pages.
Buffers are treated as a unit; all data pages in a buffer are simultaneously read,
written, or flushed from cache.
built-in functions
A wide variety of functions that take one or more parameters and return results.
The built-in functions include mathematical functions, system functions, string
functions, text functions, date functions, and type conversion functions.
bulk copy
The utility for copying data in and out of databases, called bcp.
Glossary-2
Sybase SQL Server Release 11.0.x
Cartesian product
All the possible combinations of the rows from each of the tables specified in a join.
The number of rows in the Cartesian product is equal to the number of rows in the
first table times the number of rows in the second table. Once the Cartesian
product is formed, the rows that do not satisfy the join conditions are eliminated.
character expression
An expression that returns a single character-type value. It can include literals,
concatenation operators, functions, and column identifiers.
check constraint
A check constraint limits what values users can insert into a column of a table. A
check constraint specifies a search_condition which any value must pass before it is
inserted into the table.
checkpoint
The point at which all data pages that have been changed are guaranteed to have
been written to the database device.
clauses
A set of keywords and parameters that tailor a Transact-SQL command to meet a
particular need. Also called a keyword phrase.
client cursor
A cursor declared through Open Client calls or Embedded-SQL. The Open Client
keeps track of the rows returned from SQL Server and buffers them for the
application. Updates and deletes to the result set of client cursors can only be done
through the Open Client calls.
clustered index
An index in which the physical order and the logical (indexed) order is the same.
The leaf level of a clustered index represents the data pages themselves.
column
The logical equivalent of a field. A column contains an individual data item within
a row or record.
column-level constraint
Limit the values of a specified column. Place column-level constraints after the
column name and datatype in the create table statement, before the delimiting
comma.
command
An instruction that specifies an operation to be performed by the computer. Each
command or SQL statement begins with a keyword, such as insert, that names the
basic operation performed. Many SQL commands have one or more keyword
phrases, or clauses, that tailor the command to meet a particular need.
comparison operators
Used to compare one value to another in a query. Comparison operators include
equal to (=) greater than (>), less than (<), greater than or equal to (>=), less than or
equal to (<=), not equal to (!=), not greater than (!>), and not less than (!<).
compatible datatypes
Types that SQL Server automatically converts for implicit or explicit comparison.
composite indexes
Indexes which involve more than one column. Use composite indexes when two or
more columns are best searched as a unit because of their logical relationship.
Glossary-4
Sybase SQL Server Release 11.0.x
composite key
An index key that includes two or more columns; for example, authors(au_lname,
au_fname).
concatenation
Combine expressions to form longer expressions. The expressions can include any
combination of binary or character strings, or column names.
constant expression
An expression that returns the same value each time the expression is used. In
Transact-SQL syntax statements, constant_expression does not include variables
or column identifiers.
control page
A reserved database page that stores information about the last page of a partition.
control-of-flow language
Transact-SQL’s programming-like constructs (such as if, else, while, and goto) that
control the flow of execution of Transact-SQL statements.
correlated subquery
A subquery that cannot be evaluated independently, but depends on the outer
query for its results. Also called a repeating subquery, since the subquery is
executed once for each row that might be selected by the outer query. See also
nested query.
correlation names
Distinguish the different roles a particular table plays in a query, especially a
correlated query or self-join. Assign correlation names in the from clause and
specify the correlation name after the table name:
select au1.au_fname, au2.au_fname
from authors au1, authors au2
where au1.zip = au2.zip
covered query
See index covering.
covering
See index covering.
cursor
A symbolic name associated with a Transact-SQL select statement through a
declaration statement. Cursors consist of two parts: the cursor result set and the
cursor position.
data cache
Also referred to as named cache or cache. A cache is an area of memory within SQL
Server that contains the in-memory images of database pages and the data
structures required to manage the pages. By default, SQL Server has a single data
cache named “default data cache.” Additional caches configured by users are also
called “user defined caches.” Each data cache is given a unique name that is used
for configuration purposes.
data definition
The process of setting up databases and creating database objects such as tables,
indexes, rules, defaults, procedures, triggers, and views.
data dictionary
The system tables that contain descriptions of the database objects and how they
are structured.
data integrity
The correctness and completeness of data within a database.
data modification
Adding, deleting, or changing information in the database with the insert, delete,
and update commands.
data retrieval
Requesting data from the database and receiving the results. Also called a query.
database
A set of related data tables and other database objects that are organized and
presented to serve a specific purpose.
Glossary-6
Sybase SQL Server Release 11.0.x
database device
A device dedicated to the storage of the objects that make up databases. It can be
any piece of disk or a file in the file system that is used to store databases and
database objects.
database object
One of the components of a database: table, view, index, procedure, trigger,
column, default, or rule.
Database Owner
The user who creates a database. A Database Owner has control over all the
database objects in that database. The login name for the Database Owner is “dbo.”
datatype
Specifies what kind of information each column will hold, and how the data will
be stored. Datatypes include char, int, money, and so on. Users can construct their
own datatypes based on the SQL Server system datatypes.
datatype hierarchy
The hierarchy that determines the results of computations using values of different
datatypes.
dbo
In a user’s own database, SQL Server recognizes the user as “dbo.” A database
owner logs into SQL Server using his or her assigned login name and password.
deadlock
A situation which arises when two users, each having a lock on one piece of data,
attempt to acquire a lock on the other’s piece of data. The SQL Server detects
deadlocks, and kills one user’s process.
default
The option chosen by the system when no other option is specified.
deferred update
An update operation that takes place in two steps. First, the log records for deleting
existing entries and inserting new entries are written to the log, but only the delete
changes to the data pages and indexes take place. In the second step, the log pages
are rescanned, and the insert operations are performed on the data pages and
indexes. Compare to direct update.
demand lock
A demand lock prevents any more shared locks from being set on a data resource
(table or data page). Any new shared lock request has to wait for the demand lock
request to finish.
density
The average fraction of all the rows in an index that have the same key value.
Density is 1 if all of the data values are the same and 1/N if every data value is
unique.
dependent
Data is logically dependent on other data when master data in one table must be
kept synchronized with detail data in another table in order to protect the logical
consistency of the database.
detail
Data that logically depends on data in another table. For example, in the pubs2
database, the salesdetail table is a detail table. Each order in the sales table can have
many corresponding entries in salesdetail. Each item in salesdetail is meaningless
without a corresponding entry in the sales table.
device
Any piece of disk (such as a partition) or a file in the file system used to store
databases and their objects.
direct update
An update operation that takes place in a single step, that is, the log records are
written and the data and index pages are changed. Direct updates can be
performed in three ways: in-place updates, on-page updates, and delete/insert
direct updates. Compare to deferred update.
dirty read
Occurs when one transaction modifies a row, and then a second transaction reads
that row before the first transaction commits the change. If the first transaction rolls
back the change, the information read by the second transaction becomes invalid.
Glossary-8
Sybase SQL Server Release 11.0.x
disk initialization
The process of preparing a database device or file for SQL Server use. Once the
device is initialized, it can be used for storing databases and database objects. The
command used to initialize a database device is disk init.
disk mirror
A duplicate of a SQL Server database device. All writes to the device being mirrored
are copied to a separate physical device, making the second device an exact copy of
the device being mirrored. If one of the devices fails, the other contains an up-to-date
copy of all transactions. The command disk mirror starts the disk mirroring process.
dump striping
Interleaving of dump data across several dump volumes.
dump volume
A single tape, partition, or file used for a database or transaction dump. A dump
can span many volumes, or many dumps can be made to a single tape volume.
dynamic dump
A dump made while the database is active.
dynamic index
A worktable built by SQL Server for the resolution of queries using or. As each
qualifying row is retrieved, its row ID is stored in the worktable. The worktable is
sorted to remove duplicates, and the row IDs are joined back to the table to return
the values.
engine
A process running a SQL Server that communicates with other server processes
using shared memory. An engine can be thought of as one CPU’s worth of
processing power. It does not represent a particular CPU on a machine. Also referred
to as “server engine.”A SQL Server running on a uniprocessor machine will always
have one engine, engine 0. A SQL Server running on a multiprocessor machine can
have one or more engines. The maximum number of engines running on SQL Server
can be reconfigured using the max online engines configuration variable.
entity
A database or a database object that can be identified by a unique ID and that is
backed by database pages. Examples of entities: the database pubs2, the log for
database pubs2, the clustered index for table titles in database pubs2, the table
authors in database pubs2.
equijoin
A join based on equality.
error message
A message that SQL Server issues, usually to the user’s terminal, when it detects an
error condition.
exclusive locks
Locks which prevent any other transaction from acquiring a lock until the original
lock is released at the end of a transaction, always applied for update (insert, update,
delete) operations.
execute cursor
A cursor which is a subset of client cursors whose result set is defined by a stored
procedure which has a single select statement. The stored procedure can use
parameters. The values of the parameters are sent through Open Client calls.
existence join
A type of join performed in place of certain subqueries. Instead of the usual nested
iteration through a table that returns all matching values, an existence join returns
TRUE when it finds the first value and stops processing. If no matching value is
found, it returns FALSE.
expression
A computation, column data, a built-in function, or a subquery that returns values.
extent
Whenever a table or index requires space, SQL Server allocates a block of 8 2K
pages, called an extent, to the object.
Glossary-10
Sybase SQL Server Release 11.0.x
fetch
A fetch moves the current cursor position down the cursor result set. Also called a
cursor fetch.
fetch-and-discard strategy
Reading pages into the data cache at the LRU end of the cache chain, so that the
same buffer is available for reuse immediately. This strategy keeps select
commands that require large numbers of page reads from flushing other data from
the cache.
field
A data value that describes one characteristic of an entity. Also called a column.
foreign key
A key column in a table that logically depends on a primary key column in another
table. Also, a column (or combination of columns) whose values are required to
match a primary key in some other table.
fragment
When you allocate only a portion of the space on a device with create or alter database,
that portion is called a fragment.
free-space threshold
A user-specified threshold that specifies the amount of space on a segment, and the
action to be taken when the amount of space available on that segment is less than
the specified space.
functions
See built-in functions.
global variable
System-defined variables that SQL Server updates on an ongoing basis. For
example, @@error contains the last error number generated by the system.
grouped aggregate
See vector aggregate.
Halloween problem
An anomaly associated with cursor updates, whereby a row seems to appear twice
in the result set. This happens when the index key is updated by the client and the
updated index row moves farther down in the result set.
heap table
A table where all data is stored in a single page chain. For example, an
unpartitioned table that has no clustered index stores all data in a single “heap” of
pages.
identifier
A string of characters used to identify a database object, such as a table name or
column name.
implicit conversions
Datatype conversions that SQL Server automatically performs to compare
datatypes.
in-place update
A type of direct update operation. An in-place update does not cause data rows to
move on the data page. Compare to on-page update and insert/delete direct
update.
index
A database object that consists of key values from the data tables, and pointers to
the pages that contain those values. Indexes speed up access to data rows.
index covering
A data access condition where the leaf-level pages of a nonclustered index contain
the data needed to satisfy a query. The index must contain all columns in the select
list as well as the columns in the query clauses, if any. The server can satisfy the
query using only the leaf level of the index. When an index covers a query, the
server does not access the data pages.
index selectivity
The ratio of duplicate key values in an index. An index is selective when it lets the
optimizer pinpoint a single row, such as a search for a unique key. An index on
nonunique entries is less selective. An index on values such as “M” or “F” (for male
or female) is extremely nonselective.
inner query
Another name for a subquery.
Glossary-12
Sybase SQL Server Release 11.0.x
int
A signed 32-bit integer value.
integrity constraints
Form a model to describe the database integrity in the create table statement.
Database integrity has two complementary components: validity, which
guarantees that all false information is excluded from the database, and
completeness, which guarantees that all true information is included in the
database.
intent lock
Indicates the intention to acquire a share or exclusive lock on a data page.
isolation level
Specifies the kinds of actions that are not permitted while the current transactions
execute; also called “locking level.” The ANSI standard defines four levels of
isolation for SQL transactions. Level 0 prevents other transactions from changing
data already modified by an uncommitted transaction. Level 1 prevents dirty
reads. Level 2 (not supported by SQL Server) also prevents non-repeatable reads.
Level 3 prevents both types of reads and phantoms; it is equivalent to doing all
queries with holdlock. The user controls the isolation level with the set option
transaction isolation level or with the at isolation clause of select or readtext. The default is
level 1.
join
A basic operation in a relational system which links the rows in two or more tables
by comparing the values in specified columns.
join selectivity
An estimate of the number of rows from a particular table that will join with a row
from another table. If index statistics are available for the join column, SQL Server
bases the join selectivity on the density of the index (the average number of
duplicate rows). If no statistics are available, the selectivity is 1/N, where N is the
number of rows in the smaller table.
kernel
A module within SQL Server that acts as the interface between SQL Server and the
operating system.
key
A field used to identify a record, often used as the index field for a table.
key value
Any value that is indexed.
keyword
A word or phrase that is reserved for exclusive use by Transact-SQL. Also known
as a reserved word.
keyword phrases
A set of keywords and parameters that tailor a Transact-SQL command to meet a
particular need. Also called a clause.
language cursor
A cursor declared in SQL without using Open Client. As with SQL Server cursors,
Open Client is completely unaware of the cursors and the results are sent back to
the client in the same format as a normal select.
last-chance threshold
A default threshold in SQL Server that suspends or kills user processes if the
transaction log has run out of room. This threshold leaves just enough space for the
de-allocation records for the log itself. The last-chance threshold always calls a
procedure named sp_thresholdaction. This procedure is not supplied by Sybase, it must
be written by the System Administrator.
leaf level
The level of an index at which all key values appear in order. For SQL Server
clustered indexes, the leaf level and the data level are the same. For nonclustered
indexes, the last index level above the data level is the leaf level, since key values
for all of the data rows appear there in sorted order.
livelock
A request for an exclusive lock that is repeatedly denied because a series of
overlapping shared locks keeps interfering. SQL Server detects the situation after
four denials, and refuses further shared locks.
local variables
User-defined variables defined with a declare statement.
Glossary-14
Sybase SQL Server Release 11.0.x
locking
The process of restricting access to resources in a multi-user environment to
maintain security and prevent concurrent access problems. SQL Server
automatically applies locks to tables or pages.
locking level
See isolation level.
logical expression
An expression that evaluates to TRUE (1), FALSE (0) or UNKNOWN (NULL).
Logical expressions are often used in control of flow statements, such as if or while
conditions.
logical key
The primary, foreign, or common key definitions in a database design that define
the relationship between tables in the database. Logical keys are not necessarily the
same as the physical keys (the keys used to create indexes) on the table.
logical operators
The operators and, or, and not. All three can be used in where clauses. The operator
and joins two or more conditions and returns results when all of the conditions are
true; or connects two or more conditions and returns results when any of the
conditions is true.
logical read
The process of accessing a data or index page already in memory to satisfy a query.
Compare to physical read.
login
The name a user uses to log into SQL Server. A login is valid if SQL Server has an
entry for that user in the system table syslogins.
Master Database
Controls the user databases and the operation of SQL Server as a whole. Known as
master, it keeps track of such things as user accounts, ongoing processes, and
system error messages.
master table
A table that contains data on which data in another table logically depends. For
example, in the pubs2 database, the sales table is a master table. The salesdetail table
holds detail data which depends on the master data in sales. The detail table
typically has a foreign key that joins to the primary key of the master table.
master-detail relationship
A relationship between sets of data where one set of data logically depends on the
other. For example, in the pubs2 database, the sales table and salesdetail table have a
master-detail relationship. See detail and master table.
memory pool
An area of memory within a data cache that contains a set of buffers linked
together on a MRU/LRU (most recently used/least recently used) list.
message number
The number that uniquely identifies an error message.
mirror
See disk mirror.
model database
A template for new user databases. The installation process creates model when
SQL Server is installed. Each time the create database command is issued, SQL Server
makes a copy of model and extends it to the size requested, if necessary.
Glossary-16
Sybase SQL Server Release 11.0.x
natural join
A join in which the values of the columns being joined are compared on the basis
of equality, and all the columns in the tables are included in the results, except that
only one of each pair of joined columns is included.
nested queries
select statements that contain one or more subqueries.
nonclustered index
An index that stores key values and pointers to data. The leaf level points to data
pages rather than containing the data itself.
non-repeatable read
Occur when one transaction reads a row and then a second transaction modifies
that row. If the second transaction commits its change, subsequent reads by the
first transaction yield different results than the original read.
normalization rules
The standard rules of database design in a relational database management
system.
not-equal join
A join on the basis of inequality.
null
Having no explicitly assigned value. NULL is not equivalent to zero, or to blank. A
value of NULL is not considered to be greater than, less than, or equivalent to any
other value, including another value of NULL.
numeric expression
An expression that contains only numeric values and returns a single numeric
value. In Transact-SQL, the operands can be of any SQL Server numeric datatype.
They can be functions, variables, parameters, or they can be other arithmetic
expressions. Synonymous with arithmetic expression.
object permissions
Permissions that regulate the use of certain commands (data modification
commands, plus select, truncate table and execute) to specific tables, views or columns.
See also command permissions.
objects
See database objects.
operating system
A group of programs that translates your commands to the computer, so that you
can perform such tasks as creating files, running programs, and printing
documents.
operators
Symbols that act on two values to produce a third. See comparison operators,
logical operators, or arithmetic operators.
optimizer
SQL Server code that analyzes queries and database objects and selects the
appropriate query plan. The SQL Server optimizer is a cost-based optimizer. It
estimates the cost of each permutation of table accesses in terms of CPU cost and
I/O cost.
OR Strategy
An optimizer strategy for resolving queries using or and queries using in (values list).
Indexes are used to retrieve and qualify data rows from a table. The row IDs are
stored in a worktable. When all rows have been retrieved, the worktable is sorted
to remove duplicates, and the row IDs are used to retrieve the data from the table.
outer join
A join in which both matching and nonmatching rows are returned. The operators
*= and =* are used to indicate that all the rows in the first or second tables should
be returned, regardless of whether or not there is a match on the join column.
Glossary-18
Sybase SQL Server Release 11.0.x
outer query
Another name for the principal query in a statement containing a subquery.
overflow page
A data page for a table with a nonunique clustered index, which contains only
rows that have duplicate keys. The key value is the same as the last key on the
previous page in the chain. There is no index page pointing directly to an overflow
page.
page chain
See partition.
page split
Page splits occur when new data or index rows need to be added to a page, and
there is not enough room for the new row. Usually, the data on the existing page is
split approximately evenly between the newly allocated page and the existing
page.
page stealing
Page stealing occurs when SQL Server allocates a new last page for a partition from
a device or extent that was not originally assigned to the partition.
parameter
An argument to a stored procedure.
partition
A linked chain of database pages that stores a database object.
performance
The speed with which SQL Server processes queries and returns results.
Performance is affected by several factors, including indexes on tables, use of raw
partitions compared to files, and segments.
phantoms
Occur when one transaction reads a set of rows that satisfy a search condition, and
then a second transaction modifies the data (through an insert, delete, update, and so
on). If the first transaction repeats the read with the same search conditions, it
obtains a different set of rows.
physical key
A column name, or set of column names, used in a create index statement to define
an index on a table. Physical keys on a table are not necessarily the same as the
logical keys.
physical read
A disk I/O to access a data, index, or log page. SQL Server estimates physical reads
and logical reads when optimizing queries. See logical read.
point query
A query that restricts results to a single specific value, usually using the form
“where column_value = search_argument”.
precision
The maximum number of decimal digits that can be stored by numeric and decimal
datatypes. The precision includes all digits, both to the right and to the left of the
decimal point.
prefetch
The process of performing multipage I/O’s on a table, nonclustered index, or the
transaction log. For logs, the server can fetch up to 256 pages, for nonlog tables and
indexes, the server can fetch up to 8 pages.
prefix subset
Used to refer to keys in a composite index. Search values form a prefix subset when
leading columns of the index are specified. For an index on columns A, B, and C,
these are prefix subsets: A, AB, ABC. These are not: AC, B, BC, C. See matching
index scan and non-matching index scan for more information.
primary key
The column or columns whose values uniquely identify a row in a table.
process
An execution environment scheduled onto physical CPUs by the operating
system.
Glossary-20
Sybase SQL Server Release 11.0.x
process affinity
Describes a process in which a certain SQL Server task runs only on a certain
engine, or that a certain engine runs only on a certain CPU.
projection
One of the basic query operations in a relational system. A projection is a subset of
the columns in a table.
qualified
The name of a database object can be qualified, or preceded by, the name of the
database and the object owner.
query
1. A request for the retrieval of data with a select statement.
2. Any SQL statement that manipulates data.
query plan
The ordered set of steps required to carry out a query, complete with the access
methods chosen for each table.
query tree
An internal tree structure to represent the user’s query. A large portion of query
processing and compilation is built around the shape and structure of this internal
data structure. For stored procedures, views, triggers, rules and defaults these tree
structures are stored in the sysprocedures table on disk, and read back from disk
when the procedure or view is executed.
range query
A query that requests data within a specific range of values. These include greater
than/less than queries, queries using between, and some queries using like.
recovery
The process of rebuilding one or more databases from database dumps and log
dumps. See also automatic recovery.
referential integrity
The rules governing data consistency, specifically the relationships among the
primary keys and foreign keys of different tables. SQL Server addresses referential
integrity with user-defined triggers.
reformatting strategy
A strategy used by SQL Server to resolve join queries on large tables that have no
useful index. SQL Server builds a temporary clustered index on the join columns
of the inner table, and uses this index to retrieve the rows. SQL Server estimates the
cost of this strategy and the cost of the alternative—a table scan—and chooses the
cheapest method.
relational expression
A type of Boolean or logical expression of the form:
arith_expression
relational_operator arith_expression
In Transact-SQL, a relational expression can return TRUE, FALSE, or UNKNOWN.
The results can evaluate to UNKNOWN if one or both of the expressions evaluates
to NULL.
relational operator
An operator that compares two operands and yields a truth value, such as “5 <7”
(TRUE), “ABC” = “ABCD” (FALSE) or “@value > NULL” (UNKNOWN).
response time
The time it takes for a single task, such as a Transact-SQL query sent to SQL Server,
to complete. Contrast to initial response time, the time required to return the first
row of a query to a user.
restriction
A subset of the rows in a table. Also called a selection, it is one of the basic query
operations in a relational system.
Glossary-22
Sybase SQL Server Release 11.0.x
return status
A value that indicates that the procedure completed successfully or indicates the
reason for failure.
RID
See row ID.
roles
Provide individual accountability for users performing system administration and
security-related tasks in SQL Server. The System Administrator, System Security
Officer, and Operator roles can be granted to individual server login accounts.
rollback transaction
A Transact-SQL statement used with a user-defined transaction (before a commit
transaction has been received) that cancels the transaction and undoes any changes
that were made to the database.
row
A set of related columns that describes a specific entity. Also called record.
row ID
A unique, internal identifier for a data row. The row ID, or RID, is a combination of
the data page number and the row number on the page.
rule
A specification that controls what data may be entered in a particular column, or in
a column of a particular user-defined datatype.
run values
Values of the configuration variables currently in use.
sa
The login name for the Sybase System Administrator.
scalar aggregate
An aggregate function that produces a single value from a select statement that
does not include a group by clause. This is true whether the aggregate function is
operating on all the rows in a table or on a subset of rows defined by a where clause.
(See also vector aggregate.)
scale
The maximum number of digits that can be stored to the right of the decimal point
by a numeric or decimal datatype. The scale must be less than or equal to the
precision.
search argument
A predicate in a query’s where clause that can be used to locate rows via an index.
segment
A named subset of database devices available to a particular database. It is a label
that points to one or more database devices. Segments can be used to control the
placement of tables and indexes on specific database devices.
select list
The columns specified in the main clause of a select statement. In a dependent view,
the target list must be maintained in all underlying views if the dependent view is
to remain valid.
selection
A subset of the rows in a table. Also called a restriction, it is one of the basic query
operations in a relational system.
selectivity
See index selectivity, join selectivity.
self-join
A join used for comparing values within a column of a table. Since this operation
involves a join of a table with itself, you must give the table two temporary names,
or correlation names, which are then used to qualify the column names in the rest
of the query.
server cursor
A cursor declared inside a stored procedure. The client executing the stored
procedure is not aware of the presence of these cursors. Results returned to the
client for a fetch appear exactly the same as the results from a normal select.
Glossary-24
Sybase SQL Server Release 11.0.x
server engine
See engine.
server user ID
The ID number by which a user is known to SQL Server.
shared lock
A lock created by nonupdate (“read”) operations. Other users may read the data
concurrently, but no transaction can acquire an exclusive lock on the data until all
the shared locks have been released.
sort order
Used by SQL Server to determine the order in which to sort character data. Also
called collating sequence.
spinlock
A special type of lock or semaphore that protects critical code fragments that must
be executed in a single-threaded fashion. Spinlocks exist for extremely short
durations and protect internal server data structures such as a data cache.
SQL Server
The server in Sybase’s client-server architecture. SQL Server manages multiple
databases and multiple users, keeps track of the actual location of data on disks,
maintains mapping of logical data description to physical data storage, and
maintains data and procedure caches in memory.
statement
Begins with a keyword that names the basic operation or command to be
performed.
statement block
A series of Transact-SQL statements enclosed between the keywords begin and end
so that they are treated as a unit.
stored procedure
A collection of SQL statements and optional control-of-flow statements stored
under a name. SQL Server-supplied stored procedures are called system
procedures.
subquery
A select statement that is nested inside another select, insert, update or delete statement,
or inside another subquery.
System Administrator
A user authorized to handle SQL Server system administration, including creating
user accounts, assigning permissions, and creating new databases.
system databases
The databases on a newly installed SQL Server: master, which controls user
databases and the operation of the SQL Server; tempdb, used for temporary tables;
model, used as a template to create new user databases; and sybsystemprocs, which
stores the system procedures.
system function
A function that returns special information from the database, particularly from
the system tables.
system procedures
Stored procedures that SQL Server supplies for use in system administration. These
procedures are provided as shortcuts for retrieving information from the system
tables, or mechanisms for accomplishing database administration and other tasks
that involve updating system tables.
system table
One of the data dictionary tables. The system tables keep track of information
about the SQL Server as a whole and about each user database. The master database
contains some system tables that are not in user databases.
table
A collection of rows (records) that have associated columns (fields). The logical
equivalent of a database file.
table scan
A method of accessing a table by reading every row in the table. Table scans are
used when there are no conditions (where clauses) on a query, when no index exists
on the clauses named in the query, or when the SQL Server optimizer determines
that an index should not be used because it is more expensive than a table scan.
Glossary-26
Sybase SQL Server Release 11.0.x
table-level constraint
Limits values on more than one column of a table. Enter table-level constraints as
separate comma-delimited clauses in the create statement. You must declare
constraints that operate on more than one column as table-level constraints.
task
An execution environment within the SQL Server scheduled onto engines by the
SQL Server.
temporary database
The temporary database in SQL Server, tempdb, that provides a storage area for
temporary tables and other temporary working storage needs (for example,
intermediate results of group by and order by).
text chain
A special data structure used to store text and image values for a table. Data rows
store pointers to the location of the text or image value in the text chain.
theta join
Joins which use the comparison operators as the join condition. Comparison
operators include equal (=), not equal (!=), greater than (>), less than (<), greater
than or equal to (>=), and less than or equal to (<=).
threshold
The estimate of the number of log pages required to back up the transaction log,
and the action to be taken when the amount of space falls below that value.
throughput
The volume of work completed in a given time period. It is usually measured in
transactions per second (TPS).
transaction
A mechanism for ensuring that a set of actions is treated as a single unit of work.
See also user-defined transaction.
transaction log
A system table (syslogs) in which all changes to the database are recorded.
trigger
A special form of stored procedure that goes into effect when a user gives a change
command such as insert, delete, or update to a specified table or column. Triggers are
often used to enforce referential integrity.
ungrouped aggregate
See scalar aggregate.
unique constraint
A constraint requiring that all non-null values in the specified columns must be
unique. No two rows in the table are allowed to have the same value in the
specified column. The unique constraint creates a unique index on the specified
columns to enforce this data integrity.
unique indexes
Indexes which do not permit any two rows in the specified columns to have the
same value. SQL Server checks for duplicate values when you create the index (if
data already exists) and each time data is added.
update
An addition, deletion, or change to data, involving the insert, delete, truncate table, or
update statements.
update in place
See in-place update.
update locks
Locks which ensure that only one operation can change data on a page. Other
transactions are allowed to read the data through shared locks. SQL Server applies
update locks when an update or delete operation begins.
variable
An entity that is assigned a value. SQL Server has two kinds of variables, called
local variables and global variables.
vector aggregate
A value that results from using an aggregate function with a group by clause. See
also scalar aggregate.
view
An alternative way of looking at the data in one or more tables. Usually created as
a subset of columns from one or more tables.
Glossary-28
Sybase SQL Server Release 11.0.x
view resolution
In queries that involve a view, the process of verifying the validity of database
objects in the query, and combining the query and the stored definition of the view.
wash area
An area of a buffer pool near the LRU end of the MRU/LRU page chain. Once
pages enter the wash area, SQL Server initiates an asynchronous write on the
pages. The purpose of the wash area is to provide clean buffers at the LRU for any
query that needs to perform a disk I/O.
wildcard
Special character used with the Transact-SQL like keyword that can stand for one
(the underscore, _) or any number of (the percent sign, %) characters in pattern-
matching.
write-ahead log
A log, such as the transaction log, that SQL Server automatically writes to when a
user issues a statement that would modify the database. After all changes for the
statement have been recorded in the log, they are written to an in-cache copy of the
data page.
Glossary-30
Index
The index is divided into three sections:
• Symbols
Indexes each of the symbols used in Sybase SQL Server
documentation.
• Numerics
Indexes entries that begin numerically.
• Subjects
Indexes subjects alphabetically.
Page numbers in bold are primary references.
Allocation map. See Object Allocation Average disk I/Os returned (sp_sysmon
Map (OAM) report 19-14
Allocation pages Average lock contention, sp_sysmon
large I/O and 19-60 report on 19-42
Allocation units 3-4, 3-8
database creation and 18-1
dbcc report on 5-7 B
and keyword Backup Server 18-5
subqueries containing 7-30 Backups 18-5 to 18-6
any keyword network activity from 16-9
subquery optimization and 7-27 planning 1-4
Application design 19-4 Base cost (optimization) 9-19
cursors and 12-16 Batch processing
deadlock avoidance 11-27 bulk copy and 18-7
deadlock detection in 11-26 I/O pacing and 19-17
delaying deadlock checking 11-27 managing denormalized data
denormalization for 2-9 with 2-18
DSS and OLTP 15-13 preformance monitoring and 19-4
index specification 9-8 temporary tables and 14-12
isolation level 0 considerations 11-17 transactions and lock
levels of locking 11-33 contention 11-33
managing denormalized data bcp (bulk copy utility) 18-6
with 2-16, 2-17 heap tables and 3-12
network packet size and 16-5 large I/O for 15-13
network traffic reduction with 16-8 reclaiming space with 3-20
primary keys and 6-27 temporary tables 14-3
procedure cache sizing 15-6 binary datatype
SMP servers 17-11 null becomes varbinary 10-6
temporary tables in 14-4 Binary expressions xl
user connections and 19-15 Binding
user interaction in transactions 11-33 caches 15-12, 15-29
Architecture, Server SMP 17-5 objects to data caches 3-15
Arguments, search. See Search tempdb 14-9, 15-13
arguments testing prefetch size and 9-10
Artificial columns 6-44 transaction logs 15-13
Ascending scan showplan message 8-28 Blocking network checks, sp_sysmon
Ascending sort 6-22 report on 19-12
@@pack_received global variable 16-6 Blocking process
@@pack_sent global variable 16-6 avoiding during mass
@@packet_errors global variable 16-6 operations 11-34
Auditing partitioning to avoid 13-13
disk contention and 13-2 sp_lock report on 11-25
performance effects 15-36 sp_who report on 11-25
queue, size of 15-37 Brackets. See Square brackets [ ]
Index-2
Sybase SQL Server Release 11.0.x
Index-4
Sybase SQL Server Release 11.0.x
Index-6
Sybase SQL Server Release 11.0.x
Index-8
Sybase SQL Server Release 11.0.x
Index-10
Sybase SQL Server Release 11.0.x
Index-12
Sybase SQL Server Release 11.0.x
Index-14
Sybase SQL Server Release 11.0.x
Index-16
Sybase SQL Server Release 11.0.x
Index-18
Sybase SQL Server Release 11.0.x
Index-20
Sybase SQL Server Release 11.0.x
Index-22
Sybase SQL Server Release 11.0.x
Index-24
Sybase SQL Server Release 11.0.x
Index-26
Sybase SQL Server Release 11.0.x
W
Wash area 15-8
configuring 15-29
Wash marker 3-15
where clause
creating indexes for 6-25
evaluating 9-17
optimizing 9-15
search arguments and 7-9
table scans and 3-11
Worktables
distinct and 8-20
locking and 14-10
or clauses and 7-23
order by and 8-21
reads and writes on 6-13
Index-28