Maximiliano Damian Accotto
MVP en SQL Server
http://www.triggerdb.com
http://blog.maxiaccotto.com
Best Practices:
Establish a
baseline
Repeat
(if desired)
Measure
performance
Optimize for
real-world
workloads
Identify
bottlenecks
Monitor/review
performance
regularly
Make one
change at a time
Focus on
specific issues
System/OS
SQL
Server
QueryLevel
Activity Monitor
Windows
Performance Monitor
Database Engine
Tuning Advisor
SQL Profiler / SQL
Trace
Database Engine
Tuning Advisor
Alerts (PerformanceBased)
Dynamic
Management Views
(DMVs)
Query Execution
Plans
Category
Largest single database
Largest table
Biggest total data 1 application
Highest database transactions
per second 1 db (from Perfmon)
Fastest I/O subsystem in
production (SQLIO 64k buffer)
Fastest real time cube
data load for 1TB
Largest cube
Metric
70 TB
20 TB
88 PB
130,000
18 GB/sec
5 sec latency
30 minutes
12 TB
Company Profile
Worlds largest publicly listed online gaming platform
20 million registered customers in more than 25 core markets
>14,000 bets offered simultaneously on more than 90 sports
~ 90 live events with videos every day bwin worlds largest
broadcaster of live sports
>70,000 payment transactions (PCI Level 1 and ISO 27001 certified)
per day
Business Requirements
Failure is not an option
100% transactional consistency, zero data loss
99.998% availability...even after loss of a data center
Performance critical
Must scale to handle every user and and give them a great experience
Protect users privacy and financial information
Provide a secure PCI compliant environment for all customers
SQL Server Environment
100+ SQL Server Instances
120+ TB of data
1,400+ Databases
1,600+ TB storage
450,000+ SQL Statements per second on a single server
500+ Billion database transactions per day
Core component in solutions designated for:
Financial transactions
Gaming environments
Tracking user state throughout the system
Solutions primarily scale-up using commodity hardware
SQL Server Infrastructure
Almost 200 production SQL Server instances
Single High Transaction throughput system provides:
Mission critical to the business in terms of performance and
availability
Project Description
Maintains US Equities and Options trading data
Processing 10s of billions of transactions per day
Average over 1 million business transactions/sec into SQL Server
Peak: 10 million/sec
Require last 7 years of online data
Data is used to comply with government regulations
Requirements for real-time query and analysis
Approximately 500 TB per year, totaling over 2PB of uncompressed data
Largest tables approaching 10TB (page compressed) in size
Early Adopter and upgrade to SQL Server 2014 in-order-to:
Better manage data growth
Improve query performance
Reduce database maintenance time
Data at this scale require breaking things down into manageable units:
Separate data into different logical areas:
A database per subject area (17)
A database per subject area per year (last 7 years)
Table and Index Partitioning:
255 partitions per database
25,000 filegroups
Filegroup to partition alignment for easier management/less impact moving data
Filegroup backups
Taking advantage of compression:
Compression per partition
Backup compression
Hardware
Operating
System
Sql Server
Database
Design
Application
Use Disk Alignment at 1024KB
Use GPT if MBR not large enough
Format partitions at 64KB allocation unit size
One partition per LUN
Only use Dynamic Disks when there is a need to
stripe LUNs using Windows striping (i.e. Analysis
Services workload)
Tools:
Diskpar.exe, DiskPart.exe and DmDiag.exe
Format.exe, fsutil.exe
Disk Manager
Here is a graph of performance improvement from
Microsofts white paper:
Sector Alignment
Basic MBR Partition Example
Commands
Diskpart
Select disk 0
List partition
Sector Alignment
Basic MBR Partition Example
Sample Output
RAID-1 is OK for log files and datafiles but you can do
better
RAID-5 is a BIG NO! for anything except read-only or readmostly datafiles
RAID-10 is your best bet (but most expensive)
NEVER put OLTP log files on RAID-5!
If you can afford it:
Stripe And Mirror Everything (SAME) one HUGE
RAID-10
SSD is even better consider for tempdb and/or log files
If adventurous, use RAW partitions (see BOL)
As much as you can get
and more!
64-bit is great for memory-intensive workloads
If still on 32-bit, use AWE
Are you sharing the box? How much memory
needs to be set aside? Set max/min server
memory as needed.
Observe where all this memory goes:
Data Cache vs. Procedure Cache vs. Lock Manager
vs. Other
Keep an eye for A significant part of sql server
process memory has been paged out error in the
errorlog.
Min/max server memory when needed.
Locked pages:
32-bit when using AWE
x64 Enterprise Edition just grant Lock Pages in
Memory privilege
X64 Standard Edition must have hotfix and enable
TF845 (see KB970070 for details)
Large Pages:
ONLY dedicated 64-bit servers with more than 8GB or
RAM!
Enabled with TF834 see KB920093
Server sloooooooow to start be warned!
CPU is rarely the real bottleneck look for WHY
we are using so much CPU power!
Use affinity mask as needed:
Splitting the CPUs between applications (or SQL
instances)
Moving SQL Server OFF the CPU that serves NIC
IRQs
With a really busy server:
Increase max worker threads (but be careful its not
for free!)
Consider lightweight pooling (be SUPER careful no
SQLCLR and some other features see KB319942
and BOL).
Parallelism is good:
Gives you query results faster
But at a cost of using a lot more CPU resources
MAXDOP setting is your friend:
On server level (sp_configure max degree of
parallelims)
On Resource Governor workload group
On a single query (OPTION (MAXDOP 1))
Often overlooked:
sp_configure cost threshold for parallelism (default 5)
Data file layout matters
Choose your Recovery Model carefully:
Full highest recoverability but lowest performance
Bulk-logged middle ground
Simple no log backups, bulk operations minimally
logged
Always leave ON:
Auto create statistics
Auto update statistics
Always leave OFF:
Auto shrink
Optimizes processing times Rebuild
Uses more CPU cores
ALTER INDEX ALL ON Person.Person
REBUILD WITH (MAXDOP= 4)
MaxDOP
CPU ms
Duration ms
7344
7399
9797
5997
15845
5451
Designing High Performance I/O systems
SQL Servers View of I/O
High rate of allocations to any data files can result in scaling
issues due to contention on allocation structures
Impacts decision for number of data files per file group
Especially a consideration on servers with many CPU cores
PFS/GAM/SGAM are structures within data file which manage
free space
Easily diagnosed by looking for contention on PAGELATCH_UP
Either real time on sys.dm_exec_requests or tracked in
sys.dm_os_wait_stats
Resource description in form of DBID:FILEID:PAGEID
Can be cross referenced with
sys.dm_os_buffer_descriptors to determine type of page
More data files does not necessarily equal better
performance
Determined mainly by 1) hardware capacity & 2) access patterns
Number of data files may impact scalability of heavy write
workloads
Potential for contention on allocation structures (PFS/GAM/SGAM
more on this later)
Mainly a concern for applications with high rate of page allocations
on servers with >= 8 CPU cores
Can be used to maximize # of spindles Data files can be
used to stripe database across more physical spindles
Provides less flexibility with respect to mapping data
files into differing storage configurations
Multiple files can be used as a mechanism to stripe
data across more physical spindles and/or service
processors (applies to many small/mid range arrays)
A single file prevents possible optimizations related to
file placement of certain objects (relatively uncommon)
Allocations heavy workloads (PFS contention) may
incur waits on allocation structures, which are
maintained per file.
The primary filegroup contains all system
objects
These CANNOT be moved to another
filegroup
If using file group based backup, you must
backup PRIMARY as part of regular backups
If not, you cannot restore!
Primary must be restored before other filegroups
Best Practice:
Allocate at least on additional filegroup and set
this to the default.
Do not place objects in Primary
Microsoft Confidential
Virtual LOG Files & Performance
SQL Servers View of I/O
TRACE FLAGs
SQL Servers View of I/O
DBCC TRACEON
Use -1 to turn on trace flag globally
DBCC TRACEOFF
DBCC TRACESTATUS
-T startup flag
Use T# separated by semi-colon (;)
Trace flag 610 controls minimally logged inserts into indexed tables
Allows for high volume data loading
Less information is written to the transaction log
Transaction log file size can be greatly reduced
Introduced in SQL Server 2008
Very fussy
Documented:
Data Loading Performance Guide white paper
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Trace flag 1224 disables lock escalation based on the number of locks
Memory pressure can still trigger lock escalation
Database engine will escalate row or page locks to table locks
40% of memory available for locking
sp_configure locks
Non-AWE memory
Scope: Global | Session
Documented: BOL
Forces all files to auto-grow at the same
time
Trace flag 1118 directs SQL Server to allocate full
extents to each tempdb objects (instead of mixed
extents)
Less contention on internal structures such as
SGAM pages
Story has improved in subsequent releases of SQL
Server
Local and global temporary tables (and
indexes if created)
User-defined tables and indexes
Table variables
Tables returned in table-valued functions
Note: This list, and the following lists, are not designed to be all inclusive.
Work tables for DBCC CHECKDB and DBCC
CHECKTABLE.
Work tables for hash operations, such as joins and
aggregations.
Work tables for processing static or keyset cursors.
Work tables for processing Service Broker objects.
Work files needed for many GROUP BY, ORDER BY,
UNION, SORT, and SELECT DISTINCT operations.
Work files for sorts that result from creating or rebuilding
indexes (SORT_IN_TEMPDB).
The version store is a collection of pages used to store
row-level versioning of data.
There are two types of version stores:
1. Common Version Store: Examples include:
Triggers.
Snapshot isolation or read-committed snapshot
isolation (uses less TEMPDB than snapshot
isolation).
MARS (when multiple active result sets are
used).
2. Online-Index-Build Version Store:
Used for online index builds or rebuilds. EE
edition only.
TEMPDB is dropped and recreated every time the SQL
Server service is stopped and restarted.
When SQL Server is restarted, TEMPDB inherits many of
the characteristics of model, and creates an MDF file of
8MB and an LDF file of 1MB (default setting).
By default, autogrowth is set to grow by 10% with
unrestricted growth.
Each SQL Server instance may have only one TEMPDB,
although TEMPDB may have multiple physical files.
Many TEMPDB database options cant be changed (e.g.
Database Read-Only, Auto Close, Auto Shrink).
TEMPDB only uses the simple recovery model.
TEMPDB may not be backed up, restored, be mirrored,
have database snapshots made of it, or have many
DBCC commands run against it.
TEMPDB may not be dropped, detached, or attached.
TEMPDB logging works differently from regular logging.
Operations are minimally logged, as redo information is not
included, which reduces TEMPDB transaction log activity.
The log is truncated constantly during the automatic
checkpoint process, and should not grow significantly,
although it can grow with long-running transactions, or if
disk I/O is bottlenecked.
If a TEMPDB log file grows wildly:
Check for long-running transactions (and kill them if necessary).
Check for I/O bottlenecks (and fix them if possible).
Manually running a checkpoint can often temporally reduce a
wildly growing log file if bottle-necked disk I/O is the problem.
Generally, there are three major problems you
run into with TEMPDB:
1.
2.
3.
TEMPDB is experiencing an I/O bottleneck, hurting server
performance.
TEMPDB is experiencing contention on various global allocation
structures (metadata pages) as temporary objects are being created,
populated, and dropped. E.G. Any space-changing operation
acquires a latch on PFS, GAM or SGAM pages to update space
allocation metadata. A large number of such operations can cause
excessive waits while latches are acquired, creating a bottleneck
(hotspot), and hurting performance.
TEMPDB has run out of space.
Ideally, you should be monitoring all these on a
proactive basis to identify potential problems.
Use Performance Monitor to determine how busy the disk is where
your TEMPDB MDF and LDF files are located.
LogicalDisk Object: Avg. Disk Sec/Read: The average time, in
seconds, of a read of data from disk. Numbers below are a general
guide only and may not apply to your hardware configuration.
Less than 10 milliseconds (ms) = very good
Between 10-20 ms = okay
Between 20-50 ms = slow, needs attention
Greater than 50 ms = serious IO bottleneck
LogicalDisk Object: Avg. Disk Sec/Write: The average time, in
seconds, of a write of data to the disk. See above guidelines.
LogicalDisk: %Disk Time: The percentage of elapsed time that the
selected disk drive is busy servicing read or write requests. A general
guideline is that if this value > 50%, there is a potential I/O bottleneck.
Use these performance counters to monitor allocation/deallocation
contention in SQL Server:
Access Methods:Worktables Created/sec: The number of work tables
created per second. Work tables are temporary objects and are used to
store results for query spool, LOB variables, and cursors. This number
should generally be less than 200, but can vary based on your hardware.
Access Methods:Workfiles Created/sec: Number of work files created
per second. Work files are similar to work tables but are created by
hashing operations. Used to store temporary results for hash and hash
aggregates. High values may indicate contention potential. Create a
baseline.
Temp Tables Creation Rate: The number of temporary tables
created/sec. High values may indicate contention potential. Create a
baseline.
Temp Tables For Destruction: The number of temporary tables or
variables waiting to be destroyed by the cleanup system thread. Should
be near zero, although spikes are common.
Minimize the use of TEMPDB
Enhance temporary object reuse
Add more RAM to your server
Locate TEMPDB on its own array
Locate TEMPDB on a fast I/O subsystem
Leave Auto Create Statistics & Auto Update Statistics on
Pre-allocate TEMPDB space everyone needs to do this
Dont shrink TEMPDB if you dont need to
Divide TEMPDB among multiple physical files
Avoid using Transparent Data Encryption (2008)
Generally, if you are building a new SQL Server instance, it
is a good idea to assume that TEMPDB performance will
become a problem, and to take proactive steps to deal with
this possibility.
It is easier to deal with TEMPDB performance issues
before they occur, than after they occur.
The following TEMPDB performance tips may or may not
apply to your particular situation.
It is important to evaluate each recommendation, and
determine which ones best fit your particular SQL Servers
instance. Not a one size fits all approach.
If latches are waiting to be acquired on TEMPDB pages for
various connections, this may indicate allocation page
contention.
Use this code to find out:
SELECT session_id, wait_duration_ms, resource_description
FROM sys.dm_os_waiting_tasks
WHERE wait_type like 'PAGE%LATCH_%' AND resource_description like
'2:%'
Allocation Page
Contention:
2:1:1 = PFS Page
2:1:2 = GAM Page
2:1:3: = SGAM Page
Installation & Configuration Best Practices for Performance
Server Role. Server should be a member server of a Microsoft
Active Directory network, and dedicated only to SQL Server.
Windows File, Print, and Domain Controller services should be
left for other machines.
BIOS. Change Power Management to Maximum Performance.
BIOS. Disable QPI Power Management.
BIOS. Change Power Profile to Maximum Performance.
System Architecture. Use 64-bit architecture server.
BIOS. Change Power Regulator to High Performance Mode.
32-Bit Systems. Include de /PAE parameter inside the boot.ini
file on Windows Server 2003 on servers with more than 4GB
RAM.
SQL Server Edition. Use the DEVELOPER edition on
development and test servers. Use the ENTERPRISE edition on
QA and Production servers.
RAM Modules. Validate with the servers manufacturer lowlatency recommendations on CPU and memory SIMMs
combinations, as well as memory SIMMs location on multiple
memory channels per processor.
RAM per CPU Core. For OLTP systems, use 2GB-4GB RAM
per CPU Core.
RAM per CPU Socket in Fast Track v3 (Data Warehousing).
For 2-CPU Socket use minimum of 96 GB RAM. For 4-CPU
Socket use minimum of 128 GB RAM. For 8-CPU Socket use
minimum of 256 GB RAM.
Processor Scheduling. Be sure that in Computer properties,
Performance Options, the Processor Scheduling parameter is
configured for Background Services.
Network Interface Cards. Have, at least, two network interface
cards connected to two different networks in order to divide
application load from administrative load.
CPU Cache. Use servers with CPUs that has L3 memory
cache.
Whitepapers. Look for Low-Latency best
configurations on server manufacturers websites.
BIOS. Disable CPU Hyper-Threading (or Logical Processor) at
the BIOS level. Use Intels Processor ID utility to verify it.
BIOS. Disable CPU Turbo Mode (or Turbo Boost Optimization).
BIOS. Disable CPU C-States (or C-3, C6, etc.).
BIOS. Disable CPU C1E.
practices
Installation & Configuration Best Practices for Performance
Network Interface Cards. Configure each network interface
adapter for Maximize data throughput for network applications.
Network Interface Cards. For OLAP systems (Data
Warehouses and Cubes), Database Mirroring, Log Shipping,
and Replication evaluate using Jumbo Frames (9-Mbps) on
all devices that interact with each other (switches, routers, and
NICs).
Fast Track v3 (DW) Disks. For Windows operating system
and SQL Server binary files, use a 2-Disk Spindles RAID-1
local disks array.
Disk Volumes. Assign separate virtual disks (ex. SAN LUNs)
for SQL Server data, log, tempdb, backups.
Disk Host Bus Adapter (HBA). Insert the HBA adapter into the
fastest PCI-E slot.
Disk Volumes. Use Solid-State (SSD) disks or 15K disks.
PCIe x4 v2.0 delivers up to 2GB/sec.
Disk Volumes. Use RAID-10 (or RAID-1) arrays when possible.
Use RAID-5 as last option. Never use RAID-0. RAID-5 is
excellent for reading, but not best for writing (specially bad in
random write). On direct-attached systems (DAS), if you need
to balance performance and space between solid-state disks
(SSD) and 15K disks (SAS), one strategy is to have solid-state
disk at RAID-5 and 15k disks at RAID-10.
PCIe x4 v1.0 delivers up 1GB/sec.
RAID Controller. In virtual disks, indicate cache configuration
in Write Policy = Write-Through (instead of Write-Back). The
objective is to acknowledge the operating system the
completion of the transaction when is written to the storage
system instead of the RAID controllers cache. Otherwise, is a
consistency risk if the controllers battery is not working and
energy goes down.
PCIe x1 v2.0 delivers up to 500MB/sec.
PCIe x1 v1.0 delivers up to 250MB/sec.
Disk Host Bus Adapter (HBA). Configure the HBAs Queue
Depth parameter (in Windows Registry) with the value that
reports the best performance on SQLIO tests (x86 and x64
only) or SQLIOSIM (x86, x64, and IA64).
Fast Track v3 (DW) Disks. For data files (*.MDF, *.NDF) use
multiple SAN/DAS storage enclosures that have multiple RAID10 groups each one with at least 4-spindles, but dedicate one
RAID-10 group on each storage enclosure for log files (*.LDF).
In Fast Track v3 tempdb is mixed with user databases.
Disk Volumes. Have each operating system disk partitioned as
one volume only. Dont divide each disk into multiple logical
volumes.
Installation & Configuration Best Practices for Performance
Disk Volumes. Partition each disk volume with Starting Offset
of 1024K (1048576).
Disk Volumes. Do NOT use Windows NTFS File Compression.
Disk Volumes. Format disk volumes using NTFS. Do not use
FAT or FAT32.
Disk Volumes. Use Windows Mount Point Volumes (folders)
instead of drive letters in Failover Clusters.
Disk Volumes. Format each SQL Server disk volume (data,
log, tempdb, backups) with Allocation Unit of 64KB, and do a
quick format if volumes are SAN Logical Units (LUNs).
Disk Volumes. Ratio #1. Be sure that the division result of Disk
Partition Offset (ex. 1024KB) RAID Controller Stripe Unit Size
(ex. 64KB) = equals an integer value. NOTE: This specific ratio
is critical to minimize disk misalignment.
Disk Volumes. Assign a unique disc volume to the MS DTC log file.
Also, before installing a SQL Server Failover Cluster, create a
separate resource dedicated to MS DTC.
Windows Internal Services. Disable any Windows service not
needed for SQL Server.
Windows Page File. Be sure that Windows paging is configure to use
each operating system disk only. Do not include paging file on any of
SQL Server disks.
Antivirus. The antivirus software should be configure to NOT scan
SQL Server database, logs, tempdb, and backup folders (*.mdf, *.ldf,
*.ndf, *.bak) .
SQL Server Engine Startup Flags for Fast Track v3 (Data
Warehousing). Start the SQL Server Engine with the -E and -T1117
startup flags.
SQL Server Service Accounts. Assign a different Active Directory
service account to each SQL Server service installed.
Disk Volumes. Ratio #2. Be sure that the division result of
RAID Controller Stripe Unit Size (ex. 64KB) Disk Partition
Allocation Unit Size (ex. 64KB) = equals an integer value.
Service Account and Windows Special Rights. Assign the SQL
Server service account the following Windows user right policies: 1)
Lock pages in memory, and 2) Perform volume maintenance tasks.
Fast Track v3 (DW) Multi-path I/O (MPIO) to SAN. Install
and Multi-Path I/O (MPIO), configure each disk volume to have
multiple MPIO paths defined with, at least, one Active path, and
consult SAN vendor prescribe documentations.
Address Windows Extensions (AWE). If the SQL Server service
account has the Lock pages in memory Windows user right, then
enable the SQL instance AWE memory option. ( Note: AWE was
removed from SQL Server 2012; use 64-bit! ).
Installation & Configuration Best Practices for Performance
Instance Maximum Server Memory. If exist only one (1) SQL
Database Instance and no other SQL engines, then configure
the instances Maximum Server Memory option with a value of
85% the global physical memory available.
Tempdb Data Files. Be sure that the tempdb database has the
same amount of data files as CPU cores and with the same
size.
Startup Parameter T1118. Evaluate the use of trace flag T1118
as a startup parameter for the RDBMS engine to minimize
allocation contention in tempdb.
Maximum Degree of Parallelism (MAXDOP). For OLTP
systems, configure the instances MAXDOP=1 or higher (up to
8) depending on the number of physical CPU chips. For OLAP
systems, configure MAXDOP=0 (zero).
Maximum Worker Threads. Configure
Maximum Worker Threads = 0 (zero).
Boost SQL Server Priority. Configure the instances Boost
SQL Server Priority=0 (zero).
Database Data and Log Default Locations. Configure the
instance database default locations for data and log files.
Backup Files Default Location. Configure the instance backup
location.
the
instances
Backup Compression. In SQL Server 2008, enable the
instance backup compression option.
Filegroups. Before creating any database object (tables,
indexes, etc.), create a new default filegroup (NOT PRIMARY)
for data.
Data and Log Files Initial Size. Pre-allocate data and log files
sizes. This will helps to minimize disk block fragmentation and
consuming time increasing file size stopping process until it
ends.
Fast Track v3 (DW) Compression. For Fact Tables use
Page Compression. In the other hand, compression for
Dimension tables should be considered on a case-by-case
basis.
Fast Track v3 (DW) Index Defragmentation.
When
defragmenting indexes, use ALTER INDEX [index_name] on
[schema_name].[table_name] REBUILD (WITH MAXDOP = 1,
SORT_IN_TEMPDB = TRUE) to improve performance and
avoid filegroup fragmentation. Do not use the ALTER INDEX
REORGANIZE statement. To defrag indexes specially on FACT
TABLES
from
data
warehouses,
include
DATA_COMPRESSION = PAGE.
Tools. Use the Microsoft SQL Server 2008 R2 Best Practices
Analyzer (BPA) to determine if something was left or not
configured vs. best practices.
Installation & Configuration Best Practices for Performance
Tools. Use Microsoft NT Testing TCP Tool (NTttcp) to
determine networking actual throughput.
Tools. Use Microsoft SQLIO and Microsoft SQLIOSim to stress
test storage and validate communication errors.
Tools. Use CPUID CPUz to determine processor information,
specially at which speed is currently running.
Tools. Use Intel Processor Identification to determine processor
information, specially if Hyperthreading is running.
Object
Counter
Value
Notes
Paging
$Usage
<70%
Amount of page file currently in use
Processor
% Processor
Time
<= 80%
The higher it is, the more likely users
are delayed.
Processor
% Privilege
Time
<30% of
%
Processo
r Time
Amount of time spent executing kernel
commands like SQL Server IO
requests.
Process(sqlservr)
Process(msmdsrv
)
% Processor
Time
< 80%
Percentage of elapsed time spent on
SQL Server and Analysis Server
process threads.
System
Processor
Queue Length
<4
< 12 per CPU is good/fair, < 8 is better,
< 4 is best
Logical Disk Counter
Storage Guys term
Description
Disk Reads / Second
Disk Writes / Second
IOPS
Measures the Number of I/Os per second
Discuss with vendor sizing of spindles of different
type and rotational speeds
Impacted by disk head movement (i.e. short stroking
the disk will provide more I/O per second capacity)
Average Disk sec / read
Average Disk sec / write
Latency
Measures disk latency. Numbers will vary, optimal
values for averages over time:
1 - 5 ms for Log (Ideally 1ms or better)
5 - 20 ms for Data (OLTP) (Ideally 10ms or
better)
<=25-30 ms for Data (DSS)
Average Disk Bytes / Read
Average Disk Bytes / Write
Block Size
Measures the size of I/Os being issued. Larger I/O
tend to have higher latency (example:
BACKUP/RESTORE)
Avg. / Current Disk Queue
Length
Outstanding or
waiting IOPS
Should not be used to diagnose good/bad
performance. Provides insight into the applications
I/O pattern.
Disk Read Bytes/sec
Disk Write Bytes/sec
Throughput or
Aggregate Throughput
Measure of total disk throughput. Ideally larger
block scans should be able to heavily utilize
connection bandwidth.
Object
Counter
Value
Notes
Physical Disk
Avg Disk
Reads/sec
<8
> 20 is poor, <20 is good/fair, <12 is better, <8
is best
Physical Disk
Avg Disk
Writes/sec
< 8 or <1
Without cache: > 20 poor, <20 fair, <12 better,
<8 best.
With cache > 4 poor, <4 fair, <2 better, <1 best
Memory
Available Mbytes
>100
Amount of physical memory available to run
processes on the machine
SQL Server:
Memory Manager
Memory Grants
Pending
~0
Current number of processes waiting for a
workspace memory grant.
Page Life
Expectancy
>=300
Time, in seconds, that a page stays in the
memory pool without being referenced before it
is flushed
Free List
Stalls/sec
<2
Frequency that requests for db buffer pages
are suspended because there are no buffers.
SQL Server:
Memory Manager
SQL Server: Buffer
Manager
Object
Counter
Value
Notes
:Access Methods
Forwarded
Records/sec
<10*
Tables with records traversed by a pointer.
Should be < 10 per 100 batch requests/sec.
:Access Methods
Page Splits/sec
<20*
Number of 8k pages that filled and split into two
new pages. Should be <20 per 100 batch
requests/sec.
:Databases
Log Growths/sec;
Percent Log used
< 1 and
<80%,
resp
Dont let transaction log growth happen
randomly!
:SQL Statistics
Batch
Requests/sec
No firm number without benchmarking, but >
1000 is a very busy system.
:SQL Statistics
Compilations/sec
;Recompilations/
sec
Compilations should be <10% of batch
requests/sec; Recompilations should be <10%
of compilations/sec
:Locks
Deadlocks/sec
<1
Nbr of lock requests that caused a deadlock.
DONT RUN SQL Profiler in the server.
Then what?
Run SQL Profiler in your computer.
Connect to the server.
Indicate the events and columns wanted.
Filter by the database to be evaluated.
Run the trace for 1 second, then stop it.
Export the trace as script.
Optimize the script.
And then and only then, run the SQL Trace Script in the server.
And to evaluate?
Use the fn_trace_gettable() function to query the content of the
SQL Trace file(s).
You can use the SQL Trace file(s) with SQL Server Database
Engine Tuning Advisor to evaluate for the creation of new indexes.
General event handling
Goal is to make available well-defined data
in XML format from execution points in code
Baked into SQL Server code
Layers on top of Event Tracing for Windows
Used by
SQL Trace, Performance Monitor and SQL
Server Audit
Windows Event Log or SQL Error Log
As desired by users in admin or development
Introduced in SQL Server 2008
Superset of Extended Events
Can be used in conjunction
with Extended Events
Can be a consumer or
target of Extended
Events
Kernel level facility
Built in set of objects in EXE or DLL (aka Module)
SQL Server has three types of packages
Package0
SQLServer
SQLOS
Packages one or more object types
Event
Actions
Predicates
Targets
Types
Maps
Monitoring point of interest in code of a
module
Event firing implies:
Point of interest in code reached
State information available at time event fired
Events defined statically in package
registration
Versioned schema defines contents
Schema with well-defined data types
Event data always has columns in same order
Targets can pick columns to consume
Targets are event consumers
Targets can
Write to a file
Aggregate event data
Start a task/action that is related to an
event
Process data synchronously or
asynchronously
Either file targets or In-memory targets
File Targets
Event File
ETW File
In-Memory Targets
Ring Buffer
Event Bucketing
Event Pairing
Synchronous Event Counting
Executed on top of events before event info
stored in buffers (which may be later sent to
storage)
Currently used to
Get additional data related to event
TSQL statement
User
TSQL process info
Generate a mini-dump
Defined in ADD/ALTER EVENT clause
Logical expression that gate event to fire
Pred_Compare operator for pair of values
Value Compare Value
Example: Severity < 16
Example: Error_Message = Hello World!
Pred_Source generic data not usually in
event
Package.Pred_Source Compare Value
Example: SQLServer.user_name = Chuck
Example: SQLOs.CPU_ID = 0
Defined in ADD/ALTER EVENT clause
Real-time data capture
No performance penalty
Based on Event Tracing for Windows (ETW)
Full programmability support
Packages
Events and Actions
Filters and Predicates
Sessions
Targets