Example Health Check Report
Example Health Check Report
Prepared by
UpSearch, LLC Colleen Morrow
177 Front Street David Maxwell
Berea, OH 44017
Michael Fal
www.upsearch.com/sql/ Ben Miller
UpSearch, LLC © 2016
Proprietary and Confidential
This is an example report with recommendations.
Contents
Executive Summary....................................................................................................................................... 3
Current Environment Overview .................................................................................................................... 4
Server and Infrastructure .......................................................................................................................... 4
SQL Server Instance .................................................................................................................................. 5
Database Information ............................................................................................................................... 5
SQL Server Performance Overview ............................................................................................................... 6
Host Performance (CPU and Memory) ..................................................................................................... 6
Disk and I/O Subsystem Performance ...................................................................................................... 6
SQL Instance Activity and Performance .................................................................................................... 7
Query and Indexing Overview ...................................................................................................................... 8
Long Running and Expensive Queries ....................................................................................................... 8
Top Missing Indexes .................................................................................................................................. 8
Duplicate Indexes ...................................................................................................................................... 9
Analysis ....................................................................................................................................................... 10
Configuration .......................................................................................................................................... 10
Performance ........................................................................................................................................... 11
Disaster Recovery.................................................................................................................................... 14
Security ................................................................................................................................................... 15
Additional Concerns ................................................................................................................................ 16
Conclusions ................................................................................................................................................. 17
About the Authors ...................................................................................................................................... 18
About UpSearch .......................................................................................................................................... 22
Scenario
SG Heavy Industries, LLC (“SG Heavy Industries” or “client”) engaged UpSearch, LLC (“UpSearch”) to
facilitate an assessment of SG Heavy Industries’ current database platform in order to identify
performance tuning and monitoring options.
Scope of Work
UpSearch provided SG Heavy Industries with Colleen Morrow, David Maxwell, Michael Fal and Ben Miller
to gather and analyze operating system and SQL Server configurations and performance metrics,
recommend performance changes based on SG Heavy Industries’ environment and industry best
practices, and subsequently present this report.
Test Strategy
UpSearch ran a series of PowerShell scripts between February 15 and 16, 2016 (assessment period), which
explored and gathered information from both the Windows operating system and from SQL Server on
server ENTERPRISE. In addition, UpSearch ran a set of queries against the SQL Server instance and against
the databases running on the server. UpSearch also collected performance counters for both the
operating system and SQL Server.
Conclusion
UpSearch provided SG Heavy Industries this SQL Server Health Check Report after evaluating current
configurations, performance metrics and existing maintenance procedures in order to better understand
current performance bottlenecks. The findings and recommended actions of this report are not only
intended to improve performance, but are also made with an eye towards simplifying and standardizing
SQL Server management, ensuring business continuity and managing security.
The assessment revealed several configuration and performance related findings. The rough order of
magnitude estimate for UpSearch’s time required to complete the ensuing recommendations is between
50 and 65 hours. See page 18 for time estimates by recommendation.
The following overview represents ENTERPRISE current state as of the assessment period.
Logical Disk Logical Disk Logical Disk Logical Disk Logical Disk Allocation
Caption Size (GB) Free Space (GB) Volume Name Unit Size (B)
C: 60.00 18.63 OS 4,096
D: 239.99 21.29 Data 65,536
E: 600.00 57.01 Backups 4,096
Z: 152.57 90.48 Temp Storage 4,096
Database Information
Database Owner Compatibility Level Recovery Last Known DBCC Last Full Backup
Name Model Date
master sa SQL Server 2008 (100) Simple 02/14/2016 1:00:08 AM
model sa SQL Server 2008 (100) Full 02/14/2016 1:00:09 AM
msdb sa SQL Server 2008 (100) Simple 02/14/2016 1:00:11 AM
PICARD_db DEN_DATANOMIX\ckasten SQL Server 2005 (90) Full 07/27/2014 7:59:39 02/07/2016 5:09:49 AM
PM
tempdb sa SQL Server 2008 (100) Simple
Database Name Data File Log File Data File Size Log File Size Available Data Space (MB)
Count Count (MB) (MB)
master 1 1 4.00 0.75 1.06
model 1 1 2.25 2.00 0.94
msdb 1 1 35.06 31.69 3.69
PICARD_db 1 1 201,608.31 22,250.50 7,451.81
tempdb 8 1 32,768.00 1.50 32,391.81
Host performance indicates no overall stress from a server level. Overall low CPU usage (average 8.58%)
indicates that the server is not under any CPU stress. Available memory averages at almost 2GB and does
not fall below 1GB, showing that the server itself is not being starved for memory and has available
resources for the operating system and other non-SQL Server functions. The low use of the paging file is
evidence that the applications are not having to swap memory to disk, which is a key indicator of
application memory pressure.
The disk subsystem’s latency is within Microsoft’s recommendations for operation. Typically, it is expected
that read operations should be less than 20 milliseconds and write operations be less than 10 milliseconds.
The maximum values for these metrics indicate periodic spikes. For volumes hosting database files, this
could be an indicator of specific queries that require tuning or a result of contention from the database
log and data files being on the same volume.
In general, SQL Server is operating as expected in relation to internal processing. Query Compilations and
Re-Compilations fall within expected ratios compared to the number of Batch Requests per second. The
Page Life Expectancy counter is and overall indication of SQL Server memory efficiency. It is good to
differentiate PLE for 7am-6pm on the server (business hours) versus the 24 hour average. In general, the
trend for PLE is to average 4496 during the day, but when the server enters its low use timeframe, the PLE
climbs steadily.
SQL Server operates on a "waits and queues" system. This means that when tasks within the engine must
"wait" for a resource to become available in order to complete, they are assigned a "queue" until that
resource becomes available. These waits are tracked and can indicate what resources within the SQL
Server instance are subject to performance issues.
77.40%
15.18%
2.73% 2.52% 2.16%
SQL queries can perform poorly for a variety of reasons. Some factors are how they are written,
inadequate hardware, or missing indexes to support the workload. It is important to identify the queries
that are performing poorly based on:
Number of Executions
CPU cost
Disk Impact (Reads and Writes)
Duration
This is a listing of the most expensive queries currently running in the SG Heavy Industries environment,
based on these factors.
{CURSOR} SELECT * FROM (SELECT DISTINCT ROW_NUMBER() OVER 1 1,357 184,378 99 15,742,859
(ORDER BY LNAME, FNAME ASC) AS ROWNUMBE…
SQL Server tracks individual query statistics as they are executed. From these statistics, it compiles a list
of suggested indexes that could help query processing. Listed here is the top selection of these
suggestions. It should be noted that implementation of these indexes should be reviewed and tested for
impact, as SQL Server cannot take all factors of implementing these indexes into account.
Duplicate Indexes
There are times when an index is created with the same leading keys as another index. When indexes
have the same key columns in the same order, they are duplicates. Additionally, indexes that share the
same leading column order but do not share the same total columns are considered overlapping and could
possibly be consolidated. Listed here are indexes that can be considered partial duplicates.
The following findings are category specific and include detailed descriptions:
Configuration
The PICARD_db is running at compatibility level 90, meaning that the database is using the rules and
features of SQL 2005. As this is not the current version of the SQL Server instance that is hosting the
database, it adds additional overhead to process queries due to the compatibility translation. Additionally,
it means your database can not take advantage of newer features available in SQL Server. It is a low risk
effort to upgrade the compatibility level, which will provide a fundamental support and performance
improvement.
The PICARD_db and tempdb databases have several files that are configured to grow by a percent. Use of
percentage growth settings for SQL Server can lead to unpredictable growth of the database files. There
can also be a performance impact, as SQL Server queries will wait when a file needs to be grown. It is
recommended that explicit file growth be set relevant to the size of the database.
Action: Set explicit filegrowth settings for data and log files
Action: Ensure TempDB filegrowth settings are the same for all data files.
The SQL Server error log is a key resource for identifying and troubleshooting issues in the instance. To
make it easier to spot when problems are occurring, it should be kept free of non-critical messages as
much as possible. By default, every successful backup operation adds an entry in the SQL Server error log
and in the system event log. If you create very frequent log backups, these success messages accumulate
quickly, resulting in huge error logs in which finding other messages is problematic.
By enabling trace flag 3226, you can suppress these log entries. This is useful if you are running frequent
log backups and if none of your scripts depend on those entries.
Action: Enable trace flag 3226 to suppress successful log backup messages from the SQL Server
error log.
The PICARD_db data and transaction log files are housed on the same disk volume (D:). In a typical OLTP
environment the IO profile of a data file is composed of random IOs, split relatively evenly between read
and write operations. The IO of the transaction log file, on the other hand, is predominantly sequential
write operations. Housing data and transaction files on the same physical disk generally leads to IO
contention and a degradation in write performance to the transaction log file. Because of the critical role
the transaction log plays in overall database performance, best practices from Microsoft recommend
separating out this traffic between multiple volumes.
Currently the D drive (PICARD_db data and transaction log files) is nonaligned with a starting offset of
32KB. Additionally, the Z drive (TempDB data and transaction log files) is formatted with a 4KB allocation
unit size.
All database LUNs should have the allocation unit size (partition cluster size) set to 64Kb (65,536 bytes)
for all partitions. Doing so will improve sequential write performance at a minimum and may reduce the
split I/O (inefficient I/O) count. If partitions are backed up, destroyed, and then recreated on a Windows
Server 2008 (or higher) system, the OS will automatically align the partitions to 1024KB. However, the
cluster size for the partition will still need to be manually set to 64Kb (65,536 bytes). Note: this step
requires new LUNs to be created in advance.
Action: Move data files and tempdb to volumes that are properly aligned to 1024KB and
configured with a 64KB allocation unit size.
The PICARD_db log is comprised of 892 Virtual Log Files (VLFs). When the transaction log grows (either
manually or via auto-growth), the new space allocated for the log is made up of virtual logs. When the log
grows frequently, a large number of virtual logs can exist, which can cause a performance issue during
startup, restores, and even inserts/updates/deletes. It is recommended to backup the log and then shrink
it to reduce the number of VLFs. Then, grow the file to the appropriate size in 8000MB increments. The
“appropriate size” for the transaction log will depend on the level of activity on the database and how
frequently the log is being backed up.
Action: Perform a transaction log backup, shrink the transaction log file to 0, and manually grow
the file to the appropriate size. For log files larger than 8GB, grow the file in 8000MB
increments.
As data is inserted into tables and indexes, SQL Server creates additional pages to store this data in. These
new pages are not always efficiently stored for access, creating fragmentation in the database. Over time,
this fragmentation can have an impact on performance. Microsoft best practices are to run regular index
maintenance operations to rebuild or reorganize fragmented indexes. A maintenance job to reorganize
all indexes in the PICARD_db database exists, however it has not been executed since July of 2015.
There are 2 ways of defragmenting indexes in SQL Server, rebuilding and reorganizing. The index rebuild
process is highly IO intensive and generates a large amount of log activity. In addition, when an index is
rebuilt offline, as is the case in the SG Heavy Industries environment, it will also lock the underlying table
for the duration of the rebuild process. Performing this operation during peak periods can cause long-
term blocking and deadlocks, and be highly disruptive to transactional processing. However, rebuilding
an index is the most effective way to correct high-level fragmentation. Index reorganization is an online
and relatively low-impact operation which is appropriate for low levels of fragmentation.
Because of the potential overhead, determining which indexes are rebuilt and when should be carefully
considered. The level of fragmentation in an index should always dictate whether a rebuild is necessary.
Microsoft recommends that indexes with a fragmentation level > 5% and <= 30% be reorganized. Indexes
more than 30% fragmented should be rebuilt. Bear in mind that these values are general
recommendations. Depending on the size of the objects or the time available in your maintenance
window, you may wish to raise those thresholds in order to be more selective in the rebuilds being
performed. Note that smaller indexes (1000 pages or less) and indexes with very low levels of
fragmentation (<=5%) should not be addressed, as they generally do not see a benefit from rebuilding.
Count of Indexes
59
39
However, it is the accuracy of statistics, rather than the level of index fragmentation, which has the biggest
impact on query performance. SQL Server relies heavily on statistics regarding the data distribution in a
table to form an optimal query plan. Outdated statistics lead to poor performance. To handle this, we
would recommend creating a job to update statistics on a daily basis. Updating statistics is a low-impact,
Action: Create and schedule a weekly job to reorganize indexes with >5% and <=30%
fragmentation, and rebuild indexes with > 30% fragmentation. Limit the procedure to indexes
with more than 1000 pages (not based on row count).
Action: Create and schedule a daily job to update modified index and column statistics with a
100% sample.
CXPACKET accounts for 77.4% of In general, CXPACKET indicates a heavy use of parallelism in queries
running on the SQL Server. This could be the result of several different factors, including (but not limited
to) improper parallelism configurations, inefficient queries, and a lack of indexing within the database.
There is no single answer to addressing this issue, so recommended actions will cover several options.
Numerous high-read and high-duration queries captured during the assessment included a search
predicate on the account_rqst_id. Creating an index on this column will help to eliminate full scans on
the dashboard table or related indexes and facilitate seek operations instead, thereby dramatically
reducing the IO incurred by each execution of these queries. It is notable that this is also the most
recommended index based on SQL Server’s internal tracking mechanism.
The three queries with the highest average and total read operations captured over an 8-hour period
during the assessment are shown in the table below.
It appears that a cursor is being used to repeatedly read each of these tables, causing a large amount of
read IO. Wherever possible, it is recommended to select only the columns needed to satisfy the
applications needs, and to include a predicate to limit the number of rows returned, thereby minimizing
IO operations.
Action: Review code to ensure only relevant data is being retrieved to avoid unnecessary IO.
Disaster Recovery
Database integrity checks (commonly referred to as CHECKDB), check the logical and physical structures
of a database for possible signs of corruption. The earlier corruption is detected, the greater the chance
of recovering from that corruption with minimal data loss. Therefore, it is critical to perform integrity
checks on all databases, including system databases, on a regular basis. The amount of I/O generated by
an integrity check can be extremely high (the entire database must be read into memory) and it is
recommended that these checks be performed during off hours, at least once a week.
Action: Create a job or set of jobs to check database integrity on a daily or weekly basis.
SQL Server uses several internal mechanisms to validate the data stored on disk. The Page Verify setting
defines how SQL Server will check for both logical and physical corruption within the data pages. When
set to NONE, this means SQL Server is not performing any of these checks. Changing this is low risk and
gives SQL Server the ability to detect and alert for corruption.
By storing the system databases on the OS volume, it is possible that SQL Server will halt if the OS drive
fills up. A secondary risk is any I/O operations done by the OS will also contend with these system
databases. It is a best practice to place the system databases on to another volume, either their own or
the volumes for the user database files.
Having the default backup path targeting the OS volume introduces the risk that a backup could
accidentally be made to that volume. Database backups vary in size, but a large enough backup would fill
up the OS volume and potentially cause SQL Server to halt. It is recommended that the default backup
drive be set to the current backup volume for the instance.
Security
At the time of the assessment every login on the ENTERPRISE instance was a member of the sysadmin
server role and the corresponding user was a member of the db_owner role in the PICARD_db database.
This means that every connection, including those from the web application, has the ability to perform
any operation in the SQL Server instance.
Action: Review existing logins and users to determine if the level of access currently assigned is
appropriate.
The BUILTIN\Administrators group is a login group that allows local Windows administrators on the host
server access. The group has sysadmin level access in SQL Server, meaning that anyone within the local
Windows Administrators group will have elevated SQL Server privileges. This is a concern because it makes
the separation of access and control more difficult. It is recommended that this group be removed and
sysadmin access be granted via another mechanism for better control.
Action: Remove the BUILTIN\Administrators group and replace it with an Active Directory group
dedicated for SQL Server administrators.
Database ownership grants complete control to a database and the objects within it. Unless specifically
required, having a database owned by an individual login means that it cannot be secured completely
against the login that owns it. Best practice is to set database ownership to SA within SQL Server.
SQL Agent Jobs can only be edited and properly managed by either the logins that own them or logins
with sysadmin level privileges. Because of this, it is generally recommended to set the ownership of Agent
Jobs to SA. This allows for more consistent job management.
Extended stored procedures are a set of code within SQL Server that allows for it to interact with
components of the Windows environment outside of the database. The procedure xp_cmdshell allows
DOS commands to be run from SQL Server. The issue is that these commands have the potential to
execute as the SQL Server service account, which typically has elevated privileges in the OS. Because of
this, it is considered a security risk. It is recommended that, unless there is a specific need for this
functionality, it be disabled.
Additional Concerns
Finding: SQL Server 2008 R2 is not at the current service pack level
The SQL Server 2008 R2 installation is at Service Pack 2, Cumulative Update 13. Service Pack 3 is currently
available. Service Packs provide all the combined updates as well as additional functionality
improvements. In general, it is recommended to have the most recent Service Pack installed.
The performance metrics gathered during the assessment period indicate adequate processor and
memory resources to support the current workload. While occasional spikes were recorded, disk read
and write latency metrics, on average, were also within recommended parameters. It is believed that the
recommendations put forward in this report will further improve performance and concurrency within
the ENTERPRISE instance.
UpSearch provided the following rough order of magnitude time estimates to complete the
aforementioned recommendations. The preceding time estimates by recommendation are provided as a
courtesy and do not constitute a formal Statement of Work.
Update SQL Server Configurations (backup locations, trace flags, parallelism settings)
Estimate: 1 hour
Update Database Configurations (compatibility levels, database file growth settings, database
ownership, page verify settings)
Estimate: 2 hours
Install, configure, and validate database maintenance scripts (index maintenance, integrity
checks).
Estimate: 10 hours
Publications
What Changed? Auditing Solutions in SQL Server, Tribal SQL: New voices in SQL Server, Chapter
7 (http://www.amazon.com/Tribal-SQL-Tony-Davis-ebook/dp/B00H3JP4R0)
Colleen Morrow has presented at PASS Summit 2015, Enterprise Auditing with SQL Server Audit
(Enterprise Database Administration & Deployment), PASS Summit 2014, SQL Audit: from Introduction
to Automation (Enterprise Database Administration & Deployment), and at many SQL Saturday
conferences and user groups throughout the United States.
Since 2012, David has presented sessions on SQL Server maintenance, performance and database
corruption at numerous SQL Saturday events throughout the Midwest, in addition to being a presenter
and director for his local Columbus, OH based PASS chapter. David has also presented for several online
events such as the PASS DBA Fundamentals and Performance Virtual Chapters, and 24 Hours of PASS. He
is also the winner of the PASS Summit 2015 Speaker Idol competition.
Publications
Mike has presented at PASS Summit 2015, Powershell and the Art of SQL Server Deployment, IT/Dev
Connections 2015, The Scalable SQL Server Enterprise, PASS Summit 2013, Monitoring Methodologies –
The Hierarchy of Database Needs, and at many SQL Saturday conferences and user groups throughout the
United States.
SQL Server MVP Deep Dives, Vol. 2, Chapter 26 – SQL Server Filestream : To Blob or not to Blob
PowerShell Deep Dives, Chapter 23 – SQL Server Provider in PowerShell
SQL Server PowerShell Stairway, SQL Server Central
Ben has presented at many SQL Saturday conferences throughout the United States. He helped plan and
organize Salt Lake City SQL Saturday events in 2012 and 2014.
UpSearch’s Block of Hours Model provides support against a pre-negotiated block of hours. The
larger your annual commitment, the lower the bill rate. Our clients typically assemble twelve
months’ worth of projects and support to lock-in the best bill rate.
Our team will complement your team across a broad spectrum of database initiatives, such as:
Migrations to Azure
Health Checks and Baseline Assessments
High Availability & Disaster Recovery
Database Design and Development
DevOps and Data Management
Performance Analysis and Tuning
Business Intelligence
Conversions to SQL Server
Server Consolidations / Upgrades
Virtualization
On-premises or in the cloud, UpSearch can help you protect, unlock and optimize your data’s value.