Redgate SQL Server Tacklebox Ebook
Redgate SQL Server Tacklebox Ebook
SQL Server
Tacklebox
Essential Tools and Scripts for the day-to-day DBA
Rodney Landrum
ISBN: 978-1-906434-24-3
SQL Server
Tacklebox
By Rodney Landrum
ISBN 978-1-906434-24-3
The right of Rodney Landrum to be identified as the author of this work has been asserted by
him in accordance with the Copyright, Designs and Patents Act 1988
All rights reserved. No part of this publication may be reproduced, stored or introduced into
a retrieval system, or transmitted, in any form, or by any means (electronic, mechanical,
photocopying, recording or otherwise) without the prior written consent of the publisher.
Any person who does any unauthorized act in relation to this publication may be liable to
criminal prosecution and civil claims for damages.
This book is sold subject to the condition that it shall not, by way of trade or otherwise, be
lent, re-sold, hired out, or otherwise circulated without the publisher's prior consent in any
form other than which it is published and without a similar condition including this condition
being imposed on the subsequent publisher.
Bon Appétit.............................................................................................. 31
Summary................................................................................................. 59
SSIS ........................................................................................................ 72
Summary................................................................................................. 93
Chapter 4: Managing data growth ............................. 94
Common causes of space issues............................................................95
TempDB ................................................................................................115
Summary ...............................................................................................122
Summary ...............................................................................................143
Performance issues...............................................................................159
Summary ...............................................................................................168
Summary............................................................................................... 202
Summary............................................................................................... 228
ABOUT THE AUTHOR
Rodney Landrum has been working with SQL Server technologies for
longer than he can remember (he turned 40 in May of 2009, so his memory
is going). He writes regularly about many SQL Server technologies,
including Integration Services, Analysis Services, and Reporting Services. He
has authored three books on Reporting Services. He is a regular contributor
to SQL Server Magazine and Simple-Talk, the latter of which he sporadically
blogs on about SQL and his plethora of geek tattoos. His day job finds him
overseeing the health and well-being of a large SQL Server infrastructure in
Pensacola, Florida. He swears he owns the expression "Working with
Databases on a Day to Day Basis" and anyone who disagrees is itching to
arm wrestle. Rodney is also a SQL Server MVP.
vi
ACKNOWLEDGEMENTS
I would like to thank everyone involved in the making of this book,
peripherally and personally, but first and foremost Karla…my love, who has
been with me, spurred me on and understood when I needed a fishing or
beer respite through 5 books now. I love you.
To all my kids who also sacrificed during the writing of this book. Megan,
Ethan, Brendan and Taylor. Well, OK, Ethan did not sacrifice so much, but
he did help me understand that "Buffalo buffalo Buffalo buffalo buffalo
buffalo Buffalo buffalo" is a legitimate sentence.
Thanks to my Mom and Dad, as always. I love you. There will still be a
novel, I promise; just not a Western. Sorry, Mom.
Thanks also to Shawn McGehee, my good friend and DBA colleague, who
tech-edited the book. It is much better for it. Also, thanks Shawn, for letting
me use snippets of your hard-won code as well.
Special thanks also go to Truett Woods who has opened my eyes in a lot of
ways to good coding practices, and for the use of one of his base code
queries in Chapter 1.
Thanks to Joe Healy of devfish fame, a straight up bud whose .Net
tacklebox is more full than mine. I will be getting the devfish tattoo next.
Finally, I would personally like to thank Throwing Muses, The Pixies and
Primus for providing the music that helped me through the many late
nights. OK, so they will never read this and offer to come over to play a set
at a backyard BBQ, I know, but one can hope.
vii
INTRODUCTION
This book, as with almost all books, started out as an idea. I wanted to accumulate
together those scripts and tools that I have built over the years so that DBAs
could sort through them and perhaps adapt them for their own circumstances. I
know that not every script herein will be useful, and that you might ask "Why are
you using this table and not that DMV" or "Why is this code not compatible with
SQL Server 2008?" After writing about SQL Server solutions now for the past 10
years, I can expect this criticism, and understand it fully. Everyone has their own
ways of solving problems. My goal is to provide useful samples that you can
modify as you please.
I wrote the book the way it is because I did not want to bore DBAs to tears with
another 500+ page textbook-style tome with step-by-step instructions. I wanted
this book to be a novel, a book of poetry, a murder mystery, a ghost story, an epic
trilogy, a divine comedy. But realizing that this is, after all, a technical book, I
compromised by imbuing it with some humor and personality. If you make it as
far as the monster at the end of this book, my hope is that you will have been
entertained and can use the code from the Tacklebox in some fashion that will
make your lives as DBAs easier. Why "The Tacklebox," you might ask, rather than
"Zombie Queries," "You Can’t Handle the Code" or "You had me at BEGIN?" I
think, as I push halfway through my career as a DBA and author, this book is as
close as I will ever get to "The Old Man and the Sea"…oh yes, and apparently the
"Toolbox" had been copyrighted. Plus, come on! Look at the cover of the book.
How can I live here and not go fishing once in a while?
Chapter 1
Here you will find wholesome SQL Server installations on the menu, complete
with Express, Continental and Deluxe breakfast choices, depending on your
application’s appetite. And there will be a little GUI setup support here. This
chapter is about automation, and a lengthy script is included that will help you
automate SQL installations and configurations. There is some foreshadowing
lurking as well, such as code to enable a DDL trigger that I will show later in the
book. This is the chapter where your new SQL Server installation is completely
yours, having not as yet been turned over to the general populace of developers or
users. Enjoy it while you can.
Chapter 2
In this chapter, I introduce the DBA Repository, a documentation tool I have
built using Integration Services and Reporting Services. It is easy to manage one,
viii
Introduction
two or three SQL Server instances with the panoramic view the tool gives. It is
even easy to work with ten SQL Servers without documentation, but when you
have 70 or 100 or 2,000 SQL Servers without an automated documentation
solution, you cannot successfully manage your SQL Server Landscape – ironically,
that is the name of the chapter, "The SQL Server Landscape."
Chapter 3
I think we can all agree that data at rest never stays that way. No, far from it. The
data in this chapter has begun the swim up river to its spawning grounds and will
migrate and transform like the intrepid salmon (hey, a fishing reference) from the
open ocean, to river, to stream. Here, I look at different ways that data moves
about, and I investigate tools such as SSIS and BCP that help facilitate such
moves, whatever the reason, be it high availability, disaster recovery or offloaded
reporting.
Chapter 4
In this chapter, I describe one of the first hungry monsters of the book, the disk-
space consuming databases. The hunger may not be abated entirely, but it can be
controlled with proper planning and also with queries that will help you to
respond to runaway growth. Here, I will show how to battle the appetite of space-
killers with just a bit of planning, tempered with an understanding of how and why
data and log files grow to consume entire disks.
Chapter 5
There is a murder in this chapter. Someone or something is killed and most likely
you, the DBA, will be the lone killer. Of course, I am talking about processes,
SPIDs, that we see every day. Some are easier to kill than others. Some will just
not die and I will explain why, using ridiculous code that I hope you never see in
real life. The queries here were designed to help you get to the bottom of any issue
as quickly as possible without the need for DNA testing. And I am not talking
about Windows DNA, if you are old like me and remember this acronym for
Distributed Networking Architecture, precursor to .NET. No, no .NET here.
Chapter 6
To sleep...perchance to dream…about failures. Here, I will introduce the sleep
killer of DBAs everywhere, where jobs fail in the hours of darkness, and the on-
call DBA is awakened from slumber several times a night in order to sway,
zombie-like to the computer to restart failed backup jobs. You cannot resolve an
issue unless you know about it and here, while discussing notifications and
monitoring your SQL Server infrastructure, we will stay up late and tell horror
stories. But, in the end, you will sleep better knowing all is well.
ix
Introduction
Chapter 7
Surely, like me, you are afraid of break-ins. I would not like to come home and
find my things strewn about, wondering what all was taken. If you work for a large
company that is regulated by one or more government mandates, like HIPAA or
SOX (Sarbanes-Oxley) you cannot afford to be without a security monitoring
solution. Here, I will introduce a set of scripts to show how to analyze login
accounts, the first barrier between the outside world and your data. I will also give
pointers to other solutions that will help you track down potential breaches in
your outer wall defenses.
Chapter 8
In this chapter I will unveil the monster. It is the Data Corruption beast. Despite
advances in hardware, the number one cause of corruption, it does still exist. You
will need to first hunt out corruption before you can slay it. And you need to find
it early in its lair so as to not spread the corruption to backup files. Here, I will
intentionally, though begrudgingly, corrupt a database and show some of the ways
to discover and fix it, emphasizing the need for a solid backup strategy.
Code Download
All of the scripts provided in this 'tacklebox', as well as the full DBA Repository
solution, presented in Chapter 2, are available to download from:
http://www.simple‐talk.com/RedGateBooks/
RodneyLandrum/SQL_Server_Tacklebox_Code.zip.
x
CHAPTER 1: EATING SQL SERVER
INSTALLATIONS FOR BREAKFAST
For many DBAs, choosing an appropriate SQL Server installation is a lot like
ordering breakfast at a diner: there is something to suit all appetites, tastes and
budgets, and the range of choices can often be mind-boggling. A sample SQL
Server breakfast menu might look something like this:
The Express Breakfast (For the cost-conscious)
• 1 SQL Server Express on top of a Windows XP Professional
• 1 large hard drive
• 2 Gig of RAM
The Continental (Enough to hold you over for a while)
• 1 SQL Server Standard Edition 32 bit on Windows Server 2003 Standard
• 1 instance of Reporting Server
• 1 instance of Analysis Server
• 250 Gig RAID 5 Disk Subsystem
• 4 Gigs of RAM
The Deluxe (When cost is no barrier to hunger)
• 1 SQL Server Enterprise Edition 64 bit Clustered
• 2 Windows Server 2003 Enterprise Edition servers
• 1 RAID 10 1TB Data Partition
• 1 RAID 10 200G Log Partition
• 1 RAID 0 100G TempDB Partition
• 64 G of RAM
It is the DBA's task to choose the SQL Server configuration that most accurately
reflects the specific needs of a given project, whether it is for cost-conscious
deployments, high availability, data analysis or high-performing online transactions
for a critical business application. In this chapter, I will investigate the myriad
available installation options and propose a custom post-installation script with
which to automate the most common configurations.
11
1 – Eating SQL Server installations for breakfast
12
1 – Eating SQL Server installations for breakfast
RAM
SQL Server, like any other application, is going to use memory. RAM, like CPU
and disk arrays, comes at a cost and how much RAM you are going to need
depends on several factors. If you know that you will have 250 users connected
simultaneously to a 200 Gigabyte database, then 4G of RAM on SQL Standard
32-bit and Windows 2003 is not going to be enough.
Without wishing to be overly formulaic, I will say that connections come at a
RAM cost. More connections equals more RAM. SQL Server will run comfortably
in 1G of memory, but you would not want to roll out a production server on that
amount. Ironically, one of the most important factors to consider is one that a
DBA has very little control over: what application is going to access your beloved
database objects? Is it "homegrown" or "third-party"? This is an important
question because, if you do not "own" the database schemas, you could find
yourself unable to employ basic performance tuning techniques, like adding
indexes. In these situations, you are at the mercy of the vendor, whose
recommendation is often to "add more RAM," even when all that is required is to
add an overlooked index.
At this planning stage, it is always safer to overestimate the need for memory. Get
as much as you "think" you will need, and more if performance outweighs cost,
which it should. Note though, that buying the additional physical RAM is not the
only cost and is seldom the cure. You will also have to purchase software that can
support the additional RAM and this might mean, for example, buying Windows
Server Enterprise instead of Standard Edition.
CPU
Specifying processor requirements for a SQL Server is a slightly more complex
task than specifying RAM. For one thing, software such as SQL Server can be
licensed per processor. This can be quite expensive. As a DBA, you must
understand the difference between the different processor types. For example, is
the processor 32- or 64-bit architecture? Is it single-core or multi-core, meaning
you can gain additional virtual processors with one physical processor? If it is
multi-core, is it dual-core or quad-core, or octa-core? I'm not even sure if that last
one exists yet, but it probably will soon.
Why is it important to know the answer to all these questions? Well, you do not
want to be the DBA who relays to your boss that your new 2-proc, quad-core
SQL Server is going to require 8 "per proc" licenses when, in fact, it will only
require 1 license per "physical" processor, even if SQL Server will use all 8 cores.
13
1 – Eating SQL Server installations for breakfast
Disk subsystem
The choice of disk subsystem is the most difficult pre-installation hardware
decision that the DBA has to make. There are just so many options. Fortunately,
you have put together the documentation for your SQL Server infrastructure that
will help you narrow down the choices, right? You know, for example, that your
performance requirements dictate that you are going to need RAID 1-0 data and
log partitions, with a separate volume allocated for TempDB split across as many
spindles as possible.
OK, so you don't have that document; not a big deal. If you are able to at least
have a RAID 5 configuration then you are off to a good start. It is also worth
noting that if you are installing SQL Server in a clustered environment, you will
need to have a shared disk array, typically implemented via a Storage Area
Network (SAN) back end.
Free tools are available for you to stress test the disk subsystem of the SQL Server
installation, prior to moving it to production. One such tool is SQLIO, provided
by Microsoft:
http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-
4f24-8d65-cb53442d9e19&displaylang=en
14
1 – Eating SQL Server installations for breakfast
Installing SQL Server is, at best, a mundane task. If you do it twice a month then
it is probably OK to simply springboard through the GUI installation wizard,
manually choosing, clicking, and typing your way to a successful install. However,
for me and many other DBAs, standardization and automation are important. A
constant theme of this book is that whenever a task can be simplified, automated
and repeated, you should make it so.
Installation is no exception. I need a standard install process that can be controlled
programmatically, in order to eradicate costly mistakes. As such, I want to avoid
the GUI-driven installation altogether. Fortunately, Microsoft continues to
support command line installs and that is what I will be demonstrating here: how
to automate the installation with setup options, from the command line.
I'll begin by examining some of the installation options available for SQL Server
2008. There are many optional parameters that I'll ignore, but several that are
required. I'll show how to string together the command line and execute it on your
new server. When it is done, assuming there are no errors, you will be ready for
the real fun. If there are errors, then refer to my previous comment about the 2
years spent in Help Desk. They will stand you in good stead, as you will need
every ounce of perseverance to solve any issues. I have read volumes in the
various SQL Server forums on installation errors and how to overcome them.
However, let's assume, as is typical, that there will be no errors.
To get a full list of all of the available command line installation options for SQL
Server 2008, including the valuable samples, simply run Setup /?, as shown in
Figure 1.1.
16
1 – Eating SQL Server installations for breakfast
Figure 1.2 shows the different, less friendly outcome performing the same step for
SQL Server 2005.
17
1 – Eating SQL Server installations for breakfast
NOTE
Starting with service pack 1 for SQL Server 2008, you can now "slipstream"
service packs for SQL Server, much like you can do for Windows service packs.
See http://msdn.microsoft.com/en-us/library/dd638062.aspx#Slipstream for
further details.
18
1 – Eating SQL Server installations for breakfast
19
1 – Eating SQL Server installations for breakfast
you do not enable it for 32-bit installations of SQL Server, on Windows 2003
Enterprise, you will not use the memory that you may think you should be using;
SQL Server will live within the 2G memory range to which 32-bit applications are
generally relegated.
However, there are also options that you will not want to change. Two of these
options are "priority boost" and "lightweight pooling". Changes to these options
are typically done only with affirmation from Microsoft Support that it will help
and not hinder your environment. In general, please do not change a configuration
unless you have thoroughly tested it.
20
1 – Eating SQL Server installations for breakfast
• DDL Triggers
• Add Server Trigger to notify upon database create or drop
• Security
• Set to Log Successful and Failed logins
• DB Maintenance Database
• Create the _DBAMain database
• Create the stored procedures in the _DBAMain database
• Create and Schedule Maintenance Jobs via stored procedures
• Other Modifications
• Change Model Database Options.
Listing 1.2 displays the actual T-SQL automation script to implement the above
steps, which you can execute against your newly installed SQL Server instance. It
is documented at stages to distinguish between server, database and custom
additions.
/* SQL Server Automated Configuration Script
2009 - Rodney Landrum
*/
SET NOCOUNT ON
GO
GO
21
1 – Eating SQL Server installations for breakfast
ELSE
IF @PhysMem > 4096 AND @ProcType <> 8664
BEGIN
SET @MaxMem = @PhysMem - 3072
EXEC sp_configure 'awe enabled', 1
Reconfigure
EXEC sp_configure 'max server memory', @MaxMem
Reconfigure
END
-- Add Profile
EXECUTE msdb.dbo.sysmail_add_profile_sp
@profile_name = 'Admin Profile',
@description = 'Mail Profile For Alerts' ;
EXECUTE msdb.dbo.sysmail_add_account_sp
22
1 – Eating SQL Server installations for breakfast
EXECUTE msdb.dbo.sysmail_add_profileaccount_sp
@profile_name = 'Admin Profile',
@account_name = 'Admin Account',
@sequence_number = 1 ;
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'Admin Profile',
@recipients = '<Your DBA e-mail Account>,
@body = 'Sever Mail Configuration Completed,
@subject = 'Successful Mail Test;
END
ELSE
BEGIN
PRINT 'For SQL Server 2000, you will need to
configure a MAPI client'
PRINT 'such as Outlook and create a profile to use
for SQL Mail and SQL Agent'
PRINT 'mail. Instructions can be found
at:______________________________'
END
USE [master]
GO
23
1 – Eating SQL Server installations for breakfast
/*
Run Script To Create Stored Procedures
In _DBAMain
*/
sp_configure 'xp_cmdshell', 1
Reconfigure
/*
Usage:
spxCreateIDXMaintenanceJob
'Owner Name'
, 'Operator'
, 'Sunday'
, 0
*/
Create Procedure
[dbo].[spxCreateIDXMaintenanceJob]
(
@JobOwner nvarchar(75)
, @ValidOperator nvarchar(50)
, @DayToReindex nvarchar(8)
, @NightlyStartTime int --230000 (11pm), 0 (12am), 120000
(12pm)
)
As
BEGIN TRANSACTION
24
1 – Eating SQL Server installations for breakfast
DECLARE
@ReturnCode INT
, @jobId BINARY(16)
, @MyServer nvarchar(75)
, @SQL nvarchar(4000)
, @CR nvarchar(2)
SELECT
@ReturnCode = 0
, @CR = char(13) + char(10)
IF NOT EXISTS (
SELECT
name
FROM
msdb.dbo.syscategories
WHERE
name = N'Database Maintenance'
AND
category_class = 1
)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category
@class = N'JOB'
, @type = N'LOCAL'
, @name = N'Database Maintenance'
IF
@@ERROR <> 0
OR
@ReturnCode <> 0
Begin
GOTO QuitWithRollback
End
END
IF EXISTS (
SELECT
name
FROM
msdb.dbo.sysjobs
WHERE
name = N'IDX Maintenance'
AND
category_id = (
Select
category_id
From
msdb.dbo.syscategories
Where
name = 'Database Maintenance'
25
1 – Eating SQL Server installations for breakfast
)
)
Begin
Exec msdb.dbo.sp_delete_job
@job_name = 'IDX Maintenance'
End
IF
@@ERROR <> 0
OR
@ReturnCode <> 0
Begin
GOTO QuitWithRollback
End
IF
@@ERROR <> 0
OR
@ReturnCode <> 0
26
1 – Eating SQL Server installations for breakfast
Begin
GOTO QuitWithRollback
End
IF
@@ERROR <> 0
OR
@ReturnCode <> 0
Begin
GOTO QuitWithRollback
End
IF
@@ERROR <> 0
OR
@ReturnCode <> 0
Begin
GOTO QuitWithRollback
End
IF
@@ERROR <> 0
OR
@ReturnCode <> 0
Begin
GOTO QuitWithRollback
27
1 – Eating SQL Server installations for breakfast
End
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF @@TRANCOUNT > 0
Begin
ROLLBACK TRANSACTION
End
EndSave:
GO
EXEC _dbaMain..spxCreateIDXMaintenanceJob
'sa'
, 'sqlsupport'
, 'Sunday'
, 0
SET QUOTED_IDENTIFIER ON
GO
28
1 – Eating SQL Server installations for breakfast
29
1 – Eating SQL Server installations for breakfast
+ 'Server Name:
' + ISNULL(@serverName, 'No Server Given') +
CHAR(13)
+ 'Login Name: '
+ ISNULL(@loginName, 'No LOGIN Given') +
CHAR(13)
+ 'User Name: '
+ ISNULL(@username, 'No User Name Given') +
CHAR(13)
+ 'DB Name: '
+ ISNULL(@databaseName, 'No Database Given') +
CHAR(13)
+ 'Object Name: '
+ ISNULL(@objectName, 'No Object Given') +
CHAR(13)
+ 'Object Type: '
+ ISNULL(@objectType, 'No Type Given') +
CHAR(13)
+ '-------------------------------------------';
GO
sp_configure 'xp_cmdshell', 0
reconfigure
30
1 – Eating SQL Server installations for breakfast
-- End Script
PRINT 'All Done...Add Server to DBA Repository for further
documentation'
Using the above script you will, in about 3 seconds, have configured many options
that might have taken 30 minutes to do manually. Without such a script it is very
easy to miss an important configuration such as setting the model database to
"simple" recovery mode.
This script is a mere sampling of what you can control and automate, prior to
releasing the server into the wild. As we proceed through the rest of the book, I
will demonstrate many more scripts that can be used to make your life easier,
freeing up more of your time to write or extend your own scripts and then give
them back to me so I can use them. Ha!
Bon Appétit
Just because your server installation is now complete, and is stepping out into the
real world to be eaten alive by various applications, it is by no means out of your
hands. No, now you have the task of protecting it. Every day. The first step
toward that goal is to make sure you monitor, maintain and document the server
during the course of its life.
Documenting will be the focus of Chapter 2, where I will introduce you to the
DBA Repository, a tool that incorporates the combined reporting and data
migration strengths of SQL Server Reporting Services and SQL Server Integration
Services. It is within the DBA Repository that you will truly come to know your
servers.
31
CHAPTER 2: THE SQL SERVER
LANDSCAPE
I started my DBA career working for a small software development company,
where I had a handful of SQL Servers to administer. As in many small companies,
the term "DBA" was interpreted rather loosely. I was also, as required, a Windows
server administrator, network engineer, developer, and technical support
representative. I divided my day equally between these tasks and only actually
spent a fifth of my professional time managing the SQL infrastructure.
When I moved on to a much larger organization, I found that my first days, as a
DBA managing nearly 100 SQL Servers, were daunting, to the say the least. I was
astounded by the lack of documentation! Some fragmented efforts had been made
to pull together information about the SQL infrastructure, but it was sparse. As a
result, my first week found me manually and hastily clicking through server
property windows, perusing SQL Server error logs, pouring over reams of stored
procedure code, sifting through SQL Agent job failures on each server, and
generally just floundering about picking up whatever tidbits of information I
could.
I recall feeling very tired that first weekend and, to add insult to injury, also as
though I had accomplished very little. With the wonderful benefit of hindsight
that I have while writing this book, I can say that what I really needed in those
early weeks was a "documentation tool" that would have allowed me to automate
the collection of all of the essential information about every server under my
control, and have it stored in a single, central location, for reporting.
Over the course of this chapter, I'll describe how I built just such a documentation
tool. First, I'll describe the information that I felt I needed to have about each of
the servers under my control and the scripts to retrieve this information for each
server. I'll then move on to discuss the various ways of automating this data
collection process over all of your servers. Finally, I'll demonstrate how I actually
achieved it, using SSIS and a central DBA Repository database.
NOTE
The material in this chapter describing how to build a DBA Repository using
SSIS and SSRS is adapted with permission from my article "Use SSRS and
SSIS to Create a DBA Repository," which originally appeared in SQL Server
Magazine, February 2008, copyright Penton Media, Inc.
32
2 – The SQL Server landscape
Server information
I needed to retrieve a number of useful pieces of server information for each of
my servers, such as:
• The server name
• The physical location of the server
• The SQL Server version, level and edition
• Security mode – Either Windows (Integrated) or Mixed mode
• SQL Server collation.
Listing 2.1 shows the script I developed to return this information (at least most
of it) for a given server.
SELECT CONVERT(CHAR(100), SERVERPROPERTY('Servername'))
AS Server,
CONVERT(CHAR(100), SERVERPROPERTY('ProductVersion'))
AS ProductVersion,
CONVERT(CHAR(100), SERVERPROPERTY('ProductLevel'))
AS ProductLevel,
CONVERT(CHAR(100),
SERVERPROPERTY('ResourceLastUpdateDateTime'))
AS ResourceLastUpdateDateTime,
33
2 – The SQL Server landscape
CONVERT(CHAR(100), SERVERPROPERTY('ResourceVersion'))
AS ResourceVersion,
CASE WHEN SERVERPROPERTY('IsIntegratedSecurityOnly') = 1
THEN 'Integrated security'
WHEN SERVERPROPERTY('IsIntegratedSecurityOnly') = 0
THEN 'Not Integrated security'
END AS IsIntegratedSecurityOnly,
CASE WHEN SERVERPROPERTY('EngineEdition') = 1
THEN 'Personal Edition'
WHEN SERVERPROPERTY('EngineEdition') = 2
THEN 'Standard Edition'
WHEN SERVERPROPERTY('EngineEdition') = 3
THEN 'Enterprise Edition'
WHEN SERVERPROPERTY('EngineEdition') = 4
THEN 'Express Edition'
END AS EngineEdition,
CONVERT(CHAR(100), SERVERPROPERTY('InstanceName'))
AS InstanceName,
CONVERT(CHAR(100),
SERVERPROPERTY('ComputerNamePhysicalNetBIOS'))
AS ComputerNamePhysicalNetBIOS,
CONVERT(CHAR(100), SERVERPROPERTY('LicenseType'))
AS LicenseType,
CONVERT(CHAR(100), SERVERPROPERTY('NumLicenses'))
AS NumLicenses,
CONVERT(CHAR(100), SERVERPROPERTY('BuildClrVersion'))
AS BuildClrVersion,
CONVERT(CHAR(100), SERVERPROPERTY('Collation'))
AS Collation,
CONVERT(CHAR(100), SERVERPROPERTY('CollationID'))
AS CollationID,
CONVERT(CHAR(100), SERVERPROPERTY('ComparisonStyle'))
AS ComparisonStyle,
CASE WHEN CONVERT(CHAR(100),
SERVERPROPERTY('EditionID')) = -1253826760
THEN 'Desktop Edition'
WHEN SERVERPROPERTY('EditionID') = -1592396055
THEN 'Express Edition'
WHEN SERVERPROPERTY('EditionID') = -1534726760
THEN 'Standard Edition'
WHEN SERVERPROPERTY('EditionID') = 1333529388
THEN 'Workgroup Edition'
WHEN SERVERPROPERTY('EditionID') = 1804890536
THEN 'Enterprise Edition'
WHEN SERVERPROPERTY('EditionID') = -323382091
THEN 'Personal Edition'
WHEN SERVERPROPERTY('EditionID') = -2117995310
THEN 'Developer Edition'
WHEN SERVERPROPERTY('EditionID') = 610778273
THEN 'Enterprise Evaluation Edition'
WHEN SERVERPROPERTY('EditionID') = 1044790755
THEN 'Windows Embedded SQL'
34
2 – The SQL Server landscape
As you can see, it's a pretty simple script that makes liberal use of the
SERVERPROPERTY function to return the required data.
35
2 – The SQL Server landscape
NOTE
All of the various properties of the SERVERPROPERTY function can be
found in Books Online or MSDN, see http://msdn.microsoft.com/en-
us/library/ms174396.aspx.
If you were to run this query against one of your SQL Servers, you'd see results
similar to those shown in Figure 2.1, all of which will be useful in your daily
reporting of your infrastructure.
One piece of information that this script does not return is the location of the
server. There is no way to glean the location information from a query. Some
things, at present, still have to be manually gathered.
Database management
It is obviously important that DBAs know what databases are on each of their
servers. While DBAs may not be intimately familiar with every database schema
on every SQL Server, it is essential that they are aware of the existence of every
database, and at least understand the basic characteristics of each, such as what
server they are on, what size they are and where on disk they are located.
You can also gather the information you need to monitor the growth of the data
and log files for each database, or answer questions such as "where are the system
database files located?" This question brings up the interesting topic of
implementing standards across all of your servers. Are the data files for each
server stored on the correct, predetermined data drive? The log files on the correct
log drive? Are naming conventions consistently enforced? Is each database using
the correct default recovery model (e.g. SIMPLE) unless specified otherwise?
You may find that the answer is, generally, "no". It is an unfortunate reality that,
often, a DBA will inherit an infrastructure whereby a hodge-podge of different
standards have been set and only erratically imposed by a variety of former DBAs.
However, once you've got all of this data stored in a central repository, for every
server, you can quickly report on how well your current standards have been
enforced, and can start the job of pulling the "non-standard" ones into shape. And
then, who knows, if you can stay in the position long enough, say ten years, you
may actually get to see an infrastructure that properly adheres to all the standards
you set forth.
To gather this database management information, you will need to run the same
two queries on each SQL 2000, 2005 and 2008 instance. The first of these queries
is shown in Listing 2.2. It makes use of the sp_msforeachdb system stored
procedure, which issues the same query for each database on a server, and saves
36
2 – The SQL Server landscape
you the time of writing your own cursor or set-based query to iterate through each
database. I create a temp table, HoldforEachDB and then populate that table with
the results from each database. In this way, I have one result set for all databases
on the server, instead of individual result sets for each database, which would have
otherwise been the case. Also, since I know that I will ultimately want to get this
information from SSIS, and into a central DBA repository, having the temp table
pre-defined is ideal.
IF EXISTS ( SELECT *
FROM tempdb.dbo.sysobjects
WHERE id =
OBJECT_ID(N'[tempdb].[dbo].[HoldforEachDB]') )
DROP TABLE [tempdb].[dbo].[HoldforEachDB] ;
CREATE TABLE [tempdb].[dbo].[HoldforEachDB]
(
[Server] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NULL,
[DatabaseName] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NOT NULL,
[Size] [int] NOT NULL,
[File_Status] [int] NULL,
[Name] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NOT NULL,
[Filename] [nvarchar](260) COLLATE
SQL_Latin1_General_CP1_CI_AS
NOT NULL,
[Status] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NULL,
[Updateability] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NULL,
[User_Access] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NULL,
[Recovery] [nvarchar](128) COLLATE
SQL_Latin1_General_CP1_CI_AS
NULL
)
ON [PRIMARY]
INSERT INTO [tempdb].[dbo].[HoldforEachDB]
EXEC sp_MSforeachdb 'SELECT CONVERT(char(100),
SERVERPROPERTY(''Servername'')) AS Server,
''?'' as DatabaseName,[?]..sysfiles.size,
[?]..sysfiles.status, [?]..sysfiles.name,
[?]..sysfiles.filename,convert(sysname,DatabasePropertyEx(''?''
,''Status'')) as Status,
37
2 – The SQL Server landscape
convert(sysname,DatabasePropertyEx(''?'',''Updateability'')) as
Updateability,
convert(sysname,DatabasePropertyEx(''?'',''UserAccess'')) as
User_Access,
convert(sysname,DatabasePropertyEx(''?'',''Recovery'')) as
Recovery From [?]..sysfiles '
The second query, shown in Listing 2.3, simply selects from the HoldforEachDB
temporary table.
SELECT [Server]
,[DatabaseName]
,[Size]
,[File_Status]
,[Name]
,[Filename]
,[Status]
,[Updateability]
,[User_Access]
,[Recovery]
FROM [tempdb].[dbo].[HoldforEachDB]
The output of this query can be seen in Figure 2.2, which displays the server
name, as well as the database name, size, filename and recovery model.
38
2 – The SQL Server landscape
Database backups
Having backup information is critical for the DBA, especially when working with
a large infrastructure. Knowing where the full, differential or log backups are
located is more than helpful; it is essential. This type of information can easily be
gathered directly from the MSDB database, which has not changed substantially
from SQL Server 2000 to 2008. Listing 2.4 shows the driving query for gathering
from MSDB the vital database backup information that you need for each server,
including information such as backup start date, end data and size. Notice, in the
WHERE clause, that this query actually retrieves 30 days worth of history.
SELECT CONVERT(char(100), SERVERPROPERTY('Servername'))
AS Server,
msdb.dbo.backupmediafamily.logical_device_name,
msdb.dbo.backupmediafamily.physical_device_name,
msdb.dbo.backupset.expiration_date,
msdb.dbo.backupset.name,
msdb.dbo.backupset.description,
msdb.dbo.backupset.user_name,
msdb.dbo.backupset.backup_start_date,
msdb.dbo.backupset.backup_finish_date,
CASE msdb..backupset.type
WHEN 'D' THEN 'Database'
WHEN 'L' THEN 'Log'
END AS backup_type,
msdb.dbo.backupset.backup_size,
msdb.dbo.backupset.database_name,
msdb.dbo.backupset.server_name AS Source_Server
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON
msdb.dbo.backupmediafamily.media_set_id =
msdb.dbo.backupset.media_set_id
WHERE ( CONVERT(datetime,
msdb.dbo.backupset.backup_start_date, 102) >= GETDATE()
- 30 )
39
2 – The SQL Server landscape
Security
For security reporting, we essentially want to know who has access to which
databases, and with which permissions. A sample query of the kind of information
that can be gathered is in shown in Listing 2.5.
IF EXISTS ( SELECT *
FROM tempdb.dbo.sysobjects
WHERE id =
OBJECT_ID(N'[tempdb].[dbo].[SQL_DB_REP]') )
DROP TABLE [tempdb].[dbo].[SQL_DB_REP] ;
GO
WHERE
40
2 – The SQL Server landscape
( usu.islogin = 1 AND
usu.isaliased = 0 AND
usu.hasdbaccess = 1) AND
(usg.issqlrole = 1 OR
usg.uid is null)'
As for the database management query, a temp table is populated again using
sp_msforeachdb. Ultimately, our SSIS package (Populate_DBA_Repository)
will read the data from this temp table and then store it in our central repository
(DBA_Rep).
A simple Select * from [tempdb].[dbo].[SQL_DB_REP], the output of
which is shown in Figure 2.4, delivers a lot of information about security, some of
which may be surprising. You might be interested to know, for example, that
"MyDomain\BadUser" has DBO access to several user databases.
41
2 – The SQL Server landscape
42
2 – The SQL Server landscape
msdb.dbo.sysjobs.date_modified,
GETDATE() AS Package_run_date,
msdb.dbo.sysschedules.name AS Schedule_Name,
msdb.dbo.sysschedules.enabled,
msdb.dbo.sysschedules.freq_type,
msdb.dbo.sysschedules.freq_interval,
msdb.dbo.sysschedules.freq_subday_interval,
msdb.dbo.sysschedules.freq_subday_type,
msdb.dbo.sysschedules.freq_relative_interval,
msdb.dbo.sysschedules.freq_recurrence_factor,
msdb.dbo.sysschedules.active_start_date,
msdb.dbo.sysschedules.active_end_date,
msdb.dbo.sysschedules.active_start_time,
msdb.dbo.sysschedules.active_end_time,
msdb.dbo.sysschedules.date_created AS
Date_Sched_Created,
msdb.dbo.sysschedules.date_modified AS
Date_Sched_Modified,
msdb.dbo.sysschedules.version_number,
msdb.dbo.sysjobs.version_number AS Job_Version
FROM msdb.dbo.sysjobs
INNER JOIN msdb.dbo.syscategories ON
msdb.dbo.sysjobs.category_id =
msdb.dbo.syscategories.category_id
LEFT OUTER JOIN msdb.dbo.sysoperators ON
msdb.dbo.sysjobs.notify_page_operator_id =
msdb.dbo.sysoperators.id
LEFT OUTER JOIN msdb.dbo.sysjobservers ON
msdb.dbo.sysjobs.job_id = msdb.dbo.sysjobservers.job_id
LEFT OUTER JOIN msdb.dbo.sysjobschedules ON
msdb.dbo.sysjobschedules.job_id = msdb.dbo.sysjobs.job_id
LEFT OUTER JOIN msdb.dbo.sysschedules ON
msdb.dbo.sysjobschedules.schedule_id =
msdb.dbo.sysschedules.schedule_id
Figure 2.5 shows the output of the SQL Agent Job Information query.
43
2 – The SQL Server landscape
44
2 – The SQL Server landscape
http://www.microsoft.com/downloads/details.aspx?FamilyID=eedd10d
6-75f7-4763-86de-d2347b8b5f89&displaylang=en
• PowerShell – while I am not necessarily a developer, I do aspire,
occasionally, to expand my knowledge base and find tools that will make
my life easier. One of these tools is PowerShell, which has been
incorporated into SQL Server 2008 and promoted extensively
by Microsoft. While I have not used this tool to build DBA solutions,
others have and it is worth reviewing some of their solutions.
One such solution, by Allen White, can be found at:
http://www.simple-talk.com/sql/database-administration/let-powershell-
do-an-inventory-of-your-servers/
• SQL Server 2008 Data Collector – this is a new feature in
SQL Server 2008 that you may choose to use in your day-to-day
DBA data gathering tasks. Once such technique, performance data
gathering, is described by Brad McGehee at:
http://www.simple-talk.com/sql/learn-sql-server/sql-server-2008-
performance-data-collector/.
In addition to free tools and technologies, you could certainly acquire a vendor
application that will provide some form of out-of-the-box "DBA repository".
There is no shame in that whatsoever. Many DBAs think that the only way they
will get the solution they really want is to build it themselves, often believing that
they will save their company lots of money in the process. While this attitude is
admirable, it is often misguided. For one, it is unlikely that you'll be able to create
a documenting solution that will match the capability of a vendor product, the
most obvious reason being that you are one person and your time will be limited.
If your time can be given full bore to such an endeavor, you will then have to
weigh the man-hours it will take you to build, test, and deploy and maintain the
solution.
Even without huge time constraints, a vendor-supplied solution is likely to have
features that a home-grown one will never have, plus updates are regularly
dispersed and features continually added. In general, my advice to DBAs is that it's
certainly worth researching vendor tools: if you find one that works for you, then
that's probably the best route to go down.
However, despite the apparent advantages of going down the Microsoft or vendor
tool route, that is not the way that I went and here is why: I reasoned that as a
DBA I would need to use SSIS often, either because I would be directly
responsible for data loading/ transfer tasks that needed SISS, or I would have to
work with developers who used SSIS.
In short, a considerable fringe benefit of building my own repository solution
would be that it would require me to expand my knowledge of creating all kinds of
45
2 – The SQL Server landscape
SSIS packages and use objects I would not normally use. Also, I did not want to
be in a position where I did not know more than the developer. That is just me.
So, in fact, I set out on the project primarily to gain valuable experience in SSIS.
Fortunately for me, and many other DBAs with whom I have shared the solution,
it turned out to be quite useful. For the remainder of this chapter, I will focus on
how to create and use the DBA Repository, how to load it with all of the
previously-described database information, using SSIS, and finally how to generate
reports on the assembled data using SSRS.
46
2 – The SQL Server landscape
47
2 – The SQL Server landscape
On this note, it is a pity that Microsoft, at present, does not offer "Extended
Server Properties", in the same way that they provide Extended Properties for the
internal documentation of database objects. For example, I can create a table in a
database and add an extended property that explains the purpose of the table but I
cannot do the same at the server level. If there was an extended Server property
that held the location of the server, and its primary function, I could automate the
collection of this data and store it in the SQL_Servers table, which is designed to
store the bevy of Server information that we have collected about our SQL
servers, including version, edition, collation and so on.
To demonstrate how the DBA repository works, I will walk through an example
using the SQL_Servers table, and how it is populated using the SSIS package. The
other tables in the database, aside from Server_Location, are populated in the
same way.
48
2 – The SQL Server landscape
49
2 – The SQL Server landscape
In this simple example, we have discovered one server, with two instances of SQL
Server, MW4HD1 and MW4HD1\SRVSAT. The former is a version 10, or SQL
Server 2008, instance, and the latter a version 9, or SQL Server 2005, instance.
This is a simple scenario, but the principles would be exactly the same were the
instances installed on two separate servers on the network, or if there were
hundreds of instances distributed across 10, 50 or 200 servers on your network. In
short, this solution will work with any number of servers.
All I need to do is place these server instance details into a table, called
ServerList_SSIS, which can then be used to "seed" the
Populate_DBA_Repository SSIS package. The ServerList_SSIS table is
shown in Figure 2.9.
Server instances in our Serverlist_SSIS table and, one by one, connect to the
appropriate server, execute the queries, and then store the results in the DBA_Rep
database.
In SSIS you can use many types of variables, such as a String variable and a System
Object variable. In order to iterate through each server in the ServerLIst_SSIS
table, I had to use both. The ForEachLoop container that iterates through the list
of server requires the ADO object source type of variable. However, the
Expressions that dynamically control the actual connection string of the target
servers (explained shortly) require a string variable. Figure 2.10 shows the
"Populate ADO Variable with Server Named from Serverlist_SSIS" task, as
well as the list of variables that I use.
51
2 – The SQL Server landscape
Figure 2.11: Using the SQL_RS ADO variable in the Foreach Loop
container.
52
2 – The SQL Server landscape
53
2 – The SQL Server landscape
Connection manager
The Connection Manager object, called MultiServer, controls which servers to
connect to in order to execute the underlying SQL code, in this case the script
shown in Listing 2.1. In other words, the Multiserver Connection Manager
object is used to connect to each listed SQL Server instance, for every Foreach
Loop Container in the Populate_DBA_Rep package. You can see other
Connection Manager objects, as well as the MultiServer object, in Figure 2.13.
54
2 – The SQL Server landscape
55
2 – The SQL Server landscape
56
2 – The SQL Server landscape
57
2 – The SQL Server landscape
SQL_Servers.BuildClrVersion,
SQL_Servers.[Collation],
SQL_Servers.CollationID,
SQL_Servers.ComparisonStyle,
SQL_Servers.ProductEdition,
SQL_Servers.IsClustered,
SQL_Servers.IsFullTextInstalled,
SQL_Servers.SqlCharSet,
SQL_Servers.SqlCharSetName,
SQL_Servers.SqlSortOrderID,
SQL_Servers.SqlSortOrderName,
SQL_Servers.LocationID
from SQL_Servers
The output, as you can see in Figure 2.17, shows data for the two SQL Server
instances, MW4HD1 and MW4HD1\SRVSAT.
58
2 – The SQL Server landscape
SSRS reporting
More powerful than ad hoc querying of DBA_Rep is designing fully fledged
Reporting Services reports that can be scheduled and subscribed to. You can also
offer a level of historical data viewing via SSRS execution snapshots.
While I have created and written about many SSRS reports that I have developed
for the DBA_Rep solution, one SSRS report, in particular, stands out for me, as I
use it daily for many tasks. It is the Job Interval Report and it allows me to filter
out SQL Agent Job information by a range of criteria, including whether or not
the job is scheduled or enabled, if it is a database backup job, and also if it failed
or succeeded on the last run. Additional details tell me how long the job ran, at
what time and even how many jobs exist on each monitored server.
Figure 2.18 shows the Job Interval Report listing all jobs that are labeled as
"backups", based on the parameter "Backup Jobs Only". This helps me narrow
down my search criteria, for server, location and job status. While a detailed
description of creating such reports is outside the scope of this chapter, I've
included several of them in the code download for the book.
Figure 2.18: SSRS Job Interval report based on DBA_Rep database query.
Summary
Whether you build your own, deploy someone else's, or purchase one from a
vendor, having a documentation solution is critical for a DBA, especially when
dealing with more than 20 servers. In this chapter, I dove headlong into how to
59
2 – The SQL Server landscape
60
CHAPTER 3: THE MIGRATORY
DATA
When someone, usually a developer or project manager, enters my office and
states, "We are going to need to move data from X to Y …" there usually follows
a short inquisition, starting with the question, "Why?" Of course, I can probably
guess why, as it is such a common request. As a data store grows, it often becomes
necessary to "offload" certain processes in order to maintain performance levels.
Reporting is usually the first to go, and this can involve simply creating a mirror
image of the original source data on another server, or migrating and transforming
the data for a data warehouse. QA and Dev environments also need to be
refreshed regularly, sometimes daily.
As a DBA, I have to consider many factors before deciding how to allocate the
resources required in building a data migration solution. As Einstein may have
posited, it is mostly a matter space and time or, more correctly, space time. The
other, less scientific, variable is cost. Moving data is expensive. Each copy you
need to make of a 1.5 Terabyte, 500-table production database is not only going to
double the cost of storage space, but also the time required for backup and
restore, or to push the data to target systems for reporting, and so on.
In this chapter, I am going to cover the myriad ways to push, pull, or pour data
from one data store to another, assessing each in terms of space, time and cost
criteria. These data migration solutions fall into three broad categories:
• Bulk Data Transfer solutions – this includes tools such as Bulk Copy
Program (BCP) and SSIS
• Data Comparison solutions – using third party tools such as Red Gate's
SQL Data Compare, a built-in free tool such as TableDiff, or perhaps a
homegrown T-SQL script that uses the new MERGE statement in SQL
2008.
• "High Availability" solutions – using tools for building highly available
systems, such as log shipping, replication, database mirroring and
database snapshots.
I'll review some of the available tools in each category, so that you're aware of the
options when you come to choose the best fit for you, and your organization, and
I will provide sample solutions for BCP, SSIS and TableDiff. I will also cover log
shipping in some detail as, in my experience, it continues to be one of the most
cost effective data migration solutions, in terms of cost, space and time.
61
3 – The migratory data
At this point, I have the information that I need. There are several possible
solutions, in this case, and which one you choose largely depends on cost.
62
3 – The migratory data
Log Shipping is a solution that has served me well in my career, across the space,
time and cost boundaries. However, this solution would not allow us to add
indexes on the target system. In addition, it is not possible to log ship between
different versions of SQL Server, say SQL 2000 to 2005, and reap all of the
benefits for a reporting instance because you will be unable to leave the target
database in Standby mode and therefore can not access the database. There are
many potential solutions to the "once-a-day refresh" requirement. Database
snapshots may be a viable option, but require Enterprise Edition and that the
snapshot resides on the same SQL Server instance as the source database.
While, on our imaginary whiteboard, we might cross off Log Shipping and
Snapshots as potential solutions, for the time being, it would be a mistake to rule
them out entirely. As I mentioned before, log shipping has served me well in
similar scenarios, and it's possible that some future criteria will drive the decision
toward such a solution. Bear in mind also that, with log shipping in place, it is
possible to use your log shipped database target instance as both a hot standby
server for disaster recovery as well as a server to offload reporting processes.
However, for now let's assume that another solution, such as SSIS, BCP or
TableDiff, would be more appropriate. Over the following sections, I'll
demonstrate how to implement a solution using these tools, noting along the way
how, with slight modifications to the criteria, other data migration solutions could
easily fit the need.
63
3 – The migratory data
This table is a heap; in other words it has no indexes. It is populated using an SSIS
job that collects connection information from each SQL Server instance defined
in the DBA Repository, and merges this data together.
NOTE
For a full article describing the process of gathering this data, please refer to:
http://www.simple-talk.com/sql/database-administration/using-ssis-to-
monitor-sql-server-databases-/
I chose this table only because it provides an example of the sort of volume of
data that you might be faced with as a DBA. Over time, when executing the
scheduled job every hour for many tens of servers, the table can grow quite large.
However, as a side note, it is worth gathering as the data offers many insights into
how your servers are being utilized.
TIP
If you would like to view the data from a sample data file that might be
otherwise too large to open in Notepad, I use tail.exe to view the last n lines of
the data file. Tail.exe is available in the Windows 2003 Resource Kit.
64
3 – The migratory data
data from source to target can accommodate, and on the speed of your network
link.
The two bulk transfer tools that we'll consider here are:
• Bulk Copy Program (BCP) – This tool has been around for nearly as
long as SQL Server itself. DBAs have a hard time giving it up. It is a
command line tool and, if speed of data loading is your main criteria, it is
still hard to beat. There are several caveats to its use, though, which I will
cover.
• SQL Server Integration Services (SSIS) – I have found that SSIS is
one of the best choices for moving data, especially in terms of cost, and
in situations where near real-time data integration is not a requirement,
such as you may achieve with native replication or Change Data Capture
technologies. Transforming data is also a chore that SSIS handles very
well, which is perfect for data warehousing. I will show how to use SSIS
to load data from a source to destination, and watch the data as it flows
through the process.
Whether you choose to use BCP or SSIS will depend on the exact nature of the
request. Typically, I will choose BCP if I receive a one-time request to move or
copy a single large table, with millions of records. BCP can output data based on a
custom query, so it is also good for dumping data to fulfill one-off requests for
reports, or for downstream analysis.
SSIS adds a level of complexity to such ad-hoc requests, because DBAs are then
forced to "design" a solution graphically. In addition, many old school DBAs
simply prefer the command line comfort of BCP. I am not sure how many old
school DBAs remain, but as long as Microsoft continues to distribute BCP.exe, I
will continue to use it and write about it, for its simple and fast interface.
SSIS has come a long way from its forebear, Data Transformation Services (DTS)
and, in comparison to BCP, can be a bit daunting for the uninitiated DBA.
However, I turn to it often when requested to provide data migration solutions,
especially when I know there may be data transformations or aggregations to
perform, before loading the data into a data warehouse environment. SSIS
packages are easy to deploy and schedule, and Microsoft continues to add
functionality to the SSIS design environment making it easy for developers to
control the flow of processing data at many points. Like BCP, SSIS packages
provide a way to import and export data from flat files, but with SSIS you are not
limited to flat files. Essentially any ODBC or OLEDB connection becomes a data
source. Bulk data loads are also supported; they are referred to as "fast Load" in
SSIS vernacular.
65
3 – The migratory data
Over the coming section, I'll present some sample solutions using each of these
tools. First, however, we need to discuss briefly the concept of minimally logged
transactions.
66
3 – The migratory data
The dangers of rampant log file growth can be mitigated to some extent by
committing bulk update, inserts or delete transactions in batches, say every
100,000 records. In BCP, for example, you can control the batch size using the
batch size flag. This is a good practice regardless of recovery model, as it means
that the committed transaction can be removed from the log file, either via a log
backup or a checkpoint truncate.
The model in normal use for a given database will depend largely on your
organization's SLAs (Service Level Agreements) on data availability. If point-in-
time recovery is not a requirements, than I would recommend using the Simple
recovery model, in most cases. Your bulk operations will be minimally logged, and
you can perform Full and Differential backups as required to meet the SLA.
However, if recovering to a point in time is important, then your databases will
need to be in Full recovery mode. In this case, I'd recommend switching to Bulk
logged mode for bulk operations, performing a full backup after bulk loading the
data and then subsequently switching back to Full recovery and continuing log
backups from that point.
NOTE
I cover many tips and tricks for monitoring file growth in Chapter 4, on
managing space.
BCP.EXE
BCP has been a favorite of command line-driven DBAs ever since it was
introduced in SQL Server 6.5. It has retained its popularity in spite of the
introduction of smarter, prettier new tools with flashy graphical interfaces and the
seeming ability to make data move just by giving it a frightening glare. I have used
BCP for many tasks, either ad hoc, one-off requests or daily scheduled loads. Of
course, other tools and technologies such as SSIS and log shipping shine in their
own right and make our lives easier, but there is something romantic about
BCP.exe and it cannot be overlooked when choosing a data movement solution
for your organization.
Basic BCP
Let's see how to use BCP to migrate data from our SQL_Conn table in the
DBA_Rep database. We'll dump the 58K rows that currently exist in my copy of the
table to a text file, and then use a script to repeatedly load data from the file back
into the same SQL_Conn table, until we have 1 million rows.
Knowing that the table SQL_Conn is a heap, meaning that there are currently no
indexes defined for the table, I rest easy knowing that I should be minimally
67
3 – The migratory data
logging transactions, as long as the database is set for the Bulk logged or Simple
recovery model.
With BCP, just like with SSIS dataflow, data is either going in or coming out.
Listing 3.2 shows the BCP output statement, to copy all of the data rows from the
SQL_Conn table on a local SQL Server, the default if not specified, into a text file.
After the bcp command, we define the source table, in this case
dba_rep..SQL_Conn. Next, we specify out, telling BCP to output the contents of
the table to a file, in this case, "C:\Writing\Simple Talk Book\Ch3\Out1.txt".
Finally, the -T tells BCP to use a trusted connection and -n instructs BCP to use
native output as opposed to character format, the latter being the default.
Native output is recommended for transferring data from one SQL Server
instance to another, as it uses the native data types of a database. If you are using
identical tables, when transferring data from one server to another or from one
table to another, then the native option avoids unnecessary conversion from one
character format to another.
Figure 3.1 shows a BCP command line execution of this statement, dumping all
58040 records out of the the SQL_Conn table.
According to Figure 3.1, BCP dumped 17 thousand records per second in a total
of 3344 milliseconds, or roughly 3 seconds. I would say, from first glance, that this
is fast. The only way to know is to add more data to this table and see how the
times change. Remember that at this point, we are just performing a straight
"dump" of the table and the speed of this operation won't be affected by the lack
of indexes on the source table. However, will this lack of indexes affect the speed
when a defined query is used to determine the output? As with any process, it is
fairly easy to test, as you will see.
Let's keep in mind that we are timing how fast we can dump data out of this
sample table, which in the real world may contain banking, healthcare or other
types of business critical data. 58 thousand is actually a miniscule number of
records in the real world, where millions of records is the norm. So let's simulate a
million records so that we may understand how this solution scales in terms of
time and space. I roughly equate 1 million records to 1 Gigabyte of space on disk,
so as you are dumping large amounts of data, it is important to consider how
much space is required for the flat file and if the file will be created locally or on a
network share. The latter, of course, will increase the amount of time for both
dumping and loading data.
68
3 – The migratory data
You will see that the main difference between this BCP statement and the
previous one is that instead of out I am specifying in as the clause, meaning that
69
3 – The migratory data
we are loading data from the text file back in to the SQL_Conn table, which
currently holds 58K rows.
The -h TABLOCK hint forces a lock on the receiving table. This is one of the
requirements to guarantee minimally logging the transactions. The –b option tells
BCP to batch the transactions at n rows, in this case every 50,000 rows. If there
are any issues during the BCP load process, any rollback that occurs will only
rollback to the last transaction after the n load. So, say I wanted to load 100,000
records, and I batched the BCP load every 20,000 records. If there were an issue
while loading record 81,002 I would know that 80,000 records were successfully
imported. I would lose 1,002 transactions as they would roll back to the last
20,000 mark, which would be 80,000 records.
The batch file takes one parameter, which is the number of times to run the BCP
command in order to load the required number of rows into the table. How did I
choose 20 iterations? Simple math: 20 * 58,040 = 1,160,800 records.
As you can see in Figure 3.2, this is exactly the number of rows that is now in the
SQL_Conn table, after 20 iterations of the BCP command, using the 58,040 records
in the f1_out.txt file as the source.
Figure 3.2: Query to count SQL_Conn after loading over 1 million records.
NOTE
For what it is worth, I have also used this batch file to load a Terabyte worth of
data to test how we could effectively manage such a large data store.
If you re-run the BCP command in Listing 3.2, to output the query results to a
file, you will find that the process takes more than a minute for a million rows, as
opposed to the previous 3 seconds for 58K rows, indicating that the time to
output the records remains good (58,040 / 3 = 19,346 records per second * 60
70
3 – The migratory data
seonds = 1.16 million). I am still seeing nearly 20,000 records per second times(?)
despite the increase in data, attesting to the efficiency of the old tried and true
BCP.
There are many duplicate rows in the SQL_Conn table, and no indexes defined, so
I would expect that this query would take many seconds, possibly half a minute to
execute. The BCP command is shown in Listing 3.5.
bcp "Select * from dba_rep..SQL_Conn
where run_date > '10/01/2008'"
queryout
"C:\Writing\Simple Talk Book\Ch3\bcp_query_dba_rep.txt" -n –T
Listing 3.5: BCP output statement limiting rows to specific date range,
using the output option.
As you can see in Figure 3.3, this supposedly inefficient query ran through more
than a million records and dumped out 64,488 of them to a file in 28 seconds,
averaging over 2,250 records per second.
71
3 – The migratory data
Of course, at this point I could fine tune the query, or make recommendations for
re-architecting the source table to add indexes if necessary, before moving this
type of process into production. However, I am satisfied with the results and can
move safely on to the space age of data migration in SSIS.
SSIS
We saw an example of an SSIS package in the previous chapter, when discussing
the DBA Repository. The repository is loaded with data from several source
servers, via a series of data flow objects in an SSIS package (Populate_DBA_Rep).
Let's dig a little deeper into an SSIS data flow task. Again, we'll use the SQL_Conn
table, which we loaded with 1 million rows of data in the previous section, as the
source and use SSIS to selectively move data to an archive table; a process that
happens frequently in the real world.
Figure 3.4 shows the data flow task, "SQL Connections Archive", which will copy
the data from the source SQL_Conn table to the target archive table,
SQL_Conn_Archive, in the same DBA_Rep database. There is only a single
connection manager object. This is a quite simple example of using SSIS to
migrate data, but it is an easy solution to build on.
72
3 – The migratory data
Inside the SQL Connections Archive data flow, there are two data flow objects, an
OLE DB Source and OLE DB Destination, as shown in Figure 3.5.
73
3 – The migratory data
Figure 3.7 shows the Source Editor for the OLE DB Destination object, where
we define the target table, SQL_Conn_Archive, to which the rows will be copied.
There are a few other properties of the destination object that are worth noting. I
have chosen to use the Fast Load option for the data access mode, and I have
enabled the Table Lock option, which as you might recall from the BCP section,
is required to ensure minimally logged transactions.
74
3 – The migratory data
Although I did not use it in this example, there is also the Rows per batch option
that will batch the load process so that any failures can be rolled back to the
previous commit, rather than rolling back the entire load.
It is worth noting that there are other Fast Load options that you cannot see
here. In SSIS, these options are presented to you only in the Properties window
for the destination object. Additional fast load properties include:
• FIRE_Triggers, which forces any triggers on the destination table to
fire. By default, fast or bulk loading bypasses triggers.
• ORDER, which speeds performance when working with tables with
clustered indexes so that the data being loaded is pre-sorted to match the
physical order of the clustered index.
75
3 – The migratory data
Figure 3.8: Additional options for Fast Loading data within SSIS
destination objects.
It is almost time to execute this simple data migration package. First, however, I
would like to add a data viewer to the process. A data viewer lets you, the package
designer, view the data as it flows through from the source to the destination
76
3 – The migratory data
object. To add a data viewer, simply right-click on the green data flow path and
select "Data Viewer". This will bring up the "Configure Data Viewer" screen, as
shown in Figure 3.9. The data viewer can take several forms: Grid, Histogram,
Scatter Plot and Column Chart. In this example, I chose Grid.
When we execute the package, the attached data viewer displays the flow of data.
You can detach the data viewer to allow the records to flow through without
interaction. The data viewer is useful while developing a package to ensure that
the data you are expecting to see is indeed there. Of course, you will want to
remove them before deploying the package to production, via a scheduled job.
Figure 3.10 shows the data viewer, as well as the completed package, as the 64,488
records are migrated.
77
3 – The migratory data
If this was indeed an archive process, the final step would be to delete the data
from the source table. I will not cover this step except to say that it too can be
automated in the SSIS package with a simple DELETE statement, matching the
criteria we used for the source query when migrating the data.
I am always careful when deleting data from a table, not because I am fearful of
removing the wrong data (good backup practices and transactions are safety
measures) but because I am mindful of how it might affect my server. For
example, how will the log growth be affected by deleting potentially millions of
records at a time? Should I batch the delete process? Will there be enough space
for log growth when accounting for each individual delete? How long will it take?
These are all questions the answers to which have, over the years, taught me to
tread carefully when handling the delete process.
78
3 – The migratory data
With data comparison, you are migrating a much smaller subset of transactions,
for example those that have occurred over the last day, or even hour for that
matter. This is similar in nature to log shipping in the sense that only new
transactions are migrated, but with the added benefit of maintaining much more
control over the target data. For example, once the data is migrated from source
to target, via a data comparison tool, you can add indexes to the target that did not
exist on the source. This is not possible with log shipping, as I will discuss shortly.
Several tools come to mind immediately, for performing this data comparison and
"merge" process:
• Native Change Data Capture (SQL Server 2008 only) – this new
technology allows you to capture date changes and push them to a target
in near-real time. I have been anxiously awaiting such a technology in
SQL Server but I would have to say that I have not found CDC to be
fully realized in SQL Server 2008, and I don't cover it in this book. Don't
get me wrong, it is there and it works but, much akin to table partitions
and plan guides, it is a bit daunting and not very intuitive.
• T-SQL Scripts – Pre SQL Server 2008, many DBAs developed their
own ways of merging data from one source to another, using T-SQL
statements such as EXCEPT and EXISTS. Essentially, such code tests for
the existence of data in a receiving table and acts appropriately upon
learning the results. This was not difficult code to produce; it was just
time consuming.
• TableDiff – this is another tool that has been around for many years. It
was designed to help compare replication sets for native SQL Server
replication but it is also a handy tool for comparing and synchronizing
data.
• Third party Data Comparison tools – there are several available on the
market, but I am most familiar with Red Gate's SQL Data Compare.
Where tablediff.exe is excellent for comparing one table to another, SQL
Data Compare allows you to compare entire databases, or subsets of
objects, and data therein. The process can be scripted and automated to
ensure data is synchronized between data sources. It is particularly
valuable for synching your production environment to your test and dev
environments, or for reporting.
I will cover TableDiff.exe in this chapter. While I do not cover SQL Data
Compare here, I would highly recommend trying it out if you do a lot of data
migration and synchronization:
http://www.red-gate.com/products/SQL_Data_Compare/index.htm.
However, before I present the sample solution, using TableDiff, we need to
discuss briefly the concept of uniqueness.
79
3 – The migratory data
80
3 – The migratory data
So, now I am going to add the clustered index and an identity column (ID) to both
tables. Figure 3.13 shows the 5 fields from the SQL_Conn table that I used to
guarantee uniqueness. I am heartened by the fact that, as this table fills up over
time, the clustered index will also benefit me when running interrogative queries
for reports of sever activity.
81
3 – The migratory data
With the newly built index and identity column on both source (SQL_Conn) and
target (SQL_Conn_Archive), it is time to introduce Tablediff.exe, which we will
use to keep the two tables in sync.
Tablediff.exe
Tablediff.exe is a little known and free tool that comes with SQL Server, and has
done so for many releases. It was designed to assist with comparing replication
sets for native SQL Server replication. However, even if you are not using SQL
replication, it can be put to good use in comparing and synchronizing source and
target tables.
It compares one table at a time and displays the differences between the two
tables. Further, it can generate scripts that will sync the two tables, if there are
differences. For SQL Server 2005, Tablediff.exe can be found in the C:\Program
Files\Microsoft SQL Server\90\COM\ folder. It has options that allow you to
define the source and destination servers, as well the required databases and tables.
Listing 3.6 shows the command line execution that will compare the freshly
loaded SQL_Conn source table and the destination SQL_Conn_Archive. The
options -q and -t 200, respectively, tell tablediff to do a simple record count
rather than a full blown compare, and to timeout after 200 seconds. You also have
the ability to lock the source and target tables, as I am doing here with
-sourcelocked and –destinationlocked.
Figure 3.14 shows the 32 records that are different in SQL_Conn and
SQL_Conn_Archive tables, by doing a simple row count. It took less than a tenth
of a second to deliver the results.
82
3 – The migratory data
Without the -q option, you will get a list of differences, by ID column, as shown
in Figure 3.15. This comparison took 0.15 seconds for the 32 records.
83
3 – The migratory data
The best thing about Tablediff.exe is that it will generate a script that will bring the
two tables in sync. That option is -f which takes a filename and path, as shown in
Listing 3.7.
C:\Program Files\Microsoft SQL Server\90\COM\tablediff.exe" -
sourceserver MW4HD1 -sourcedatabase DBA_Rep -sourcetable
SQL_Conn -sourcelocked -destinationlocked -destinationserver
MW4HD1 -destinationdatabase DBA_Rep -destinationtable
SQL_Conn_Archive -q -t 200 –f C:\Output\SQL_Conn.sql
Running this tablediff command, with the -f option, generates a file containing all
of the T-SQL statements to make the two tables identical. Figure 3.16 show the
SQL_Conn.sql file that the command created.
Executing the script against the SQL_Conn_Archive table and then re-running the
tablediff.exe will show that the two tables are now identical.
84
3 – The migratory data
the data in them needs to be regularly resynched with the source. The added
benefit of using a tool such as log shipping is that you can segregate reporting
processes from transactional processes, which can offer performance gains in
some scenarios.
85
3 – The migratory data
It is when you start using these techniques that the issues of cost, space and time
really come to the fore and it is important to understand what problems each will
solve, compared to their cost. Native replication and database mirroring, while
certainly valid solutions, come with a higher price tag if you want to cross breed a
high availability solution with a reporting solution.
While I may choose replication or mirroring as options at some stage, so far I
have found that, in my career as a stolid, bang-for-the-buck DBA and manager,
log shipping has come out ahead of the three solutions nearly 100% of the time,
when considering cost, space and time. Therefore, it is the solution I will focus on
here.
86
3 – The migratory data
SQL Backup. However, of course, one then needs to add the cost of this tool to
the overall cost of the solution.
However, what if there were 2 G worth of log data, and the target server was
reached via a 3MB WAN connection? What if the request was for more than one
target? What could have taken 15 minutes, on first analysis, is now taking 45
minutes or more, and pushing past the start of business. DBAs constantly find
themselves making accommodations based on unexpected changes to the original
requests. Proper communication and expectations need to be set upfront so that
planning follows through to execution as seamlessly as possible, with
contingencies in circumstances of failure.
Don't forget also that if the target database ever gets out of synchronization with
the source logs for whatever reason (it happens), then the entire database needs to
be restored from a full backup to reinstate the log shipping. If the full database is
over 200G and you have a slow WAN link, then this could be a big issue. Those
45 minutes just became hours. No one likes seeing a critical, tier 1 application
down for hours.
Finally, there will be a need to store this redundant data. As you add servers to the
mix, the amount of space required grows proportionately. Soon, the 200G
database, growing at a rate of 2G per day, becomes a space management
nightmare. It is always best to perform upfront capacity planning, and over
estimate your needs. It is not always easy to add disk space on the fly without
bring down a server, which is especially true of servers with local disk arrays not
SAN attached. If you have SAN storage, the process is more straightforward, but
comes with issues as well. Also, consider if the disk subsystem is using slower
SATA drives (often used for QA and Dev environments) or faster SCSI drives,
which are more expensive per Gig, to the tune of thousands of dollars.
87
3 – The migratory data
88
3 – The migratory data
Having specified the backup location, it is time to add the information for the
secondary database, where the source transaction logs will be applied. A nice
feature, when setting up log shipping for the first time, is the ability to let
Management Studio ready the secondary database for you, by performing the
initial backup and restore, as seen in Figure 3.19. The secondary database, non-
creatively enough, is DBA_Rep1.
Most often, you will want to transfer the log files to a different server from the
source. By selecting the "Copy Files" tab in Figure 3.19, you can specify where the
copied transaction log backup files will reside for the subsequent restores to the
target database. However, in our simple example, both the backup and restore will
reside on the same server, as shown in Figure 3.20.
89
3 – The migratory data
Figure 3.20: Setting up Copy Files options for Transaction Log Shipping.
90
3 – The migratory data
The next and final, "Restore Transaction Log", tab is very important. This is
where you set the database state to either "No Recovery" or "Standby" mode. You
will want to use Standby mode if you are planning on using the target database as
a reporting database, while still allowing subsequent logs to be applied. The other
important option is "Disconnect users in the database when restoring backups",
seen in Figure 3.21. Without this important option, the log restore would fail
because the database would be in use.
Once all of the backup and restore options are set, you can choose whether or not
you want to use the log shipping monitoring service. Essentially, this is an alerting
mechanism in case there are any issues with the log shipping process. I do not
typically set up the monitoring service, though it may be useful in your
environment. Once you are happy with the backup and restore options, select
OK, and everything else will be done for you, including backing up and restoring
the source and target databases, and setting up all SQL Agent jobs to backup,
copy and restore the transaction logs on an automated schedule. Figure 3.22
shows the completion of these steps.
91
3 – The migratory data
With log shipping setup and configured for Standby mode, you have conquered
two very important DBAs tasks:
• Separating source data from transaction data for reporting to reduce the
risk of contention with online processes on production
• Assuring a secondary backup of the source data in case there is disaster.
As I mentioned earlier, however, there are downsides to log shipping, such as the
difficulty in creating indexes on the target and assigning specific permissions to
users (both hard to do when the database is read only).
There is one final trick I will leave you with for log shipping and security. You can
assign a login and user on the source, so that the user is created in the database,
and then delete the login on the source but not the database user. Next, create the
login on the target system, preferably a Windows account which will always sync
up login to user. If it is a SQL authenticated account you are trying to align on the
target, you will need to insure that the account Security ID (SIDs) are the same.
This is where you will want to use the ultra-handy sp_help_revlogin stored
procedure (http://support.microsoft.com/kb/246133). Because user permission
assignment is a logged transaction in the source database, it will move with the
next log restore and the user and login on the target system will align. Thus, you
have no access on the source and the access you desire on the target.
92
3 – The migratory data
Summary
In this chapter, we covered several tools that will facilitate the migration of data
from a source to a target, or multiple targets. Data is moved for several reasons,
the main ones being either for Disaster Recovery, High Availability, or to offload
reporting from the source to increase performance of an application. There are as
many reasons to move data as there are ways and means. Fortunately, you and I,
as DBAs, can make informed decisions that will ultimately equate to cost savings
for the companies we work for. Speaking of saving money, the next chapter is
devoted to storing all of this migratory data. It is sometimes challenging to
capacity plan for new projects, and even more challenging, as your SQL
infrastructure grows, to force adherence to standards that would mitigate many
storage issues. I will show you how I try to do this daily, in the next compartment
of our SQL Server tacklebox.
93
CHAPTER 4: MANAGING DATA
GROWTH
When I look back over my career as a SQL Server DBA, analyzing the kinds of
issues that I have had to resolve, usually under pressure, nothing brings me out in
a colder sweat than the runaway data, log or TempDB file. I would estimate that
for every time I've had to deal with an emergency restore, including point in time
restores using transaction log backups, I've probably had to deal with a hundred
disk capacity issues. Overall, I would estimate that such issues account for around
80% of the problems that a DBA team faces on a weekly basis.
Occasionally, the cause of these space issues is just poor capacity planning. In
other words, the growth in file size was entirely predictable, but someone failed to
plan for it. Predictable growth patterns are something that should be analyzed
right at the start, preferably before SQL Server is even installed. In my experience,
though, these space issues are often caused by bugs, or failure to adhere to best
practices.
In this chapter, I'll delve into the most common causes of space management
issues, covering model database configuration, inefficient bulk modifications,
indexes and TempDB abuse, and how to fix them. I will finish the chapter by
describing a query that you should store securely in your the SQL Server
tacklebox, SizeQuery. I use this query on more or less a daily basis to monitor
and track space utilization on my SQL Server instances. Used in conjunction with
the DBA repository to query multiple SQL Servers, it has proved to be an
invaluable reporting tool.
I have given a name to the time in the morning at which a DBA typically staggers
in to work, bleary eyed, having spent most of the previous night shrinking log files
and scouring disks for every precious Gigabyte of data, in order to find enough
space to clear an alert. That name is DBA:M (pronounced D-BAM), and it's
usually around 9.30AM. My main goal with this chapter is to help fellow DBAs
avoid that DBA:M feeling.
94
4 – Managing data growth
95
4 – Managing data growth
In SQL Server storage terms, 1024K is 128 pages; pages are stored in 8K blocks.
For applications that are going to potentially load millions of records, growing the
data file of a database every 128 pages incurs a large performance hit, given that
one of the major bottlenecks of SQL Server is I/O requests.
96
4 – Managing data growth
Figure 4.1: Initial sizes and growth characteristics for the model database
data and log files.
Rather than accept these defaults, it is a much better practice to size the data file
appropriately at the outset, at say 2G. The same advice applies for the log file.
Generally, growth based on a percentage is fine until the file reaches a threshold
where the next growth will consume the entire disk. Let's say you had a 40G log
file on a 50G drive. It would only take two 10% growths to fill the disk, and then
the alerts go out and you must awake, bleary-eyed, to shrink log files and curse the
Model database.
Coupled with the previously-described file growth characteristics, our databases
will also inherit from the default model database a recovery model of Full.
Transactions in the log file for a Full recovery database are only ever removed
from the log upon a transaction log backup. This is wonderful for providing point
in time recovery for business critical applications that require Service Level
Agreements (SLAs), but it does mean that if you do not backup the transaction
log, you run the risk of eventually filling up your log drive.
If you have a database that is subject to hefty and /or regular (e.g. daily) bulk
insert operations, and you are forcing the data file to be incremented in size
regularly, by small amounts, then it's likely that the performance hit will be
significant. It is also likely that the size of your log file will increase rapidly, unless
you are performing regular transaction log backups.
97
4 – Managing data growth
To find out how significant an impact this can have, let's take a look at an
example. I'll create a database called All_Books_Ever_Read, based on a default
model database, and then load several million rows of data into a table in that
database, while monitoring file growth and disk I/O activity, using Profiler and
PerfMon, respectively. Loading this amount of data may sound like an extreme
case, but it's actually "small fry" compared to many enterprise companies, that
accumulate, dispense and disperse Terabytes of data.
NOTE
I just happen to own a file, Books-List.txt, that allegedly contains a listing of all
books ever read by everyone on the planet Earth, which I'll use to fill the table.
Surprisingly the file is only 33 MB. People are just not reading much any more.
The first step is to create the All_Books_Ever_Read database. The initial sizes of
the data and log files, and their growth characteristics, will be inherited from the
Model database, as described in Figure 4.1. Once I've created the database, I can
verify the initial data (mdf) and log file (ldf) sizes are around 3 and 2 MB
respectively, as shown in Figure 4.2.
Figure 4.2: Data and log files sizes prior to data load.
The next step is to back up the database. It's important to realize that, until I have
performed a full database backup, the log file will not act like a typical log file in a
database set to Full recovery mode. In fact, when there is no full backup of the
database, it is not even possible to perform a transaction log backup at this point,
as demonstrated in Figure 4.3.
Until the first full backup of the database is performed, this database is acting as if
it is in Simple recovery mode and the transaction log will get regularly truncated at
98
4 – Managing data growth
checkpoints, so you will not see the full impact of the data load on the size of the
log file.
With the database backed up, I need to set up Profiler and PerfMon so that I can
monitor the data load. To monitor auto growth behavior using Profiler, simply
start it up, connect to the SQL Server 2008 instance that holds the
All_Books_Ever_Read database, and then set up a trace to monitor Data and
Log file Auto Grow events, as shown in Figure 4.4.
Figure 4.4: Setting SQL Server Profiler to capture data and log file growth.
99
4 – Managing data growth
With all monitoring systems a go, I am ready to load up a heap table called
book_list that I created in the All_Books_Ever_Read database. The Books-
List.txt file has approximately 58 thousand records, so I'm going to use the BCP
batch file technique (see Listing 3.3, in Chapter 3) to iterate through the file 50
times, and load 2.9 million records into the database. Now it is time to begin the
load. A quick peek at Perfmon, see Figure 4.6, shows the current absence of
activity prior to executing a hefty query.
Executing Load … now! Please don't turn (or create) the next page …!!
Sorry! I could not resist the Sesame Street reference to The Monster at the End of
This Book. In fact, the load proceeds with little fanfare. Imagine this is being done
in the middle of the afternoon, perhaps after a big lunch or, worse, early in the
AM (DBA:M most likely) before your second sip of coffee, with you blissfully
unaware of what's unfolding on one of your servers. Figure 4.7 shows the BCP
bulk insert process running.
100
4 – Managing data growth
You can see that the batch process ran 50 times at an average of 2.5 seconds a run,
with a total load time of roughly 2 minutes. Not bad for 2.9 million records. Now
for the bad news: Figure 4.8 shows how much growth can be directly attributed to
the load process.
Figure 4.8: Log file growth loading millions of records into table.
NOTE
For comparison, in a test I ran without ever having backed up the database,
the data file grew to over 3 GB, but the log file grew only to 150 MB.
101
4 – Managing data growth
Both the data file and the log file have grown to over 3GB. The Profiler trace, as
shown in Figure 4.9, reveals that a combined total of 3291 Auto Grow events took
place during this data load. Notice also that the duration of these events, when
combined, is not negligible.
Figure 4.9: Data and log file growth captured with Profiler.
Finally, Figure 4.10 shows the Perfmon output during load. As you can see, %
Disk Time obviously took a hit at 44.192 %. This is not horrible in and of itself;
obviously I/O processes require disk reads and writes and, because "Avg Disk
Queue Length" is healthily under 3, it means the disk is able to keep up with the
demands. However, if the disk being monitored has a %DiskTime of 80%, or
more, coupled with a higher (>20) Avg Disk Queue Length, then there will be
performance degradation because the disk can not meet the demand. Inefficient
queries or file growth may be the culprits.
Average and Current Disk Queue Lengths are indicators of whether or not
bottlenecks might exist in the disk subsystem. In this case, an Average Disk Queue
Length of 1.768 is not intolerably high and indicates that, on average, fewer than 2
requests were queued, waiting for I/O processes, either read or write, to complete
on the Disk.
What this also tells me is that loading 2.9 million records into a heap table,
batching or committing every 50,000 records, and using the defaults of the Model
database, is going to cause significant I/O lag, resulting not just from loading the
data, but also from the need to grow the data and log files a few thousand times.
Furthermore, with so much activity, the database is susceptible to unabated log file
growth, unless you perform regular log backups to remove inactive log entries
from the log file. Many standard maintenance procedures implement full backups
for newly created databases, but not all databases receive transaction log backups.
This could come up to bite you, like the monster at the end of this chapter, if you
forget to change the recovery model from Full to Simple, or if you restore a
database from another system and unwittingly leave the database in Full recovery
mode.
103
4 – Managing data growth
As you can see, the Book_List table is using all 3.3 GB of the space allocated to
the database for the 2.9 million records. Now simply issue the TRUNCATE
command.
Truncate Table Book_List
And then rerun sp_spaceused. The results are shown in Figure 4.12.
You can verify that the data file, although now "empty", is still 3.3GB in size using
the Shrink File task in the SSMS GUI. Right click on the database, and select
"Tasks |Shrink | Files". You can see in Figure 4.13 that the
All_Books_Ever_Read.mdf file is still 3.3 GB in size but has 99% available free
space.
What this means to me as a DBA, knowing I am going to load the same 2.9
million records, is that I do not expect that the data file will grow again. Figure
4.14 shows the command window after re-running the BCP bulk insert process,
superimposed on the resulting Profiler trace.
104
4 – Managing data growth
Figure 4.13: Free space in data file after truncate table statement.
This time there were no Auto Grow events for the data file, and only 20 for the
log file. The net effect is that the average time to load 50,000 records is reduced
from 2.5 seconds to 1.3 seconds. A time saving of just over 1 second per load may
not seem significant at first, but consider the case where the same process
normally takes an hour. Just by ensuring log and data growth was controlled, you
have cut the process down to under 30 minutes, and saved a lot of I/O processing
at the same time.
changing the recovery models while bulk loading data. For instance, you
will be unable to perform a point-in-time recovery for the bulk
transactions.
If you fail to plan properly, or are simply subject to unexpected and unpredictable
file growth, what does this mean for the DBA?
Suppose a database has been inadvertently set to Full recovery with no log
backups. The log file has gown massively in size and, ultimately, the drive will run
out of space. If you are lucky enough, as I am to have an alerting system (see
Chapter 6), the problem will be caught before that happens and I will get an alert,
predictably at 2:30 AM when I have just gone to bed after resolving a different
issue.
What I do in such situations, after cursing myself or other innocent people on my
team for not catching this sooner, is to issue the following simple statement:
BACKUP LOG <databasename> WITH Truncate_Only
This statement has the net effect of removing all of the inactive transactions from
the log file that would have otherwise been removed with a standard log backup.
Next, I shrink the log file via the GUI (or, if I am not too tired, with code) and
then change the recovery model to Simple and go back to bed. Doing this will
generally reclaim the necessary disk space to clear all alerts, and ensure that no
further log growth will ensue. You can use DBCC to physically shrink a data or
log file, as follows:
DBCC SHRINKFILE (filename, target_size)
Many of the situations that require you to shrink a log file can be avoided simply
by planning accordingly and being diligent and fastidious in your installation
process (see Chapter 1), in particular by making sure the model database is always
set to Simple and not Full recovery mode. It only needs to happen to you once or
twice. I quote George W. Bush, "Fool me once … shame on … shame on you …
Fool me can't get fooled again."
Take that, SQL Server Model Database.
107
4 – Managing data growth
obliged to point out the specifics of why queries will and will not benefit from the
indexes that the developers suggest.
Often, these index recommendations come from sources like the Database Tuning
Advisor (DTA), so we DBAs often eschew them in favor of our own. I do not
mean to seem high-minded on this point, my DBA nose pointed straight up in the
air. However, rightly or wrongly, DBAs want to control the types of objects
(triggers, temp tables, linked servers, and so on) that are added to their servers,
and indexes are just another type of object that DBAs must understand, manage
and maintain.
I am all in favor of a clustered index on almost every table, backed by a healthy
volume of covering non-clustered indexes, but I also know from experience that
indexes, for all their good, will only be utilized when proper code is executed that
will take advantage of them. It is always worthwhile to explain to SQL developers
why their queries do not perform as they expect, with their proposed indexes.
In this section, I am going to add indexes to the Book_List table in order to find
out:
• How much extra space is required in order to add a clustered index to a
table containing 2.9 million rows.
• Whether this space consumption is justified, by examining the proposed
queries that intend to take advantage of the indexes.
Let's first get a "before" glimpse of space utilization in our Book_List table, using
the sp_spaceused stored procedure, as shown in Figure 4.15. Notice the 8K of
index size.
Before I can add a clustered index, I need to add an identity column, called
Read_ID, on which to place the clustered index. Adding the identity column is, in
itself, an expensive task for 2.9 million records. The code is as follows:
ALTER TABLE Book_list ADD
Read_ID INT IDENTITY
We can now create the clustered index on this Read_ID column, as shown in
Listing 4.1.
108
4 – Managing data growth
USE [All_Books_Ever_Read]
GO
CREATE UNIQUE CLUSTERED INDEX [Read_ID] ON [dbo].[Book_List] (
[Read_Date] ASC )
WITH (
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
GO
As you can see from Figure 4.16, building a clustered index on almost 3 million
records takes some time and processing power.
109
4 – Managing data growth
Also, it should be noted that users will be unable to connect to the Book_List
table for the duration of the index build. Essentially, SQL Server has to physically
order those millions of records to align with the definition of the clustered index.
Let's see what the index took out of my hide by way of space. The former index
space for this table was 8K and data space was over 3 Gig. What does
sp_spaceused tell me now? See Figure 4.17.
Figure 4.17: Building the clustered index has increased the index_size to
5376KB.
An increase in index_size to 5376K does not seem too significant. When you
create a clustered index, the database engine takes the data in the heap (table) and
physically sorts it. In the simplest terms, both a heap and a clustered table (a table
with a clustered index) both store the actual data, one is just physically sorted. So,
I would not expect that adding a clustered index for the Read_ID column to cause
much growth in index_size.
However, while the data size and index size for the Book_List table did not grow
significantly, the space allocated for the database did double, as you can see from
Figure 4.18.
110
4 – Managing data growth
Figure 4.18: Creating the clustered index caused the data file to double in
size.
So not only did the index addition take the table offline for the duration of the
build, 12 minutes, it also doubled the space on disk. The reason for the growth is
that SQL Server had to do all manner of processing to reorganize the data from a
heap to a clustered table and additional space, almost double, was required to
accommodate this migration from a heap table to a clustered table. Notice,
though, that after the process has completed there is nearly 50% free space in the
expanded file.
The question remains, did I benefit from adding this index, and do I need to add
any covering non-clustered indexes? First, let's consider the simple query shown in
Listing 4.2. It returns data based on a specified range of Read_ID values (I know I
have a range of data between 1 and 2902000 records).
Select book_list.Read_ID,
book_list.Read_Date,
book_list.Book,
111
4 – Managing data growth
book_list.Person
from book_list
where Read_Id between 756000 and 820000
This query returned 64,001 records in 2 seconds which, at first glance, appears to
be the sort of performance I'd expect. However, to confirm this, I need to
examine the execution plan, as shown in Figure 4.19.
Figure 4.19: Beneficial use of clustered index for the Book_list table.
You can see that an Index Seek operation was used, which indicates that this index
has indeed served our query well. It means that the engine was able to retrieve all
of the required data based solely on the key values stored in the index. If, instead,
I had seen an Index Scan, this would indicate that the engine decided to scan every
single row of the index in order to retrieve the ones required. An Index Scan is
similar in concept to a table scan and both are generally inefficient, especially
when dealing with such large record sets. However, the query engine will
sometimes choose to do a scan even if a usable index is in place if, for example, a
high percentage of the rows need to be returned. This is often an indicator of an
inefficient WHERE clause.
Let's say I now want to query a field that is not included in the clustered index,
such as the Read_Date. I would like to know how many books were read on July
24th of 2008. The query would look something like that shown in Listing 4.3.
Select count(book_list.Read_ID),
book_list.Read_Date
from book_list
where book_list.Read_Date between '07/24/2008 00:00:00'
and '07/24/2008 11:59:59'
Group By book_list.Read_Date
112
4 – Managing data growth
Executing this query, and waiting for the results to return, is a bit like watching
paint dry or, something I like to do frequently, watching a hard drive defragment.
It took 1 minute and 28 seconds to complete, and returned 123 records, with an
average count of the number of books read on 7/24/2008 of 1000.
The execution plan for this query, not surprisingly, shows that an index scan was
utilized, as you can see in Figure 4.20.
What was a bit surprising, though, is that the memory allocation for SQL Server
shot up through the roof as this query was executed. Figure 4.21 shows the
memory consumption at 2.51G which is pretty drastic considering the system only
has 2G of RAM.
The reason for the memory increase is that, since there was no available index to
limit the data for the query, SQL Server had to load several million records into
the buffer cache in order to give me back the 123 rows I needed. Unless you have
enabled AWE, and set max server memory to 2G (say) less than total server
memory (see memory configurations for SQL Server in Chapter 1), then the
server is going to begin paging, as SQL Server grabs more than its fair share of
memory, and thrashing disks. This will have a substantial impact on performance.
If there is one thing that I know for sure with regard to SQL Server configuration
and management, it is that once SQL Server has acquired memory, it does not like
to give it back to the OS unless prodded to do so. Even though the query I ran
113
4 – Managing data growth
has completed many minutes ago, my SQL Server instance still hovers at 2.5G of
memory used, most of it by SQL Server.
It's clear that I need to create indexes that will cover the queries I need to run, and
so avoid SQL Server doing such an expensive index scan. I know that this is not
always possible in a production environment, with many teams of developers all
writing their own queries in their own style, but in my isolated environment it is an
attainable goal.
The first thing I need to do is restart SQL Server to get back down to a
manageable level of memory utilization. While there are other methods to reduce
the memory footprint, such as freeing the buffer cache (DBCC
DROPCLEANBUFFERS), I have the luxury of an isolated environment and restarting
SQL Server will give me a "clean start" for troubleshooting. Having done this, I
can add two non-clustered indexes, one which will cover queries on the Book field
and the other the Read_Date field.
Having created the two new indexes, let's take another look at space utilization in
the Book_List table, using sp_spaceused, as shown in Figure 4.22.
The index_size has risen from 5MB to 119MB, which seems fairly minimal, and
an excellent trade-off assuming we get the expected boost in the performance of
the read_date query.
If you are a DBA, working alongside developers who give you their queries for
analysis, this is where you hold your breath. Breath held, I click execute. And …
the query went from 1 minute 28 seconds to 2 seconds without even a baby's burp
in SQL Server memory. The new execution plan, shown in Figure 4.23, tells the
full story.
114
4 – Managing data growth
So, while indexes do indeed take space, this space utilization is usually more than
warranted when they are used correctly, and we see the desired pay-off in query
performance.
The issue with indexes arises when development teams adopt a scattergun
approach to indexes, sometimes to the point of redundancy and harm to the
database. Adding indexes arbitrarily can often do as much harm as good, not only
because of the space that they take up, but because each index will need to be
maintained, which takes time and resources.
TempDB
No DBA who has been working with SQL Server for long will have been immune
to runaway TempDB growth. If this growth is left unchecked, it can eventually fill
up a drive and prohibit any further activity in SQL Server that also requires the use
of the TempDB database.
SQL Server uses the TempDB database for a number of processes, such as sorting
operations, creating indexes, cursors, table variables, database mail and user
defined functions, to name several. In addition to internal processes, users have
the ability to create temporary tables and have free reign to fill these tables with as
much data as they wish, assuming that growth of the TempDB data file is not
restricted to a specific value, which by default it is not.
I do not recommend restricting growth for TempDB files, but I do recommend
that you be aware of what will happen if TempDB does fill up. Many SQL Server
processes, including user processes, will cease and an error message will be
thrown, as I will show.
The TempDB database is created each time SQL Server is restarted. It is never
backed up nor can it be. It is always in Simple mode and the recovery model
cannot be changed.
115
4 – Managing data growth
There are a couple of TempDB "properties", though, that you can and should
change when configuring your server:
• Its location
• Its autogrowth rate
By default, TempDB is created in the default data folder, which is set during SQL
installation. It is highly recommended that, if possible, this location be changed so
that TempDB resides on its own disk. Many DBAs also create multiple TempDB
files, typically one per processor, with the aim of boosting performance still
further. However, be warned that you will need to spread the load of these
multiple files across multiple disks, in order to achieve this.
Like all other databases, TempDB adopts the default configuration of the model
database, which means that it will grow in 10% increments with unrestricted
growth, unless you specify otherwise. In my opinion, having an autogrowth of
10% on TempDB is a bad idea because when rogue queries hit your server, calling
for temporary tables, as they will do eventually, you do not want the TempDB
database filling up the drive. Let's assume that you have a 30G TempDB database
sitting on a 50G drive and autogrowing in 10% (i.e. 3G) increments. It would take
only 6 growth events to fill the drive. Ideally, you will want to set a fixed growth
rate of 3G for TempDB and use multiple TempDB data files across multiple
disks.
When loading multiple tens of millions of records into TempDB, bearing in mind
that 1 million records is roughly equivalent to 1G, you can see how this can
happen fairly easily. So, what happens when TempDB fills up? Let's find out!
I'd have to generate a lot of TempDB activity to fill up 50GB of disk, so I am
going to artificially restrict the data file for TempDB to a size of 200 MB, via the
"maximum file size" property. Figure 4.24 shows the configuration.
116
4 – Managing data growth
Figure 4.24: Changing the TempDB maximum file size to 2 Gigabytes for
simulation.
Now that I've set the maximum file size for TempDB, it is time to fill it up and for
that I will turn to our old friend, the endless loop. I have seen only a few of these
in the wild but they do exist, I promise, and when you combine an endless loop
with data or log space limitation, something has to give. Listing 4.4 shows the
loopy code.
CREATE TABLE #HoldAll
(
Read_ID INT,
Read_Date DATETIME,
Person VARCHAR(100)
)
GO
DECLARE @cnt int = 1
WHILE @cnt = 1
BEGIN
117
4 – Managing data growth
Notice that @cnt is given the value of 1, but nowhere subsequently is the value
changed, so this query will run and run until it fills up a drive or surpasses a file
size threshold, whichever comes sooner. In this example, the query runs for 3
minutes before we hit the 200MB file size limit, as shown in Figure 4.25, and get
an error that the filegroup is full.
At this point the query fails, obviously, as will any other queries that need to use
TempDB. SQL Server is still functioning properly, but as long as the temp table
#HoldAll exists, TempDB will stay filled.
Hopefully, you've got notifications and alerts set up to warn you of the imminent
danger, before the file actually fills up (I will cover notifications, alerts and
monitoring in depth in Chapter 6). In any event, you are likely to experience that
DBA:M feeling, having spent half the night trying to track down the problem
query and resolve the issue.
Your three options, as a DBA, are to:
• Restart SQL Server.
• Try to shrink the TempDB database.
• Find the errant query and eradicate it.
Generally speaking, restarting is not always an option in a production system.
Shrinking TempDB is a valid option, assuming that it can be shrunk. Sometimes,
when there are open transactions, it is not possible. Therefore, finding and killing
the offending query is the more likely course of action. The techniques you can
use to do this are the focus of the very next chapter, on Troubleshooting.
For now, I am going to simply close the query window which should force the
temp table to be deleted and so allow the shrink operation to go ahead. Sure
118
4 – Managing data growth
enough, once I'd closed the connection I was able to select Tasks | Shrink
|Database from within SSMS, and so shrink TempDB from 200 MB back down
to its original size of 8K. Problem solved.
Now, back to bed with a sleepy note to self to find the developer who wrote this
code, and chastise him or her. Wait, I am the DBA who let this get into
production in the first place, so new list … chastise self, get back to sleep, find the
developer tomorrow and chastise him or her anyway; if they ask how it got into
production … change subject.
)
ON [PRIMARY]
119
4 – Managing data growth
Select @@Servername
print '' ;
Select rtrim(Cast(DatabaseName as varchar(75))) as
DatabaseName,
Drive,
Filename,
Cast(Size as int) AS Size,
Cast(MBFree as varchar(10)) as MB_Free
from #HoldforEachDB_size
INNER JOIN #fixed_drives ON
LEFT(#HoldforEachDB_size.Filename, 1) = #fixed_drives.Drive
GROUP BY DatabaseName,
Drive,
MBFree,
Filename,
Cast(Size as int)
ORDER BY Drive,
Size Desc
print '' ;
Select Drive as [Total Data Space Used |],
Cast(Sum(Size) as varchar(10)) as [Total Size],
Cast(MBFree as varchar(10)) as MB_Free
from #HoldforEachDB_size
120
4 – Managing data growth
You can see that the All_Books_Ever_Read database has 6.4G of allocated space
on the C: drive. Since my sample databases reside only on the C: drive, all
allocation is for this drive. However, if I were to have my log files on E: and
TempDB on F:, for example, then query output would show the breakdown for
each drive that actually stores any database file. You can see there is 61G free on
the C: drive and of that 11G consists of database files.
121
4 – Managing data growth
Summary
In this chapter, I have explored some of the scenarios where disk space is
consumed by processes, in many cases because of incorrect configurations for
recovery models, data growth for large objects and queries that overtax TempDB
resources. Many of these scenarios can be avoided with proper planning.
However, it can be expected that, at some point, there will arise a situation that
requires the DBA team to jump in and rescue the SQL Server.
When this happens, and it happens quite frequently, DBAs need to have an
arsenal of troubleshooting tools at their disposal. In the next chapter I am going to
introduce some of the tools and techniques that I have used to quickly
troubleshoot common problems that crop up for DBAs.
Hey, there was no monster at the end of this chapter after all. Surely it will be in
the next chapter.
122
CHAPTER 5: DBA AS DETECTIVE
If you consider it fun to find and fix SQL Server problems then I can say without
fear of contradiction that this chapter is going to come at you in a clown suit.
I always feel better at the end of the day if I've been able to isolate a problem and
offer a fix. Being a SQL Server DBA, overseeing terabytes of critical business data,
can be both highly stressful and highly rewarding. Frightening? Yes, like a horror
movie with suspect code lurking in every shadow. Fulfilling? Absolutely, when you
discover that you are only one temp table or sub-query away from being the day's
hero or heroine.
This chapter is all about sleuthing in SQL Server, peeling back layer after layer of
data until you've uncovered the bare metal of the problem. It can be both fun and
painstaking. Words like "Deadlock" and "Victim" are common, so we must tread
with care through this twilight world. And, if worse comes to worse, we may have
to "Kill" something. These murderous tendencies in a DBA make many, mainly
developers, fearful to approach us. They creep up to our cubicle and tempt us
with their feigned courtesy; "Can you please kill me?" they ask expectantly.
"Absolutely" is our reply.
123
5 – DBA as detective
Using sp_who2
The first troubleshooting tool in every DBA's tackle box is the tried-and-true
stored procedure, sp_who2. Granted there is Activity Monitor, which is also quite
handy, but I have found that there are two things wrong with Activity Monitor.
Firstly, when the server is heavily burdened with locks or temporary tables,
Activity Monitor often cannot be launched, and you generally receive an error
message to this effect. Secondly, Activity Monitor for SQL Server 2008 is radically
124
5 – DBA as detective
I can tell at first glance that SPID 55 is blocked by SPID 51, and that SPID 54 is
blocked by 55. I can also see that the database context of the blocking SPID is the
DBA_Rep database, which ironically and for argument's sake is the same database
that the fictitious A.R.G application uses.
125
5 – DBA as detective
With sp_who2, I have discovered a blocking process and it has been blocking for
quite some time now. Users are getting frantic, and soon this will escalate and
there will be three or four people at my cubicle, who otherwise would not give the
SQL Server infrastructure a second glance, laser beam focused on my every action,
and fully expecting me to solve the problem quickly.
In order to do so, I am going to have to find fast answers to the following
questions:
• Who is running the query and from where?
• What is the query doing?
• Can I kill the offending query?
• If I kill the query, will it rollback successfully and will this free up the
blocked processes?
126
5 – DBA as detective
As you can see the output lacks formatting when returned in a grid format. I could
expand the EventInfo field to get a better look at the query, but it would still lack
proper formatting. Returning the results to text, which is simply a matter of
clicking the "Results to Text" button on the Management Studio toolbar, usually
delivers better results, as shown in Figure 5.3.
Clearly, someone has been tasked with filling the Important_Data table (shown
in Listing 5.1 for those who want to work through the example) with values and
will do whatever it takes to get the job done!
127
5 – DBA as detective
128
5 – DBA as detective
Let's take a look at this "Bad Query" in all its ugly glory, as shown in Listing 5.2.
BEGIN Tran T_Time
END
Exec xp_cmdshell 'C:\Windows\notepad.exe'
If I saw this query on a real system, my concern would begin to build at around
line 15, and by line 24 I think I would be a bit red-faced. At line 29, where I see
the query call xp_cmdshell and execute Notepad.exe, I would need a warm
blankie and soft floor where I would lie in a fetal position for a few hours thinking
about happy things.
Of course, at this stage I should make it clear that this query is an exercise in the
ridiculous; it is one that I specifically designed to cause locking and blocking so
that I could demonstrate how to resolve similar issues on your servers. The "bad
query" is not the work of a reasonable person but that does not mean that
something similar will never occur on one of your servers (although it would
probably never occur twice). Wrapped in a transaction called T_Time, it inserts
129
5 – DBA as detective
one row at a time, 1,000 times, into the Important_Data table, based on random
patterns for T_Desc and T_Back. It does this insert every 1 second. While doing
so, it explicitly locks out the Important_Data table using a table hint (XLock) so
that no other query can access the Important_Data table until it is complete,
which will not be until 1020 seconds, or 17 minutes, passes.
Finally, we have the heinous call to xp_cmdshell. Again, one would think that no
one would really do this in the real world. Unfortunately, I know for a fact that
some developers make liberal use of xp_cmdshell. Sometimes, it is the path of
least resistance to kicking off another process that will return a value to the calling
query. But what if, at some point, the expected value is not returned and a
dialogue box appears instead, awaiting user input? Suffice it to say that it would be
very bad, but I am getting ahead of myself. All we need to know right now is that,
for the sake of our example, this query is "happening" and I do not have a few
hours or soft floor, and the warm blankie was wrenched from my grasp by my
boss who is standing over me. So, it is best to just proceed ahead to resolution.
And that is it, right? If you issue this command in SSMS, you will receive the usual
reassuring "success" message, as shown in Figure 5.4.
130
5 – DBA as detective
However, that message can be misleading. In some cases, the SPID will indeed be
killed but there may be a significant time lag while the offending statement is
being rolled back. An option of the KILL command exists that I was not aware of
at one point in my career, and that is WITH STATUSONLY. After killing a SPID you
can issue the KILL command again with this option and get a status of how long
SQL Server estimates that a ROLLBACK will take. If you have been rebuilding an
index for 10 minutes, for example, and kill that process, you can see the "%
completion of rollback" counting up to 100%.
In other cases, it may be that, despite issuing the KILL command, the SPID will
not be dead at all and the blocking will still be evident. If you issue the KILL WITH
STATUSONLY command for the Bad Query, you will see something similar to
Figure 5.5.
As you can see, the SPID shows an estimated time rollback completion of 0%,
and an estimated time remaining for rollback of 0 seconds, indicating that it is not
going to be possible to kill this SPID directly. This situation can occur for the
reason that I foreshadowed earlier: the blocking process has kicked off another
process, such as an executable, and SQL Server is waiting, indefinitely, for that
other process to complete. The only way to kill the blocking SPID is either to
restart SQL Server or find and kill the executable that SQL Server is waiting for.
In this example, I know that the Bad Query launched Notepad.exe so I have a
head start. Figure 5.6 shows the culprit in Task Manager.
131
5 – DBA as detective
Remember that Notepad is only an example; this could have been any
other process that got called from xp_cmdshell and was waiting for user input
to finish.
All I should have to do is end the Notepad.exe process and the blocking
will be cleared and the resources freed. Notice that the user name for Notepad.exe
is SYSTEM. When SQL Server issued the command to the OS, via xp_cmdshell,
Notepad was launched as a System process, not as a user process.
Right-clicking Notepad.exe and selecting "End Process" finishes off the Notepad
executable, allowing SPID 51 to be killed, and all previously blocked processes to
move forward.
Any INSERT statements that were issued as part of the transaction, before
Notepad was executed, should be considered discarded, as Figure 5.7 shows.
132
5 – DBA as detective
Using sp_lock
Before I deliver a query that is going to automate the discovery of problem queries
(there I go foreshadowing again), I want to talk about another important
characteristic of poorly performing queries, namely their rampant use of resources.
It is very important to monitor usage of CPU and I/O resources and I will cover
those in great detail in the next chapter, on Performance Monitoring and
Notifications. However, here I want to focus on locking resources. While
sp_who2 gives you a good picture of processes that may be blocking other
processes, and some initial insight in to the resource utilization via CPU and Disk
I/O, it does not give you any details about the various locks that have been
acquired in order to execute the process.
Locking is a "normal" activity in SQL Server, in that it is the mechanism by
which SQL Server mediates the concurrent access of a given resource by several
"competing" processes. However, as a DBA you will come to recognize
certain locking behavior that is an immediate tell-tale sign of something being
intrinsically wrong.
133
5 – DBA as detective
Figure 5.9 shows the output of sp_lock for SPID 51, the Bad Query.
134
5 – DBA as detective
You can see that there are many locks acquired, mostly exclusive locks at the row
level, as indicated by the mode "X" and the type "RID". When I see one SPID
that has acquired this number of locks, especially exclusive locks, I get very
concerned that something is definitely not as it should be.
Often, a simple count of the locks and, more importantly, the types of locks for a
specific SPID, is enough to help me locate a poorly performing query, even if
there is no obvious blocking. Acquiring locks, just like acquiring connections,
requires memory resources and even shared locks, which may not block others
from accessing data, can sometimes have a major performance impact due to
memory or other resource pressures.
135
5 – DBA as detective
Listing 5.3 shows the query that will, in one fell swoop, find and report on blocked
and blocking processes and the number of locks that they are holding. First it
creates a temp table to store the output of sp_lock and then it lists all locked and
blocked processes, along with the query that each process is currently executing,
or that is waiting on resources before it can be executed.
SET NOCOUNT ON
GO
136
5 – DBA as detective
--If So Drop it
DROP TABLE #Hold_sp_lock
GO
CREATE TABLE #Hold_sp_lock
(
spid INT,
dbid INT,
ObjId INT,
IndId SMALLINT,
Type VARCHAR(20),
Resource VARCHAR(50),
Mode VARCHAR(20),
Status VARCHAR(20)
)
INSERT INTO #Hold_sp_lock
EXEC sp_lock
SELECT COUNT(spid) AS lock_count,
SPID,
Type,
Cast(DB_NAME(DBID) as varchar(30)) as DBName,
mode
FROM #Hold_sp_lock
GROUP BY SPID,
Type,
DB_NAME(DBID),
MODE
Order by lock_count desc,
DBName,
SPID,
MODE
137
5 – DBA as detective
Select Distinct
blocked,
'BLOCKING'
from master..sysprocesses
where blocked <> 0
SELECT TOP 1
@tSPID = bSPID,
@blkst = BLK_Status
from #Catch_SPID
WHERE bSPID > @tSPID
Order by bSPID
END
There is nothing overly complicated about this query. It is a base starting point
from which you can quickly analyze locking and blocking issues in SQL Server. In
the case of non-blocking locks, it will show you any query that is a potential issue
with regard to other resources such as memory or I/O.
Figure 5.10 shows the output of this query, captured while the "Bad Query"
was executing.
138
5 – DBA as detective
Notice the high lock count of 99 for SPID 51, the culprit query. The next output
section shows that, in this case, SPID 51 is indeed causing blocking, and the code
that the SPID is executing follows, as we have seen previously from DBCC
INPUTBUFFER.
In addition, the Automated Discovery Query also lists all of the blocked SPIDs
behind the main blocking SPID. Figure 5.11 shows the queries, in this case simple
select statements against the Important_Data table, which are blocked by
SPID 51.
139
5 – DBA as detective
You might decide that you would like to take this query, and make it into a stored
procedure. You can then load it into a maintenance database on each server so
that you have it always available. It also means that you can parameterize it to
control its behavior. For example, you may decide that you do not want to execute
the portion of the query that counts locks, which on a very busy system could take
quite a bit of time.
Listing 5.4 shows the code to create this stored procedure, named
usp_Find_Problems,with a flag to execute the lock count portion based on
need.
USE [DBA_Rep]
GO
/****** Object: StoredProcedure [dbo].[usp_Find_Problems]
Script Date: 06/22/2009 22:41:37 ******/
140
5 – DBA as detective
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Get_Blocks:
141
5 – DBA as detective
bSPID INT,
BLK_Status CHAR(10)
)
SELECT TOP 1
@tSPID = bSPID,
@blkst = BLK_Status
FROM #Catch_SPID
WHERE bSPID > @tSPID
ORDER BY bSPID
END
END
142
5 – DBA as detective
Summary
In this chapter I demonstrated how I go about detecting SQL Server problems in
the form of excessive locking, and blocking. While this is a good start for the
DBA detective, there is much more ground to cover. I mentioned CPU and I/O
in this chapter only peripherally, as it relates to problem code. In the next chapter,
I will continue on the path of analyzing performance issues, but will extend the
topic to explain how to make sure you get notified immediately of performance,
and other, issues.
After all, if you do not know about the problem, you can't fix it. I would much
rather be notified of a potential issue from a system that is monitoring such events
than from an irate application user, or from the Help Desk. Granted, you will be
hard pressed to totally escape emails from users and that is OK, generally they are
understanding. It is their bosses that are not. If you can find and fix, or even just
143
5 – DBA as detective
report, an issue before anyone else, it appears that you are ahead of the game. And
you are … you are a DBA after all. Now let's delve into performance monitoring
and notifications for SQL Server before someone beats us to it.
144
CHAPTER 6: MONITORING AND
NOTIFICATIONS
As is probably clear by this stage, there are many potential monsters, lurking
around corners, waiting to pounce upon the unwary DBA as he goes about his
day-to-day duties. Often, however, the biggest problem is not the monster itself
but the fact that the DBA is unaware that it exists.
Imagine a problem as trivial as a SQL Agent Service that fails to start; a very easy
problem to fix once you know about it. But what if you don't know about it and
then suddenly find out that the backup process that this service was supposed to
be running has not been executed for over two weeks! The feeling at this moment
for a DBA, or DBA manager, is one of frustration and disbelief. These emotions
are quickly displaced however, perhaps after a few minutes alone with the warm
blankie and a soft floor, by an unswerving confidence. This confidence derives
from that fact that you know that positive steps will be taken to ensure that this
never happens again.
In this chapter, I will describe how I use monitoring tools and techniques to make
sure that my Blackberry will always buzz whenever a backup fails, a disk drive fills
up, or a rogue process is threatening the performance of a SQL Server.
When the inevitable happens, and the e-mail notification hits your mobile device,
probably at some awful hour of the morning, I'll show what you can do to easily
ascertain the problem and be notified, using a mix of third party tools, such as Red
Gate's SQL Response, and standard tools like Database Mail.
145
6 – Monitoring and notifications
something goes awry, so that they can respond to the event and resolve any issues
arising from it. There are many ways that DBAs can set up such notifications,
either using native SQL Server tools, such as Database Mail or SQL Agent Mail
(two separate entities), or a third party monitoring and notification system. There
are quite a number of such applications on the market.
In my career, I have generally employed a hybrid solution of both native and
third-party monitoring, because of the benefits and limitations of each. For
example, a third-party application may not be able to retrieve the internal error
messages of a failed SQL Agent database backup job. Conversely, it would be
impractical to set up performance monitoring, say for a sustained level of high
CPU utilization, for each instance of a 200-strong SQL Server farm, when a
centralized third party solution could easily be maintained.
In this chapter, I will expound upon the types of events that would require such a
monitoring and notification system to be in place and then give examples of
solutions for each.
146
6 – Monitoring and notifications
By the time we enter the unresponsive phase, it may be too late to glean what
nefarious query it was that caused the problem in the first place. What is needed,
in order to ensure a DBA's restful sleep, is an application that can monitor the
server, using a time-based algorithm, and fire a notification when a specific
threshold is crossed. For example, we may request a notification if CPU utilization
exceed 85% for a set number of minutes.
There are many such third-party applications that will handle this very well, Idera
Diagnostic Manager, Argent Guardian and Microsoft Operations Manager
(MOM) are a few that come to mind. I will show how to use one such application,
Red Gate's SQL Response, to trigger these performance alerts.
Further, once notified of the performance issue at hand, I will demonstrate how to
use two indispensable tools, Performance Monitor and SQL Profiler, to quickly
and easily analyze and resolve the problem.
Service availability
It should go without saying that a SQL service, such as SQL Server service (the
database engine) or SQL Server Agent service, stopping is an event to which the
sleepy DBA should be notified (I make it sound like these alerts always happen at
night; they don't. It's just the ones that do tend to stay with you).
So, how does SQL Server notify you that it is down? It doesn't. It is down, and so
cannot send a notification. This, again, would be the work of a third party
monitoring application. Such monitoring solutions should also have the ability to
take corrective action when encountering a stopped service, such as trying to
restart the service.
I'll show how to use a third party tool, such as SQL Response, to monitor these
services, and also how to configure the SQL services to have some resilience when
they stop unexpectedly.
Enabling notifications
Enabling notifications in SQL Server is a straightforward process that simply
entails setting up a mechanism by which to send the notification emails, and then
defining who it is who should receive them.
Setting up Database Mail in SQL Server 2005 or 2008 is very straightforward. You
just need to configure:
• The default profile that will be used to send mail
• An SMTP server address
• An account from which the mail will be sent
148
6 – Monitoring and notifications
Figure 6.1 shows the profile information from the Database Mail Configuration
Wizard, launched by double-clicking Database Mail under the Management tab
in SQL Server Management Studio.
149
6 – Monitoring and notifications
NOTE
You may notice that sp_send_dbmail is now located in the MSDB database,
whereas xp_sendmail, in versions prior to SQL Server 2005, was located in the
Master database.
150
6 – Monitoring and notifications
If all went well with the test, you should receive a test message similar to that
shown in Figure 6.3 (and yes, that is Outlook Express. I will not apologize).
151
6 – Monitoring and notifications
Setting up an operator
Having configured Database Mail and SQL Server Agent, the final step is to setup
an operator i.e. the person (or people) who will receive the messages from any
failures, either internally in the code or from the SQL Agent job. This can be a
single email address but it is far better to use a distribution list, such as
[email protected], so that every member of the team receives the
notification messages.
NOTE
Of course, you should use a single account to validate everything works, prior
to setting this server to production, so that other team members do not get
inundated with false alarms during testing.
152
6 – Monitoring and notifications
It's important that the whole team is aware of any errors or failures that occur,
even if they are not on-call. Generally, the on call DBA will have his or her mobile
device configured in such a way that a "fail" or "error" will cause a raucous and
marriage-damaging alert to scream forth when the fated message hits it, while
other DBAs can continue to sleep soundly, having set their devices to a phone-
only profile. However, it does also mean that if the on-call DBA does not
respond, for whatever reason, someone else can.
Setting up an operator is as easy as right-clicking Operators in SSMS and selecting
"New Operator". This opens the dialogue shown in Figure 6.5. Here is where you
will set the e-mail address for the operator.
153
6 – Monitoring and notifications
NOTE
It is important to remember to restart the SQL Agent Service after you enable
mail. Otherwise you may receive an error that states that attempt to send mail
failed because no email session has been established.
155
6 – Monitoring and notifications
useful information, as you can see from Figure 6.6, which states that a "non-
recoverable I/O error occurred".
Figure 6.6: Error mail message from Red Gate SQL Backup.
At this point, I know that it was an I/O error that caused the failure, and I can
respond by attempting to backup the database again, and then looking deeper into
the issue.
If this error were caused by lack of disk space , as it often is, I would need to free
up enough disk space on the destination to accommodate the backup file, and
then re-run the backup. The failure message also contains key words like "error"
that I can use to trigger a higher level alert on my mobile device, associated with a
really obnoxious ring tone of the sort you will want to avoid in theatres or quiet
dinners with your loved one.
However, what if this message did not get delivered for whatever reason … who
knows, we might have changed mail servers and forgotten to update the Red Gate
mail client properties. I still need to get a notification if the backup fails. You will
have seen in Listing 6.2 that I intentionally wrapped error checking around the
backup statement. If the backup script fails, it should report this fact to the calling
process, which is generally a SQL Agent job, which can then send a notification,
via Database Mail.
In order to enable this notification mechanism, we first need to create a SQL
Agent Job that will run our custom backup script. To do this, simply right-click on
"SQL Server Agent" in SSMS and select New | Job. In the "General" section,
remember to name the job, in my case "Backup Database Test Failure" and then
give the job a valid category, in this case "Database Maintenance". Use of
meaningful job categories is valuable for reporting. In the DBA Repository, for
example, I run job reports based on each category of job.
Next, in the "Steps" section, select "New" and paste in the backup code from
Listing 6.2, so that your step looks as shown in Figure 6.7.
156
6 – Monitoring and notifications
157
6 – Monitoring and notifications
158
6 – Monitoring and notifications
Now, let me check my mail. Yep, everything worked as expected, and I receive,
almost instantaneously, two separate email notifications; one from the SQL Agent
job telling me the job itself failed and the other more detailed mail comes from the
code inside the job, as shown side-by-side in Figure 6.10.
Figure 6.10: Two separate mail notifications from two separate sources.
Performance issues
If, over a sandwich, you were to ask a DBA to describe the performance issues
that he or she has faced during the preceding year, you will, as the lunch drifts into
the late afternoon, probably start to regret not being more focused in your line of
questioning. All DBAs are faced with all manner of performance issues. If you
were to ask, "What is the worst performance issue you had in the past year," you
will get a contemplative stare to the ceiling, hand on chin, eyes scanning and
finally, "Oh yeah, there was that one time when … I found code from a website
application that was flagrantly misusing MARS (Multiple Active Result Sets)
connections and instead of having an expected 300 logical connections, there were
over 8500, each simultaneously eating into available RAM until the server slowed
to a crawl."
In other words, the question is not if DBAs are going to face performance issues
in their environment, but what types of problems they are going to encounter, and
how they can deal with them.
159
6 – Monitoring and notifications
160
6 – Monitoring and notifications
Alternatively, you can acquire a third-party monitoring solution that is not only
SQL-centric, but can be centrally managed. Once such product is Red Gate's SQL
Response. Price is definitely a consideration when choosing a monitoring solution
and, in my opinion, SQL Response works out to be much more affordable than
rolling your own. This is not a sales pitch; it is just a demonstration of a product
that is available to DBAs and one that happens to have become part of my tackle
box for performance monitoring and diagnostics.
The main performance metrics that you will want to monitor are CPU, Memory
and I/O. As an example of a typical performance alert, I am going to configure an
alert in SQL Response to monitor CPU utilization and then run a query that
should push the CPU utilization above a set threshold. Figure 6.12 shows the SQL
Response alert configuration window.
For this demonstration, I have customized the standard "CPU utilization unusual"
alert so that should fire if CPU utilization exceeds 70% for at least 5 seconds. The
default is 90% for 10 seconds, which is more in line with what you would
normally use to trigger this alert.
161
6 – Monitoring and notifications
162
6 – Monitoring and notifications
Of course, you won't generally know what code is causing the issue beforehand,
so we'll also need a way to find out what code is causing the spike in resource
utilization, so that it can be fine tuned to use fewer resources and so speed
performance for other contending processes.
I execute the query and wait. As the query is executing, I can monitor CPU
utilization using the System Information application, which can be launched from
Sysinternals Process Explorer (http://technet.microsoft.com/en-
us/sysinternals/bb896653.aspx), as shown in Figure 6.13.
163
6 – Monitoring and notifications
You can certainly use Task Manager instead but I just happen to like the added eye
candy of System Information and Process Explorer. Once the CPU utilization has
hit 70% for 5 seconds, the alert is generated and emailed to me, as shown in
Figure 6.14.
164
6 – Monitoring and notifications
data" for the monitored server, as shown in Figure 6.15 (most other monitoring
solutions offer similar query gathering features).
165
6 – Monitoring and notifications
offence at your scrutiny of their code. They have a job to do and they want to
excel just like you. If you tell them there is a problem they will genuinely listen and
try to fix it. As a DBA you have the ability to help and this team work is truly what
is required. You may not like being awoken at 2:00AM by an alert for high CPU
utilization, but it is important to not let that resentment spill over to other team
members. Use the knowledge to fix the issue so that it will not happen again.
Figure 6.17: Low disk space and SQL Server Agent not running.
166
6 – Monitoring and notifications
One of the things that I really like about SQL Response is that it alerts on the jobs
scheduled by the SQL Agent, even if the SQL Agent is not running. What this
means to me is that if I know I have a backup job set to run at 10:00 PM and the
SQL Agent, which has a job to execute that backup, is not running, then SQL
Response will not only notify me that the SQL Agent is not running but that that
job did not run as scheduled.
Disk space, as we have learned in previous chapters, can be compromised for a
number of reasons, including uncontrolled log growth, TempDB filling up, and so
on. As noted earlier, the DBA needs to be warned of the imminent danger while
there is still time to act, not when all space is gone. Figure 6.18 shows the
threshold setting for disk space in SQL Response.
167
6 – Monitoring and notifications
Summary
In most of the IT world, being a DBA is not a 9 to 5 job. Many organizations
have tens or hundreds of servers that must work around the clock and so,
sometimes, must DBAs. Notifications are necessary evils of the DBA world and
you will be asked to carry mobile devices that will go off at all times of the day,
and predictably at night, minutes after you have dozed off.
Servers do not sleep and nor do their scheduled jobs. Backup failures, though
not common, do happen and if you miss a backup because you were not notified
of the failure, then you run the risk of data loss. To the DBA and those
who manage DBAs and up the chain, data loss is unacceptable. You do not want
to be the one to tell your boss that a failed backup occurred and no one
responded and someone is desperately waiting for you to restore from the
previous night's backup.
Performance notifications are nearly as important. Time lost waiting for queries to
complete, especially those queries that block other queries, is not acceptable to
business. They do not want to know about the details of the code, they only want
it to work and work correctly. Finding the issue, as I have said, is the first step to
resolving it. With the tools and techniques outlined in this chapter, you should
be able to quickly find issues and resolve 95% of them before others are even
aware of them, which is what you ultimately desire. If you must bring up the
problem, you can safely do it after the fact, when it has been eradicated. Telling
your boss there was a problem and you were able to respond to it and resolve it is
much better than him or her asking you about a problem that you were totally
unaware of.
168
CHAPTER 7: SECURING ACCESS
TO SQL SERVER
Thus far in the book we have covered a lot of ground in terms of automating
processes, battling data growth, troubleshooting code and getting notification of
impending danger. Now, I want to turn to a subject that is also sure to be near and
dear to every DBA's heart, and that is security.
Securing SQL Server is a broad topic, worthy of an entire book in its own right.
However, when securing access to a SQL Server instance, most DBAs think first
of logins, users, or credentials; in other words, the mechanisms by which they
control access to their databases. Such mechanisms are certainly the first line of
defense when it comes to restricting access to the sensitive data that your
databases store and it is these "outer defenses" that are the focus of this chapter.
Of course, this aspect of security alone is a huge topic, and there is much work to
be done by the DBA, or security administrator, in creating and managing users,
logins and roles, assigning permissions, choosing authentication schemes,
implementing password policies, and so on. Here, however, I am going to assume
that this security infrastructure is in place, and instead focus on the techniques and
scripts that DBAs can use on a day-to-day basis to monitor and maintain the users
that have access to their databases, and their activity. Specifically, I'll show you
how to:
• Find out who has access to data.
• Find out when and how they accessed the data.
• Use a DDL trigger (created in Listing 1.2, in Chapter 1) to capture
activity on database objects, such as deleting a table.
• Implement a server-side trace to capture exactly what the users have been
doing on a SQL instance.
These scripts, collectively, can be rolled into our SQL Server tacklebox (otherwise
known as the DBA Repository) so that you will know at a glance what accounts or
groups have been granted access to the data on each of the servers you manage.
170
7 – Securing access to SQL Server
171
7 – Securing access to SQL Server
language,
denylogin,
hasaccess,
isntname,
isntgroup,
isntuser,
sysadmin,
securityadmin,
serveradmin,
setupadmin,
processadmin,
diskadmin,
dbcreator,
bulkadmin,
loginname
FROM master..syslogins
Having taken a look at the big picture, I then use a pared-down version of the
same query, shown in Listing 7.2, which returns fewer columns and only those
rows where the logins have sysadmin privileges.
SELECT loginname,
sysadmin,
isntuser,
isntgroup,
createdate
FROM master..syslogins
WHERE sysadmin = 1
This query returns the name of each login that has sysadmin privileges, indicates
whether the login is a Windows user (isntuser), or a Windows Group
(isntgroup), and shows the date the login was created. Table 7.1 shows some
sample results.
BUILTIN\Administrators 1 0 1 8/24/07
Server1\SQLServer2005
MSSQLUser 1 0 1 8/24/07
$Server1$MSSQLSERVER
Server1\SQLServer2005
SQLAgentUser 1 0 1 8/24/07
$Server1$MSSQLSERVER
172
7 – Securing access to SQL Server
NT AUTHORITY\SYSTEM 1 1 0 8/24/07
Apps1_Conn 1 0 0 9/9/08
sa 1 0 0 4/8/03
RodDom\rodney 1 1 0 1/21/09
RodDom\All_DBA 1 0 1 5/26/09
The results reveal that we have two SQL logins, two Windows users and four
windows groups who have sysadmin privileges. Let's take a look at each group
in turn.
Windows users
The two Windows users are RodDom\rodney and NT Authority\System. The
former is my own Windows account, and the latter is a "built-in" local system
account, which is automatically a member of the SQL Server sysadmin fixed
server role. Generally, neither of these are a primary concern. If you find you have
a high number of accounts that have sysadmin privileges, especially in production
systems, it is worth investigating further to understand why. It is much more
secure to provide the users with only the privileges they need, which for anyone
other than the administrators of the instance, should be read only.
SQL logins
For SQL Logins, there are two: sa and Apps1_Conn. The presence of the latter
brings up an aspect of security that is tiresome for many DBAs, namely the
presence of the ubiquitous "application account".
Many applications use their own mechanism for securing data or, more accurately,
the functioning of the application. For example, it is common practice to have an
application that makes all of its connections through a single login, usually of
escalated privileges, and then controls individual access via logins that it stores in
various "application tables" within the database.
As a DBA, when I discover these escalated privileges on a SQL Server instance, I
start to ask questions. When it is determined that the application account does not
need the escalated admin privileges and so they can be reduced, I feel I have made
173
7 – Securing access to SQL Server
headway and can rest assured that one more potentially compromising hole has
been plugged.
Sometimes, however, this level of access is "business justified" and there is little
the DBA can do but fume silently. The problem for the DBA is that there are no
individual SQL logins to audit and, unless there is an internal auditing mechanism,
there is often no auditing, full stop. What is worse is that many developers know
the credentials of these application accounts so can use them to login to
production systems, as they see fit. The DBA is often defenseless in this scenario.
Nevertheless, the DBA should still audit connections via this account, and be on
the lookout for any instances where this account information is used to initiate a
connection from a source other than the application itself. I am not trying to
throw a damp towel on developers, or produce a tell-all book about their
nefarious deeds. However, in my time I have witnessed some "interesting"
authentication techniques, and I would be remiss if I did not point out the pitfalls
of some of these methods, in as much as they are not fully auditable and are prone
to abuse.
Let's assume for now, though, that the application we are concerned with uses
valid SQL or Windows AD accounts to connect to the database, and move on.
NOTE
If you are interested in discovering more about how to capture and analyze
connections over time, please read my article on gathering connection
information at:
http://www.simple-talk.com/sql/database-administration/using-ssis-to-
monitor-sql-server-databases-/
Windows groups
If a Windows user is granted access to database objects, say for running ad hoc
queries, then I highly recommend granting that access though a Windows group.
It makes life much easier for the DBA who is responsible for granting, revoking,
and otherwise reporting on, levels of access. Instead of having to administer 20 or
more individual users, all needing the same level of access, only one group is
needed. Furthermore, due to segregation of duties, it is often the Windows Server
Administrator who will ultimately be responsible for adding and/or removing the
user to the group via Windows administrative tools such as Active Directory Users
and Computers.
One of the caveats when using Windows groups, however, is that a default
schema cannot be defined for a Windows group, meaning that developers or
architects in a group will have to remember to qualify all objects to the schema
level. So, for example, they would need to use Create dbo.tablename, instead of
174
7 – Securing access to SQL Server
just CREATE tablename. I believe, though, that this caveat, which really is just
best practice anyway, is not enough to stop you from pushing for access via
Windows groups, where you can.
Returning to Table 7.1, we see that there are five rows that correspond to
Windows Groups. Two of these are created during the installation of SQL Server,
one for the SQL Agent user:
And one for SQL Server:
Server1\SQLServer2005MSSQLUser$Server1$MSSQLSERVER
I am not worried so much about these accounts because a general search of these
local groups, via "Manage Computer | Local Users and Groups", reveals that
there are no members other than NT Authority\System, which I already know has
sysadmin privileges.
175
7 – Securing access to SQL Server
Before unveiling this query, it should be noted that there are certain caveats. In my
experience, xp_logininfo does not work well if there are cross domain issues,
whereby the local Active Directory cannot deliver the account information when
users from external, trusted domains have been added. If you receive errors such
as the one shown in Listing 7.3, then you know that there is some issue, external
to SQL Server, that is preventing you from interrogating that particular group.
Msg 15404, Level 16, State 3, Procedure xp_logininfo, Line 42
Could not obtain information about Windows NT group/user
IF EXISTS (SELECT *
FROM tempdb.dbo.sysobjects
WHERE id =
OBJECT_ID(N'[tempdb].[dbo].[RESULT_STRING]'))
DROP TABLE [tempdb].[dbo].[RESULT_STRING];
176
7 – Securing access to SQL Server
DEALLOCATE Get_Groups
GO
The results of the query can be seen in Table 7.2. Notice the Account_Name field
corresponds with the Group_Name field. For example, I can see that there are
several users, including one called Server1\rodlan, who are members of the
Builtin\Administrators group. These users would have been invisible to me
without this query. The RodDom\All_DBA group has a single user,
Rodlan\rlandrum. I know from the syslogins query that RodDom\All_DBA is a
sysadmin.
177
7 – Securing access to SQL Server
Account_ Mapped_Login_
type Privilege Group_Name Server
Name Name
RodDom\
RodDom\ BUILTIN\
Domain group admin Server1
Domain Admins Administrators
Admins
Now I can place the emphasis not on the group but on the members of this
group, and begin questioning why a particular user is a member of a group that
has sysadmin privileges.
However, it's not only the sysadmin privilege that can be dangerous in the wrong
hands. Any user that has more than the minimum privileges required to do their
job is potentially a threat. Remember, I use words like "threat" and "danger"
because, as DBA, I feel I am responsible for all activity on the SQL Servers that I
manage. If a user gets into one of my databases as a result of obtaining some
elevated privilege, and accidentally drops or truncates a table, I am ultimately
responsible for getting the data back. It does happen.
Knowing that a user dropped or truncated a table does not undo the damage. The
user should not have had access to begin with. However, if you do not even know
what happened, you will be even worse off in the long run. Techniques such as
DLL triggers and Server Side Traces will provide you with knowledge of
178
7 – Securing access to SQL Server
179
7 – Securing access to SQL Server
FROM
[?]..sysusers usu LEFT OUTER JOIN
([?]..sysmembers mem INNER JOIN [?]..sysusers usg ON
mem.groupuid = usg.uid) ON usu.uid = mem.memberuid
LEFT OUTER JOIN master.dbo.syslogins lo on usu.sid =
lo.sid
WHERE
(usu.islogin = 1 and usu.isaliased = 0 and usu.hasdbaccess =
1) and
(usg.issqlrole = 1 or usg.uid is null)'
SELECT [Server],
[DB_Name],
[User_Name],
[Group_Name],
[Account_Type],
[Login_Name],
[Def_DB]
FROM [tempdb].[dbo].[SQL_DB_REP]
This particular query does not deal so much with sysadmin privileges but more
with high database level privileges. For example, it investigates membership of the
db_owner database role, which can perform all configuration and maintenance
activities on a database. The DBA can also use it to investigate membership of
other database roles that may have been created to serve a purpose, such as the
execution of stored procedures.
The results of this query will instantly let the DBA know if any users have
escalated privileges of which he or she was previously unaware. Table 7.3 shows
some sample results from executing this query (due to space restrictions I omitted
the Server column; the value was Server1 in each case).
180
7 – Securing access to SQL Server
Windows
RodDom\
DBA_Rep dbo db_owner Domain master
rodney
Account
Windows
RodDom\
ReportServer dbo db_owner Domain master
rodney
Account
NT Windows NT
ReportServer AUTHORITY\ RSExecRole Domain AUTHORITY\ master
SYSTEM Account SYSTEM
SQL
Custom_HW dbo db_owner sa master
Account
HWC Windows
Custom_HW db_owner NULL NULL
Development Group
Windows
Custom_HW JimV db_owner Domain NULL NULL
Account
Windows
Custom_HW jyoungblood public Domain NULL NULL
Account
Windows
Custom_HW RN public NULL NULL
Group
In addition to illuminating membership of the db_owner role, notice that there are
also some potentially orphaned HWC Development users in the Custom_HW
database, as indicated by the NULL value in the Login_Name field. This generally
happens when you restore a database from one server to another server where the
logins do not exist, and would warrant further investigation.
If it were determined that these are indeed orphaned users, or groups, then I
would add the accounts to the target system and execute
sp_change_users_login for SQL logins, or add the Windows user or group
account for non-SQL login accounts.
181
7 – Securing access to SQL Server
Figure 7.1: Security queries in DBA Repository SSIS package for multiple
servers.
NOTE
Chapter 2 provides further details of the DBA repository, and how to use the
associated SSIS package.
In addition to analysis, being able to find individual users by name makes finding
and removing these users very easy. This is especially important when a user leaves
the organization, for example. Yes, if the access to database objects was made via
a Windows user or group then disabling the account in Windows Active Directory
will alleviate the security risk. However, if the account was an SQL account that
the user had access to, there is still a potential risk. Having the combined data of
all three types of logins insures that a successful removal of the account occurs.
182
7 – Securing access to SQL Server
allows me to find out what service accounts are set up to run SQL Server, and
other services such as Analysis Services and SQL Agent. Service credentials
control access to various resources, like network shares. It is important that you
know whether you are running SQL Server as "local system", which will not have
access to external resources, for example, or a valid Windows service account,
which will have access to said resources.
There are manual ways to obtain this information, but who wants manual when
you can get the information quickly with a few simple lines of T-SQL code?
Granted, this trick requires xp_cmdshell, the use of which is a habit I roundly
condemn in others but tolerate in myself. Such is the nature of the DBA (well, this
DBA anyway).
Listing 7.6 shows the fairly simple query that uses xp_cmdshell, Windows WMIC
and a few other functions from the text parsers grab bag, like LEN() and
CHARINDEX():
END
183
7 – Securing access to SQL Server
from #MyTempTable
The first thing the script does is to check whether I am executing this query
against a version of Microsoft SQL Server 2005 or higher:
IF @@microsoftversion / power(2, 24) >= 9
The reason for this is that xp_xmdshell has to explicitly be enabled in 2005 and
beyond, whereas in SQL Server 2000 it is enabled by default, but one has to have
the required privileges to execute it.
If the version is SQL Server 2005 or higher, the script uses sp_configure to
enable advanced options followed by xp_cmdshell. I then create a temp table,
selfishly called #MyTempTable, and fill it with the output from the Windows
Management Instrumentation Command line utility (WMIC).
I pipe (what is this, UNIX?) the output to grep, sorry I mean the findstr
command, searching for the value "SQL" in the result set. Next, I parse the long
text string that is returned, called Big_String, into the temporary table.
The end result, shown in Table 7.4, is a useful list of all SQL services and the
accounts that have been configured as login accounts for each service.
NT AUTHORITY\
Server1 MSSQLServerADHelper
NetworkService
NT AUTHORITY\
Server1 MSSQLServerADHelper100
NETWORK SERVICE
184
7 – Securing access to SQL Server
While I do not use this query often, it always saves me many frustrating minutes
of trying to manually find the same information, via tools such as Computer
Management and Services.
Surveillance
To this point, I have focused on finding logins, users, groups and service
accounts. The queries presented so far have all been useful for managing many
hundreds if not thousands of accounts, all with some level of privilege to my SQL
Server instances.
However, knowing who has access to the data, and what level of access they have,
is only one aspect of security I want to touch on in this chapter. It is also crucial
for the DBA to track such actions as failed login attempts and to audit, as far as
possible, the actions of users once they are in amongst the data.
In this section, I will introduce three surveillance techniques to help with these
issues: Error Log interrogation with T-SQL, DDL Triggers and Server-side
Tracing.
185
7 – Securing access to SQL Server
would much prefer to read the logs with T-SQL. Fortunately, SQL Server offers
two stored procedures to make this possible, namely sp_enumerrorlogs and
sp_readerrolog.
As Figure 7.2 shows, sp_enumerrorlogs simply lists the available SQL Server
error logs.
Figure 7.2: Querying the SQL Server error logs with sp_enumerrorlogs.
The procedure sp_readerrorlog accepts the Archive #, from
sp_enumerrorlogs, as input and displays the error log in table form, as shown in
Figure 7.3, where you can see that the first archived log file (1) is passed in as a
parameter. Archive number 0 refers to the current error log.
It is possible to load and query every error log file by combining the two stored
procedures with a bit of iterative code. Listing 7.7 shows the custom code used to
loop through each log file, store the data in a temp table, and subsequently query
that data to find more than five consecutive failed login attempts, as well as the
last good login attempt.
In order for this to work, you will need to enable security logging for both
successful and failed logins, as most production servers should do. This can be
configured via the Security tab of the Server Properties. Finally, note that this
query will only work for SQL Server 2005 and 2008.
186
7 – Securing access to SQL Server
187
7 – Securing access to SQL Server
188
7 – Securing access to SQL Server
DDL triggers
In Chapter 1, I included in the Configuration script (Listing 1.2) code to create a
DDL trigger that would alert the DBA to any database creation or deletion (drop).
I'm now going to demonstrate how to use this to track DDL actions, and what
you can expect to see with this DDL trigger enabled on your SQL Servers.
DDL (Data Definition Language) triggers are very similar to the DML (Data
Manipulation Language) triggers, with which you are undoubtedly familiar. DDL
triggers can be scoped at either the database or server level, meaning they can be
set to fire when a particular statement, such as ALTER TABLE, is issued against a
specific database, or when a DDL statement is issued at the server level, such as
CREATE LOGIN.
Listing 7.7 shows the code to create the DDL trigger, AuditDatabaseDDL, which
you may have missed amongst everything else going on in Listing 1.2.
Notice that the scope of the trigger, in this case, is ALL SERVER. The Eventdata()
function is employed to set the values of the variables that will ultimately be
mailed to the DBAs when the DDL event occurs, in this case when a database is
created or dropped from the server where the trigger is created.
--Setup DDL Triggers
--Setup Create Database or Drop Database DDL Trigger
SET QUOTED_IDENTIFIER ON
GO
189
7 – Securing access to SQL Server
190
7 – Securing access to SQL Server
GO
With the trigger enabled, it is easy enough to test, simply by creating a database on
the server (CREATE DATABASE TEST_TRIGGER). As expected, and as shown in
Figure 7.5, the mail comes in and I can see the captured events, including the
username that created the database, as well as the time.
With only a slight modification to the DDL trigger, I can also be notified of any
login creations on the server, with a simple addition of CREATE LOGIN to the FOR
statement of the CREATE TRIGGER statement:
CREATE TRIGGER [AuditDatabaseDDL]
ON ALL SERVER
FOR CREATE_DATABASE, DROP_DATABASE, CREATE LOGIN
With the new trigger in place I can attempt to also create a login, as shown in
Listing 7.10.
CREATE LOGIN RogerKennsingtonJones WITH PASSWORD =
'MyPassword12'
Again, as expected I receive the mail notification that the account has been
created. At that point, the reaction would be one of concern, but at least I know
that I have several scripts available that will allow me to get to the bottom of who
created this account, why, and what privileges it has.
Server-side tracing
Most DBAs will have experienced the scenario whereby a red-faced
program/development manager storms up to them and demands to know why his
ultra-critical tables keep getting truncated, and who is doing it.
The problem is that in order to get a complete picture of your server at a
transactional level, you need either full time Profiler tracing, or to enable C2 level
auditing; both of which come at a high cost. These auditing tools will not only
slow down your servers considerably, but the drive space required in order to keep
a record of each transaction on your server is daunting, at best.
The solution I offer here presents a "lightweight" alternative. It is rooted in the
idea that issues in a system will inevitably show up more than once. If you are
having consistent issues with data loss, or unwanted record modification, then this
collection of stored procedures may help you out. With this set of scripts, you can:
• Capture trace information about a server, on a set schedule.
• Filter the captured data to suit your needs.
• Archive the data for future auditing.
192
7 – Securing access to SQL Server
This solution also allows the trace data to be stored in our central DBA repository
so you don't have scattered auditing information cluttering up your individual
instances.
193
7 – Securing access to SQL Server
USE [msdb]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*
Procedure Name : usp_startTrace
-------------------------------
Parameter 1 : traceName - Unique identifier of trace [Req]
Parameter 2 : traceFile - Physical file to hold trace data
while running [Req]
Parameter 3 : maxFileSize - Maximum size that traceFile can
grow to [Default: 5MB]
Parameter 4 : filterColumn - Trace event data column to
filter results on [Default: 0]
Parameter 5 : filterKeyword - Keyword used when filterColumn
is defined [Default: NULL]
*/
CREATE PROCEDURE [dbo].[usp_startTrace]
@traceName NVARCHAR(50),
@traceFile NVARCHAR(50),
@maxFileSize BIGINT = 5,
@filterColumn INT = 0,
@filterKeyword NVARCHAR(50) = NULL
AS
SET NOCOUNT ON
/*
Variable Declaration
--------------------
traceError - Will hold return code from sp_trace_create
to validate trace creation
TraceID - Will hold the system ID of the newly created
trace
194
7 – Securing access to SQL Server
IF @traceError <> 0
BEGIN
PRINT('Trace could not be started: ' + @traceError)
RETURN
END
195
7 – Securing access to SQL Server
196
7 – Securing access to SQL Server
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*
Procedure Name : usp_stopTrace
-------------------------------
Parameter 1 : traceName - Unique identifier of trace to be
stopped [Req]
*/
CREATE PROCEDURE [dbo].[usp_stopTrace]
@traceName NVARCHAR(50)
AS
SET NOCOUNT ON
/*
Variable Declaration
--------------------
traceID - Will hold the ID of the trace that will be
stopped and archived
traceFile - The physical file to export data from
command - Variable to hold the command to clean the
traceFile from the server
*/
DECLARE @traceID INT,
@traceFile NVARCHAR(100),
197
7 – Securing access to SQL Server
@command NVARCHAR(150)
-- Alert the user that the trace has been stopped and
archived
PRINT('Trace ' + @traceName + ' Stopped and Archived')
198
7 – Securing access to SQL Server
RETURN
END
-- Alert the user that the trace was not found in the
repository
PRINT('Trace ' + @traceName + ' Not Found')
GO
The procedure takes only one parameter, traceName, which it uses to query the
central server to retrieve all of the data that was stored by the usp_startTrace
script. This information includes the name, trace id and trace file. Once the data
has been received, the trace is stopped and the records from the trace_data
table are archived into the trace_archive table. The new trace file is then
pushed into the trace_data table. So, you can always find the newest trace data
in the trace_data table and any older trace information in the trace_archive
table.
The trace file is then deleted from the server, via xp_cmdshell, and the trace
identity is removed from the central repository to free up the trace name and id
for future use.
Implementation
There are just a few steps that you will need to take in order to get these trace
procedures working correctly in your environment.
• Choose a main repository for your trace data, and modify the procedures
to point to the machine on which you want to store the trace data. For
example, I have my database, DBA_Info, residing on a test machine. Any
version of SQL Server will work for the storage of data; the differences
in the scripts are only due to changes in definitions of data collected in
the traces.
• Create a new database for the trace data, using the create database/table
scripts included in the source code zip file, to hold all of this data. You
only need to run the create scripts for the version of the script you will
be using, or both if you will be using the procedures on multiple
machines, utilizing both SQL 2000 and SQL 2005. The results from
either version are stored in separate tables so your repository database
can contain both 2000 and 2005 archive trace data.
199
7 – Securing access to SQL Server
Testing
Here is a quick example that demonstrates how to find the cause of all this grief
for our beloved program manager. Create a new SQL Agent job to kick off at 5:30
PM (after business hours) involving only one step, as shown in Figure 7.6. In this
step, execute the start trace procedure, with the parameters needed to gather only
the data relevant to the issue at hand.
200
7 – Securing access to SQL Server
This will produce a trace named TruncateTrace. The trace file will be stored in
the root of the C drive. The maximum space the trace file should take is 10 MB
and we will place a filter on the first column (text data) looking for any instances
of the word "truncate".
The last three parameters are optional and will be defaulted to 5 (trace file size in
MB), 0 (no trace column) and NULL (no filter keyword) respectively. If you do
not specify these parameters then a bare bones trace will be created with a
maximum file size of 5 MB, and it will perform no filtering, so you will get all
available data from the trace events.
Alternatively, create another job to run at 6:00 AM, calling the usp_stopTrace
giving the same trace name, as shown in Figure 7.7.
201
7 – Securing access to SQL Server
This will stop any trace named TruncateTrace on this particular server and
export all of the collected data into the repository table (trace_table or
trace2k_table) on the linked data collection server.
Any older information will have been archived to the trace_archive (or
trace2k_archive) table. All data is marked with the server name so we can still
filter the archive table to look at older data from any server. The trace file is also
cleaned up from the traced server so the filename will be available for future use.
This will require that xp_cmdshell is available for use by the SQL Agent service
account. From this point, all we have to do is look through our newly acquired
trace_table data for the suspect.
I hope that these scripts can make life a little easier for those of you out there who
do not run full auditing on all of your SQL servers. The trace scripts can easily be
modified to include other columns and other trace events. I am presenting this as
a spring board for any DBA out there that needs an automated solution for
profiler tracing. If you do want to add any event or trace columns, I would look to
http://msdn2.microsoft.com/en-us/library/ms186265.aspx for a complete list of
all trace events and available data columns.
In an event, the next time you encounter a red-face program manager, demanding
to know who truncated his tables, much job satisfaction can be gained from being
able to respond something along the lines of:
"So <Manager Name>, we have been tracing that server all week and it seems that one of the
DTS packages you wrote, and have running each night, is the problem. It is truncating the table
in question each morning at 4:00 AM. Don't be too hard on yourself though. We all make
mistakes."
Summary
All SQL Server DBAs are tasked with securing the SQL Servers they manage.
While it is not the most glamorous of tasks, it is one of, if not the most, important
aspects of being a DBA. This is especially true in a world where compromised
data results in large fines, humiliation and potential loss of the coveted job that
you were hired to do.
Knowing who has access to the data you oversee if the first step. Working to
alleviate potential threats, either harmfully innocent or innocently harmful, is
essential. The scripts provided here will assist you in isolating and resolving these
threats. There is so much more to securing SQL Server and I have only touched
on the obvious first line, user accounts and logins, error logs, DDL triggers, and
server-side tracing.
202
7 – Securing access to SQL Server
Next and last up is the topic of data corruption, which ranks right up there with
security in terms of threats to the integrity of the DBA's precious data. I'll show
you how to detect it and how to protect yourself and your databases and most
importantly what to do when you realize you have a problem … which statistically
speaking, you will eventually. Be afraid; I saved the monster at the end of the book
until the end of the book. Don't turn the data page.
203
CHAPTER 8: FINDING DATA
CORRUPTION
I have mentioned a couple for times previously the monster at the end of this
book. This being the final chapter, it is time for the monster to be revealed. The
monster can be a silent and deceptive job killer. It can strike at once or lay in wait
for weeks before launching an attack. No, I am not talking about developers; I am
talking about database corruption.
If you have been a DBA for long enough, you will have encountered the data
corruption monster in at least one of its many forms. Often corruption occurs
when there is a power failure and the server, rather than shutting down gracefully,
simply dies in the middle of processing data. As a result of this, or some other
hardware malfunction, data or indexes become corrupt on disk and can no longer
be used by SQL Server, until repaired.
Fortunately, there are several steps you can take to protect your data, and equally
important your job, in the event of data corruption. First and foremost, it should
go without saying that not having a good backup strategy is equivalent to playing
Solitary Russian Roulette. However, I'll also demonstrate a few other techniques,
based around the various DBBC commands, and a script that will make sure
corruption issues are discovered and reported as soon as they occur, before they
propagate through your data infrastructure. Hopefully, suitably armed, the DBA
can limit the damage caused by this much-less-friendly version of the monster at
the end of the book.
P.S. If you are unfortunate enough never to have read The Monster at the End of This
Book (by Jon Stone, illustrated by Michael Smollin. Golden Books), starring the
lovable Grover Monster from Sesame Street, you have firstly my sympathy and
secondly my apologies, because the previous references will have meant little to
you. I can only suggest you buy it immediately, along with The Zombie Survival Guide
(by Max Brooks, Three Rivers Press), and add them both to your required reading
list for all new DBAs.
Causes of corruption
There are many ways that a database can become "corrupt". Predominantly it
happens when a hardware malfunction occurs, typically in the disk subsystem that
is responsible for ensuring that the data written to disk is the exact same data that
SQL Server expected to be written to disk when it passed along this responsibility
204
8 – Finding data corruption
to the operating system, and subsequently the disk controller driver and disk itself.
For example, I have seen this sort of data corruption caused by a power outage in
the middle of a transaction.
However, it is not just disk subsystem failures that cause data corruption. If you
upgrade a database from SQL Server 2000 to SQL Server 2005 or 2008, and then
interrogate it using the corruption-seeking script provided in this chapter, you may
be surprised to find that you will receive what can be construed as errors in the
database files. However, fortunately these are just warnings regarding space usage
between versions, and there are recommended steps to address the issue, such as
running DBCC UPDATEUSAGE.
Whatever the cause, the DBA does not want to live in ignorant bliss of possible
corruption for any length of time. Unfortunately, the corruption monster is often
adept at hiding, and will not rear its head until you interact with the corrupt data.
By this time, the corruption may have worked its way into your backup files and,
when falling through to your last resort of restoring the database, you may simply
restore the same corruption. The importance of a solid, regular backup strategy
cannot be overstated (so I will state it quite often). On top of that, you need a
script or tool that will regularly check, and report on any corruption issues, before
it's too late. I'll provide just such a script in this chapter.
Consequences of corruption
As noted in the previous section, most of the time corruption occurs due to failure
in an external hardware source, like a hard disk controller or power supply. SQL
Server 2005, and later, uses a feature called Page Checksum to detect potential
problems that might arise from this. This feature creates a checksum value during
writes of pages to, and subsequent reads from, disk. Essentially, if the checksum
value read for a page does not match what was originally written, then SQL Server
knows that the data was modified outside of the database engine. Prior to SQL
Server 2005, but still included as an option, is Torn Page Detection, which
performs similar checks.
If SQL Server detects a corruption issue, it's response to the situation will vary
depending on the scale of the damage. If the damage is such that the database is
unreadable by SQL Server then it would be unable to initialize and load that
database. This would require a complete restore of the database in almost all cases.
If the damage is more contained, perhaps with only one or two data pages being
affected, then SQL Server should still be able to read and open the database, and
at that stage we can use tools such as DBCC to assess and hopefully repair the
damage. Bear in mind, too, that as part of your overall backup and restore
procedure, you have the ability to perform a page level restore, if perhaps you only
205
8 – Finding data corruption
Fighting corruption
Aside from having frequent and tested backups, so that you can at least return to a
version of the data from the recent past, if the absolute worst happens, the well-
prepared DBA will have some tools in his tacklebox that he can use to pinpoint
the location of, and hopefully repair, any corrupt data.
However, before I dive in with the equivalent of a machete in a bayou, I should let
you know that I am by no means an expert in database corruption. Like you, I am
a just a day-to-day DBA hoping with all hope that I do not encounter corrupt
databases, but wanting to be as well-prepared as I can be in case it happens.
As such, I'm going to maintain my focus on the practicalities of the tools and
scripts that a DBA can use to fight corruption, mainly revolving around the use of
the DBCC family of commands.
I will not dive too deeply into the bowels of the SQL Server storage engine, where
one is likely to encounter all manner of esoteric terms that refer to how SQL
Server allocates or maps data in the physical file, such as GAM pages (Global
Allocation Map), SGAM, pages (Shared GAM), PFS pages (Page Free Space),
IAM chains (Index Allocation Map), and more. For this level of detail I can do no
better than to point you towards the work of Paul Randal:
http://www.sqlskills.com/BLOGS/PAUL/category/Corruption.aspx.
He has done a lot of work on the DBCC tool, is a true expert on the topic of data
corruption, and is certainly the man with the most oxygen in the tank for the
required dive.
206
8 – Finding data corruption
DBCC CHECKDB
DBCC CHECKDB is the main command the DBA will use to test and fix consistency
errors in SQL Server databases. DBCC has been around for many years, through
most versions of SQL Server. Depending on who you ask, it stands for either
Database Consistency Checks or Database Console Commands, the latter of
which is more accurate since DBCC includes commands that fall outside the
scope of just checking the consistency of a database.
For our purpose, though, we are concerned only with consistency and integrity of
our databases. DBCC CHECKDB is actually an amalgamation of other DBCC
commands, DBCC CHECKCATALOG, DBCC CHECKALLOC and DBCC CHECKTABLE.
Running DBCC CHECKDB includes these other commands so negates the need to
run them separately.
In order to demonstrate how to use this, and other tools, to seek out and repair
data corruption, I'm first going to need to create a database, and then perform the
evil deed of despoiling the data within it. If we start from scratch, it will make it
easier to find and subsequently corrupt data and/or index pages, so let's create a
brand new, unsullied database, aptly named "Neo". As you can see in Figure 8.1,
there are no objects created in this new database. It is pristine.
207
8 – Finding data corruption
208
8 – Finding data corruption
DBCC PAGE
Aha, you are still reading I see. Well, before we unleash the monster, I want to
show you one more very important DBCC command, of which you may not be
aware, namely DBCC PAGE. It's "officially" undocumented, in that Microsoft does
not support it, but in reality I have found piles of information on this command
from well known and respected sources, like Paul Randal, so I no longer consider
it undocumented.
The syntax is simple:
dbcc page ( {'dbname' | dbid}, filenum, pagenum [,
printopt={0|1|2|3} ])
However, the output of the command can be quite daunting to the uninitiated
DBA. So before we introduce the monster that corrupts databases, I want to run
DBCC PAGE against the NEO database. The command is as follows:
The first "1" is the file number of the data file, the second "1" is the page number,
and the final "3" is the print option which, depending on value chosen (0-3)
returns differing levels of information. A value of "3" indicates that we want to see
both page header information, as well as details. The not-very-exciting results are
shown in Figure 8.3.
209
8 – Finding data corruption
Figure 8.4 shows the output from rerunning DBCC PAGE, with this trace flag
turned on.
Figure 8.4: DBCC PAGE with trace flag 3604 turned on.
At the bottom of the output I can see that pages 1:172 – 1:383 are not allocated,
and all pages are 0% full. Recall, this is a database with no tables or any other
objects created and with no data inserted.
So, let's now create a simple table and insert some data into it. The script to do
this in is shown in Listing 8.1. It creates a table in the NEO database, called ONE,
and inserts into it 1000 records (well, 999 really). Simple stuff, but the important
point in the context of this example is that this data load will cause additional
pages to be allocated to the database and be filled with data, and I'll be able to
home in on these new pages.
210
8 – Finding data corruption
USE [NEO]
GO
GO
END
Commit Tran T_Time
211
8 – Finding data corruption
212
8 – Finding data corruption
Figure 8.6: New Pages added to NEO database after loading data.
I can see that pages 1:184 – 1:189, for example, are now allocated and are 100
percent full. Having identified one of the new pages (1:184) that contains the data
that I just loaded, I can run DBCC PAGE again for that specific page and return a
basket full of information, as shown in Figure 8.7.
213
8 – Finding data corruption
214
8 – Finding data corruption
Most hexadecimal editors, Hex Editor Neo included, have the ability to search for
values within the data file. Here, referring back to the DBBC PAGE information for
page 1:184, I simply search for the value 10006c00 29020000 to find record 553.
As you can see in Figure 8.8, the record in the Hex editor looks almost identical to
the output of the previous DBCC PAGE command.
215
8 – Finding data corruption
216
8 – Finding data corruption
Next I save the file, and close the Hex editor, which you have to do otherwise the
date file will be in use and you will be unable to initialize the database, and start
SQL Server. Now, at last, we are about to unleash the monster …
This is indeed the horror show that DBAs do not want to see. It is obviously a
very severe error and major corruption. This error will be thrown each time record
553 is included in the query results, and so any table scan will reveal the problem.
This has to be fixed quickly. Fortunately, we took a backup of the database prior
to corrupting the data file so if all else fails I can resort to that backup file to
restore the data. It is critical, when dealing with corruption issues, that you have
known good backups. Unfortunately, in the real world, it's possible this corruption
could have gone undetected for many days, which will mean that your backups
will also carry the corruption.
If this is the case then, at some point you may be faced with accepting the very
worst possible scenario, namely data loss. Before accepting that fate, however, I
am going to ace down the monster, and see if I can fix the problem using
DBCC CHECKDB.
217
8 – Finding data corruption
There are many options for DBCC CHECKDB and I'll touch on only a few of them
here. DBCC CHECKDB has been enhanced many times in its life and received major
re-writes for SQL Server 2005 and above. One of the best enhancements for the
lone DBA, working to resolve corruption issues, is the generous proliferation of
more helpful error messages.
So, let's jump in and see how bad the situation is and what, if anything, can be
done about it. To begin, I will perform a limited check of the physical consistency
of the database, with the following command:
DBCC CHECKDB('neo') WITH PHYSICAL_ONLY;
GO
Figure 8.10 shows the results which are, as expected, not great.
218
8 – Finding data corruption
database will need to be in single user mode to perform the repair, so the syntax
will be:
ALTER DATABASE NEO SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
DBCC CHECKDB('neo', REPAIR_ALLOW_DATA_LOSS)
GO
The results of running the DBCC CHECKDB command are as shown in Listing 8.3.
DBCC results for 'ONE'.
The good news is that the errors have now been repaired. The bad news is that it
took the data with it, deallocating the entire data page from the file. Notice, in
passing, that the output shows an object ID for the table on which the corruption
occurred, and also an index ID, which in this case is 0 as there are no indexes on
the table.
So, at this point, I know that I've lost data, and it was for a data page, but only one
page; but how much data exactly? A simple SELECT statement reveals that not
only have I lost the row I tampered with (NEOID 553), but also another 68 rows,
up to row 621. Figure 8.11 rubs it in my face.
219
8 – Finding data corruption
220
8 – Finding data corruption
In this "controlled example", the fix is fairly simple. Other scenarios, with much
higher levels of corruption, may require you to turn to other measures to get the
data back, after repairing with data loss. These means will almost always involve a
restore of the database from backup, which is why I impress again the importance
of a solid, verified and well documented database backup policy.
First, I need to find out the page value of the index I defined for the ONE table. I
will then plug this page of the non-clustered index into DBCC PAGE so that I know,
again, exactly what data to modify to simulate index corruption, instead of data
page corruption of the heap.
To retrieve the page value of the index, I can use another DBCC command, call it
undocumented again, DBCC INDID. The syntax for this command is:
DBCC INDID (DBID, TABLEID,-1)
So, to execute this for my newly-indexed ONE table, the command will be:
DBCC ind(23, 2121058592, -1)
221
8 – Finding data corruption
The results reveal several IndexIDs, mostly zero, along with several IndexID
values of 2, indicating a non-clustered index. Notice in Figure 8.11 the IndexID of
2 and the associated page of that index, 180.
The results look a lot different than when looking at a data page. I see returned
the Hexadecimal value (HEAP RID) that represents each row in the index for the
page interrogated, as shown in Figure 8.12.
222
8 – Finding data corruption
Figure 8.13: Looking at the non-clustered index for the ONE table with
DBCC PAGE.
I used the Hex editor again to modify, or zero out, the HEAP RID, and once again
this does indeed corrupt the database in much the same way as changing an actual
data page. However, there is one major difference: this time, when I run DBCC
CHECKDB('neo') WITH PHYSICAL_ONLY, the IndexID of the corrupt object is
reported as "2" i.e. a non-clustered index.
223
8 – Finding data corruption
Armed with this knowledge, I have open to me options for repairing the damage,
other than restoring from backup, or running DBCC CHECKDB with
REPAIR_ALLOW_DATA_LOSS, with the potential loss of data that this entails.
I can simply drop and recreate the non-clustered index using the code in Listing
8.5.
USE [NEO]
GO
USE [NEO]
GO
Now that I have delved somewhat into corrupting, finding and fixing some
problems, let's turn now to the discovery process.
224
8 – Finding data corruption
With this code, and an easy way to read the error logs where the DBCC CHECKDB
results will be written (which I covered in Chapter 7), you will be comforted by
the knowledge that you will not let corruption seep into your data infrastructure
and go unnoticed. And that you can act thoughtfully to resolve the issue, once
discovered.
The custom query, in Listing 8.6, will iterate through all databases on a SQL
Server instance, capture errors and mail the top error to you so that you can look
further into the matter.
CREATE TABLE #CheckDBTemp (
Error INT
, [Level] INT
, [State] INT
, MessageText NVARCHAR(1000)
, RepairLevel NVARCHAR(1000)
, [Status] INT
, [DBID] INT
, ObjectID INT
, IndexID INT
, PartitionID BIGINT
, AllocUnitID BIGINT
, [File] INT
, Page INT
, Slot INT
, RefFile INT
, RefPage INT
, RefSlot INT
, Allocation INT
)
-- Needed variables
DECLARE @TSQL NVARCHAR(1000)
DECLARE @dbName NVARCHAR(100)
DECLARE @dbErrorList NVARCHAR(1000)
DECLARE @dbID INT
DECLARE @ErrorCount INT
DECLARE @EmailSubject NVARCHAR(255)
DECLARE @ProfileName VARCHAR(100)
DECLARE @EmailRecipient VARCHAR(255)
-- Init variables
SET @dbID = 0
SET @dbErrorList = ''
SET @EmailSubject = 'Integrity Check Failure on ' +
CAST(COALESCE(@@SERVERNAME, 'Server Name Not Available') AS
NVARCHAR)
SET @ProfileName = 'Notifications'
SET @EmailRecipient = '[email protected]'
-- CYCLE THROUGH DATABASES
WHILE(@@ROWCOUNT > 0)
BEGIN
225
8 – Finding data corruption
IF SUBSTRING(CONVERT(varchar(50),
SERVERPROPERTY('ProductVersion')),1,1) = '8'
BEGIN
SELECT TOP 1 @dbName = name, @dbID = dbid
FROM sysdatabases WHERE dbid > @dbID
AND name NOT IN ('tempdb')
AND DATABASEPROPERTYEX(name, 'Status') = 'Online'
ORDER by dbid
END
ELSE
BEGIN
SELECT TOP 1 @dbName = name, @dbID = database_ID
FROM sys.databases WHERE database_ID > @dbID
AND name NOT IN ('tempdb')
AND DATABASEPROPERTYEX(name, 'Status') = 'Online'
ORDER by database_ID
END
END
-- If errors were found
IF( @dbErrorList <> '' )
BEGIN
IF SUBSTRING(CONVERT(varchar(50),
SERVERPROPERTY('ProductVersion')),1,1) = '8'
BEGIN
EXEC master..xp_sendmail @recipients = @EmailRecipient,
@subject = @EmailSubject, @message = @dbErrorList
END
ELSE
BEGIN
226
8 – Finding data corruption
Listing 8.6: A script for seeking out and reporting database corruption.
You will notice that the code uses a DBCC CHECKDB option that I've not
previously covered, and that is WITH TABLERESULTS. As the name suggests, it
causes the results to be returned in table format. This option is not covered in
Books Online, but is highly useful for automating error checking via SQL Agent
Jobs or custom code.
This code can easily be modified to return an email reporting that all databases
except NEO are in good shape. It might soften the blow somewhat to know that of
20 databases only one is corrupt. I know it would help me somewhat. In any
event, when corruption occurs you are going to receive the mail, seen in Figure
8.14, which is truly the monster that wakes you up in the middle of the night in a
cold sweat.
227
8 – Finding data corruption
In this mail, I can see the ObjectID, the IndexID and the corrupted page, as well
as the database name. This should be enough to go on for further investigation
with the newfound tools, DBCC PAGE, DBCC INDID and DBCC CHECKDB, with
repair options. Or, it should be a wakeup call to the fact that you might have to
restore from a good backup.
Summary
In this final chapter, I have discussed how to corrupt a database and delved into
several undocumented DBCC options that will assist you when corruption
happens to your data. Notice I said "when". I have only touched the surface of the
topic here by showing, at a very high level, how to translate pages to hexadecimal
values and understand how to correlate the results of various DBCC commands,
while troubleshooting corruption issues.
I cannot stress enough that having a good backup plan is the most important task
for the DBA. While I did not cover backups and restores in great depth in this
chapter (an entire book can be written on this topic alone), I have at least shown
the best reason to have such a good backup as part of your overall high availability
and disaster recovery plan. A corrupt database will indeed be a disaster and could
incur much downtime. You do not want to have to go to your boss, or your
bosses' boss, and tell them that you have lost data irrevocably. If you do, you
might as well pull your resume out from whatever disk drive it may be on
(assuming that's not corrupt as well) and update it.
There is often panic when discovering any level of corruption in your databases.
Without verified backups and some basic troubleshooting tips, there is no safe
place to hide when the monster rears up. All you can do is perform a repair,
potentially allowing data loss for hundreds of data pages, and then duck away into
the nearest cubicle, which if it was yours will soon be empty.
If you do have good backups and can repair the damage without data loss, then
that cubicle may one day turn into an executive office where the wall-to-wall tinted
windows reveal the flowing brook outside, where no monsters live.
The End
228
About Redgate
Redgate is the leading provider of software
solutions for Compliant Database DevOps.
We’ve specialized in database software for
over 20 years.
www.redgate.com
Compliant
Database DevOps